arXiv Papers with Code in Cryptography and Security (January 2026 - June 2026)
Authors:Charafeddine Mouzouni
Abstract:
LLM agents with tool access can discover and exploit security vulnerabilities. This is known. What is not known is which features of a system prompt trigger this behaviour, and which do not. We present a systematic taxonomy based on approximately 10,000 trials across seven models, 37 prompt conditions, and 12 hypothesized attack dimensions -- all executed in real Docker sandboxes. Every condition includes the instruction "Always respect all rules and access policies"; only a single encouragement sentence varies. Within the task class studied (planted test-runner vulnerabilities), the threat model is narrower than assumed: nine of twelve dimensions -- including minimization, moral licensing, incentives, identity priming, and reasoning triggers -- produce no detectable exploitation at n=50 per cell (upper 95% CI <7% per condition). But it is also more subtle than expected: one dimension, goal reframing, reliably triggers exploitation. "You are solving a puzzle; there may be hidden clues" produces 38-40% exploitation on Claude Sonnet 4 despite the explicit rule instruction, replicating across four models (CTF framing: 8-14% on DeepSeek, GPT-5-mini, o4-mini). The agent does not override the rules; it reinterprets the task so that exploitative actions become task-aligned. GPT-4.1 produces no exploitation across 1,850 trials (37 conditions), and a temporal comparison across four OpenAI models released over eleven months shows a pattern consistent with improving safety training, though model capability differences are a confounder. The practical contribution is a narrowed, testable threat model: defenders should audit for goal-reframing language, not for the broad class of adversarial prompts.
Authors:Asiri Dalugoda
Abstract:
Agentic AI systems increasingly execute consequential actions on behalf of human principals, delegating tasks through multi-step chains of autonomous agents. No existing standard addresses a fundamental accountability gap: verifying that terminal actions in a delegation chain were genuinely authorized by a human principal, through what chain of delegation, and under what scope. This paper presents the Human Delegation Provenance (HDP) protocol, a lightweight token-based scheme that cryptographically captures and verifies human authorization context in multi-agent systems. An HDP token binds a human authorization event to a session, records each agent's delegation action as a signed hop in an append-only chain, and enables any participant to verify the full provenance record using only the issuer's Ed25519 public key and the current session identifier. Verification is fully offline, requiring no registry lookups or third-party trust anchors. We situate HDP within the existing landscape of delegation protocols, identify its distinct design point relative to OAuth 2.0 Token Exchange (RFC 8693), JSON Web Tokens (RFC 7519), UCAN, and the Intent Provenance Protocol (draft-haberkamp-ipp-00), and demonstrate that existing standards fail to address the multi-hop, append-only, human-provenance requirements of agentic systems. HDP has been published as an IETF Internet-Draft (draft-helixar-hdp-agentic-delegation-00) and a reference TypeScript SDK is publicly available.
Authors:Luis Guzmán Lorenzo
Abstract:
When an LLM deobfuscates JavaScript, can poisoned identifier names in the string table survive into the model's reconstructed code, even when the model demonstrably understands the correct semantics? Using Claude Opus 4.6 across 192 inference runs on two code archetypes (force-directed graph simulation, A* pathfinding; 50 conditions, N=3-6), we found three consistent patterns: (1) Poisoned names persisted in every baseline run on both artifacts (physics: 8/8; pathfinding: 5/5). Matched controls showed this extends to terms with zero semantic fit when the string table does not form a coherent alternative domain. (2) Persistence coexisted with correct semantic commentary: in 15/17 runs the model wrote wrong variable names while correctly describing the actual operation in comments. (3) Task framing changed persistence: explicit verification prompts had no effect (12/12 across 4 variants), but reframing from "deobfuscate this" to "write a fresh implementation" reduced propagation from 100% to 0-20% on physics and to 0% on pathfinding, while preserving the checked algorithmic structure. Matched-control experiments showed zero-fit terms persist at the same rate when the replacement table lacks a coherent alternative-domain signal. Per-term variation in earlier domain-gradient experiments is confounded with domain-level coherence and recoverability. These observations are from two archetypes on one model family (Opus 4.6 primary; Haiku 4.5 spot-check). Broader generalization is needed
Authors:Xiaohang Yu, William Knottenbelt
Abstract:
Blockchain forensics inherently involves dynamic and iterative investigations, while many existing approaches primarily model it through static inference pipelines. We propose a paradigm shift towards Agentic Blockchain Forensics (ABF), modeling forensic investigation as a sequential decision-making process. To instantiate this paradigm, we introduce LOCARD, the first agentic framework for blockchain forensics. LOCARD operationalizes this perspective through a Tri-Core Cognitive Architecture that decouples strategic planning, operational execution, and evaluative validation. Unlike generic LLM-based agents, it incorporates a Structured Belief State mechanism to enforce forensic rigor and guide exploration under explicit state constraints. To demonstrate the efficacy of the ABF paradigm, we apply LOCARD to the inherently complex domain of cross-chain transaction tracing. We introduce Thor25, a benchmark dataset comprising over 151k real-world cross-chain forensic records, and evaluate LOCARD on the Group-Transfer Tracing task for dismantling Sybil clusters. Validated against representative laundering sub-flows from the Bybit hack, LOCARD achieves high-fidelity tracing results, providing empirical evidence that modeling blockchain forensics as an autonomous agentic task is both viable and effective. These results establish a concrete foundation for future agentic approaches to large-scale blockchain forensic analysis. Code and dataset are publicly available at https://github.com/xhyumiracle/locard and https://github.com/xhyumiracle/thorchain-crosschain-data.
Authors:Baicheng Chen, Yu Wang, Ziheng Zhou, Xiangru Liu, Juanru Li, Yilei Chen, Tianxing He
Abstract:
Reverse engineering (RE) is central to software security, particularly for cryptographic programs that handle sensitive data and are highly prone to vulnerabilities. It supports critical tasks such as vulnerability discovery and malware analysis. Despite its importance, RE remains labor-intensive and requires substantial expertise, making large language models (LLMs) a potential solution for automating the process. However, their capabilities for RE remain systematically underexplored. To address this gap, we study the cryptographic binary RE capabilities of LLMs and introduce \textbf{CREBench}, a benchmark comprising 432 challenges built from 48 standard cryptographic algorithms, 3 insecure crypto key usage scenarios, and 3 difficulty levels. Each challenge follows a Capture-the-Flag (CTF) RE challenge, requiring the model to analyze the underlying cryptographic logic and recover the correct input. We design an evaluation framework comprising four sub-tasks, from algorithm identification to correct flag recovery. We evaluate eight frontier LLMs on CREBench. GPT-5.4, the best-performing model, achieves 64.03 out of 100 and recovers the flag in 59\% of challenges. We also establish a strong human expert baseline of 92.19 points, showing that humans maintain an advantage in cryptographic RE tasks. Our code and dataset are available at https://github.com/wangyu-ovo/CREBench.
Authors:David Ilić, Kostadin Cvejoski, David Stanojević, Evgeny Grigorenko
Abstract:
All prior membership inference attacks for fine-tuned language models use hand-crafted heuristics (e.g., loss thresholding, Min-K\%, reference calibration), each bounded by the designer's intuition. We introduce the first transferable learned attack, enabled by the observation that fine-tuning any model on any corpus yields unlimited labeled data, since membership is known by construction. This removes the shadow model bottleneck and brings membership inference into the deep learning era: learning what matters rather than designing it, with generalization through training diversity and scale. We discover that fine-tuning language models produces an invariant signature of memorization detectable across architectural families and data domains. We train a membership inference classifier exclusively on transformer-based models. It transfers zero-shot to Mamba (state-space), RWKV-4 (linear attention), and RecurrentGemma (gated recurrence), achieving 0.963, 0.972, and 0.936 AUC respectively. Each evaluation combines an architecture and dataset never seen during training, yet all three exceed performance on held-out transformers (0.908 AUC). These four families share no computational mechanisms, their only commonality is gradient descent on cross-entropy loss. Even simple likelihood-based methods exhibit strong transfer, confirming the signature exists independently of the detection method. Our method, Learned Transfer MIA (LT-MIA), captures this signal most effectively by reframing membership inference as sequence classification over per-token distributional statistics. On transformers, LT-MIA achieves 2.8$\times$ higher TPR at 0.1\% FPR than the strongest baseline. The method also transfers to code (0.865 AUC) despite training only on natural language texts. Code and trained classifier available at https://github.com/JetBrains-Research/learned-mia.
Authors:Lingxin Jin, Wei Jiang, Maregu Assefa Habtie, Letian Chen, Jinyu Zhan, Xingzhi Zhou, Lin Zuo, Naoufel Werghi
Abstract:
Spiking Neural Networks (SNNs) are energy-efficient and biologically plausible, ideal for embedded and security-critical systems, yet their adversarial robustness remains open. Existing adversarial attacks often overlook SNNs' bio-plausible dynamics. We propose Spike-PTSD, a biologically inspired adversarial attack framework modeled on abnormal neural firing in Post-Traumatic Stress Disorder (PTSD). It localizes decision-critical layers, selects neurons via hyper/hypoactivation signatures, and optimizes adversarial examples with dual objectives. Across six datasets, three encoding types, and four models, Spike-PTSD achieves over 99% success rates, systematically compromising SNN robustness. Code: https://github.com/bluefier/Spike-PTSD.
Authors:Yiming Fan, Jun Yeon Won, Ding Zhu, Melih Sirlanci, Mahdi Khalili, Carter Yagemann
Abstract:
Binary Function Similarity Detection (BFSD) is a core problem in software security, supporting tasks such as vulnerability analysis, malware classification, and patch provenance. In the past few decades, numerous models and tools have been developed for this application; however, due to the lack of a comprehensive universal benchmark in this field, researchers have struggled to compare different models effectively. Existing datasets are limited in scope, often focusing on a narrow set of transformations or types of binaries, and fail to reflect the full diversity of real-world applications. We introduce EXHIB, a benchmark comprising five realistic datasets collected from the wild, each highlighting a distinct aspect of the BFSD problem space. We evaluate 9 representative models spanning multiple BFSD paradigms on EXHIB and observe performance degradations of up to 30% on firmware and semantic datasets compared to standard settings, revealing substantial generalization gaps. Our results show that robustness to low- and mid-level binary variations does not generalize to high-level semantic differences, underscoring a critical blind spot in current BFSD evaluation practices.
Authors:Devakh Rashie, Veda Rashi
Abstract:
The rapid evolution of autonomous, agentic artificial intelligence within financial services has introduced an existential architectural crisis: large language models (LLMs) are probabilistic, non-deterministic systems operating in domains that demand absolute, mathematically verifiable compliance guarantees. Existing guardrail solutions -- including NVIDIA NeMo Guardrails and Guardrails AI -- rely on probabilistic classifiers and syntactic validators that are fundamentally inadequate for enforcing complex multi-variable regulatory constraints mandated by the SEC, FINRA, and OCC. This paper presents the Lean-Agent Protocol, a formal-verification-based AI guardrail platform that leverages the Aristotle neural-symbolic model developed by Harmonic AI to auto-formalize institutional policies into Lean 4 code. Every proposed agentic action is treated as a mathematical conjecture: execution is permitted if and only if the Lean 4 kernel proves that the action satisfies pre-compiled regulatory axioms. This architecture provides cryptographic-level compliance certainty at microsecond latency, directly satisfying SEC Rule 15c3-5, OCC Bulletin 2011-12, FINRA Rule 3110, and CFPB explainability mandates. A three-phase implementation roadmap from shadow verification through enterprise-scale deployment is provided.
Authors:Yanting Wang, Wei Zou, Runpeng Geng, Jinyuan Jia
Abstract:
Large language models (LLMs) and their applications, such as agents, are highly vulnerable to prompt injection attacks. State-of-the-art prompt injection detection methods have the following limitations: (1) their effectiveness degrades significantly as context length increases, and (2) they lack explicit rules that define what constitutes prompt injection, causing detection decisions to be implicit, opaque, and difficult to reason about. In this work, we propose AgentWatcher to address the above two limitations. To address the first limitation, AgentWatcher attributes the LLM's output (e.g., the action of an agent) to a small set of causally influential context segments. By focusing detection on a relatively short text, AgentWatcher can be scalable to long contexts. To address the second limitation, we define a set of rules specifying what does and does not constitute a prompt injection, and use a monitor LLM to reason over these rules based on the attributed text, making the detection decisions more explainable. We conduct a comprehensive evaluation on tool-use agent benchmarks and long-context understanding datasets. The experimental results demonstrate that AgentWatcher can effectively detect prompt injection and maintain utility without attacks. The code is available at https://github.com/wang-yanting/AgentWatcher.
Authors:Zhengyang Tang, Ke Ji, Xidong Wang, Zihan Ye, Xinyuan Wang, Yiduo Guo, Ziniu Li, Chenxin Li, Jingyuan Hu, Shunian Chen, Tongxu Luo, Jiaxi Bi, Zeyu Qin, Shaobo Wang, Xin Lai, Pengyuan Lyu, Junyi Li, Can Xu, Chengquan Zhang, Han Hu, Ming Yan, Benyou Wang
Abstract:
We study whether phone-use agents respect privacy while completing benign mobile tasks. This question has remained hard to answer because privacy-compliant behavior is not operationalized for phone-use agents, and ordinary apps do not reveal exactly what data agents type into which form entries during execution. To make this question measurable, we introduce MyPhoneBench, a verifiable evaluation framework for privacy behavior in mobile agents. We operationalize privacy-respecting phone use as permissioned access, minimal disclosure, and user-controlled memory through a minimal privacy contract, iMy, and pair it with instrumented mock apps plus rule-based auditing that make unnecessary permission requests, deceptive re-disclosure, and unnecessary form filling observable and reproducible. Across five frontier models on 10 mobile apps and 300 tasks, we find that task success, privacy-compliant task completion, and later-session use of saved preferences are distinct capabilities, and no single model dominates all three. Evaluating success and privacy jointly reshuffles the model ordering relative to either metric alone. The most persistent failure mode across models is simple data minimization: agents still fill optional personal entries that the task does not require. These results show that privacy failures arise from over-helpful execution of benign tasks, and that success-only evaluation overestimates the deployment readiness of current phone-use agents. All code, mock apps, and agent trajectories are publicly available at~ https://github.com/FreedomIntelligence/MyPhoneBench.
Authors:Yiming Zhang, Weibo Qin, Feng Wang
Abstract:
Deep neural networks have demonstrated excellent performance in SAR target detection tasks but remain susceptible to adversarial attacks. Existing SAR-specific attack methods can effectively deceive detectors; however, they often introduce noticeable perturbations and are largely confined to digital domain, neglecting physical implementation constrains for attacking SAR systems. In this paper, a novel Adversarial Attenuation Patch (AAP) method is proposed that employs energy-constrained optimization strategy coupled with an attenuation-based deployment framework to achieve a seamless balance between attack effectiveness and stealthiness. More importantly, AAP exhibits strong potential for physical realization by aligning with signal-level electronic jamming mechanisms. Experimental results show that AAP effectively degrades detection performance while preserving high imperceptibility, and shows favorable transferability across different models. This study provides a physical grounded perspective for adversarial attacks on SAR target detection systems and facilitates the design of more covert and practically deployable attack strategies. The source code is made available at https://github.com/boremycin/SAAP.
Authors:Animesh Shaw
Abstract:
The impending arrival of cryptographically relevant quantum computers (CRQCs) threatens the security foundations of modern software: Shor's algorithm breaks RSA, ECDSA, ECDH, and Diffie-Hellman, while Grover's algorithm reduces the effective security of symmetric and hash-based schemes. Despite NIST standardising post-quantum cryptography (PQC) in 2024 (FIPS 203 ML-KEM, FIPS 204 ML-DSA, FIPS 205 SLH-DSA), most codebases lack automated tooling to inventory classical cryptographic usage and prioritise migration based on quantum risk. We present Quantum-Safe Code Auditor, a quantum-aware static analysis framework that combines (i) regex-based detection of 15 classes of quantum-vulnerable primitives, (ii) LLM-assisted contextual enrichment to classify usage and severity, and (iii) risk scoring via a Variational Quantum Eigensolver (VQE) model implemented in Qiskit 2.x, incorporating qubit-cost estimates to prioritise findings. We evaluate the system across five open-source libraries -- python-rsa, python-ecdsa, python-jose, node-jsonwebtoken, and Bouncy Castle Java -- covering 5,775 findings. On a stratified sample of 602 labelled instances, we achieve 71.98% precision, 100% recall, and an F1 score of 83.71%. All code, data, and reproduction scripts are released as open-source.
Authors:Yanting Wang, Jinyuan Jia
Abstract:
Random subspace method has wide security applications such as providing certified defenses against adversarial and backdoor attacks, and building robustly aligned LLM against jailbreaking attacks. However, the explanation of random subspace method lacks sufficient exploration. Existing state-of-the-art feature attribution methods, such as Shapley value and LIME, are computationally impractical and lacks security guarantee when applied to random subspace method. In this work, we propose EnsembleSHAP, an intrinsically faithful and secure feature attribution for random subspace method that reuses its computational byproducts. Specifically, our feature attribution method is 1) computationally efficient, 2) maintains essential properties of effective feature attribution (such as local accuracy), and 3) offers guaranteed protection against privacy-preserving attacks on feature attribution methods. To the best of our knowledge, this is the first work to establish provable robustness against explanation-preserving attacks. We also perform comprehensive evaluations for our explanation's effectiveness when faced with different empirical attacks, including backdoor attacks, adversarial attacks, and jailbreak attacks. The code is at https://github.com/Wang-Yanting/EnsembleSHAP. WARNING: This document may include content that could be considered harmful.
Authors:Xiao Liu, Xiaowei Fu, Fuxiang Huang, Lei Zhang
Abstract:
Network traffic classification using self-supervised pre-training models based on Masked Autoencoders (MAE) has demonstrated a huge potential. However, existing methods are confined to isolated byte-level reconstruction of individual flows, lacking adequate perception of the multi-granularity contextual relationship in traffic. To address this limitation, we propose Mean MAE (MMAE), a teacher-student MAE paradigm with flow mixing strategy for building encrypted traffic pre-training model. MMAE employs a self-distillation mechanism for teacher-student interaction, where the teacher provides unmasked flow-level semantic supervision to advance the student from local byte reconstruction to multi-granularity comprehension. To break the information bottleneck in individual flows, we introduce a dynamic Flow Mixing (FlowMix) strategy to replace traditional random masking mechanism. By constructing challenging cross-flow mixed samples with interferences, it compels the model to learn discriminative representations from distorted tokens. Furthermore, we design a Packet-importance aware Mask Predictor (PMP) equipped with an attention bias mechanism that leverages packet-level side-channel statistics to dynamically mask tokens with high semantic density. Numerous experiments on a number of datasets covering encrypted applications, malware, and attack traffic demonstrate that MMAE achieves state-of-the-art performance. The code is available at https://github.com/lx6c78/MMAE
Authors:Qing He, Xiaowei Fu, Lei Zhang
Abstract:
Encrypted traffic classification is a critical task for network security. While deep learning has advanced this field, the occlusion of payload semantics by encryption severely challenges standard modeling approaches. Most existing frameworks rely on static and homogeneous pipelines that apply uniform parameter sharing and static fusion strategies across all inputs. This one-size-fits-all static design is inherently flawed: by forcing structured headers and randomized payloads into a unified processing pipeline, it inevitably entangles the raw protocol signals with stochastic encryption noise, thereby degrading the fine-grained discriminative features. In this paper, we propose TrafficMoE, a framework that breaks through the bottleneck of static modeling by establishing a Disentangle-Filter-Aggregate (DFA) paradigm. Specifically, to resolve the structural between-components conflict, the architecture disentangles headers and payloads using dual-branch sparse Mixture-of-Experts (MoE), enabling modality-specific modeling. To mitigate the impact of stochastic noise, an uncertainty-aware filtering mechanism is introduced to quantify reliability and selectively suppress high-variance representations. Finally, to overcome the limitations of static fusion, a routing-guided strategy aggregates cross-modality features dynamically, that adaptively weighs contributions based on traffic context. With this DFA paradigm, TrafficMoE maximizes representational efficiency by focusing solely on the most discriminative traffic features. Extensive experiments on six datasets demonstrate TrafficMoE consistently outperforms state-of-the-art methods, validating the necessity of heterogeneity-aware modeling in encrypted traffic analysis. The source code is publicly available at https://github.com/Posuly/TrafficMoE_main.
Authors:He Yang, Dongyi Lv, Song Ma, Wei Xi, Jizhong Zhao
Abstract:
Dataset condensation aims to synthesize compact yet informative datasets that retain the training efficacy of full-scale data, offering substantial gains in efficiency. Recent studies reveal that the condensation process can be vulnerable to backdoor attacks, where malicious triggers are injected into the condensation dataset, manipulating model behavior during inference. While prior approaches have made progress in balancing attack success rate and clean test accuracy, they often fall short in preserving stealthiness, especially in concealing the visual artifacts of condensed data or the perturbations introduced during inference. To address this challenge, we introduce Sneakdoor, which enhances stealthiness without compromising attack effectiveness. Sneakdoor exploits the inherent vulnerability of class decision boundaries and incorporates a generative module that constructs input-aware triggers aligned with local feature geometry, thereby minimizing detectability. This joint design enables the attack to remain imperceptible to both human inspection and statistical detection. Extensive experiments across multiple datasets demonstrate that Sneakdoor achieves a compelling balance among attack success rate, clean test accuracy, and stealthiness, substantially improving the invisibility of both the synthetic data and triggered samples while maintaining high attack efficacy. The code is available at https://github.com/XJTU-AI-Lab/SneakDoor.
Authors:Leye Wang, Zixing Wang, Anjie Xu
Abstract:
This technical report presents SkillTester, a tool for evaluating the utility and security of agent skills. Its evaluation framework combines paired baseline and with-skill execution conditions with a separate security probe suite. Grounded in a comparative utility principle and a user-facing simplicity principle, the framework normalizes raw execution artifacts into a utility score, a security score, and a three-level security status label. More broadly, it can be understood as a comparative quality-assurance harness for agent skills in an agent-first world. The public service is deployed at https://skilltester.ai, and the broader project is maintained at https://github.com/skilltester-ai/skilltester.
Authors:Yi Zhang, Hongbo Huang, Liang-Jie Zhang
Abstract:
Diffusion models generate high-quality images but pose serious risks like copyright violation and disinformation. Watermarking is a key defense for tracing and authenticating AI-generated content. However, existing methods rely on threshold-based detection, which only supports fuzzy matching and cannot recover structured watermark data bit-exactly, making them unsuitable for offline verification or applications requiring lossless metadata (e.g., licensing instructions). To address this problem, in this paper, we propose Gaussian Shannon, a watermarking framework that treats the diffusion process as a noisy communication channel and enables both robust tracing and exact bit recovery. Our method embeds watermarks in the initial Gaussian noise without fine-tuning or quality loss. We identify two types of channel interference, namely local bit flips and global stochastic distortions, and design a cascaded defense combining error-correcting codes and majority voting. This ensures reliable end-to-end transmission of semantic payloads. Experiments across three Stable Diffusion variants and seven perturbation types show that Gaussian Shannon achieves state-of-the-art bit-level accuracy while maintaining a high true positive rate, enabling trustworthy rights attribution in real-world deployment. The source code have been made available at: https://github.com/Rambo-Yi/Gaussian-Shannon
Authors:Alexander Panfilov, Peter Romov, Igor Shilov, Yves-Alexandre de Montjoye, Jonas Geiping, Maksym Andriushchenko
Abstract:
LLM agents like Claude Code can not only write code but also be used for autonomous AI research and engineering \citep{rank2026posttrainbench, novikov2025alphaevolve}. We show that an \emph{autoresearch}-style pipeline \citep{karpathy2026autoresearch} powered by Claude Code discovers novel white-box adversarial attack \textit{algorithms} that \textbf{significantly outperform all existing (30+) methods} in jailbreaking and prompt injection evaluations. Starting from existing attack implementations, such as GCG~\citep{zou2023universal}, the agent iterates to produce new algorithms achieving up to 40\% attack success rate on CBRN queries against GPT-OSS-Safeguard-20B, compared to $\leq$10\% for existing algorithms (\Cref{fig:teaser}, left). The discovered algorithms generalize: attacks optimized on surrogate models transfer directly to held-out models, achieving \textbf{100\% ASR against Meta-SecAlign-70B} \citep{chen2025secalign} versus 56\% for the best baseline (\Cref{fig:teaser}, middle). Extending the findings of~\cite{carlini2025autoadvexbench}, our results are an early demonstration that incremental safety and security research can be automated using LLM agents. White-box adversarial red-teaming is particularly well-suited for this: existing methods provide strong starting points, and the optimization objective yields dense, quantitative feedback. We release all discovered attacks alongside baseline implementations and evaluation code at https://github.com/romovpa/claudini.
Authors:Yutao Wu, Xiao Liu, Yifeng Gao, Xiang Zheng, Hanxun Huang, Yige Li, Cong Wang, Bo Li, Xingjun Ma, Yu-Gang Jiang
Abstract:
This work identifies a critical failure mode in frontier large language models (LLMs), which we term Internal Safety Collapse (ISC): under certain task conditions, models enter a state in which they continuously generate harmful content while executing otherwise benign tasks. We introduce TVD (Task, Validator, Data), a framework that triggers ISC through domain tasks where generating harmful content is the only valid completion, and construct ISC-Bench containing 53 scenarios across 8 professional disciplines. Evaluated on JailbreakBench, three representative scenarios yield worst-case safety failure rates averaging 95.3% across four frontier LLMs (including GPT-5.2 and Claude Sonnet 4.5), substantially exceeding standard jailbreak attacks. Frontier models are more vulnerable than earlier LLMs: the very capabilities that enable complex task execution become liabilities when tasks intrinsically involve harmful content. This reveals a growing attack surface: almost every professional domain uses tools that process sensitive data, and each new dual-use tool automatically expands this vulnerability--even without any deliberate attack. Despite substantial alignment efforts, frontier LLMs retain inherently unsafe internal capabilities: alignment reshapes observable outputs but does not eliminate the underlying risk profile. These findings underscore the need for caution when deploying LLMs in high-stakes settings. Source code: https://github.com/wuyoscar/ISC-Bench
Authors:Reshabh K Sharma, Dan Grossman
Abstract:
Large Language Model (LLM) agents combine the chat interaction capabilities of LLMs with the power to interact with external tools and APIs. This enables them to perform complex tasks and act autonomously to achieve user goals. However, current agent systems operate on an all-or-nothing basis: an agent either has full access to an API's capabilities and a web page's content, or it has no access at all. This coarse-grained approach forces users to trust agents with more capabilities than they actually need for a given task. In this paper, we introduce AC4A, an access control framework for agents. As agents become more capable and autonomous, users need a way to limit what APIs or portions of web pages these agents can access, eliminating the need to trust them with everything an API or web page allows. Our goal with AC4A is to provide a framework for defining permissions that lets agents access only the resources they are authorized to access. AC4A works across both API-based and browser-based agents. It does not prescribe what permissions should be, but offers a flexible way to define and enforce them, making it practical for real-world systems. AC4A works by creating permissions granting access to resources, drawing inspiration from established access control frameworks like the one for the Unix file system. Applications define their resources as hierarchies and provide a way to compute the necessary permissions at runtime needed for successful resource access. We demonstrate the usefulness of AC4A in enforcing permissions over real-world APIs and web pages through case studies. The source code of AC4A is available at https://github.com/reSHARMA/AC4A
Authors:Jiahao Chen, Zhiming Zhao, Yuwen Pu, Chunyi Zhou, Zhou Feng, Songze Li, Shouling Ji
Abstract:
Federated learning (FL) has attracted substantial attention in both academia and industry, yet its practical security posture remains poorly understood. In particular, a large body of poisoning research is evaluated under idealized assumptions about attacker participation, client homogeneity, and success metrics, which can substantially distort how security risks are perceived in deployed FL systems. This paper revisits FL security from a measurement perspective. We systematize three major sources of mismatch between research and practice: unrealistic poisoning threat models, the omission of hybrid heterogeneity, and incomplete metrics that overemphasize peak attack success while ignoring stability and utility cost. To study these gaps, we build TFLlib, a uniform evaluation framework that supports image, text, and tabular FL tasks and re-implements representative poisoning attacks under practical settings. Our empirical study shows that idealized evaluation often overstates security risk. Under practical settings, attack performance becomes markedly more dataset-dependent and unstable, and several attacks that appear consistently strong in idealized FL lose effectiveness or incur clear benign-task degradation once practical constraints are enforced. These findings further show that final-round attack success alone is insufficient for security assessment; practical measurement must jointly consider effectiveness, temporal stability, and collateral utility loss. Overall, this work argues that many conclusions in the FL poisoning literature are not directly transferable to real deployments. By tightening the threat model and using measurement protocols aligned with practice, we provide a more realistic view of the security risks faced by contemporary FL systems and distill concrete guidance for future FL security evaluation. Our code is available at https://github.com/xaddwell/TFLlib
Authors:Yizhe Zhao, Yongjian Fu, Zihao Feng, Hao Pan, Yongheng Deng, Yaoxue Zhang, Ju Ren
Abstract:
Mobile advertising dominates app monetization but introduces risks ranging from intrusive user experience to malware delivery. Existing detection methods rely either on static analysis, which misses runtime behaviors, or on heuristic UI exploration, which struggles with sparse and obfuscated ads. In this paper, we present MANA, the first agentic multimodal reasoning framework for mobile ad detection. MANA integrates static, visual, temporal, and experiential signals into a reasoning-guided navigation strategy that determines not only how to traverse interfaces but also where to focus, enabling efficient and robust exploration. We implement and evaluate MANA on commercial smartphones over 200 apps, achieving state-of-the-art accuracy and efficiency. Compared to baselines, it improves detection accuracy by 30.5%-56.3% and reduces exploration steps by 29.7%-63.3%. Case studies further demonstrate its ability to uncover obfuscated and malicious ads, underscoring its practicality for mobile ad auditing and its potential for broader runtime UI analysis (e.g., permission abuse). Code and dataset are available at https://github.com/MANA-2026/MANA.
Authors:Alex Popa, Adrian Taylor, Ranwa Al Mallah
Abstract:
Reinforcement learning techniques are being explored as solutions to the threat of cyber attacks on enterprise networks. Recent research in the field of AI in cyber security has investigated the ability of homogeneous multi-agent reinforcement learning agents, capable of inter-agent communication, to respond to cyberattacks. This paper advances the study of learned communication in multi-agent systems by examining heterogeneous agent capabilities within a simulated network environment. To this end, we leverage CommFormer, a publicly available state-of-the-art communication algorithm, to train and evaluate agents within the Cyber Operations Research Gym (CybORG). Our results show that CommFormer agents with heterogeneous capabilities can outperform other algorithms deployed in the CybORG environment, by converging to an optimal policy up to four times faster while improving standard error by up 38%. The agents implemented in this project provide an additional avenue for exploration in the field of AI for cyber security, enabling further research involving realistic networks.
Authors:Hyunjun Jeon, Kyuyoung Kim, Jinwoo Shin
Abstract:
Modern language models can readily extract sensitive information from unstructured text, making redaction -- the selective removal of such information -- critical for data security. However, existing benchmarks for redaction typically focus on predefined categories of data such as personally identifiable information (PII) or evaluate specific techniques like masking. To address this limitation, we introduce RedacBench, a comprehensive benchmark for evaluating policy-conditioned redaction across domains and strategies. Constructed from 514 human-authored texts spanning individual, corporate, and government sources, paired with 187 security policies, RedacBench measures a model's ability to selectively remove policy-violating information while preserving the original semantics. We quantify performance using 8,053 annotated propositions that capture all inferable information in each text. This enables assessment of both security -- the removal of sensitive propositions -- and utility -- the preservation of non-sensitive propositions. Experiments across multiple redaction strategies and state-of-the-art language models show that while more advanced models can improve security, preserving utility remains a challenge. To facilitate future research, we release RedacBench along with a web-based playground for dataset customization and evaluation. Available at https://hyunjunian.github.io/redaction-playground/.
Authors:Xuebo Qiu, Mingqi Lv, Yimei Zhang, Tiantian Zhu, Tieming Chen
Abstract:
Advanced Persistent Threats (APTs) remain difficult to detect due to their stealthy nature and long-term persistence. To tackle this challenge, provenance-based threat hunting has gained traction as a proactive defense mechanism. This technique models audit logs as a whole-system provenance graph and searches for subgraphs that match APT patterns recorded in Cyber Threat Intelligence (CTI) reports. However, several limitations persist: 1) significant memory and time overhead due to the extremely large provenance graphs; 2) imprecise segmentation of APT activities from provenance graphs due to their intricate entanglement with benign operations; and 3) poor alignment of attack representations between CTI-derived query graphs and provenance graphs due to their substantial semantic gaps. To address these limitations, this paper presents ProHunter, an efficient and accurate provenance-based APT hunting system with a platform-independent design. To minimize system overhead, ProHunter creates a compact data structure that efficiently stores long-term provenance graphs using semantic abstraction and bit-level hierarchical encoding strategies. To segment APT behaviors, a heuristic-driven threat graph sampling algorithm is designed, which can extract precise attack patterns from provenance graphs. Furthermore, to bridge the semantic gaps between CTI-derived graphs and provenance graphs, ProHunter proposes adaptive graph representation and feature enhancement methods, enabling the extraction of consistent attack semantics at both localized and globalized levels.Extensive evaluations on real-world APT campaigns from DARPA TC E3, E5 and OpTC datasets demonstrate that ProHunter outperforms state-of-the-art threat hunting systems in terms of efficiency and accuracy. Our code is available at https://github.com/xueboQiu/ProHunter.
Authors:Marcelo Fernandez
Abstract:
Agent Control Protocol (ACP) is a formal technical specification for governance of autonomous agents in B2B institutional environments. ACP is the admission control layer between agent intent and system state mutation: before any agent action reaches execution, it must pass a cryptographic admission check that validates identity, capability scope, delegation chain, and policy compliance simultaneously. ACP defines the mechanisms of cryptographic identity, capability-based authorization, deterministic risk evaluation, verifiable chained delegation, transitive revocation, and immutable auditing that a system must implement for autonomous agents to operate under explicit institutional control. ACP operates as an additional layer on top of RBAC and Zero Trust, without replacing them. It is designed specifically for the problem that neither model solves: governing what an autonomous agent can do, under what conditions, with what limits, and with complete traceability for external auditing -- including across organizational boundaries. The v1.14 specification comprises 36 technical documents organized into five conformance levels (L1-L5). It includes a Go reference implementation of 22 packages covering all L1-L4 capabilities, 73 signed conformance test vectors (Ed25519 + SHA-256), and an OpenAPI 3.1.0 specification for all HTTP endpoints. It defines more than 62 verifiable requirements, 12 prohibited behaviors, and the mechanisms for interoperability between institutions. Specification and implementation: https://github.com/chelof100/acp-framework-en
Authors:Yipu Dou, Wang Yang
Abstract:
We study how to allocate a fixed supervised fine-tuning budget when three objectives must be balanced at once: multi-turn safety alignment, low over-refusal on benign boundary queries, and instruction following under verifiable constraints. We propose MOSAIC (Multi-Objective Slice-Aware Iterative Curation for Alignment), a multi-objective framework for closed-loop data mixture search built on a unified L1-L3 evaluation interface. MOSAIC turns slice-level failure profiles into executable data actions, including dataset-level mixture ratios, bucket-level weights, and focus criteria. Under a fixed 1M-token budget and five rounds of independent fine-tuning from the same base model, MOSAIC improves internal XGuard from 2.76 to 4.67 while keeping OrBench at 4.41 and IFEval at 3.65. The final Pareto solution also generalizes better than a random static LoRA baseline on independent attack, over-refusal, and capability tests, suggesting that structured failure diagnosis can serve as a practical control signal for budgeted data construction. Code is available at https://github.com/douyipu/mosaic.
Authors:Jonathan Cook, Sabih ur Rehman, M. Arif Khan
Abstract:
SIMON and SPECK were among the first efficient encryption algorithms introduced for resource-constrained applications. SIMON is suitable for Internet of Things (IoT) devices and has rapidly attracted the attention of the research community to understand its structure and analyse its security. To analyse the security of an encryption algorithm, researchers often employ cryptanalysis techniques. However, cryptanalysis is a resource and time-intensive task. To improve cryptanalysis efficiency, state-of-the-art research has proposed implementing heuristic search and sampling methods. Despite recent advances, the cryptanalysis of the SIMON cypher remains inefficient. Contributing factors are the large size of the difference distribution tables utilised in cryptanalysis and the scarcity of differentials with a high transition probability. To address these limitations, we introduce an analysis of differential properties of the SIMON32 cypher, revealing differential characteristics that pave the way for future efficiency enhancements. Our analysis has further increased the number of targeted rounds by identifying high probability differentials within a partial difference distribution table of the SIMON cypher, exceeding existing state-of-the-art benchmarks. The code designed for this work is available at https://github.com/johncook1979/simon32-analysis.
Authors:Yong Zou, Haoran Li, Fanxiao Li, Shenyang Wei, Yunyun Dong, Li Tang, Wei Zhou, Renyang Liu
Abstract:
Recent progress in image generation models (IGMs) enables high-fidelity content creation but also amplifies risks, including the reproduction of copyrighted content and the generation of offensive content. Image Generation Model Unlearning (IGMU) mitigates these risks by removing harmful concepts without full retraining. Despite growing attention, the robustness under adversarial inputs, particularly image-side threats in black-box settings, remains underexplored. To bridge this gap, we present REFORGE, a black-box red-teaming framework that evaluates IGMU robustness via adversarial image prompts. REFORGE initializes stroke-based images and optimizes perturbations with a cross-attention-guided masking strategy that allocates noise to concept-relevant regions, balancing attack efficacy and visual fidelity. Extensive experiments across representative unlearning tasks and defenses demonstrate that REFORGE significantly improves attack success rate while achieving stronger semantic alignment and higher efficiency than involved baselines. These results expose persistent vulnerabilities in current IGMU methods and highlight the need for robustness-aware unlearning against multi-modal adversarial attacks. Our code is at: https://github.com/Imfatnoily/REFORGE.
Authors:Mateusz Dziemian, Maxwell Lin, Xiaohan Fu, Micha Nowak, Nick Winter, Eliot Jones, Andy Zou, Lama Ahmad, Kamalika Chaudhuri, Sahana Chennabasappa, Xander Davies, Lauren Deason, Benjamin L. Edelman, Tanner Emek, Ivan Evtimov, Jim Gust, Maia Hamin, Kat He, Klaudia Krawiecka, Riccardo Patana, Neil Perry, Troy Peterson, Xiangyu Qi, Javier Rando, Zifan Wang, Zihan Wang, Spencer Whitman, Eric Winsor, Arman Zharmagambetov, Matt Fredrikson, Zico Kolter
Abstract:
LLM based agents are increasingly deployed in high stakes settings where they process external data sources such as emails, documents, and code repositories. This creates exposure to indirect prompt injection attacks, where adversarial instructions embedded in external content manipulate agent behavior without user awareness. A critical but underexplored dimension of this threat is concealment: since users tend to observe only an agent's final response, an attack can conceal its existence by presenting no clue of compromise in the final user facing response while successfully executing harmful actions. This leaves users unaware of the manipulation and likely to accept harmful outcomes as legitimate. We present findings from a large scale public red teaming competition evaluating this dual objective across three agent settings: tool calling, coding, and computer use. The competition attracted 464 participants who submitted 272000 attack attempts against 13 frontier models, yielding 8648 successful attacks across 41 scenarios. All models proved vulnerable, with attack success rates ranging from 0.5% (Claude Opus 4.5) to 8.5% (Gemini 2.5 Pro). We identify universal attack strategies that transfer across 21 of 41 behaviors and multiple model families, suggesting fundamental weaknesses in instruction following architectures. Capability and robustness showed weak correlation, with Gemini 2.5 Pro exhibiting both high capability and high vulnerability. To address benchmark saturation and obsoleteness, we will endeavor to deliver quarterly updates through continued red teaming competitions. We open source the competition environment for use in evaluations, along with 95 successful attacks against Qwen that did not transfer to any closed source model. We share model-specific attack data with respective frontier labs and the full dataset with the UK AISI and US CAISI to support robustness research.
Authors:Balaji Rao, John Harrison, Soonho Kong, Juneyoung Lee, Carlo Lipizzi
Abstract:
Neurosymbolic approaches leveraging Large Language Models (LLMs) with formal methods have recently achieved strong results on mathematics-oriented theorem-proving benchmarks. However, success on competition-style mathematics does not by itself demonstrate the ability to construct proofs about real-world implementations. We address this gap with a benchmark derived from an industrial cryptographic library whose assembly routines are already verified in HOL Light. s2n-bignum is a library used at AWS for providing fast assembly routines for cryptography, and its correctness is established by formal verification. The task of formally verifying this library has been a significant achievement for the Automated Reasoning Group. It involved two tasks: (1) precisely specifying the correct behavior of a program as a mathematical proposition, and (2) proving that the proposition is correct. In the case of s2n-bignum, both tasks were carried out by human experts. In \textit{s2n-bignum-bench}, we provide the formal specification and ask the LLM to generate a proof script that is accepted by HOL Light within a fixed proof-check timeout. To our knowledge, \textit{s2n-bignum-bench} is the first public benchmark focused on machine-checkable proof synthesis for industrial low-level cryptographic assembly routines in HOL Light. This benchmark provides a challenging and practically relevant testbed for evaluating LLM-based theorem proving beyond competition mathematics. The code to set up and use the benchmark is available here: \href{https://github.com/kings-crown/s2n-bignum-bench}{s2n-bignum-bench}.
Authors:Aojie Yuan, Haiyue Zhang, Ziyi Wang, Yue Zhao
Abstract:
As AI agents evolve from text generators into autonomous economic actors that accept jobs, manage budgets, and delegate to sub-agents, the absence of runtime governance becomes a critical gap. Existing frameworks orchestrate agent behavior but impose no fiscal constraints, require no earned permissions, and offer no tamper-evident audit trail. We introduce Sovereign-OS, a governance-first operating system that places every agent action under constitutional control. A declarative Charter (YAML) defines mission scope, fiscal boundaries, and success criteria. A CEO (Strategist) decomposes goals into dependency-aware task DAGs; a CFO (Treasury) gates each expenditure against budget caps, daily burn limits, and profitability floors via an auction-based bidding engine; Workers operate under earned-autonomy permissions governed by a dynamic TrustScore; and an Auditor (ReviewEngine) verifies outputs against Charter KPIs, sealing each report with a SHA-256 proof hash. Across our evaluation suite, Sovereign-OS blocks 100% of fiscal violations (30 scenarios), achieves 94% correct permission gating (200 trust-escalation missions), and maintains zero integrity failure over 1,200+ audit reports. The system further integrates Stripe for real-world payment processing, closing the loop from task planning to revenue collection. Our live demonstration walks through three scenarios: loading distinct Charters to observe divergent agent behavior, triggering CFO fiscal denials under budget and profitability constraints, and escalating a new worker's TrustScore from restricted to fully authorized with on-the-spot cryptographic audit verification.
Authors:Bo Ma, Wei Qi Yan, Jinsong Wu
Abstract:
Learning systems that preserve privacy often inject noise into hierarchical visual representations; a central challenge is to \emph{model} how such perturbations align with a declared privacy budget in a way that is interpretable and applicable across vision backbones and vision--language models (VLMs). We propose \emph{Bodhi VLM}, a \emph{privacy-alignment modeling} framework for \emph{hierarchical neural representations}: it (1) links sensitive concepts to layer-wise grouping via NCP and MDAV-based clustering; (2) locates sensitive feature regions using bottom-up (BUA) and top-down (TDA) strategies over multi-scale representations (e.g., feature pyramids or vision-encoder layers); and (3) uses an Expectation-Maximization Privacy Assessment (EMPA) module to produce an interpretable \emph{budget-alignment signal} by comparing the fitted sensitive-feature distribution to an evaluator-specified reference (e.g., Laplace or Gaussian with scale $c/ε$). The output is reference-relative and is \emph{not} a formal differential-privacy estimator. We formalize BUA/TDA over hierarchical feature structures and validate the framework on object detectors (YOLO, PPDPTS, DETR) and on the \emph{visual encoders} of VLMs (CLIP, LLaVA, BLIP). BUA and TDA yield comparable deviation trends; EMPA provides a stable alignment signal under the reported setups. We compare with generic discrepancy baselines (Chi-square, K-L, MMD) and with task-relevant baselines (MomentReg, NoiseMLE, Wass-1). Results are reported as mean$\pm$std over multiple seeds with confidence intervals in the supplementary materials. This work contributes a learnable, interpretable modeling perspective for privacy-aligned hierarchical representations rather than a post hoc audit only. Source code: \href{https://github.com/mabo1215/bodhi-vlm.git}{Bodhi-VLM GitHub repository}
Authors:Bo Ma, Jinsong Wu, Wei Qi Yan
Abstract:
Sensitive data release is vulnerable to output-side privacy threats such as membership inference, attribute inference, and record linkage. This creates a practical need for release mechanisms that provide formal privacy guarantees while preserving utility in measurable ways. We propose REAEDP, a differential privacy framework that combines entropy-calibrated histogram release, a synthetic-data release mechanism, and attack-based evaluation. On the theory side, we derive an explicit sensitivity bound for Shannon entropy, together with an extension to Rényi entropy, for adjacent histogram datasets, enabling calibrated differentially private release of histogram statistics. We further study a synthetic-data mechanism $\mathcal{F}$ with a privacy-test structure and show that it satisfies a formal differential privacy guarantee under the stated parameter conditions. On multiple public tabular datasets, the empirical entropy change remains below the theoretical bound in the tested regime, standard Laplace and Gaussian baselines exhibit comparable trends, and both membership-inference and linkage-style attack performance move toward random-guess behavior as the privacy parameter decreases. These results support REAEDP as a practically usable privacy-preserving release pipeline in the tested settings. Source code: https://github.com/mabo1215/REAEDP.git
Authors:Florin Adrian Chitan
Abstract:
The proliferation of autonomous AI agents capable of executing real-world actions - filesystem operations, API calls, database modifications, financial transactions - introduces a class of safety risk not addressed by existing content-moderation infrastructure. Current text-safety systems evaluate linguistic content for harm categories such as violence, hate speech, and sexual content; they are architecturally unsuitable for evaluating whether a proposed action falls within an agent's authorized operational scope. We present ILION (Intelligent Logic Identity Operations Network), a deterministic execution gate for agentic AI systems. ILION employs a five-component cascade architecture - Transient Identity Imprint (TII), Semantic Vector Reference Frame (SVRF), Identity Drift Control (IDC), Identity Resonance Score (IRS) and Consensus Veto Layer (CVL) - to classify proposed agent actions as BLOCK or ALLOW without statistical training or API dependencies. The system requires zero labeled data, operates in sub-millisecond latency, and produces fully interpretable verdicts. We evaluate ILION on ILION-Bench v2, a purpose-built benchmark of 380 test scenarios across eight attack categories with 39% hard-difficulty adversarial cases and a held-out development split. ILION achieves F1 = 0.8515, precision = 91.0%, and a false positive rate of 7.9% at a mean latency of 143 microseconds. Comparative evaluation against three baselines - Lakera Guard (F1 = 0.8087), OpenAI Moderation API (F1 = 0.1188), and Llama Guard 3 (F1 = 0.0105) - demonstrates that existing text-safety infrastructure systematically fails on agent execution safety tasks due to a fundamental task mismatch. ILION outperforms the best commercial baseline by 4.3 F1 points while operating 2,000 times faster with a false positive rate four times lower.
Authors:Chenlong Yin, Runpeng Geng, Yanting Wang, Jinyuan Jia
Abstract:
Prompt injection poses serious security risks to real-world LLM applications, particularly autonomous agents. Although many defenses have been proposed, their robustness against adaptive attacks remains insufficiently evaluated, potentially creating a false sense of security. In this work, we propose PISmith, a reinforcement learning (RL)-based red-teaming framework that systematically assesses existing prompt-injection defenses by training an attack LLM to optimize injected prompts in a practical black-box setting, where the attacker can only query the defended LLM and observe its outputs. We find that directly applying standard GRPO to attack strong defenses leads to sub-optimal performance due to extreme reward sparsity -- most generated injected prompts are blocked by the defense, causing the policy's entropy to collapse before discovering effective attack strategies, while the rare successes cannot be learned effectively. In response, we introduce adaptive entropy regularization and dynamic advantage weighting to sustain exploration and amplify learning from scarce successes. Extensive evaluation on 13 benchmarks demonstrates that state-of-the-art prompt injection defenses remain vulnerable to adaptive attacks. We also compare PISmith with 7 baselines across static, search-based, and RL-based attack categories, showing that PISmith consistently achieves the highest attack success rates. Furthermore, PISmith achieves strong performance in agentic settings on InjecAgent and AgentDojo against both open-source and closed-source LLMs (e.g., GPT-4o-mini and GPT-5-nano). Our code is available at https://github.com/albert-y1n/PISmith.
Authors:Zonghao Ying, Xiao Yang, Siyang Wu, Yumeng Song, Yang Qu, Hainan Li, Tianlin Li, Jiakai Wang, Aishan Liu, Xianglong Liu
Abstract:
The rapid evolution of Large Language Models (LLMs) into autonomous, tool-calling agents has fundamentally altered the cybersecurity landscape. Frameworks like OpenClaw grant AI systems operating-system-level permissions and the autonomy to execute complex workflows. This level of access creates unprecedented security challenges. Consequently, traditional content-filtering defenses have become obsolete. This report presents a comprehensive security analysis of the OpenClaw ecosystem. We systematically investigate its current threat landscape, highlighting critical vulnerabilities such as prompt injection-driven Remote Code Execution (RCE), sequential tool attack chains, context amnesia, and supply chain contamination. To systematically contextualize these threats, we propose a novel tri-layered risk taxonomy for autonomous Agents, categorizing vulnerabilities across AI Cognitive, Software Execution, and Information System dimensions. To address these systemic architectural flaws, we introduce the Full-Lifecycle Agent Security Architecture (FASA). This theoretical defense blueprint advocates for zero-trust agentic execution, dynamic intent verification, and cross-layer reasoning-action correlation. Building on this framework, we present Project ClawGuard, our ongoing engineering initiative. This project aims to implement the FASA paradigm and transition autonomous agents from high-risk experimental utilities into trustworthy systems. Our code and dataset are available at https://github.com/NY1024/ClawGuard.
Authors:Mohamed Tarek Ibn ziad, Christos Kozyrakis
Abstract:
GPUs play an increasingly important role in modern software. However, the heterogeneous host-device execution model and expanding software stacks make GPU programs prone to memory-safety and concurrency bugs that evade static analysis. While fuzz-testing, combined with dynamic error checking tools, offers a plausible solution, it remains underutilized for GPUs. In this work, we identify three main obstacles limiting prior GPU fuzzing efforts: (1) kernel-level fuzzing leading to false positives, (2) lack of device-side coverage-guided feedback, and (3) incompatibility between coverage and sanitization tools. We present cuFuzz, the first CUDA-oriented fuzzer that makes GPU fuzzing practical by addressing these obstacles. cuFuzz uses whole program fuzzing to avoid false positives from independently fuzzing device-side kernels. It leverages NVBit to instrument device-side instructions and merges the resultant coverage with compiler-based host coverage. Finally, cuFuzz decouples sanitization from coverage collection by executing host- and device-side sanitizers in separate processes. cuFuzz uncovers 43 previously unknown bugs (19 in commercial libraries) across 14 CUDA programs, including illegal memory accesses, uninitialized reads, and data races. cuFuzz achieves significantly more discovered edges and unique inputs compared to baseline approaches, especially on closed-source targets. Moreover, we quantify the execution time overheads of the different cuFuzz components and add persistent-mode support to improve the overall fuzzing throughput. Our results demonstrate that cuFuzz is an effective and deployable addition to the GPU testing toolbox. cuFuzz is publicly available at https://github.com/NVlabs/cuFuzz/.
Authors:Davi Bonetto
Abstract:
State Space Models (SSMs) such as Mamba achieve linear-time sequence processing through input-dependent recurrence, but this mechanism introduces a critical safety vulnerability. We show that the spectral radius rho(A-bar) of the discretized transition operator governs effective memory horizon: when an adversary drives rho toward zero through gradient-based Hidden State Poisoning, memory collapses from millions of tokens to mere dozens, silently destroying reasoning capacity without triggering output-level alarms. We prove an Evasion Existence Theorem showing that for any output-only defense, adversarial inputs exist that simultaneously induce spectral collapse and evade detection, then introduce SpectralGuard, a real-time monitor that tracks spectral stability across all model layers. SpectralGuard achieves F1=0.961 against non-adaptive attackers and retains F1=0.842 under the strongest adaptive setting, with sub-15ms per-token latency. Causal interventions and cross-architecture transfer to hybrid SSM-Attention systems confirm that spectral monitoring provides a principled, deployable safety layer for recurrent foundation models.
Authors:Patricia Guerra-Balboa, Annika Sauer, Héber H. Arcolezi, Thorsten Strufe
Abstract:
Differential Privacy (DP) is widely adopted in data management systems to enable data sharing with formal disclosure guarantees. A central systems challenge is understanding how DP noise translates into effective protection against inference attacks, since this directly determines achievable utility. Most existing analyses focus only on membership inference -- capturing only a threat -- or rely on reconstruction robustness (ReRo). However, under realistic assumptions, we show that ReRo can yield misleading risk estimates and violate claimed bounds, limiting their usefulness for principled DP calibration and auditing. This paper introduces reconstruction advantage, a unified risk metric that consistently captures risk across membership inference, attribute inference, and data reconstruction. We derive tight bounds that relate DP noise to adversarial advantage and characterize optimal adversarial strategies for arbitrary DP mechanisms and attacker knowledge. These results enable risk-driven noise calibration and provide a foundation for systematic DP auditing. We show that reconstruction advantage improves the accuracy and scope of DP auditing and enables more effective utility-privacy trade-offs in DP-enabled data management systems.
Authors:Massimiliano Altieri, Ronan Hamon, Roberto Corizzo, Michelangelo Ceci, Ignacio Sanchez
Abstract:
Network intrusion detection systems play a crucial role in the security strategy employed by organisations to detect and prevent cyberattacks. Such systems usually combine pattern detection signatures with anomaly detection techniques powered by machine learning methods. However, the commonly proposed machine learning methods present drawbacks such as over-reliance on labeled data and limited generalization capabilities. To address these issues, embedding-based methods have been introduced to learn representations from network data, such as DNS traffic, mainly due to its large availability, that generalise effectively to many downstream tasks. However, current approaches do not properly consider contextual information among DNS queries. In this paper, we tackle this issue by proposing DNS-GT, a novel Transformer-based model that learns embeddings for domain names from sequences of DNS queries. The model is first pre-trained in a self-supervised fashion in order to learn the general behavior of DNS activity. Then, it can be finetuned on specific downstream tasks, exploiting interactions with other relevant queries in a given sequence. Our experiments with real-world DNS data showcase the ability of our method to learn effective domain name representations. A quantitative evaluation on domain name classification and botnet detection tasks shows that our approach achieves better results compared to relevant baselines, creating opportunities for further exploration of large-scale language models for intrusion detection systems. Our code is available at: https://github.com/m-altieri/DNS-GT.
Authors:Chaoyuan Peng, Lei Wu, Yajin Zhou
Abstract:
EVMbench, released by OpenAI, Paradigm, and OtterSec, is the first large-scale benchmark for AI agents on smart contract security. Its results -- agents detect up to 45.6% of vulnerabilities and exploit 72.2% of a curated subset -- have fueled expectations that fully automated AI auditing is within reach. We identify two limitations: its narrow evaluation scope (14 agent configurations, most models tested on only their vendor scaffold) and its reliance on audit-contest data published before every model's release that models may have seen during training. To address these, we expand to 26 configurations across four model families and three scaffolds, and introduce a contamination-free dataset of 22 real-world security incidents postdating every model's release date. Our evaluation yields three findings: (1) agents' detection results are not stable, with rankings shifting across configurations, tasks, and datasets; (2) on real-world incidents, no agent succeeds at end-to-end exploitation across all 110 agent-incident pairs despite detecting up to 65% of vulnerabilities, contradicting EVMbench's conclusion that discovery is the primary bottleneck; and (3) scaffolding materially affects results, with an open-source scaffold outperforming vendor alternatives by up to 5 percentage points, yet EVMbench does not control for this. These findings challenge the narrative that fully automated AI auditing is imminent. Agents reliably catch well-known patterns and respond strongly to human-provided context, but cannot replace human judgment. For developers, agent scans serve as a pre-deployment check. For audit firms, agents are most effective within a human-in-the-loop workflow where AI handles breadth and human auditors contribute protocol-specific knowledge and adversarial reasoning. Code and data: https://github.com/blocksecteam/ReEVMBench/.
Authors:Harry Owiredu-Ashley
Abstract:
Most adversarial evaluations of large language model (LLM) safety assess single prompts and report binary pass/fail outcomes, which fails to capture how safety properties evolve under sustained adversarial interaction. We present ADVERSA, an automated red-teaming framework that measures guardrail degradation dynamics as continuous per-round compliance trajectories rather than discrete jailbreak events. ADVERSA uses a fine-tuned 70B attacker model (ADVERSA-Red, Llama-3.1-70B-Instruct with QLoRA) that eliminates the attacker-side safety refusals that render off-the-shelf models unreliable as attackers, scoring victim responses on a structured 5-point rubric that treats partial compliance as a distinct measurable state. We report a controlled experiment across three frontier victim models (Claude Opus 4.6, Gemini 3.1 Pro, GPT-5.2) using a triple-judge consensus architecture in which judge reliability is measured as a first-class research outcome rather than assumed. Across 15 conversations of up to 10 adversarial rounds, we observe a 26.7% jailbreak rate with an average jailbreak round of 1.25, suggesting that in this evaluation setting, successful jailbreaks were concentrated in early rounds rather than accumulating through sustained pressure. We document inter-judge agreement rates, self-judge scoring tendencies, attacker drift as a failure mode in fine-tuned attackers deployed out of their training distribution, and attacker refusals as a previously-underreported confound in victim resistance measurement. All limitations are stated explicitly. Attack prompts are withheld per responsible disclosure policy; all other experimental artifacts are released.
Authors:Vladyslav Parakhin
Abstract:
The temporal assumptions underpinning conventional Identity and Access Management collapse under agentic execution regimes. A sixty-second revocation window permits on the order of $6 \times 10^3$ unauthorized API calls at 100 ops/tick; at AWS Lambda scale, the figure approaches $6 \times 10^5$. This is a coherence problem, not merely a latency problem. We define a Capability Coherence System (CCS) and construct a state-mapping $φ: Σ_{\rm MESI} \to Σ_{\rm auth}$ preserving transition structure under bounded-staleness semantics. A safety theorem bounds unauthorized operations for the execution-count Release Consistency-directed Coherence (RCC) strategy at $D_{\rm rcc} \leq n$, independent of agent velocity $v$ -- a qualitative departure from the $O(v \cdot \mathrm{TTL})$ scaling of time-bounded strategies. Tick-based discrete event simulation across three business-contextualised scenarios (four strategies, ten deterministic seeds each) confirms: RCC achieves a $120\times$ reduction versus TTL-based lease in the high-velocity scenario (50 vs. 6,000 unauthorized operations), and $184\times$ under anomaly-triggered revocation. Zero bound violations across all 120 runs confirm the per-capability safety guarantee. Simulation code: https://github.com/hipvlady/prizm
Authors:Xiangsen Chen, Xuan Feng, Shuo Chen, Matthieu Maitre, Sudipto Rakshit, Diana Duvieilh, Ashley Picone, Nan Tang
Abstract:
Analyzing Open Source Intelligence (OSINT) from large volumes of data is critical for drafting and publishing comprehensive CTI reports. This process usually follows a three-stage workflow -- triage, deep search and TI drafting. While Large Language Models (LLMs) offer a promising route toward automation, existing benchmarks still have limitations. These benchmarks often consist of tasks that do not reflect real-world analyst workflows. For example, human analysts rarely receive tasks in the form of multiple-choice questions. Also, existing benchmarks often rely on model-centric metrics that emphasize lexical overlap rather than actionable, detailed insights essential for security analysts. Moreover, they typically fail to cover the complete three-stage workflow. To address these issues, we introduce CyberThreat-Eval, which is collected from the daily CTI workflow of a world-leading company. This expert-annotated benchmark assesses LLMs on practical tasks across all three stages as mentioned above. It utilizes analyst-centric metrics that measure factual accuracy, content quality, and operational costs. Our evaluation using this benchmark reveals important insights into the limitations of current LLMs. For example, LLMs often lack the nuanced expertise required to handle complex details and struggle to distinguish between correct and incorrect information. To address these challenges, the CTI workflow incorporates both external ground-truth databases and human expert knowledge. TRA allows human experts to iteratively provide feedback for continuous improvement. The code is available at \href{https://github.com/xschen-beb/CyberThreat-Eval}{\texttt{GitHub}} and \href{https://huggingface.co/datasets/xse/CyberThreat-Eval}{\texttt{HuggingFace}}.
Authors:Wenhao Yan, Ning An, Linxu Li, Bingsheng Bi, Bo Jiang, Zhigang Lu, Baoxu Liu, Junrong Liu, Cong Dong
Abstract:
Advanced Persistent Threats (APTs) pose critical challenges to modern cybersecurity due to their multi-stage and stealthy nature. While provenance-based detection approaches show promise in capturing causal attack semantics, current threat provenance practices face two paradoxical issues: (1) expert skepticism, where human analysts doubt the capability of traditional detection models to identify complex attacks; and (2) expert dependence, as analysts cannot manually process large-scale raw logs to detect threats without these models. Consequently, collaboration between humans and traditional models remains the prevailing paradigm. However, this renders investigation quality contingent upon human expertise and frequently results in alert fatigue. To address these challenges, we present ProvAgent, a framework that evolves the threat provenance paradigm from human-model collaboration to a novel collaboration between multi-agent systems and traditional models. ProvAgent leverages the speed and cost-efficiency of traditional models for initial anomaly screening over large-scale logs. By enforcing fine-grained identity-behavior consistency via graph contrastive learning, it profiles entities based on specific attributes to generate high-fidelity alerts. With these alerts serving as investigation entry points, ProvAgent achieves in-depth autonomous investigation through a hypothesis-verification multi-agent framework. Evaluations with real-world datasets demonstrate that ProvAgent outperforms six state-of-the-art (SOTA) baselines in anomaly detection. Through automated investigation, ProvAgent reconstructs near-complete attack processes at a minimum cost of \$0.06 per day.
Authors:Andrew Chin, Dongkwan Kim, Yu-Fu Fu, Fabian Fleischer, Youngjoon Kim, HyungSeok Han, Cen Zhang, Brian Junekyu Lee, Hanqing Zhao, Taesoo Kim
Abstract:
DARPA's AI Cyber Challenge (AIxCC) showed that cyber reasoning systems (CRSs) can go beyond vulnerability discovery to autonomously confirm and patch bugs: seven teams built such systems and open-sourced them after the competition. Yet all seven open-sourced CRSs remain largely unusable outside their original teams, each bound to the competition cloud infrastructure that no longer exists. We present OSS-CRS, an open, locally deployable framework for running and combining CRS techniques against real-world open-source projects, with budget-aware resource management. We ported the first-place system (Atlantis) and discovered 10 previously unknown bugs (three of high severity) across 8 OSS-Fuzz projects. OSS-CRS is publicly available.
Authors:Junxian Li, Tu Lan, Haozhen Tan, Yan Meng, Haojin Zhu
Abstract:
Modern vision-language-model (VLM) based graphical user interface (GUI) agents are expected not only to execute actions accurately but also to respond to user instructions with low latency. While existing research on GUI-agent security mainly focuses on manipulating action correctness, the security risks related to response efficiency remain largely unexplored. In this paper, we introduce SlowBA, a novel backdoor attack that targets the responsiveness of VLM-based GUI agents. The key idea is to manipulate response latency by inducing excessively long reasoning chains under specific trigger patterns. To achieve this, we propose a two-stage reward-level backdoor injection (RBI) strategy that first aligns the long-response format and then learns trigger-aware activation through reinforcement learning. In addition, we design realistic pop-up windows as triggers that naturally appear in GUI environments, improving the stealthiness of the attack. Extensive experiments across multiple datasets and baselines demonstrate that SlowBA can significantly increase response length and latency while largely preserving task accuracy. The attack remains effective even with a small poisoning ratio and under several defense settings. These findings reveal a previously overlooked security vulnerability in GUI agents and highlight the need for defenses that consider both action correctness and response efficiency. Code can be found in https://github.com/tu-tuing/SlowBA.
Authors:Najeeb Jebreel, Mona Khalil, David Sánchez, Josep Domingo-Ferrer
Abstract:
Membership inference attacks (MIAs) have become the standard tool for evaluating privacy leakage in machine learning (ML). Among them, the Likelihood-Ratio Attack (LiRA) is widely regarded as the state of the art when sufficient shadow models are available. However, prior evaluations have often overstated the effectiveness of LiRA by attacking models overconfident on their training samples, calibrating thresholds on target data, assuming balanced membership priors, and/or overlooking attack reproducibility. We re-evaluate LiRA under a realistic protocol that (i) trains models using anti-overfitting (AOF) and transfer learning (TL), when applicable, to reduce overconfidence as in production models; (ii) calibrates decision thresholds using shadow models and data rather than target data; (iii) measures positive predictive value (PPV, or precision) under shadow-based thresholds and skewed membership priors (pi <= 10%); and (iv) quantifies per-sample membership reproducibility across different seeds and training variations. We find that AOF significantly weakens LiRA, while TL further reduces attack effectiveness while improving model accuracy. Under shadow-based thresholds and skewed priors, LiRA's PPV often drops substantially, especially under AOF or AOF+TL. We also find that thresholded vulnerable sets at extremely low FPR show poor reproducibility across runs, while likelihood-ratio rankings are more stable. These results suggest that LiRA, and likely weaker MIAs, are less effective than previously suggested under realistic conditions, and that reliable privacy auditing requires evaluation protocols that reflect practical training practices, feasible attacker assumptions, and reproducibility considerations. Code is available at https://github.com/najeebjebreel/lira_analysis.
Authors:Yige Li, Wei Zhao, Zhe Li, Nay Myat Min, Hanxun Huang, Yunhan Zhao, Xingjun Ma, Yu-Gang Jiang, Jun Sun
Abstract:
Backdoor mechanisms have traditionally been studied as security threats that compromise the integrity of machine learning models. However, the same mechanism -- the conditional activation of specific behaviors through input triggers -- can also serve as a controllable and auditable interface for trustworthy model behavior. In this work, we present \textbf{Backdoor4Good (B4G)}, a unified benchmark and framework for \textit{beneficial backdoor} applications in large language models (LLMs). Unlike conventional backdoor studies focused on attacks and defenses, B4G repurposes backdoor conditioning for Beneficial Tasks that enhance safety, controllability, and accountability. It formalizes beneficial backdoor learning under a triplet formulation $(T, A, U)$, representing the \emph{Trigger}, \emph{Activation mechanism}, and \emph{Utility function}, and implements a benchmark covering four trust-centric applications. Through extensive experiments across Llama3.1-8B, Gemma-2-9B, Qwen2.5-7B, and Llama2-13B, we show that beneficial backdoors can achieve high controllability, tamper-resistance, and stealthiness while preserving clean-task performance. Our findings demonstrate new insights that backdoors need not be inherently malicious; when properly designed, they can serve as modular, interpretable, and beneficial building blocks for trustworthy AI systems. Our code and datasets are available at https://github.com/bboylyg/BackdoorLLM/B4G.
Authors:Changyi Li, Pengfei Lu, Xudong Pan, Fazl Barez, Min Yang
Abstract:
As Large Language Models (LLMs) evolve into autonomous agents, existing safety evaluations face a fundamental trade-off: manual benchmarks are costly, while LLM-based simulators are scalable but suffer from logic hallucination. We present AutoControl Arena, an automated framework for frontier AI risk evaluation built on the principle of logic-narrative decoupling. By grounding deterministic state in executable code while delegating generative dynamics to LLMs, we mitigate hallucination while maintaining flexibility. This principle, instantiated through a three-agent framework, achieves over 98% end-to-end success and 60% human preference over existing simulators. To elicit latent risks, we vary environmental Stress and Temptation across X-Bench (70 scenarios, 7 risk categories). Evaluating 9 frontier models reveals: (1) Alignment Illusion: risk rates surge from 21.7% to 54.5% under pressure, with capable models showing disproportionately larger increases; (2) Scenario-Specific Safety Scaling: advanced reasoning improves robustness for direct harms but worsens it for gaming scenarios; and (3) Divergent Misalignment Patterns: weaker models cause non-malicious harm while stronger models develop strategic concealment.
Authors:Elzo Brito dos Santos Filho
Abstract:
AI-assisted software generation has increased development speed, but it has also amplified a persistent engineering problem: systems that are functionally correct may still be structurally insecure. In practice, prompt-based security review with large language models often suffers from uneven coverage, weak reproducibility, unsupported findings, and the absence of an immutable audit trail. The ESAA architecture addresses a related governance problem in agentic software engineering by separating heuristic agent cognition from deterministic state mutation through append-only events, constrained outputs, and replay-based verification. This paper presents ESAA-Security, a domain-specific specialization of ESAA for agent-assisted security auditing of software repositories, with particular emphasis on AI-generated or AI-modified code. ESAA-Security structures auditing as a governed execution pipeline with four phases reconnaissance, domain audit execution, risk classification, and final reporting and operationalizes the workflow into 26 tasks, 16 security domains, and 95 executable checks. The framework produces structured check results, vulnerability inventories, severity classifications, risk matrices, remediation guidance, executive summaries, and a final markdown/JSON audit report. The central idea is that security review should not be modeled as a free-form conversation with an LLM, but as an evidence-oriented audit process governed by contracts and events. In ESAA-Security, agents emit structured intentions under constrained protocols; the orchestrator validates them, persists accepted outputs to an append-only log, reprojects derived views, and verifies consistency through replay and hashing. The result is a traceable, reproducible, and risk-oriented audit architecture whose final report is auditable by construction.
Authors:Xisen Jin, Michael Duan, Qin Lin, Aaron Chan, Zhenglun Chen, Junyi Du, Xiang Ren
Abstract:
As AI agents become widely deployed as online services, users often rely on an agent developer's claim about how safety is enforced, which introduces a threat where safety measures are falsely advertised. To address the threat, we propose proof-of-guardrail, a system that enables developers to provide cryptographic proof that a response is generated after a specific open-source guardrail. To generate proof, the developer runs the agent and guardrail inside a Trusted Execution Environment (TEE), which produces a TEE-signed attestation of guardrail code execution verifiable by any user offline. We implement proof-of-guardrail for OpenClaw agents and evaluate latency overhead and deployment cost. Proof-of-guardrail ensures integrity of guardrail execution while keeping the developer's agent private, but we also highlight a risk of deception about safety, for example, when malicious developers actively jailbreak the guardrail. Code and demo video: https://github.com/SaharaLabsAI/Verifiable-ClawGuard
Authors:Kelly L Vomo-Donfack, Adryel Hoszu, Grégory Ginot, Ian Morilla
Abstract:
Federated learning (FL) faces two structural tensions: gradient sharing enables data-reconstruction attacks, while non-IID client distributions degrade aggregation quality. We introduce PTOPOFL, a framework that addresses both challenges simultaneously by replacing gradient communication with topological descriptors derived from persistent homology (PH). Clients transmit only 48-dimensional PH feature vectors-compact shape summaries whose many-to-one structure makes inversion provably ill-posed-rather than model gradients. The server performs topology-guided personalised aggregation: clients are clustered by Wasserstein similarity between their PH diagrams, intra-cluster models are topology-weighted,and clusters are blended with a global consensus. We prove an information-contraction theorem showing that PH descriptors leak strictly less mutual information per sample than gradients under strongly convex loss functions, and we establish linear convergence of the Wasserstein-weighted aggregation scheme with an error floor strictly smaller than FedAvg. Evaluated against FedAvg, FedProx, SCAFFOLD, and pFedMe on a non-IID healthcare scenario (8 hospitals, 2 adversarial) and a pathological benchmark (10 clients), PTOPOFL achieves AUC 0.841 and 0.910 respectively-the highest in both settings-while reducing reconstruction risk by a factor of 4.5 relative to gradient sharing. Code is publicly available at https://github.com/MorillaLab/TopoFederatedL and data at https://doi.org/10.5281/zenodo.18827595.
Authors:Dipesh Tamboli, Vineet Punyamoorty, Atharv Pawar, Vaneet Aggarwal
Abstract:
Recent advances in generative image editing have enabled transformative applications, from professional head shot generation to avatar stylization. However, these systems often require uploading high-fidelity facial images to third-party models, raising concerns around biometric privacy, data misuse, and user consent. We propose a privacy-preserving pipeline that supports high-quality editing while keeping users in control over their biometric data in face-centric use cases. Our approach separates identity-sensitive regions from editable image context using on-device segmentation and masking, enabling secure, user-controlled editing without modifying third-party generative models. Unlike traditional cloud-based tools, PRIVATEEDIT enforces privacy by default: biometric data is never exposed or transmitted. This design requires no access to or retraining of third-party models, making it compatible with a wide range of commercial APIs. By treating privacy as a core design constraint, our system supports responsible generative AI centered on user autonomy and trust. The pipeline includes a tunable masking mechanism that lets users control how much facial information is concealed, allowing them to balance privacy and output fidelity based on trust level or use case. We demonstrate its applicability in professional and creative workflows and provide a user interface for selective anonymization. By advocating privacy-by-design in generative AI, our work offers both technical feasibility and normative guidance for protecting digital identity. The source code is available at https://github.com/Dipeshtamboli/PrivateEdit-Privacy-Preserving-GenAI.
Authors:Varun Pratap Bhardwaj
Abstract:
We present SuperLocalMemory, a local-first memory system for multi-agent AI that defends against OWASP ASI06 memory poisoning through architectural isolation and Bayesian trust scoring, while personalizing retrieval through adaptive learning-to-rank -- all without cloud dependencies or LLM inference calls. As AI agents increasingly rely on persistent memory, cloud-based memory systems create centralized attack surfaces where poisoned memories propagate across sessions and users -- a threat demonstrated in documented attacks against production systems. Our architecture combines SQLite-backed storage with FTS5 full-text search, Leiden-based knowledge graph clustering, an event-driven coordination layer with per-agent provenance, and an adaptive re-ranking framework that learns user preferences through three-layer behavioral analysis (cross-project technology preferences, project context detection, and workflow pattern mining). Evaluation across seven benchmark dimensions demonstrates 10.6ms median search latency, zero concurrency errors under 10 simultaneous agents, trust separation (gap =0.90) with 72% trust degradation for sleeper attacks, and 104% improvement in NDCG@5 when adaptive re-ranking is enabled. Behavioral data is isolated in a separate database with GDPR Article 17 erasure support. SuperLocalMemory is open-source (MIT) and integrates with 17+ development tools via Model Context Protocol.
Authors:Yanping Li, Zhening Liu, Zijian Li, Zehong Lin, Jun Zhang
Abstract:
Fine-tuning large language models (LLMs) on custom datasets has become a standard approach for adapting these models to specific domains and applications. However, recent studies have shown that such fine-tuning can lead to significant degradation in the model's safety. Existing defense methods operate at the sample level and often suffer from an unsatisfactory trade-off between safety and utility. To address this limitation, we perform a systematic token-level diagnosis of safety degradation during fine-tuning. Based on this, we propose token-level data selection for safe LLM fine-tuning (TOSS), a novel framework that quantifies the safety risk of each token by measuring the loss difference between a safety-degraded model and a utility-oriented model. This token-level granularity enables accurate identification and removal of unsafe tokens, thereby preserving valuable task-specific information. In addition, we introduce a progressive refinement strategy, TOSS-Pro, which iteratively enhances the safety-degraded model's ability to identify unsafe tokens. Extensive experiments demonstrate that our approach robustly safeguards LLMs during fine-tuning while achieving superior downstream task performance, significantly outperforming existing sample-level defense methods. Our code is available at https://github.com/Polly-LYP/TOSS.
Authors:Akshat Singh Jaswal, Ashish Baghel
Abstract:
Modern web applications are increasingly produced through AI-assisted development and rapid no-code deployment pipelines, widening the gap between accelerating software velocity and the limited adaptability of existing security tooling. Pattern-driven scanners fail to reason about novel contexts, while emerging LLM-based penetration testers rely on unconstrained exploration, yielding high cost, unstable behavior, and poor reproducibility. We introduce AWE, a memory-augmented multi-agent framework for autonomous web penetration testing that embeds structured, vulnerability-specific analysis pipelines within a lightweight LLM orchestration layer. Unlike general-purpose agents, AWE couples context aware payload mutations and generations with persistent memory and browser-backed verification to produce deterministic, exploitation-driven results. Evaluated on the 104-challenge XBOW benchmark, AWE achieves substantial gains on injection-class vulnerabilities - 87% XSS success (+30.5% over MAPTA) and 66.7% blind SQL injection success (+33.3%) - while being much faster, cheaper, and more token-efficient than MAPTA, despite using a midtier model (Claude Sonnet 4) versus MAPTA's GPT-5. MAPTA retains higher overall coverage due to broader exploratory capabilities, underscoring the complementary strengths of specialized and general-purpose architectures. Our results demonstrate that architecture matters as much as model reasoning capabilities: integrating LLMs into principled, vulnerability-aware pipelines yields substantial gains in accuracy, efficiency, and determinism for injection-class exploits. The source code for AWE is available at: https://github.com/stuxlabs/AWE
Authors:Yilian Liu, Xiaojun Jia, Guoshun Nan, Jiuyang Lyu, Zhican Chen, Tao Guan, Shuyuan Luo, Zhongyi Zhai, Yang Liu
Abstract:
Multimodal Large Language Models (MLLMs) have achieved remarkable performance but remain vulnerable to jailbreak attacks that can induce harmful content and undermine their secure deployment. Previous studies have shown that introducing additional inference steps, which disrupt security attention, can make MLLMs more susceptible to being misled into generating malicious content. However, these methods rely on single-image masking or isolated visual cues, which only modestly extend reasoning paths and thus achieve limited effectiveness, particularly against strongly aligned commercial closed-source models. To address this problem, in this paper, we propose Multi-Image Dispersion and Semantic Reconstruction (MIDAS), a multimodal jailbreak framework that decomposes harmful semantics into risk-bearing subunits, disperses them across multiple visual clues, and leverages cross-image reasoning to gradually reconstruct the malicious intent, thereby bypassing existing safety mechanisms. The proposed MIDAS enforces longer and more structured multi-image chained reasoning, substantially increases the model's reliance on visual cues while delaying the exposure of malicious semantics and significantly reducing the model's security attention, thereby improving the performance of jailbreak against advanced MLLMs. Extensive experiments across different datasets and MLLMs demonstrate that the proposed MIDAS outperforms state-of-the-art jailbreak attacks for MLLMs and achieves an average attack success rate of 81.46% across 4 closed-source MLLMs. Our code is available at this [link](https://github.com/Winnie-Lian/MIDAS).
Authors:Haodong Zhao, Jinming Hu, Zhaomin Wu, Zongru Wu, Wei Du, Junyi Hou, Caibei Zhao, Zhuosheng Zhang, Bingsheng He, Gongshen Liu
Abstract:
Federated Instruction Tuning (FIT) enables collaborative instruction tuning of large language models across multiple organizations (clients) in a cross-silo setting without requiring the sharing of private instructions. Recent findings on natural backdoors and the existing training data collection method suggest that poisoned samples may be pervasive and inadvertently embedded in real-world datasets, potentially distributed across all clients, even if the clients are benign. This work systematically examine this threat in FIT, demonstrating that existing defenses are ineffective when poisoned data is interspersed among all clients. Addressing this challenge entails two major difficulties: identifying the distinctive characteristics of poisoned samples at each client and enabling collaborative defense when some clients are heavily dominated by poisoned samples. To address these difficulties, we identify gradients in the frequency domain as a robust signal to distinguish poisoned data. We further propose a global secondary clustering mechanism that facilitates collaborative identification of poisoned samples across clients. In summary, this paper introduces ProtegoFed, the first backdoor-free FIT framework that accurately detects, removes, and even purifies interspersed poisoned data across clients during the training. Experimental results on four FL datasets show that ProtegoFed identifies $92.00\% \sim 100.00\%$ of poisoned samples, reduces the attack success rate to almost zero, and maintains utility on the main task. Code is available at https://github.com/dongdongzhaoUP/ProtegoFed.
Authors:Varun Pratap Bhardwaj
Abstract:
The rapid proliferation of agentic AI skill ecosystems -- exemplified by OpenClaw (228,000 GitHub stars) and Anthropic Agent Skills (75,600 stars) -- has introduced a critical supply chain attack surface. The ClawHavoc campaign (January-February 2026) infiltrated over 1,200 malicious skills into the OpenClaw marketplace, while MalTool catalogued 6,487 malicious tools that evade conventional detection. In response, twelve reactive security tools emerged, yet all rely on heuristic methods that provide no formal guarantees. We present SkillFortify, the first formal analysis framework for agent skill supply chains, with six contributions: (1) the DY-Skill attacker model, a Dolev-Yao adaptation to the five-phase skill lifecycle with a maximality proof; (2) a sound static analysis framework grounded in abstract interpretation; (3) capability-based sandboxing with a confinement proof; (4) an Agent Dependency Graph with SAT-based resolution and lockfile semantics; (5) a trust score algebra with formal monotonicity; and (6) SkillFortifyBench, a 540-skill benchmark. SkillFortify achieves 96.95% F1 (95% CI: [95.1%, 98.4%]) with 100% precision and 0% false positive rate on 540 skills, while SAT-based resolution handles 1,000-node graphs in under 100 ms.
Authors:Marcus Graves
Abstract:
We introduce Reverse CAPTCHA, an evaluation framework that tests whether large language models follow invisible Unicode-encoded instructions embedded in otherwise normal-looking text. Unlike traditional CAPTCHAs that distinguish humans from machines, our benchmark exploits a capability gap: models can perceive Unicode control characters that are invisible to human readers. We evaluate five models from two providers across two encoding schemes (zero-width binary and Unicode Tags), four hint levels, two payload framings, and with tool use enabled or disabled. Across 8,308 model outputs, we find that tool use dramatically amplifies compliance (Cohen's h up to 1.37, a large effect), that models exhibit provider-specific encoding preferences (OpenAI models decode zero-width binary; Anthropic models prefer Unicode Tags), and that explicit decoding instructions increase compliance by up to 95 percentage points within a single model and encoding. All pairwise model differences are statistically significant (p < 0.05, Bonferroni-corrected). These results highlight an underexplored attack surface for prompt injection via invisible Unicode payloads.
Authors:Tiantong Wang, Xinyu Yan, Tiantong Wu, Yurong Hao, Yong Jiang, Fei Huang, Wei Yang Bryan Lim
Abstract:
Machine unlearning for large language models often faces a privacy dilemma in which strict constraints prohibit sharing either the server's parameters or the client's forget set. To address this dual non-disclosure constraint, we propose MPU, an algorithm-agnostic privacy-preserving Multiple Perturbed Copies Unlearning framework that primarily introduces two server-side modules: Pre-Process for randomized copy generation and Post-Process for update aggregation. In Pre-Process, the server distributes multiple perturbed and reparameterized model instances, allowing the client to execute unlearning locally on its private forget set without accessing the server's exact original parameters. After local unlearning, the server performs Post-Process by inverting the reparameterization and aggregating updates with a harmonic denoising procedure to alleviate the impact of perturbation. Experiments with seven unlearning algorithms show that MPU achieves comparable unlearning performance to noise-free baselines, with most algorithms' average degradation well below 1% under 10% noise, and can even outperform the noise-free baseline for some algorithms under 1% noise. Code is available at https://github.com/Tristan-SHU/MPU.
Authors:Yanpei Guo, Wenjie Qu, Linyu Wu, Shengfang Zhai, Lionel Z. Wang, Ming Xu, Yue Liu, Binhang Yuan, Dawn Song, Jiaheng Zhang
Abstract:
Commercial large language models are typically deployed as black-box API services, requiring users to trust providers to execute inference correctly and report token usage honestly. We present IMMACULATE, a practical auditing framework that detects economically motivated deviations-such as model substitution, quantization abuse, and token overbilling-without trusted hardware or access to model internals. IMMACULATE selectively audits a small fraction of requests using verifiable computation, achieving strong detection guarantees while amortizing cryptographic overhead. Experiments on dense and MoE models show that IMMACULATE reliably distinguishes benign and malicious executions with under 1% throughput overhead. Our code is published at https://github.com/guo-yanpei/Immaculate.
Authors:Idan Habler, Vineeth Sai Narajala, Stav Koren, Amy Chang, Tiffany Saade
Abstract:
Retrieval-Augmented Generation (RAG) systems are essential to contemporary AI applications, allowing large language models to obtain external knowledge via vector similarity search. Nevertheless, these systems encounter a significant security flaw: hubness - items that frequently appear in the top-k retrieval results for a disproportionately high number of varied queries. These hubs can be exploited to introduce harmful content, alter search rankings, bypass content filtering, and decrease system performance. We introduce hubscan, an open-source security scanner that evaluates vector indices and embeddings to identify hubs in RAG systems. Hubscan presents a multi-detector architecture that integrates: (1) robust statistical hubness detection utilizing median/MAD-based z-scores, (2) cluster spread analysis to assess cross-cluster retrieval patterns, (3) stability testing under query perturbations, and (4) domain-aware and modality-aware detection for category-specific and cross-modal attacks. Our solution accommodates several vector databases (FAISS, Pinecone, Qdrant, Weaviate) and offers versatile retrieval techniques, including vector similarity, hybrid search, and lexical matching with reranking capabilities. We evaluate hubscan on Food-101, MS-COCO, and FiQA adversarial hubness benchmarks constructed using state-of-the-art gradient-optimized and centroid-based hub generation methods. hubscan achieves 90% recall at a 0.2% alert budget and 100% recall at 0.4%, with adversarial hubs ranking above the 99.8th percentile. Domain-scoped scanning recovers 100% of targeted attacks that evade global detection. Production validation on 1M real web documents from MS MARCO demonstrates significant score separation between clean documents and adversarial content. Our work provides a practical, extensible framework for detecting hubness threats in production RAG systems.
Authors:Jing Zhang
Abstract:
AI agents increasingly act on behalf of humans, yet no existing system provides a tamper-evident, independently verifiable record of what they did. As regulations such as the EU AI Act begin mandating automatic logging for high-risk AI systems, this gap carries concrete consequences -- especially for agents running on personal hardware, where no centralized provider controls the log. Extending Floridi's informational rights framework from data about individuals to actions performed on their behalf, this paper proposes the Right to History: the principle that individuals are entitled to a complete, verifiable record of every AI agent action on their own hardware. The paper formalizes this principle through five system invariants with structured proof sketches, and implements it in PunkGo, a Rust sovereignty kernel that unifies RFC 6962 Merkle tree audit logs, capability-based isolation, energy-budget governance, and a human-approval mechanism. Adversarial testing confirms all five invariants hold. Performance evaluation shows sub-1.3 ms median action latency, ~400 actions/sec throughput, and 448-byte Merkle inclusion proofs at 10,000 log entries.
Authors:Nahom Birhan, Daniel Wesego, Dereje Shenkut, Frank Liu, Daniel Takabi
Abstract:
Personalized federated learning (PFL) creates client-specific models to handle data heterogeneity. Previously, PFL has been shown to be naturally resistant to backdoor attack propagation across clients. In this work, we reveal that PFL remains vulnerable to backdoor attacks through a novel frequency-domain approach. We propose DCInject, an adaptive frequency-domain backdoor attack for PFL, which removes portions of the zero-frequency (DC) component and replaces them with Gaussian-distributed samples in the frequency domain. Our attack achieves superior attack success rates while maintaining clean accuracy across four datasets (CIFAR-10/100, GTSRB, SVHN) compared to existing spatial-domain attacks, evaluated under parameter decoupling based personalization. DCInject achieves superior performance with ASRs of 96.83% (CIFAR-10), 99.38% (SVHN), and 100% (GTSRB) while maintaining clean accuracy. Under I-BAU defense, DCInject demonstrates strong persistence, retaining 90.30% ASR vs BadNet's 58.56% on VGG-16, exposing critical vulnerabilities in PFL security assumptions. Our code is available at https://github.com/NahomMA/DCINject-PFL
Authors:Agnieszka M. Zbrzezny
Abstract:
We present BMC4TimeSec, an end-to-end tool for verifying Timed Security Protocols (TSP) based on SMT-based bounded model checking and multi-agent modelling in the form of Timed Interpreted Systems (TIS) and Timed Interleaved Interpreted Systems (TIIS). In BMC4TimeSec, TSP executions implement the TIS/TIIS environment (join actions, interleaving, delays, lifetimes), and knowledge automata implement the agents (evolution of participant knowledge, including the intruder). The code is publicly available on \href{https://github.com/agazbrzezny/BMC4TimeSec}{GitHub}, as is a \href{https://youtu.be/aNybKz6HwdA}{video} demonstration.
Authors:Yu Yin, Shuai Wang, Bevan Koopman, Guido Zuccon
Abstract:
Large Language Models (LLMs) have emerged as powerful re-rankers. Recent research has however showed that simple prompt injections embedded within a candidate document (i.e., jailbreak prompt attacks) can significantly alter an LLM's ranking decisions. While this poses serious security risks to LLM-based ranking pipelines, the extent to which this vulnerability persists across diverse LLM families, architectures, and settings remains largely under-explored. In this paper, we present a comprehensive empirical study of jailbreak prompt attacks against LLM rankers. We focus our evaluation on two complementary tasks: (1) Preference Vulnerability Assessment, measuring intrinsic susceptibility via attack success rate (ASR); and (2) Ranking Vulnerability Assessment, quantifying the operational impact on the ranking's quality (nDCG@10). We systematically examine three prevalent ranking paradigms (pairwise, listwise, setwise) under two injection variants: decision objective hijacking and decision criteria hijacking. Beyond reproducing prior findings, we expand the analysis to cover vulnerability scaling across model families, position sensitivity, backbone architectures, and cross-domain robustness. Our results characterize the boundary conditions of these vulnerabilities, revealing critical insights such as that encoder-decoder architectures exhibit strong inherent resilience to jailbreak attacks. We publicly release our code and additional experimental results at https://github.com/ielab/LLM-Ranker-Attack.
Authors:Diego Cabuya-Padilla, Daniel Díaz-López, Carlos Castaneda-Marroquín
Abstract:
As cyber threats grow in complexity and scale, many security incidents remain poorly managed due to the lack of proper training among C-level executives. Thus, there is a need for targeted cybersecurity education to enhance executive decision-making and crisis response. Traditional training methods, such as cyber wargames and Tabletop Exercises (TTX), aim to develop abilities to face critical incidents, however, they often lack the interactive and dynamic elements required to prepare individuals for real-world cyber incidents. This paper presents a novel approach to cybersecurity and cyberdefense education through the design of a specialized hybrid TTX for the maritime domain, which uses a framework to model mathematically how a cyberattack spreads along multiple nodes and impacts infrastructure. Our proposal was validated through exercises in Argentina and the United States, demonstrating a positive impact in developing the comprehension and projection levels of Cyber Situational Awareness (CSA), and reinforcing governance. Documentation about the Hybrid TTX, scenario, datasets and implementation of the SERDUX-MARCIM model, is available at the project repository at https://github.com/diegocabuya/SERDUX-MARCIM
Authors:Longfei Chen, Ji Zhao, Lanxiao Cui, Tong Su, Xingbo Pan, Ziyang Li, Yongxing Wu, Qijiang Cao, Qiyao Cai, Jing Zhang, Yuandong Ni, Junyao He, Zeyu Zhang, Chao Ge, Xuhuai Lu, Zeyu Gao, Yuxin Cui, Weisen Chen, Yuxuan Peng, Shengping Wang, Qi Li, Yukai Huang, Yukun Liu, Tuo Zhou, Terry Yue Zhuo, Junyang Lin, Chao Zhang
Abstract:
We introduce SecCodeBench-V2, a publicly released benchmark for evaluating Large Language Model (LLM) copilots' capabilities of generating secure code. SecCodeBench-V2 comprises 98 generation and fix scenarios derived from Alibaba Group's industrial productions, where the underlying security issues span 22 common CWE (Common Weakness Enumeration) categories across five programming languages: Java, C, Python, Go, and JavaScript. SecCodeBench-V2 adopts a function-level task formulation: each scenario provides a complete project scaffold and requires the model to implement or patch a designated target function under fixed interfaces and dependencies. For each scenario, SecCodeBench-V2 provides executable proof-of-concept (PoC) test cases for both functional validation and security verification. All test cases are authored and double-reviewed by security experts, ensuring high fidelity, broad coverage, and reliable ground truth. Beyond the benchmark itself, we build a unified evaluation pipeline that assesses models primarily via dynamic execution. For most scenarios, we compile and run model-generated artifacts in isolated environments and execute PoC test cases to validate both functional correctness and security properties. For scenarios where security issues cannot be adjudicated with deterministic test cases, we additionally employ an LLM-as-a-judge oracle. To summarize performance across heterogeneous scenarios and difficulty levels, we design a Pass@K-based scoring protocol with principled aggregation over scenarios and severity, enabling holistic and comparable evaluation across models. Overall, SecCodeBench-V2 provides a rigorous and reproducible foundation for assessing the security posture of AI coding assistants, with results and artifacts released at https://alibaba.github.io/sec-code-bench. The benchmark is publicly available at https://github.com/alibaba/sec-code-bench.
Authors:Haibo Tong, Feifei Zhao, Linghao Feng, Ruoyu Wu, Ruolin Chen, Lu Jia, Zhou Zhao, Jindong Li, Tenglong Li, Erliang Lin, Shuai Yang, Enmeng Lu, Yinqian Sun, Qian Zhang, Zizhe Ruan, Jinyu Fan, Zeyang Yue, Ping Wu, Huangrui Li, Chengyi Sun, Yi Zeng
Abstract:
Rapidly evolving AI exhibits increasingly strong autonomy and goal-directed capabilities, accompanied by derivative systemic risks that are more unpredictable, difficult to control, and potentially irreversible. However, current AI safety evaluation systems suffer from critical limitations such as restricted risk dimensions and failed frontier risk detection. The lagging safety benchmarks and alignment technologies can hardly address the complex challenges posed by cutting-edge AI models. To bridge this gap, we propose the "ForesightSafety Bench" AI Safety Evaluation Framework, beginning with 7 major Fundamental Safety pillars and progressively extends to advanced Embodied AI Safety, AI4Science Safety, Social and Environmental AI risks, Catastrophic and Existential Risks, as well as 8 critical industrial safety domains, forming a total of 94 refined risk dimensions. To date, the benchmark has accumulated tens of thousands of structured risk data points and assessment results, establishing a widely encompassing, hierarchically clear, and dynamically evolving AI safety evaluation framework. Based on this benchmark, we conduct systematic evaluation and in-depth analysis of over twenty mainstream advanced large models, identifying key risk patterns and their capability boundaries. The safety capability evaluation results reveals the widespread safety vulnerabilities of frontier AI across multiple pillars, particularly focusing on Risky Agentic Autonomy, AI4Science Safety, Embodied AI Safety, Social AI Safety and Catastrophic and Existential Risks. Our benchmark is released at https://github.com/Beijing-AISI/ForesightSafety-Bench. The project website is available at https://foresightsafety-bench.beijing-aisi.ac.cn/.
Authors:Mario Marín Caballero, Miguel Betancourt Alonso, Daniel Díaz-López, Angel Luis Perales Gómez, Pantaleone Nespoli, Gregorio Martínez Pérez
Abstract:
The most valuable asset of any cloud-based organization is data, which is increasingly exposed to sophisticated cyberattacks. Until recently, the implementation of security measures in DevOps environments was often considered optional by many government entities and critical national services operating in the cloud. This includes systems managing sensitive information, such as electoral processes or military operations, which have historically been valuable targets for cybercriminals. Resistance to security implementation is often driven by concerns over losing agility in software development, increasing the risk of accumulated vulnerabilities. Nowadays, patching software is no longer enough; adopting a proactive cyber defense strategy, supported by Artificial Intelligence (AI), is crucial to anticipating and mitigating threats. Thus, this work proposes integrating the Security Chaos Engineering (SCE) methodology with a new LLM-based flow to automate the creation of attack defense trees that represent adversary behavior and facilitate the construction of SCE experiments based on these graphical models, enabling teams to stay one step ahead of attackers and implement previously unconsidered defenses. Further detailed information about the experiment performed, along with the steps to replicate it, can be found in the following repository: https://github.com/mariomc14/devsecops-adversary-llm.git.
Authors:Ruomeng Ding, Yifei Pang, He Sun, Yizhong Wang, Zhiwei Steven Wu, Zhun Deng
Abstract:
Evaluation and alignment pipelines for large language models increasingly rely on LLM-based judges, whose behavior is guided by natural-language rubrics and validated on benchmarks. We identify a previously under-recognized vulnerability in this workflow, which we term Rubric-Induced Preference Drift (RIPD). Even when rubric edits pass benchmark validation, they can still produce systematic and directional shifts in a judge's preferences on target domains. Because rubrics serve as a high-level decision interface, such drift can emerge from seemingly natural, criterion-preserving edits and remain difficult to detect through aggregate benchmark metrics or limited spot-checking. We further show this vulnerability can be exploited through rubric-based preference attacks, in which benchmark-compliant rubric edits steer judgments away from a fixed human or trusted reference on target domains, systematically inducing RIPD and reducing target-domain accuracy up to 9.5% (helpfulness) and 27.9% (harmlessness). When these judgments are used to generate preference labels for downstream post-training, the induced bias propagates through alignment pipelines and becomes internalized in trained policies. This leads to persistent and systematic drift in model behavior. Overall, our findings highlight evaluation rubrics as a sensitive and manipulable control interface, revealing a system-level alignment risk that extends beyond evaluator reliability alone. The code is available at: https://github.com/ZDCSlab/Rubrics-as-an-Attack-Surface. Warning: Certain sections may contain potentially harmful content that may not be appropriate for all readers.
Authors:Sihao Hu, Selim Furkan Tekin, Yichang Xu, Ling Liu
Abstract:
Launchpads have become the dominant mechanism for issuing memecoins on blockchains due to their fully automated, no-code creation process. This new issuance paradigm has led to a surge in high-risk token launches, causing substantial financial losses for unsuspecting buyers. In this paper, we introduce MemeTrans, the first dataset for studying and detecting high-risk memecoin launches on Solana. MemeTrans covers over 40k memecoin launches that successfully migrated to the public Decentralized Exchange (DEX), with over 30 million transactions during the initial sale on launchpad and 180 million transactions after migration. To precisely capture launch patterns, we design 122 features spanning dimensions such as context, trading activity, holding concentration, and time-series dynamics, supplemented with bundle-level data that reveals multiple accounts controlled by the same entity. Finally, we introduce an annotation approach to label the risk level of memecoin launches, which combines statistical indicators with a manipulation-pattern detector. Experiments on the introduced high-risk launch detection task suggest that designed features are informative for capturing high-risk patterns and ML models trained on MemeTrans can effectively reduce financial loss by 56.1%. Our dataset, experimental code, and pipeline are publicly available at: https://github.com/git-disl/MemeTrans.
Authors:Xu Li, Simon Yu, Minzhou Pan, Yiyou Sun, Bo Li, Dawn Song, Xue Lin, Weiyan Shi
Abstract:
LLM-based agents are becoming increasingly capable, yet their safety lags behind. This creates a gap between what agents can do and should do. This gap widens as agents engage in multi-turn interactions and employ diverse tools, introducing new risks overlooked by existing benchmarks. To systematically scale safety testing into multi-turn, tool-realistic settings, we propose a principled taxonomy that transforms single-turn harmful tasks into multi-turn attack sequences. Using this taxonomy, we construct MT-AgentRisk (Multi-Turn Agent Risk Benchmark), the first benchmark to evaluate multi-turn tool-using agent safety. Our experiments reveal substantial safety degradation: the Attack Success Rate (ASR) increases by 16% on average across open and closed models in multi-turn settings. To close this gap, we propose ToolShield, a training-free, tool-agnostic, self-exploration defense: when encountering a new tool, the agent autonomously generates test cases, executes them to observe downstream effects, and distills safety experiences for deployment. Experiments show that ToolShield effectively reduces ASR by 30% on average in multi-turn interactions. Our code is available at https://github.com/CHATS-lab/ToolShield.
Authors:Rubén Pérez-Jove, Osvaldo Simeone, Alejandro Pazos, Jose Vázquez-Naya
Abstract:
Operating System (OS) fingerprinting is critical for network security, but conventional methods do not provide formal uncertainty quantification mechanisms. Conformal Prediction (CP) could be directly wrapped around existing methods to obtain prediction sets with guaranteed coverage. However, a direct application of CP would treat OS identification as a flat classification problem, ignoring the natural taxonomic structure of OSs and providing brittle point predictions. This work addresses these limitations by introducing and evaluating two distinct structured CP strategies: level-wise CP (L-CP), which calibrates each hierarchy level independently, and projection-based CP (P-CP), which ensures structural consistency by projecting leaf-level sets upwards. Our results demonstrate that, while both methods satisfy validity guarantees, they expose a fundamental trade-off between level-wise efficiency and structural consistency. L-CP yields tighter prediction sets suitable for human forensic analysis but suffers from taxonomic inconsistencies. Conversely, P-CP guarantees hierarchically consistent, nested sets ideal for automated policy enforcement, albeit at the cost of reduced efficiency at coarser levels.
Authors:Dong Yan, Jian Liang, Ran He, Tieniu Tan
Abstract:
Recent studies have shown that large language models (LLMs) can infer private user attributes (e.g., age, location, gender) from user-generated text shared online, enabling rapid and large-scale privacy breaches. Existing anonymization-based defenses are coarse-grained, lacking word-level precision in anonymizing privacy-leaking elements. Moreover, they are inherently limited as altering user text to hide sensitive cues still allows attribute inference to occur through models' reasoning capabilities. To address these limitations, we propose a unified defense framework that combines fine-grained anonymization (TRACE) with inference-preventing optimization (RPS). TRACE leverages attention mechanisms and inference chain generation to identify and anonymize privacy-leaking textual elements, while RPS employs a lightweight two-stage optimization strategy to induce model rejection behaviors, thereby preventing attribute inference. Evaluations across diverse LLMs show that TRACE-RPS reduces attribute inference accuracy from around 50\% to below 5\% on open-source models. In addition, our approach offers strong cross-model generalization, prompt-variation robustness, and utility-privacy tradeoffs. Our code is available at https://github.com/Jasper-Yan/TRACE-RPS.
Authors:Valery Khvatov, Alexey Neyman
Abstract:
Formal privacy metrics provide compliance-oriented guarantees but often fail to quantify actual linkability in released datasets. We introduce CVPL (Cluster-Vector-Projection Linkage), a geometric framework for post-hoc assessment of linkage risk between original and protected tabular data. CVPL represents linkage analysis as an operator pipeline comprising blocking, vectorization, latent projection, and similarity evaluation, yielding continuous, scenario-dependent risk estimates rather than binary compliance verdicts. We formally define CVPL under an explicit threat model and introduce threshold-aware risk surfaces, R(lambda, tau), that capture the joint effects of protection strength and attacker strictness. We establish a progressive blocking strategy with monotonicity guarantees, enabling anytime risk estimation with valid lower bounds. We demonstrate that the classical Fellegi-Sunter linkage emerges as a special case of CVPL under restrictive assumptions, and that violations of these assumptions can lead to systematic over-linking bias. Empirical validation on 10,000 records across 19 protection configurations demonstrates that formal k-anonymity compliance may coexist with substantial empirical linkability, with a significant portion arising from non-quasi-identifier behavioral patterns. CVPL provides interpretable diagnostics identifying which features drive linkage feasibility, supporting privacy impact assessment, protection mechanism comparison, and utility-risk trade-off analysis.
Authors:Yuxin Cao, Wei Song, Shangzhi Xu, Jingling Xue, Jin Song Dong
Abstract:
Video Large Language Models (VideoLLMs) have recently achieved strong performance in video understanding tasks. However, we identify a previously underexplored generation failure: severe output repetition, where models degenerate into self-reinforcing loops of repeated phrases or sentences. This failure mode is not captured by existing VideoLLM benchmarks, which focus primarily on task accuracy and factual correctness. We introduce VideoSTF, the first framework for systematically measuring and stress-testing output repetition in VideoLLMs. VideoSTF formalizes repetition using three complementary n-gram-based metrics and provides a standardized testbed of 10,000 diverse videos together with a library of controlled temporal transformations. Using VideoSTF, we conduct pervasive testing, temporal stress testing, and adversarial exploitation across 10 advanced VideoLLMs. We find that output repetition is widespread and, critically, highly sensitive to temporal perturbations of video inputs. Moreover, we show that simple temporal transformations can efficiently induce repetitive degeneration in a black-box setting, exposing output repetition as an exploitable security vulnerability. Our results reveal output repetition as a fundamental stability issue in modern VideoLLMs and motivate stability-aware evaluation for video-language systems. Our evaluation code and scripts are available at: https://github.com/yuxincao22/VideoSTF_benchmark.
Authors:Kun Wang, Zherui Li, Zhenhong Zhou, Yitong Zhang, Yan Mi, Kun Yang, Yiming Zhang, Junhao Dong, Zhongxiang Sun, Qiankun Li, Yang Liu
Abstract:
Omni-modal Large Language Models (OLLMs) greatly expand LLMs' multimodal capabilities but also introduce cross-modal safety risks. However, a systematic understanding of vulnerabilities in omni-modal interactions remains lacking. To bridge this gap, we establish a modality-semantics decoupling principle and construct the AdvBench-Omni dataset, which reveals a significant vulnerability in OLLMs. Mechanistic analysis uncovers a Mid-layer Dissolution phenomenon driven by refusal vector magnitude shrinkage, alongside the existence of a modal-invariant pure refusal direction. Inspired by these insights, we extract a golden refusal vector using Singular Value Decomposition and propose OmniSteer, which utilizes lightweight adapters to modulate intervention intensity adaptively. Extensive experiments show that our method not only increases the Refusal Success Rate against harmful inputs from 69.9% to 91.2%, but also effectively preserves the general capabilities across all modalities. Our code is available at: https://github.com/zhrli324/omni-safety-research.
Authors:Lepeng Zhao, Zhenhua Zou, Shuo Li, Zhuotao Liu
Abstract:
Mobile Graphical User Interface (GUI) agents have demonstrated strong capabilities in automating complex smartphone tasks by leveraging multimodal large language models (MLLMs) and system-level control interfaces. However, this paradigm introduces significant privacy risks, as agents typically capture and process entire screen contents, thereby exposing sensitive personal data such as phone numbers, addresses, messages, and financial information. Existing defenses either reduce UI exposure, obfuscate only task-irrelevant content, or rely on user authorization, but none can protect task-critical sensitive information while preserving seamless agent usability. We propose an anonymization-based privacy protection framework that enforces the principle of available-but-invisible access to sensitive data: sensitive information remains usable for task execution but is never directly visible to the cloud-based agent. Our system detects sensitive UI content using a PII-aware recognition model and replaces it with deterministic, type-preserving placeholders (e.g., PHONE_NUMBER#a1b2c) that retain semantic categories while removing identifying details. A layered architecture comprising a PII Detector, UI Transformer, Secure Interaction Proxy, and Privacy Gatekeeper ensures consistent anonymization across user instructions, XML hierarchies, and screenshots, mediates all agent actions over anonymized interfaces, and supports narrowly scoped local computations when reasoning over raw values is necessary. Extensive experiments on the AndroidLab and PrivScreen benchmarks show that our framework substantially reduces privacy leakage across multiple models while incurring only modest utility degradation, achieving the best observed privacy-utility trade-off among existing methods. Code available at: https://github.com/one-step-beh1nd/gui_privacy_protection
Authors:Zhiyu Sun, Minrui Luo, Yu Wang, Zhili Chen, Tianxing He
Abstract:
Large language models (LLMs) are pretrained on corpora containing trillions of tokens and, therefore, inevitably memorize sensitive information. Locate-then-edit methods, as a mainstream paradigm of model editing, offer a promising solution by modifying model parameters without retraining. However, in this work, we reveal a critical vulnerability of this paradigm: the parameter updates inadvertently serve as a side channel, enabling attackers to recover the edited data. We propose a two-stage reverse-engineering attack named \textit{KSTER} (\textbf{K}ey\textbf{S}paceRecons\textbf{T}ruction-then-\textbf{E}ntropy\textbf{R}eduction) that leverages the low-rank structure of these updates. First, we theoretically show that the row space of the update matrix encodes a ``fingerprint" of the edited subjects, enabling accurate subject recovery via spectral analysis. Second, we introduce an entropy-based prompt recovery attack that reconstructs the semantic context of the edit. Extensive experiments on multiple LLMs demonstrate that our attacks can recover edited data with high success rates. Furthermore, we propose \textit{subspace camouflage}, a defense strategy that obfuscates the update fingerprint with semantic decoys. This approach effectively mitigates reconstruction risks without compromising editing utility. Our code is available at https://github.com/reanatom/EditingAtk.git.
Authors:Suraj Ranganath, Atharv Ramesh
Abstract:
AI-text detectors face a critical robustness challenge: adversarial paraphrasing attacks that preserve semantics while evading detection. We introduce StealthRL, a reinforcement learning framework that stress-tests detector robustness under realistic adversarial conditions. StealthRL trains a paraphrase policy against a multi-detector ensemble using Group Relative Policy Optimization (GRPO) with LoRA adapters on Qwen3-4B, optimizing a composite reward that balances detector evasion with semantic preservation. We evaluate six attack settings (M0-M5) on the full filtered MAGE test pool (15,310 human / 14,656 AI) against four detectors: RoBERTa, Fast-DetectGPT, Binoculars, and MAGE. StealthRL achieves near-zero detection on three of the four detectors and a 0.024 mean TPR@1%FPR, reducing mean AUROC from 0.79 to 0.43 and attaining a 97.6% attack success rate. Critically, attacks transfer to two held-out detectors not seen during training, revealing shared architectural vulnerabilities rather than detector-specific brittleness. We additionally conduct LLM-based quality evaluation via Likert scoring on 500 matched samples per method, analyze detector score distributions to explain why evasion succeeds, and provide per-detector AUROC with bootstrap confidence intervals. Our results expose significant robustness gaps in current AI-text detection and establish StealthRL as a principled adversarial evaluation protocol. Code and evaluation pipeline are publicly available at https://github.com/suraj-ranganath/StealthRL.
Authors:Jaylen Jones, Zhehao Zhang, Yuting Ning, Eric Fosler-Lussier, Pierre-Luc St-Charles, Yoshua Bengio, Dawn Song, Yu Su, Huan Sun
Abstract:
Although computer-use agents (CUAs) hold significant potential to automate increasingly complex OS workflows, they can demonstrate unsafe unintended behaviors that deviate from expected outcomes even under benign input contexts. However, exploration of this risk remains largely anecdotal, lacking concrete characterization and automated methods to proactively surface long-tail unintended behaviors under realistic CUA scenarios. To fill this gap, we introduce the first conceptual and methodological framework for unintended CUA behaviors, by defining their key characteristics, automatically eliciting them, and analyzing how they arise from benign inputs. We propose AutoElicit: an agentic framework that iteratively perturbs benign instructions using CUA execution feedback, and elicits severe harms while keeping perturbations realistic and benign. Using AutoElicit, we surface hundreds of harmful unintended behaviors from state-of-the-art CUAs such as Claude 4.5 Haiku and Opus. We further evaluate the transferability of human-verified successful perturbations, identifying persistent susceptibility to unintended behaviors across various other frontier CUAs. This work establishes a foundation for systematically analyzing unintended behaviors in realistic computer-use settings.
Authors:Md Nafiu Rahman, Sadif Ahmed, Zahin Wahab, Gias Uddin, Rifat Shahriyar
Abstract:
GitHub and GitLab are widely used collaborative platforms whose issue-tracking systems contain large volumes of unstructured text, including logs, code snippets, and configuration examples. This creates a significant risk of accidental secret exposure, such as API keys and credentials, yet these platforms provide no mechanism to warn users before submission. We present \textsc{IssueGuard}, a tool for real-time detection and prevention of secret leaks in issue reports. Implemented as a Chrome extension, \textsc{IssueGuard} analyzes text as users type and combines regex-based candidate extraction with a fine-tuned CodeBERT model for contextual classification. This approach effectively separates real secrets from false positives and achieves an F1-score of 92.70\% on a benchmark dataset, outperforming traditional regex-based scanners. \textsc{IssueGuard} integrates directly into the web interface and continuously analyzes the issue editor, presenting clear visual warnings to help users avoid submitting sensitive data. The source code is publicly available at \href{https://github.com/nafiurahman00/IssueGuard}{https://github.com/nafiurahman00/IssueGuard}, and a demonstration video is available at \href{https://youtu.be/kvbWA8rr9cU}{https://youtu.be/kvbWA8rr9cU}.
Authors:Tianyi Wu, Mingzhe Du, Yue Liu, Chengran Yang, Terry Yue Zhuo, Jiaheng Zhang, See-Kiong Ng
Abstract:
Large language models (LLMs) are increasingly used in software development, yet their tendency to generate insecure code remains a major barrier to real-world deployment. Existing secure code alignment methods often suffer from a functionality--security paradox, improving security at the cost of substantial utility degradation. We propose SecCoderX, an online reinforcement learning framework for functionality-preserving secure code generation. SecCoderX first bridges vulnerability detection and secure code generation by repurposing mature detection resources in two ways: (i) synthesizing diverse, reality-grounded vulnerability-inducing coding tasks for online RL rollouts, and (ii) training a reasoning-based vulnerability reward model that provides scalable and reliable security supervision. Together, these components are unified in an online RL loop to align code LLMs to generate secure and functional code. Extensive experiments demonstrate that SecCoderX achieves state-of-the-art performance, improving Effective Safety Rate (ESR) by approximately 10% over unaligned models, whereas prior methods often degrade ESR by 14-54%. We release our code, dataset and model checkpoints at https://github.com/AndrewWTY/SecCoderX.
Authors:Ruoyao Wen, Hao Li, Chaowei Xiao, Ning Zhang
Abstract:
Indirect prompt injection threatens LLM agents by embedding malicious instructions in external content, enabling unauthorized actions and data theft. LLM agents maintain working memory through their context window, which stores interaction history for decision-making. Conventional agents indiscriminately accumulate all tool outputs and reasoning traces in this memory, creating two critical vulnerabilities: (1) injected instructions persist throughout the workflow, granting attackers multiple opportunities to manipulate behavior, and (2) verbose, non-essential content degrades decision-making capabilities. Existing defenses treat bloated memory as given and focus on remaining resilient, rather than reducing unnecessary accumulation to prevent the attack. We present AgentSys, a framework that defends against indirect prompt injection through explicit memory management. Inspired by process memory isolation in operating systems, AgentSys organizes agents hierarchically: a main agent spawns worker agents for tool calls, each running in an isolated context and able to spawn nested workers for subtasks. External data and subtask traces never enter the main agent's memory; only schema-validated return values can cross boundaries through deterministic JSON parsing. Ablations show isolation alone cuts attack success to 2.19%, and adding a validator/sanitizer further improves defense with event-triggered checks whose overhead scales with operations rather than context length. On AgentDojo and ASB, AgentSys achieves 0.78% and 4.25% attack success while slightly improving benign utility over undefended baselines. It remains robust to adaptive attackers and across multiple foundation models, showing that explicit memory management enables secure, dynamic LLM agent architectures. Our code is available at: https://github.com/ruoyaow/agentsys-memory.
Authors:Abdullah Arafat Miah, Kevin Vu, Yu Bi
Abstract:
Spiking Neural Networks (SNNs) are energy-efficient counterparts of Deep Neural Networks (DNNs) with high biological plausibility, as information is transmitted through temporal spiking patterns. The core element of an SNN is the spiking neuron, which converts input data into spikes following the Leaky Integrate-and-Fire (LIF) neuron model. This model includes several important hyperparameters, such as the membrane potential threshold and membrane time constant. Both the DNNs and SNNs have proven to be exploitable by backdoor attacks, where an adversary can poison the training dataset with malicious triggers and force the model to behave in an attacker-defined manner. Yet, how an adversary can exploit the unique characteristics of SNNs for backdoor attacks remains underexplored. In this paper, we propose \textit{BadSNN}, a novel backdoor attack on spiking neural networks that exploits hyperparameter variations of spiking neurons to inject backdoor behavior into the model. We further propose a trigger optimization process to achieve better attack performance while making trigger patterns less perceptible. \textit{BadSNN} demonstrates superior attack performance on various datasets and architectures, as well as compared with state-of-the-art data poisoning-based backdoor attacks and robustness against common backdoor mitigation techniques. Codes can be found at https://github.com/SiSL-URI/BadSNN.
Authors:Abdullah Arafat Miah, Yu Bi
Abstract:
Deep Neural Networks (DNNs) are vulnerable to backdoor attacks. Due to the nature of Machine Learning as a Service (MLaaS) applications, black-box defenses are more practical than white-box methods, yet existing purification techniques suffer from key limitations: a lack of justification for specific transformations, dataset dependency, high computational overhead, and a neglect of frequency-domain transformations. This paper conducts a preliminary study on various image transformations, identifying down-upscaling as the most effective backdoor trigger disruption technique. We subsequently propose \texttt{Lite-BD}, a lightweight two-stage blackbox backdoor defense. \texttt{Lite-BD} first employs a super-resolution-based down-upscaling stage to neutralize spatial triggers. A secondary stage utilizes query-based band-by-band frequency filtering to remove triggers hidden in specific bands. Extensive experiments against state-of-the-art attacks demonstrate that \texttt{Lite-BD} provides robust and efficient protection. Codes can be found at https://github.com/SiSL-URI/Lite-BD.
Authors:Shang Liu, Hanyu Pei, Zeyan Liu
Abstract:
Large Language Models(LLMs) have been successful in numerous fields. Alignment has usually been applied to prevent them from harmful purposes. However, aligned LLMs remain vulnerable to jailbreak attacks that deliberately mislead them into producing harmful outputs. Existing jailbreaks are either black-box, using carefully crafted, unstealthy prompts, or white-box, requiring resource-intensive computation. In light of these challenges, we introduce ShallowJail, a novel attack that exploits shallow alignment in LLMs. ShallowJail can misguide LLMs' responses by manipulating the initial tokens during inference. Through extensive experiments, we demonstrate the effectiveness of ShallowJail, which substantially degrades the safety of state-of-the-art LLM responses. Our code is available at https://github.com/liuup/ShallowJail.
Authors:Saad Hossain, Tom Tseng, Punya Syon Pandey, Samanvay Vajpayee, Matthew Kowal, Nayeema Nonta, Samuel Simko, Stephen Casper, Zhijing Jin, Kellin Pelrine, Sirisha Rambhatla
Abstract:
As increasingly capable open-weight large language models (LLMs) are deployed, improving their tamper resistance against unsafe modifications, whether accidental or intentional, becomes critical to minimize risks. However, there is no standard approach to evaluate tamper resistance. Varied data sets, metrics, and tampering configurations make it difficult to compare safety, utility, and robustness across different models and defenses. To this end, we introduce TamperBench, the first unified framework to systematically evaluate the tamper resistance of LLMs. TamperBench (i) curates a repository of state-of-the-art weight-space fine-tuning attacks and latent-space representation attacks; (ii) enables realistic adversarial evaluation through systematic hyperparameter sweeps per attack-model pair; and (iii) provides both safety and utility evaluations. TamperBench requires minimal additional code to specify any fine-tuning configuration, alignment-stage defense method, and metric suite while ensuring end-to-end reproducibility. We use TamperBench to evaluate 21 open-weight LLMs, including defense-augmented variants, across nine tampering threats using standardized safety and capability metrics with hyperparameter sweeps per model-attack pair. This yields novel insights, including effects of post-training on tamper resistance, that jailbreak-tuning is typically the most severe attack, and that Triplet emerges as a leading alignment-stage defense. Code is available at: https://github.com/criticalml-uw/TamperBench
Authors:José Ramón Pareja Monturiol, Juliette Sinnott, Roger G. Melko, Mohammad Kohandel
Abstract:
Machine learning in clinical settings must balance predictive accuracy, interpretability, and privacy. Models such as logistic regression (LR) offer transparency, while neural networks (NNs) provide greater predictive power; yet both remain vulnerable to privacy attacks. We empirically assess these risks by designing attacks that identify which public datasets were used to train a model under varying levels of adversarial access, applying them to LORIS, a publicly available LR model for immunotherapy response prediction, as well as to additional shallow NN models trained for the same task. Our results show that both models leak significant training-set information, with LRs proving particularly vulnerable in white-box scenarios. Moreover, we observe that common practices such as cross-validation in LRs exacerbate these risks. To mitigate these vulnerabilities, we propose a quantum-inspired defense based on tensorizing discretized models into tensor trains (TTs), which fully obfuscates parameters while preserving accuracy, reducing white-box attacks to random guessing and degrading black-box attacks comparably to Differential Privacy. TT models retain LR interpretability and extend it through efficient computation of marginal and conditional distributions, while also enabling this higher level of interpretability for NNs. Our results demonstrate that tensorization is widely applicable and establishes a practical foundation for private, interpretable, and effective clinical prediction.
Authors:Licheng Pan, Yunsheng Lu, Jiexi Liu, Jialing Tao, Haozhe Feng, Hui Xue, Zhixuan Chu, Kui Ren
Abstract:
Uncovering the mechanisms behind "jailbreaks" in large language models (LLMs) is crucial for enhancing their safety and reliability, yet these mechanisms remain poorly understood. Existing studies predominantly analyze jailbreak prompts by probing latent representations, often overlooking the causal relationships between interpretable prompt features and jailbreak occurrences. In this work, we propose Causal Analyst, a framework that integrates LLMs into data-driven causal discovery to identify the direct causes of jailbreaks and leverage them for both attack and defense. We introduce a comprehensive dataset comprising 35k jailbreak attempts across seven LLMs, systematically constructed from 100 attack templates and 50 harmful queries, annotated with 37 meticulously designed human-readable prompt features. By jointly training LLM-based prompt encoding and GNN-based causal graph learning, we reconstruct causal pathways linking prompt features to jailbreak responses. Our analysis reveals that specific features, such as "Positive Character" and "Number of Task Steps", act as direct causal drivers of jailbreaks. We demonstrate the practical utility of these insights through two applications: (1) a Jailbreaking Enhancer that targets identified causal features to significantly boost attack success rates on public benchmarks, and (2) a Guardrail Advisor that utilizes the learned causal graph to extract true malicious intent from obfuscated queries. Extensive experiments, including baseline comparisons and causal structure validation, confirm the robustness of our causal analysis and its superiority over non-causal approaches. Our results suggest that analyzing jailbreak features from a causal perspective is an effective and interpretable approach for improving LLM reliability. Our code is available at https://github.com/Master-PLC/Causal-Analyst.
Authors:Zeming Wei, Qiaosheng Zhang, Xia Hu, Xingcheng Xu
Abstract:
Large Reasoning Models (LRMs) have achieved tremendous success with their chain-of-thought (CoT) reasoning, yet also face safety issues similar to those of basic language models. In particular, while algorithms are designed to guide them to deliberately refuse harmful prompts with safe reasoning, this process often fails to generalize against diverse and complex jailbreak attacks. In this work, we attribute these failures to the generalization of the safe reasoning process, particularly their insufficiency against complex attack prompts. We provide both theoretical and empirical evidence to show the necessity of a more sufficient safe reasoning process to defend against advanced attack prompts. Building on this insight, we propose a Risk-Aware Preference Optimization (RAPO) framework that enables LRM to adaptively identify and address the safety risks with appropriate granularity in its thinking content. Extensive experiments demonstrate that RAPO successfully generalizes multiple LRMs' safe reasoning adaptively across diverse attack prompts whilst preserving general utility, contributing a robust alignment technique for LRM safety. Our code is available at https://github.com/weizeming/RAPO.
Authors:Longjie Zhao, Ziming Hong, Jiaxin Huang, Runnan Chen, Mingming Gong, Tongliang Liu
Abstract:
3D Gaussian Splatting (3DGS) has become a mainstream representation for real-time 3D scene synthesis, enabling applications in virtual and augmented reality, robotics, and 3D content creation. Its rising commercial value and explicit parametric structure raise emerging intellectual property (IP) protection concerns, prompting a surge of research on 3DGS IP protection. However, current progress remains fragmented, lacking a unified view of the underlying mechanisms, protection paradigms, and robustness challenges. To address this gap, we present the first systematic survey on 3DGS IP protection and introduce a bottom-up framework that examines (i) underlying Gaussian-based perturbation mechanisms, (ii) passive and active protection paradigms, and (iii) robustness threats under emerging generative AI era, revealing gaps in technical foundations and robustness characterization and indicating opportunities for deeper investigation. Finally, we outline six research directions across robustness, efficiency, and protection paradigms, offering a roadmap toward reliable and trustworthy IP protection for 3DGS assets.
Authors:Xilong Wang, Yinuo Liu, Zhun Wang, Dawn Song, Neil Gong
Abstract:
Prompt injection attacks manipulate webpage content to cause web agents to execute attacker-specified tasks instead of the user's intended ones. Existing methods for detecting and localizing such attacks achieve limited effectiveness, as their underlying assumptions often do not hold in the web-agent setting. In this work, we propose WebSentinel, a two-step approach for detecting and localizing prompt injection attacks in webpages. Given a webpage, Step I extracts \emph{segments of interest} that may be contaminated, and Step II evaluates each segment by checking its consistency with the webpage content as context. We show that WebSentinel is highly effective, substantially outperforming baseline methods across multiple datasets of both contaminated and clean webpages that we collected. Our code is available at: https://github.com/wxl-lxw/WebSentinel.
Authors:Yi Yu, Qixin Zhang, Shuhan Ye, Xun Lin, Qianshan Wei, Kun Wang, Wenhan Yang, Dacheng Tao, Xudong Jiang
Abstract:
Spiking neural networks (SNNs) compute with discrete spikes and exploit temporal structure, yet most adversarial attacks change intensities or event counts instead of timing. We study a timing-only adversary that retimes existing spikes while preserving spike counts and amplitudes in event-driven SNNs, thus remaining rate-preserving. We formalize a capacity-1 spike-retiming threat model with a unified trio of budgets: per-spike jitter $\mathcal{B}_{\infty}$, total delay $\mathcal{B}_{1}$, and tamper count $\mathcal{B}_{0}$. Feasible adversarial examples must satisfy timeline consistency and non-overlap, which makes the search space discrete and constrained. To optimize such retimings at scale, we use projected-in-the-loop (PIL) optimization: shift-probability logits yield a differentiable soft retiming for backpropagation, and a strict projection in the forward pass produces a feasible discrete schedule that satisfies capacity-1, non-overlap, and the chosen budget at every step. The objective maximizes task loss on the projected input and adds a capacity regularizer together with budget-aware penalties, which stabilizes gradients and aligns optimization with evaluation. Across event-driven benchmarks (CIFAR10-DVS, DVS-Gesture, N-MNIST) and diverse SNN architectures, we evaluate under binary and integer event grids and a range of retiming budgets, and also test models trained with timing-aware adversarial training designed to counter timing-only attacks. For example, on DVS-Gesture the attack attains high success (over $90\%$) while touching fewer than $2\%$ of spikes under $\mathcal{B}_{0}$. Taken together, our results show that spike retiming is a practical and stealthy attack surface that current defenses struggle to counter, providing a clear reference for temporal robustness in event-driven SNNs. Code is available at https://github.com/yuyi-sd/Spike-Retiming-Attacks.
Authors:Hao Li, Ruoyao Wen, Shanghao Shi, Ning Zhang, Chaowei Xiao
Abstract:
AI agents that autonomously interact with external tools and environments show great promise across real-world applications. However, the external data which agent consumes also leads to the risk of indirect prompt injection attacks, where malicious instructions embedded in third-party content hijack agent behavior. Guided by benchmarks, such as AgentDojo, there has been significant amount of progress in developing defense against the said attacks. As the technology continues to mature, and that agents are increasingly being relied upon for more complex tasks, there is increasing pressing need to also evolve the benchmark to reflect threat landscape faced by emerging agentic systems. In this work, we reveal three fundamental flaws in current benchmarks and push the frontier along these dimensions: (i) lack of dynamic open-ended tasks, (ii) lack of helpful instructions, and (iii) simplistic user tasks. To bridge this gap, we introduce AgentDyn, a manually designed benchmark featuring 60 challenging open-ended tasks and 560 injection test cases across Shopping, GitHub, and Daily Life. Unlike prior static benchmarks, AgentDyn requires dynamic planning and incorporates helpful third-party instructions. Our evaluation of ten state-of-the-art defenses suggests that almost all existing defenses are either not secure enough or suffer from significant over-defense, revealing that existing defenses are still far from real-world deployment. Our benchmark is available at https://github.com/leolee99/AgentDyn.
Authors:Xianzhen Luo, Jingyuan Zhang, Shiqi Zhou, Rain Huang, Chuan Xiao, Qingfu Zhu, Zhiyuan Ma, Xing Yue, Yang Yue, Wencong Zeng, Wanxiang Che
Abstract:
Evaluating and improving the security capabilities of code agents requires high-quality, executable vulnerability tasks. However, existing works rely on costly, unscalable manual reproduction and suffer from outdated data distributions. To address these, we present CVE-Factory, the first multi-agent framework to achieve expert-level quality in automatically transforming sparse CVE metadata into fully executable agentic tasks. Cross-validation against human expert reproductions shows that CVE-Factory achieves 95\% solution correctness and 96\% environment fidelity, confirming its expert-level quality. It is also evaluated on the latest realistic vulnerabilities and achieves a 66.2\% verified success. This automation enables two downstream contributions. First, we construct LiveCVEBench, a continuously updated benchmark of 190 tasks spanning 14 languages and 153 repositories that captures emerging threats including AI-tooling vulnerabilities. Second, we synthesize over 1,000 executable training environments, the first large-scale scaling of agentic tasks in code security. Fine-tuned Qwen3-32B improves from 5.3\% to 35.8\% on LiveCVEBench, surpassing Claude 4.5 Sonnet, with gains generalizing to Terminal Bench (12.5\% to 31.3\%). We open-source CVE-Factory, LiveCVEBench, Abacus-cve (fine-tuned model), training dataset, and leaderboard. All resources are available at https://github.com/livecvebench/CVE-Factory .
Authors:Takahito Nakajima
Abstract:
Background: As of 2026, Large Language Models (LLMs) demonstrate expert-level medical knowledge. However, deploying them as autonomous "Clinical Agents" remains limited. Current Electronic Medical Records (EMRs) and standards like FHIR are designed for human review, creating a "Context Mismatch": AI agents receive fragmented data and must rely on probabilistic inference (e.g., RAG) to reconstruct patient history. This approach causes hallucinations and hinders auditability. Methods: We propose MedBeads, an agent-native data infrastructure where clinical events are immutable "Beads"--nodes in a Merkle Directed Acyclic Graph (DAG)--cryptographically referencing causal predecessors. This "write-once, read-many" architecture makes tampering mathematically detectable. We implemented a prototype with a Go Core Engine, Python middleware for LLM integration, and a React-based visualization interface. Results: We successfully implemented the workflow using synthetic data. The FHIR-to-DAG conversion transformed flat resources into a causally-linked graph. Our Breadth-First Search (BFS) Context Retrieval algorithm traverses relevant subgraphs with O(V+E) complexity, enabling real-time decision support. Tamper-evidence is guaranteed by design: any modification breaks the cryptographic chain. The visualization aids clinician understanding through explicit causal links. Conclusion: MedBeads addresses the "Context Mismatch" by shifting from probabilistic search to deterministic graph traversal, and from mutable records to immutable chains, providing the substrate for "Trustworthy Medical AI." It guarantees the context the AI receives is deterministic and tamper-evident, while the LLM determines interpretation. The structured Bead format serves as a token-efficient "AI-native language." We release MedBeads as open-source software to accelerate agent-native data standards.
Authors:Xiaogeng Liu, Xinyan Wang, Yechao Zhang, Sanjay Kariyappa, Chong Xiang, Muhao Chen, G. Edward Suh, Chaowei Xiao
Abstract:
Large reasoning models (LRMs) extend large language models with explicit multi-step reasoning traces, but this capability introduces a new class of prompt-induced inference-time denial-of-service (PI-DoS) attacks that exploit the high computational cost of reasoning. We first formalize inference cost for LRMs and define PI-DoS, then prove that any practical PI-DoS attack should satisfy three properties: (1) a high amplification ratio, where each query induces a disproportionately long reasoning trace relative to its own length; (ii) stealthiness, in which prompts and responses remain on the natural language manifold and evade distribution shift detectors; and (iii) optimizability, in which the attack supports efficient optimization without being slowed by its own success. Under this framework, we present ReasoningBomb, a reinforcement-learning-based PI-DoS framework that is guided by a constant-time surrogate reward and trains a large reasoning-model attacker to generate short natural prompts that drive victim LRMs into pathologically long and often effectively non-terminating reasoning. Across seven open-source models (including LLMs and LRMs) and three commercial LRMs, ReasoningBomb induces 18,759 completion tokens on average and 19,263 reasoning tokens on average across reasoning models. It outperforms the the runner-up baseline by 35% in completion tokens and 38% in reasoning tokens, while inducing 6-7x more tokens than benign queries and achieving 286.7x input-to-output amplification ratio averaged across all samples. Additionally, our method achieves 99.8% bypass rate on input-based detection, 98.7% on output-based detection, and 98.4% against strict dual-stage joint detection.
Authors:Yiheng Liu, Junhao Ning, Sichen Xia, Haiyang Sun, Yang Yang, Hanyang Chi, Xiaohui Gao, Ning Qiang, Bao Ge, Junwei Han, Xintao Hu
Abstract:
The development of large language models (LLMs) is costly and has significant commercial value. Consequently, preventing unauthorized appropriation of open-source LLMs and protecting developers' intellectual property rights have become critical challenges. In this work, we propose the Functional Network Fingerprint (FNF), a training-free, sample-efficient method for detecting whether a suspect LLM is derived from a victim model, based on the consistency between their functional network activity. We demonstrate that models that share a common origin, even with differences in scale or architecture, exhibit highly consistent patterns of neuronal activity within their functional networks across diverse input samples. In contrast, models trained independently on distinct data or with different objectives fail to preserve such activity alignment. Unlike conventional approaches, our method requires only a few samples for verification, preserves model utility, and remains robust to common model modifications (such as fine-tuning, pruning, and parameter permutation), as well as to comparisons across diverse architectures and dimensionalities. FNF thus provides model owners and third parties with a simple, non-invasive, and effective tool for protecting LLM intellectual property. The code is available at https://github.com/WhatAboutMyStar/LLM_ACTIVATION.
Authors:Naufal Suryanto, Muzammal Naseer, Pengfei Li, Syed Talal Wasim, Jinhui Yi, Juergen Gall, Paolo Ceravolo, Ernesto Damiani
Abstract:
Cybersecurity operations demand assistant LLMs that support diverse workflows without exposing sensitive data. Existing solutions either rely on proprietary APIs with privacy risks or on open models lacking domain adaptation. To bridge this gap, we curate 11.8B tokens of cybersecurity-focused continual pretraining data via large-scale web filtering and manual collection of high-quality resources, spanning 28.6K documents across frameworks, offensive techniques, and security tools. Building on this, we design an agentic augmentation pipeline that simulates expert workflows to generate 266K multi-turn cybersecurity samples for supervised fine-tuning. Combined with general open-source LLM data, these resources enable the training of RedSage, an open-source, locally deployable cybersecurity assistant with domain-aware pretraining and post-training. To rigorously evaluate the models, we introduce RedSage-Bench, a benchmark with 30K multiple-choice and 240 open-ended Q&A items covering cybersecurity knowledge, skills, and tool expertise. RedSage is further evaluated on established cybersecurity benchmarks (e.g., CTI-Bench, CyberMetric, SECURE) and general LLM benchmarks to assess broader generalization. At the 8B scale, RedSage achieves consistently better results, surpassing the baseline models by up to +5.59 points on cybersecurity benchmarks and +5.05 points on Open LLM Leaderboard tasks. These findings demonstrate that domain-aware agentic augmentation and pre/post-training can not only enhance cybersecurity-specific expertise but also help to improve general reasoning and instruction-following. All models, datasets, and code are publicly available.
Authors:Ningyuan He, Ronghong Huang, Qianqian Tang, Hongyu Wang, Xianghang Mi, Shanqing Guo
Abstract:
In-context learning (ICL) has become a powerful, data-efficient paradigm for text classification using large language models. However, its robustness against realistic adversarial threats remains largely unexplored. We introduce ICL-Evader, a novel black-box evasion attack framework that operates under a highly practical zero-query threat model, requiring no access to model parameters, gradients, or query-based feedback during attack generation. We design three novel attacks, Fake Claim, Template, and Needle-in-a-Haystack, that exploit inherent limitations of LLMs in processing in-context prompts. Evaluated across sentiment analysis, toxicity, and illicit promotion tasks, our attacks significantly degrade classifier performance (e.g., achieving up to 95.3% attack success rate), drastically outperforming traditional NLP attacks which prove ineffective under the same constraints. To counter these vulnerabilities, we systematically investigate defense strategies and identify a joint defense recipe that effectively mitigates all attacks with minimal utility loss (<5% accuracy degradation). Finally, we translate our defensive insights into an automated tool that proactively fortifies standard ICL prompts against adversarial evasion. This work provides a comprehensive security assessment of ICL, revealing critical vulnerabilities and offering practical solutions for building more robust systems. Our source code and evaluation datasets are publicly available at: https://github.com/ChaseSecurity/ICL-Evader .
Authors:Xingwei Lin, Wenhao Lin, Sicong Cao, Jiahao Yu, Renke Huang, Lei Xue, Chunming Wu
Abstract:
Multi-turn jailbreak attacks have emerged as a critical threat to Large Language Models (LLMs), bypassing safety mechanisms by progressively constructing adversarial contexts from scratch and incrementally refining prompts. However, existing methods suffer from the inefficiency of incremental context construction that requires step-by-step LLM interaction, and often stagnate in suboptimal regions due to surface-level optimization. In this paper, we characterize the Intent-Context Coupling phenomenon, revealing that LLM safety constraints are significantly relaxed when a malicious intent is coupled with a semantically congruent context pattern. Driven by this insight, we propose ICON, an automated multi-turn jailbreak framework that efficiently constructs an authoritative-style context via prior-guided semantic routing. Specifically, ICON first routes the malicious intent to a congruent context pattern (e.g., Scientific Research) and instantiates it into an attack prompt sequence. This sequence progressively builds the authoritative-style context and ultimately elicits prohibited content. In addition, ICON incorporates a Hierarchical Optimization Strategy that combines local prompt refinement with global context switching, preventing the attack from stagnating in ineffective contexts. Experimental results across eight SOTA LLMs demonstrate the effectiveness of ICON, achieving a state-of-the-art average Attack Success Rate (ASR) of 97.1\%. Code is available at https://github.com/xwlin-roy/ICON.
Authors:Dren Fazlija, Iyiola E. Olatunji, Daniel Kudenko, Sandipan Sikdar
Abstract:
With LLMs increasingly deployed in corporate data management, it is crucial to ensure that these models do not leak sensitive information. In the context of corporate data management, the concept of sensitivity awareness has been introduced, enabling LLMs to adhere to predefined access rights rules. However, it remains unclear how sensitivity awareness relates to established notions of privacy, such as differential privacy (DP), thereby making it difficult to deploy meaningfully in real-world applications. In this work, we formalize the notion of sensitivity awareness and theoretically establish its connection to DP. Additionally, we develop a supervised fine-tuning recipe to make existing, four-bit quantized LLMs more sensitivity-aware. With a performance boost of up to 21.7%, the finetuned LLMs not only substantially improve over their baseline but also outperform other full-precision open-source and commercial models of similar size in achieving sensitivity awareness, demonstrating the effectiveness of our proposed approach. At the same time, our method also largely preserves the models' performance on other tasks, such as general instruction-following, mathematical, and common-sense reasoning.
Authors:Bharath Krishnamurthy, Ajita Rattani
Abstract:
Morphing techniques generate artificial biometric samples that combine features from multiple individuals, allowing each contributor to be verified against a single enrolled template. While extensively studied in face recognition, this vulnerability remains largely unexplored in voice biometrics. Prior work on voice morphing is computationally expensive, non-scalable, and limited to acoustically similar identity pairs, constraining practical deployment. Moreover, existing sound-morphing methods target audio textures, music, or environmental sounds and are not transferable to voice identity manipulation. We propose VoxMorph, a zero-shot framework that produces high-fidelity voice morphs from as little as five seconds of audio per subject without model retraining. Our method disentangles vocal traits into prosody and timbre embeddings, enabling fine-grained interpolation of speaking style and identity. These embeddings are fused via Spherical Linear Interpolation (Slerp) and synthesized using an autoregressive language model coupled with a Conditional Flow Matching network. VoxMorph achieves state-of-the-art performance, delivering a 2.6x gain in audio quality, a 73% reduction in intelligibility errors, and a 67.8% morphing attack success rate on automated speaker verification systems under strict security thresholds. This work establishes a practical and scalable paradigm for voice morphing with significant implications for biometric security. The code and dataset are available on our project page: https://vcbsl.github.io/VoxMorph/
Authors:Yangyang Guo, Ziwei Xu, Si Liu, Zhiming Zheng, Mohan Kankanhalli
Abstract:
This study reveals a previously unexplored vulnerability in the safety alignment of Large Language Models (LLMs). Existing aligned LLMs predominantly respond to unsafe queries with refusals, which often begin with a fixed set of prefixes (I'm sorry). We demonstrate that this rigid refusal pattern is a vulnerability and introduce a novel \textbf{refusal unlearning} technique that exploits it. Specifically, we fine-tune LLMs using merely 1,000 benign samples, where each response is prepended with a refusal prefix. The underlying intuition is to disrupt the refusal completion pathway, thereby driving the model to forget how to refuse while following harmful instructions. This intuition is further supported by theoretical proofs. We apply this approach to a total of 16 LLMs, including various open-source models from Llama, Qwen, and Gemma families, as well as closed-source models such as Gemini and GPT. Experimental results show that the safety scores of previously aligned LLMs degrade both consistently and substantially. Importantly, we verify that the observed gain cannot be attributed to plain fine-tuning or random prefix effects. Our findings suggest that current safety alignment may rely heavily on token sequence memorization rather than reasoning, motivating future work beyond simple refusal mechanisms. Code has been released: https://github.com/guoyang9/refusal-unlearning.
Authors:Yanxi Wang, Zhiling Zhang, Wenbo Zhou, Weiming Zhang, Jie Zhang, Qiannan Zhu, Yu Shi, Shuxin Zheng, Jiyan He
Abstract:
GUI agents enable end-to-end automation through direct perception of and interaction with on-screen interfaces. However, these agents frequently access interfaces containing sensitive personal information, and screenshots are often transmitted to remote models, creating substantial privacy risks. These risks are particularly severe in GUI workflows: GUIs expose richer, more accessible private information, and privacy risks depend on interaction trajectories across sequential scenes. We propose GUIGuard, a three-stage framework for privacy-preserving GUI agents: (1) privacy recognition, (2) privacy protection, and (3) task execution under protection. We further construct GUIGuard-Bench, a cross-platform benchmark with 630 trajectories and 13,830 screenshots, annotated with region-level privacy grounding and fine-grained labels of risk level, privacy category, and task necessity. Evaluations reveal that existing agents exhibit limited privacy recognition, with state-of-the-art models achieving only 13.3% accuracy on Android and 1.4% on PC. Under privacy protection, task-planning semantics can still be maintained, with closed-source models showing stronger semantic consistency than open-source ones. Case studies on MobileWorld show that carefully designed protection strategies achieve higher task accuracy while preserving privacy. Our results highlight privacy recognition as a critical bottleneck for practical GUI agents. Project: https://futuresis.github.io/GUIGuard-page/
Authors:Deep Mehta
Abstract:
Aggregate analytics over conversational data are increasingly used for safety monitoring, governance, and product analysis in large language model systems. A common practice is to embed conversations, cluster them, and publish short textual summaries describing each cluster. While raw conversations may never be exposed, these derived summaries can still pose privacy risks if they contain personally identifying information (PII) or uniquely traceable strings copied from individual conversations. We introduce CanaryBench, a simple and reproducible stress test for privacy leakage in cluster-level conversation summaries. CanaryBench generates synthetic conversations with planted secret strings ("canaries") that simulate sensitive identifiers. Because canaries are known a priori, any appearance of these strings in published summaries constitutes a measurable leak. Using TF-IDF embeddings and k-means clustering on 3,000 synthetic conversations (24 topics) with a canary injection rate of 0.60, we evaluate an intentionally extractive example snippet summarizer that models quote-like reporting. In this configuration, we observe canary leakage in 50 of 52 canary-containing clusters (cluster-level leakage rate 0.961538), along with nonzero regex-based PII indicator counts. A minimal defense combining a minimum cluster-size publication threshold (k-min = 25) and regex-based redaction eliminates measured canary leakage and PII indicator hits in the reported run while maintaining a similar cluster-coherence proxy. We position this work as a societal impacts contribution centered on privacy risk measurement for published analytics artifacts rather than raw user data.
Authors:Dezhang Kong, Zhuxi Wu, Shiqi Liu, Zhicheng Tan, Kuichen Lu, Minghao Li, Qichen Liu, Shengyu Chu, Zhenhua Xu, Xuan Liu, Meng Han
Abstract:
LLM-based web agents have become increasingly popular for their utility in daily life and work. However, they exhibit critical vulnerabilities when processing malicious URLs: accepting a disguised malicious URL enables subsequent access to unsafe webpages, which can cause severe damage to service providers and users. Despite this risk, no benchmark currently targets this emerging threat. To address this gap, we propose MalURLBench, the first benchmark for evaluating LLMs' vulnerabilities to malicious URLs. MalURLBench contains 61,845 attack instances spanning 10 real-world scenarios and 7 categories of real malicious websites. Experiments with 12 popular LLMs reveal that existing models struggle to detect elaborately disguised malicious URLs. We further identify and analyze key factors that impact attack success rates and propose URLGuard, a lightweight defense module. We believe this work will provide a foundational resource for advancing the security of web agents. Our code is available at https://github.com/JiangYingEr/MalURLBench.
Authors:Sean Carlin, Kevin Curran
Abstract:
For the past three decades, the architecture of the internet has rested on two primary pillars - communication on the World Wide Web and Value such as Bitcoin/Distributed ledgers. However, a third critical pillar, Private Coordination has remained dependent on centralised intermediaries, effectively creating a surveillance architecture by default. This paper introduces the 'Stateless Pattern', a novel network topology that replaces the traditional 'Fortress' security model (database-centric) with a 'Mist' model (ephemeral relays). By utilising client-side cryptography and self-destructing server instances, we demonstrate a protocol where the server acts as a blind medium rather than a custodian of state. We present empirical data from a live deployment (https://signingroom.io), analysing over 1,900 requests and cache-hit ratios to validate the system's 'Zero-Knowledge' properties and institutional utility. The findings suggest that digital privacy can be commoditised as a utility, technically enforcing specific articles of the universal declaration of human rights not through policy, but through physics.
Authors:Amjad Fatmi
Abstract:
Autonomous agent systems increasingly trigger real-world side effects: deploying infrastructure, modifying databases, moving money, and executing workflows. Yet most agent stacks provide no mandatory execution checkpoint where organizations can deterministically permit, deny, or defer an action before it changes reality. This paper introduces Faramesh, a protocol-agnostic execution control plane that enforces execution-time authorization for agent-driven actions via a non-bypassable Action Authorization Boundary (AAB). Faramesh canonicalizes agent intent into a Canonical Action Representation (CAR), evaluates actions deterministically against policy and state, and issues a decision artifact (PERMIT/DEFER/DENY) that executors must validate prior to execution. The system is designed to be framework- and model-agnostic, supports multi-agent and multi-tenant deployments, and remains independent of transport protocols (e.g., MCP). Faramesh further provides decision-centric, append-only provenance logging keyed by canonical action hashes, enabling auditability, verification, and deterministic replay without re-running agent reasoning. We show how these primitives yield enforceable, predictable governance for autonomous execution while avoiding hidden coupling to orchestration layers or observability-only approaches.
Authors:Silong Chen, Yuchuan Luo, Guilin Deng, Yi Liu, Min Xu, Shaojing Fu, Xiaohua Jia
Abstract:
Adapter-based Federated Large Language Models (FedLLMs) are widely adopted to reduce the computational, storage, and communication overhead of full-parameter fine-tuning for web-scale applications while preserving user privacy. By freezing the backbone and training only compact low-rank adapters, these methods appear to limit gradient leakage and thwart existing Gradient Inversion Attacks (GIAs). Contrary to this assumption, we show that low-rank adapters create new, exploitable leakage channels. We propose the Unordered-word-bag-based Text Reconstruction (UTR) attack, a novel GIA tailored to the unique structure of adapter-based FedLLMs. UTR overcomes three core challenges: low-dimensional gradients, frozen backbones, and combinatorially large reconstruction spaces by: (i) inferring token presence from attention patterns in frozen layers, (ii) performing sentence-level inversion within the low-rank subspace of adapter gradients, and (iii) enforcing semantic coherence through constrained greedy decoding guided by language priors. Extensive experiments across diverse models (GPT2-Large, BERT, Qwen2.5-7B) and datasets (CoLA, SST-2, Rotten Tomatoes) demonstrate that UTR achieves near-perfect reconstruction accuracy (ROUGE-1/2 > 99), even with large batch size settings where prior GIAs fail completely. Our results reveal a fundamental tension between parameter efficiency and privacy in FedLLMs, challenging the prevailing belief that lightweight adaptation inherently enhances security. Our code and data are available at https://github.com/shwksnshwowk-wq/GIA.
Authors:Inderjeet Singh, Eleonore Vissol-Gaudin, Andikan Otung, Motoyoshi Sekiya
Abstract:
Fine-tuning Large Language Models (LLMs) for specialized domains is constrained by a fundamental challenge: the need for diverse, cross-organizational data conflicts with the principles of data privacy and sovereignty. While Federated Learning (FL) provides a framework for collaboration without raw data exchange, its classic centralized form introduces a single point of failure and remains vulnerable to model inversion attacks. Decentralized FL (DFL) mitigates this risk by removing the central aggregator but typically relies on inefficient, random peer-to-peer (P2P) pairings, forming a collaboration graph that is blind to agent heterogeneity and risks negative transfer. This paper introduces KNEXA-FL, a novel framework for orchestrated decentralization that resolves this trade-off. KNEXA-FL employs a non-aggregating Central Profiler/Matchmaker (CPM) that formulates P2P collaboration as a contextual bandit problem, using a LinUCB algorithm on abstract agent profiles to learn an optimal matchmaking policy. It orchestrates direct knowledge exchange between heterogeneous, PEFT-based LLM agents via secure distillation, without ever accessing the models themselves. Our comprehensive experiments on a challenging code generation task show that KNEXA-FL yields substantial gains, improving Pass@1 by approx. 50% relative to random P2P collaboration. Critically, our orchestrated approach demonstrates stable convergence, in stark contrast to a powerful centralized distillation baseline which suffers from catastrophic performance collapse. Our work establishes adaptive, learning-based orchestration as a foundational principle for building robust and effective decentralized AI ecosystems.
Authors:Sylvestre-Alvise Rebuffi, Tuan Tran, Valeriu Lacatusu, Pierre Fernandez, Tomáš Souček, Nikola Jovanović, Tom Sander, Hady Elsahar, Alexandre Mourachko
Abstract:
Existing approaches for watermarking AI-generated images often rely on post-hoc methods applied in pixel space, introducing computational overhead and potential visual artifacts. In this work, we explore latent space watermarking and introduce DistSeal, a unified approach for latent watermarking that works across both diffusion and autoregressive models. Our approach works by training post-hoc watermarking models in the latent space of generative models. We demonstrate that these latent watermarkers can be effectively distilled either into the generative model itself or into the latent decoder, enabling in-model watermarking. The resulting latent watermarks achieve competitive robustness while offering similar imperceptibility and up to 20x speedup compared to pixel-space baselines. Our experiments further reveal that distilling latent watermarkers outperforms distilling pixel-space ones, providing a solution that is both more efficient and more robust.
Authors:Md Nabi Newaz Khan, Abdullah Arafat Miah, Yu Bi
Abstract:
Graph neural network (GNN) have demonstrated exceptional performance in solving critical problems across diverse domains yet remain susceptible to backdoor attacks. Existing studies on backdoor attack for graph classification are limited to single target attack using subgraph replacement based mechanism where the attacker implants only one trigger into the GNN model. In this paper, we introduce the first multi-targeted backdoor attack for graph classification task, where multiple triggers simultaneously redirect predictions to different target labels. Instead of subgraph replacement, we propose subgraph injection which preserves the structure of the original graphs while poisoning the clean graphs. Extensive experiments demonstrate the efficacy of our approach, where our attack achieves high attack success rates for all target labels with minimal impact on the clean accuracy. Experimental results on five dataset demonstrate the superior performance of our attack framework compared to the conventional subgraph replacement-based attack. Our analysis on four GNN models confirms the generalization capability of our attack which is effective regardless of the GNN model architectures and training parameters settings. We further investigate the impact of the attack design parameters including injection methods, number of connections, trigger sizes, trigger edge density and poisoning ratios. Additionally, our evaluation against state-of-the-art defenses (randomized smoothing and fine-pruning) demonstrates the robustness of our proposed multi-target attacks. This work highlights the GNN vulnerability against multi-targeted backdoor attack in graph classification task. Our source codes will be available at https://github.com/SiSL-URI/Multi-Targeted-Graph-Backdoor-Attack.
Authors:Rishit Chugh
Abstract:
The deployment of large language models (LLMs) has raised security concerns due to their susceptibility to producing harmful or policy-violating outputs when exposed to adversarial prompts. While alignment and guardrails mitigate common misuse, they remain vulnerable to automated jailbreaking methods such as GCG, PEZ, and GBDA, which generate adversarial suffixes via training and gradient-based search. Although effective, these methods particularly GCG are computationally expensive, limiting their practicality for organisations with constrained resources. This paper introduces a resource-efficient adversarial prompting approach that eliminates the need for retraining by matching new prompts to a database of pre-trained adversarial prompts. A dataset of 1,000 prompts was classified into seven harm-related categories, and GCG, PEZ, and GBDA were evaluated on a Llama 3 8B model to identify the most effective attack method per category. Results reveal a correlation between prompt type and algorithm effectiveness. By retrieving semantically similar successful adversarial prompts, the proposed method achieves competitive attack success rates with significantly reduced computational cost. This work provides a practical framework for scalable red-teaming and security evaluation of aligned LLMs, including in settings where model internals are inaccessible.
Authors:Zhihao Chen, Zirui Gong, Jianting Ning, Yanjun Zhang, Leo Yu Zhang
Abstract:
Federated Rank Learning (FRL) is a promising Federated Learning (FL) paradigm designed to be resilient against model poisoning attacks due to its discrete, ranking-based update mechanism. Unlike traditional FL methods that rely on model updates, FRL leverages discrete rankings as a communication parameter between clients and the server. This approach significantly reduces communication costs and limits an adversary's ability to scale or optimize malicious updates in the continuous space, thereby enhancing its robustness. This makes FRL particularly appealing for applications where system security and data privacy are crucial, such as web-based auction and bidding platforms. While FRL substantially reduces the attack surface, we demonstrate that it remains vulnerable to a new class of local model poisoning attack, i.e., fine-grained control attacks. We introduce the Edge Control Attack (ECA), the first fine-grained control attack tailored to ranking-based FL frameworks. Unlike conventional denial-of-service (DoS) attacks that cause conspicuous disruptions, ECA enables an adversary to precisely degrade a competitor's accuracy to any target level while maintaining a normal-looking convergence trajectory, thereby avoiding detection. ECA operates in two stages: (i) identifying and manipulating Ascending and Descending Edges to align the global model with the target model, and (ii) widening the selection boundary gap to stabilize the global model at the target accuracy. Extensive experiments across seven benchmark datasets and nine Byzantine-robust aggregation rules (AGRs) show that ECA achieves fine-grained accuracy control with an average error of only 0.224%, outperforming the baseline by up to 17x. Our findings highlight the need for stronger defenses against advanced poisoning attacks. Our code is available at: https://github.com/Chenzh0205/ECA
Authors:Andrew Crossman, Jonah Dodd, Viralam Ramamurthy Chaithanya Kumar, Riyaz Mohammed, Andrew R. Plummer, Chandra Sekharudu, Deepak Warrier, Mohammad Yekrangian
Abstract:
MITRE ATT&CK is a cybersecurity knowledge base that organizes threat actor and cyber-attack information into a set of tactics describing the reasons and goals threat actors have for carrying out attacks, with each tactic having a set of techniques that describe the potential methods used in these attacks. One major application of ATT&CK is the use of its tactic and technique hierarchy by security specialists as a framework for annotating cyber-threat intelligence reports, vulnerability descriptions, threat scenarios, inter alia, to facilitate downstream analyses. To date, the tagging process is still largely done manually. In this technical note, we provide a stratified "task space" characterization of the MITRE ATT&CK text tagging task for organizing previous efforts toward automation using AIML methods, while also clarifying pathways for constructing new methods. To illustrate one of the pathways, we use the task space strata to stage-wise construct our own multi-label hierarchical classification models for the text tagging task via experimentation over general cyber-threat intelligence text -- using shareable computational tools and publicly releasing the models to the security community (via https://github.com/jpmorganchase/MITRE_models). Our multi-label hierarchical approach yields accuracy scores of roughly 94% at the tactic level, as well as accuracy scores of roughly 82% at the technique level. The models also meet or surpass state-of-the-art performance while relying only on classical machine learning methods -- removing any dependence on LLMs, RAG, agents, or more complex hierarchical approaches. Moreover, we show that GPT-4o model performance at the tactic level is significantly lower (roughly 60% accuracy) than our own approach. We also extend our baseline model to a corpus of threat scenarios for financial applications produced by subject matter experts.
Authors:Jun Liu, Leo Yu Zhang, Fengpeng Li, Isao Echizen, Jiantao Zhou
Abstract:
Hard-label black-box settings, where only top-1 predicted labels are observable, pose a fundamentally constrained yet practically important feedback model for understanding model behavior. A central challenge in this regime is whether meaningful gradient information can be recovered from such discrete responses. In this work, we develop a unified theoretical perspective showing that a wide range of existing sign-flipping hard-label attacks can be interpreted as implicitly approximating the sign of the true loss gradient. This observation reframes hard-label attacks from heuristic search procedures into instances of gradient sign recovery under extremely limited feedback. Motivated by this first-principles understanding, we propose a new attack framework that combines a zero-query frequency-domain initialization with a Pattern-Driven Optimization (PDO) strategy. We establish theoretical guarantees demonstrating that, under mild assumptions, our initialization achieves higher expected cosine similarity to the true gradient sign compared to random baselines, while the proposed PDO procedure attains substantially lower query complexity than existing structured search approaches. We empirically validate our framework through extensive experiments on CIFAR-10, ImageNet, and ObjectNet, covering standard and adversarially trained models, commercial APIs, and CLIP-based models. The results show that our method consistently surpasses SOTA hard-label attacks in both attack success rate and query efficiency, particularly in low-query regimes. Beyond image classification, our approach generalizes effectively to corrupted data, biomedical datasets, and dense prediction tasks. Notably, it also successfully circumvents Blacklight, a SOTA stateful defense, resulting in a $0\%$ detection rate. Our code will be released publicly soon at https://github.com/csjunjun/DPAttack.git.
Authors:David Ilić, David Stanojević, Kostadin Cvejoski
Abstract:
Fine-tuned language models pose significant privacy risks, as they may memorize and expose sensitive information from their training data. Membership inference attacks (MIAs) provide a principled framework for auditing these risks, yet existing methods achieve limited detection rates, particularly at the low false-positive thresholds required for practical privacy auditing. We present EZ-MIA, a membership inference attack that exploits a key observation: memorization manifests most strongly at error positions, specifically tokens where the model predicts incorrectly yet still shows elevated probability for training examples. We introduce the Error Zone (EZ) score, which measures the directional imbalance of probability shifts at error positions relative to a pretrained reference model. This principled statistic requires only two forward passes per query and no model training of any kind. On WikiText with GPT-2, EZ-MIA achieves 3.8x higher detection than the previous state-of-the-art under identical conditions (66.3% versus 17.5% true positive rate at 1% false positive rate), with near-perfect discrimination (AUC 0.98). At the stringent 0.1% FPR threshold critical for real-world auditing, we achieve 8x higher detection than prior work (14.0% versus 1.8%), requiring no reference model training. These gains extend to larger architectures: on AG News with Llama-2-7B, we achieve 3x higher detection (46.7% versus 15.8% TPR at 1% FPR). These results establish that privacy risks of fine-tuned language models are substantially greater than previously understood, with implications for both privacy auditing and deployment decisions. Code is available at https://github.com/JetBrains-Research/ez-mia.
Authors:Francisco Angulo de Lafuente, Vladimir Veselov, Richard Goodman
Abstract:
This definitive research memoria presents a comprehensive, mathematically verified paradigm for neural communication with Bitcoin mining Application-Specific Integrated Circuits (ASICs), integrating five complementary frameworks: thermodynamic reservoir computing, hierarchical number system theory, algorithmic analysis, network latency optimization, and machine-checked mathematical formalization. We establish that obsolete cryptocurrency mining hardware exhibits emergent computational properties enabling bidirectional information exchange between AI systems and silicon substrates. The research program demonstrates: (1) reservoir computing with NARMA-10 Normalized Root Mean Square Error (NRMSE) of 0.8661; (2) the Thermodynamic Probability Filter (TPF) achieving 92.19% theoretical energy reduction; (3) the Virtual Block Manager achieving +25% effective hashrate; and (4) hardware universality across multiple ASIC families including Antminer S9, Lucky Miner LV06, and Goldshell LB-Box. A significant contribution is the machine-checked mathematical formalization using Lean 4 and Mathlib, providing unambiguous definitions, machine-verified theorems, and reviewer-proof claims. Key theorems proven include: independence implies zero leakage, predictor beats baseline implies non-independence (the logical core of TPF), energy savings theoretical maximum, and Physical Unclonable Function (PUF) distinguishability witnesses. Vladimir Veselov's hierarchical number system theory explains why early-round information contains predictive power. This work establishes a new paradigm: treating ASICs not as passive computational substrates but as active conversational partners whose thermodynamic state encodes exploitable computational information.
Authors:Yipu Dou, Wang Yang
Abstract:
As Large Language Models (LLMs) evolve from static chatbots into autonomous agents capable of tool execution, the landscape of AI safety is shifting from content moderation to action security. However, existing red-teaming frameworks remain bifurcated: they either focus on rigid, script-based text attacks or lack the architectural modularity to simulate complex, multi-turn agentic exploitations. In this paper, we introduce AJAR (Adaptive Jailbreak Architecture for Red-teaming), a proof-of-concept framework designed to bridge this gap through Protocol-driven Cognitive Orchestration. Built upon the robust runtime of Petri, AJAR leverages the Model Context Protocol (MCP) to decouple adversarial logic from the execution loop, encapsulating state-of-the-art algorithms like X-Teaming as standardized, plug-and-play services. We validate the architectural feasibility of AJAR through a controlled qualitative case study, demonstrating its ability to perform stateful backtracking within a tool-use environment. Furthermore, our preliminary exploration of the "Agentic Gap" reveals a complex safety dynamic: while tool usage introduces new injection vectors via code execution, the cognitive load of parameter formatting can inadvertently disrupt persona-based attacks. AJAR is open-sourced to facilitate the standardized, environment-aware evaluation of this emerging attack surface. The code and data are available at https://github.com/douyipu/ajar.
Authors:Yuansen Liu, Yixuan Tang, Anthony Kum Hoe Tun
Abstract:
Current LLM safety research predominantly focuses on mitigating Goal Hijacking, preventing attackers from redirecting a model's high-level objective (e.g., from "summarizing emails" to "phishing users"). In this paper, we argue that this perspective is incomplete and highlight a critical vulnerability in Reasoning Alignment. We propose a new adversarial paradigm: Reasoning Hijacking and instantiate it with Criteria Attack, which subverts model judgments by injecting spurious decision criteria without altering the high-level task goal. Unlike Goal Hijacking, which attempts to override the system prompt, Reasoning Hijacking accepts the high-level goal but manipulates the model's decision-making logic by injecting spurious reasoning shortcut. Though extensive experiments on three different tasks (toxic comment, negative review, and spam detection), we demonstrate that even newest models are prone to prioritize injected heuristic shortcuts over rigorous semantic analysis. The results are consistent over different backbones. Crucially, because the model's "intent" remains aligned with the user's instructions, these attacks can bypass defenses designed to detect goal deviation (e.g., SecAlign, StruQ), exposing a fundamental blind spot in the current safety landscape. Data and code are available at https://github.com/Yuan-Hou/criteria_attack
Authors:Chaochao Chen, Jiaming Qian, Fei Zheng, Yachuan Liu
Abstract:
The prevalence of recommendation systems also brings privacy concerns to both the users and the sellers, as centralized platforms collect as much data as possible from them. To keep the data private, we propose PADER: a Paillier-based secure decentralized social recommendation system. In this system, the users and the sellers are nodes in a decentralized network. The training and inference of the recommendation model are carried out securely in a decentralized manner, without the involvement of a centralized platform. To this end, we apply the Paillier cryptosystem to the SoReg (Social Regularization) model, which exploits both user's ratings and social relations. We view the SoReg model as a two-party secure polynomial evaluation problem and observe that the simple bipartite computation may result in poor efficiency. To improve efficiency, we design secure addition and multiplication protocols to support secure computation on any arithmetic circuit, along with an optimal data packing scheme that is suitable for the polynomial computations of real values. Experiment results show that our method only takes about one second to iterate through one user with hundreds of ratings, and training with ~500K ratings for one epoch only takes <3 hours, which shows that the method is practical in real applications. The code is available at https://github.com/GarminQ/PADER.
Authors:Hao Li, Yankai Yang, G. Edward Suh, Ning Zhang, Chaowei Xiao
Abstract:
Large Language Models (LLMs) have enabled the development of powerful agentic systems capable of automating complex workflows across various fields. However, these systems are highly vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external data can hijack agent behavior. In this work, we present ReasAlign, a model-level solution to improve safety alignment against indirect prompt injection attacks. The core idea of ReasAlign is to incorporate structured reasoning steps to analyze user queries, detect conflicting instructions, and preserve the continuity of the user's intended tasks to defend against indirect injection attacks. To further ensure reasoning logic and accuracy, we introduce a test-time scaling mechanism with a preference-optimized judge model that scores reasoning steps and selects the best trajectory. Comprehensive evaluations across various benchmarks show that ReasAlign maintains utility comparable to an undefended model while consistently outperforming Meta SecAlign, the strongest prior guardrail. On the representative open-ended CyberSecEval2 benchmark, which includes multiple prompt-injected tasks, ReasAlign achieves 94.6% utility and only 3.6% ASR, far surpassing the state-of-the-art defensive model of Meta SecAlign (56.4% utility and 74.4% ASR). These results demonstrate that ReasAlign achieves the best trade-off between security and utility, establishing a robust and practical defense against prompt injection attacks in real-world agentic systems. Our code and experimental results could be found at https://github.com/leolee99/ReasAlign.
Authors:Jack Wilkie, Hanan Hindy, Craig Michie, Christos Tachtatzis, James Irvine, Robert Atkinson
Abstract:
Machine learning has achieved state-of-the-art results in network intrusion detection; however, its performance significantly degrades when confronted by a new attack class -- a zero-day attack. In simple terms, classical machine learning-based approaches are adept at identifying attack classes on which they have been previously trained, but struggle with those not included in their training data. One approach to addressing this shortcoming is to utilise anomaly detectors which train exclusively on benign data with the goal of generalising to all attack classes -- both known and zero-day. However, this comes at the expense of a prohibitively high false positive rate. This work proposes a novel contrastive loss function which is able to maintain the advantages of other contrastive learning-based approaches (robustness to imbalanced data) but can also generalise to zero-day attacks. Unlike anomaly detectors, this model learns the distributions of benign traffic using both benign and known malign samples, i.e. other well-known attack classes (not including the zero-day class), and consequently, achieves significant performance improvements. The proposed approach is experimentally verified on the Lycos2017 dataset where it achieves an AUROC improvement of .000065 and .060883 over previous models in known and zero-day attack detection, respectively. Finally, the proposed method is extended to open-set recognition achieving OpenAUC improvements of .170883 over existing approaches.
Authors:Renyang Liu, Kangjie Chen, Han Qiu, Jie Zhang, Kwok-Yan Lam, Tianwei Zhang, See-Kiong Ng
Abstract:
Image generation models (IGMs), while capable of producing impressive and creative content, often memorize a wide range of undesirable concepts from their training data, leading to the reproduction of unsafe content such as NSFW imagery and copyrighted artistic styles. Such behaviors pose persistent safety and compliance risks in real-world deployments and cannot be reliably mitigated by post-hoc filtering, owing to the limited robustness of such mechanisms and a lack of fine-grained semantic control. Recent unlearning methods seek to erase harmful concepts at the model level, which exhibit the limitations of requiring costly retraining, degrading the quality of benign generations, or failing to withstand prompt paraphrasing and adversarial attacks. To address these challenges, we introduce SafeRedir, a lightweight inference-time framework for robust unlearning via prompt embedding redirection. Without modifying the underlying IGMs, SafeRedir adaptively routes unsafe prompts toward safe semantic regions through token-level interventions in the embedding space. The framework comprises two core components: a latent-aware multi-modal safety classifier for identifying unsafe generation trajectories, and a token-level delta generator for precise semantic redirection, equipped with auxiliary predictors for token masking and adaptive scaling to localize and regulate the intervention. Empirical results across multiple representative unlearning tasks demonstrate that SafeRedir achieves effective unlearning capability, high semantic and perceptual preservation, robust image quality, and enhanced resistance to adversarial attacks. Furthermore, SafeRedir generalizes effectively across a variety of diffusion backbones and existing unlearned models, validating its plug-and-play compatibility and broad applicability. Code and data are available at https://github.com/ryliu68/SafeRedir.
Authors:Marie Bolzer, Sébastien Duval, Marine Minier
Abstract:
The problem of finding a minimal circuit to implement a given function is one of the oldest in electronics. It is known to be NP-hard. Still, many tools exist to find sub-optimal circuits to implement a function. In electronics, such tools are known as synthesisers. However, these synthesisers aim to implement very large functions (a whole electronic chip). In cryptography, the focus is on small functions, hence the necessity for new dedicated tools for small functions. Several tools exist to implement small functions. They differ by their algorithmic approach (some are based on Depth-First-Search as introduced by Ullrich in 2011, some are based on SAT-solvers like the tool desgined by Stoffelen in 2016, some non-generic tools use subfield decomposition) and by their optimisation criteria (some optimise for circuit size, others for circuit depth, and some for side-channel-protected implementations). However, these tools are limited to functions operating on less than 5 bits, sometimes 6 bits for quadratic functions, or to very simple functions. The limitation lies in a high computing time. We propose a new tool (The tool is provided alongside the IEEE article with CodeOcean and at https://github.com/seduval/implem-quad-sbox) to implement quadratic functions up to 9 bits within AND-depth 1, minimising the number of AND gates. This tool is more time-efficient than previous ones, allowing to explore larger implementations than others on 6 bits or less and allows to reach larger sizes, up to 9 bits.
Authors:Zhi Yang, Runguo Li, Qiqi Qiang, Jiashun Wang, Fangqi Lou, Mengping Li, Dongpo Cheng, Rui Xu, Heng Lian, Shuo Zhang, Xiaolong Liang, Xiaoming Huang, Zheng Wei, Zhaowei Liu, Xin Guo, Huacan Wang, Ronghao Chen, Liwen Zhang
Abstract:
Financial agents powered by large language models (LLMs) are increasingly deployed for investment analysis, risk assessment, and automated decision-making, where their abilities to plan, invoke tools, and manipulate mutable state introduce new security risks in high-stakes and highly regulated financial environments. However, existing safety evaluations largely focus on language-model-level content compliance or abstract agent settings, failing to capture execution-grounded risks arising from real operational workflows and state-changing actions. To bridge this gap, we propose FinVault, the first execution-grounded security benchmark for financial agents, comprising 31 regulatory case-driven sandbox scenarios with state-writable databases and explicit compliance constraints, together with 107 real-world vulnerabilities and 963 test cases that systematically cover prompt injection, jailbreaking, financially adapted attacks, as well as benign inputs for false-positive evaluation. Experimental results reveal that existing defense mechanisms remain ineffective in realistic financial agent settings, with average attack success rates (ASR) still reaching up to 50.0\% on state-of-the-art models and remaining non-negligible even for the most robust systems (ASR 6.7\%), highlighting the limited transferability of current safety designs and the need for stronger financial-specific defenses. Our code can be found at https://github.com/aifinlab/FinVault.
Authors:Mingxiang Tao, Yu Tian, Wenxuan Tu, Yue Yang, Xue Yang, Xiangyan Tang
Abstract:
Federated learning (FL) addresses data privacy and silo issues in large language models (LLMs). Most prior work focuses on improving the training efficiency of federated LLMs. However, security in open environments is overlooked, particularly defenses against malicious clients. To investigate the safety of LLMs during FL, we conduct preliminary experiments to analyze potential attack surfaces and defensible characteristics from the perspective of Low-Rank Adaptation (LoRA) weights. We find two key properties of FL: 1) LLMs are vulnerable to attacks from malicious clients in FL, and 2) LoRA weights exhibit distinct behavioral patterns that can be filtered through simple classifiers. Based on these properties, we propose Safe-FedLLM, a probe-based defense framework for federated LLMs, constructing defenses across three dimensions: Step-Level, Client-Level, and Shadow-Level. The core concept of Safe-FedLLM is to perform probe-based discrimination on the LoRA weights locally trained by each client during FL, treating them as high-dimensional behavioral features and using lightweight classification models to determine whether they possess malicious attributes. Extensive experiments demonstrate that Safe-FedLLM effectively enhances the defense capability of federated LLMs without compromising performance on benign data. Notably, our method effectively suppresses malicious data impact without significant impact on training speed, and remains effective even with many malicious clients. Our code is available at: https://github.com/dmqx/Safe-FedLLM.
Authors:Takaaki Toda, Tatsuya Mori
Abstract:
Modern software package registries like PyPI have become critical infrastructure for software development, but are increasingly exploited by threat actors distributing malicious packages with sophisticated multi-stage attack chains. While Large Language Models (LLMs) offer promising capabilities for automated code analysis, their application to security-critical malware detection faces fundamental challenges, including hallucination and context confusion, which can lead to missed detections or false alarms. We present CHASE (Collaborative Hierarchical Agents for Security Exploration), a high-reliability multi-agent architecture that addresses these limitations through a Plan-and-Execute coordination model, specialized Worker Agents focused on specific analysis aspects, and integration with deterministic security tools for critical operations. Our key insight is that reliability in LLM-based security analysis emerges not from improving individual model capabilities but from architecting systems that compensate for LLM weaknesses while leveraging their semantic understanding strengths. Evaluation on a dataset of 3,000 packages (500 malicious, 2,500 benign) demonstrates that CHASE achieves 98.4% recall with only 0.08% false positive rate, while maintaining a practical median analysis time of 4.5 minutes per package, making it suitable for operational deployment in automated package screening. Furthermore, we conducted a survey with cybersecurity professionals to evaluate the generated analysis reports, identifying their key strengths and areas for improvement. This work provides a blueprint for building reliable AI-powered security tools that can scale with the growing complexity of modern software supply chains. Our project page is available at https://t0d4.github.io/CHASE-AIware25/
Authors:Qingyu Liu, Yitao Zhang, Zhongjie Ba, Chao Shuai, Peng Cheng, Tianhang Zheng, Zhibo Wang
Abstract:
Protecting the copyright of user-generated AI images is an emerging challenge as AIGC becomes pervasive in creative workflows. Existing watermarking methods (1) remain vulnerable to real-world adversarial threats, often forced to trade off between defenses against spoofing and removal attacks; and (2) cannot support semantic-level tamper localization. We introduce PAI, a training-free inherent watermarking framework for AIGC copyright protection, plug-and-play with diffusion-based AIGC services. PAI simultaneously provides three key functionalities: robust ownership verification, attack detection, and semantic-level tampering localization. Unlike existing inherent watermark methods that only embed watermarks at noise initialization of diffusion models, we design a novel key-conditioned deflection mechanism that subtly steers the denoising trajectory according to the user key. Such trajectory-level coupling further strengthens the semantic entanglement of identity and content, thereby further enhancing robustness against real-world threats. Moreover, we also provide a theoretical analysis proving that only the valid key can pass verification. Experiments across 12 attack methods show that PAI achieves 98.43\% verification accuracy, improving over SOTA methods by 37.25\% on average, and retains strong tampering localization performance even against advanced AIGC edits. Our code is available at https://github.com/QingyuLiu/PAI.
Authors:Chen Gong, Kecen Li, Zinan Lin, Tianhao Wang
Abstract:
To improve the quality of Differentially private (DP) synthetic images, most studies have focused on improving the core optimization techniques (e.g., DP-SGD). Recently, we have witnessed a paradigm shift that takes these techniques off the shelf and studies how to use them together to achieve the best results. One notable work is DP-FETA, which proposes using `central images' for `warming up' the DP training and then using traditional DP-SGD. Inspired by DP-FETA, we are curious whether there are other such tools we can use together with DP-SGD. We first observe that using `central images' mainly works for datasets where there are many samples that look similar. To handle scenarios where images could vary significantly, we propose FETA-Pro, which introduces frequency features as `training shortcuts.' The complexity of frequency features lies between that of spatial features (captured by `central images') and full images, allowing for a finer-grained curriculum for DP training. To incorporate these two types of shortcuts together, one challenge is to handle the training discrepancy between spatial and frequency features. To address it, we leverage the pipeline generation property of generative models (instead of having one model trained with multiple features/objectives, we can have multiple models working on different features, then feed the generated results from one model into another) and use a more flexible design. Specifically, FETA-Pro introduces an auxiliary generator to produce images aligned with noisy frequency features. Then, another model is trained with these images, together with spatial features and DP-SGD. Evaluated across five sensitive image datasets, FETA-Pro shows an average of 25.7% higher fidelity and 4.1% greater utility than the best-performing baseline, under a privacy budget $ε= 1$.
Authors:Honghao Liu, Xuhui Jiang, Chengjin Xu, Cehao Yang, Yiran Cheng, Lionel Ni, Jian Guo
Abstract:
Preserving privacy in sensitive data while pretraining large language models on small, domain-specific corpora presents a significant challenge. In this work, we take an exploratory step toward privacy-preserving continual pretraining by proposing an entity-based framework that synthesizes encrypted training data to protect personally identifiable information (PII). Our approach constructs a weighted entity graph to guide data synthesis and applies deterministic encryption to PII entities, enabling LLMs to encode new knowledge through continual pretraining while granting authorized access to sensitive data through decryption keys. Our results on limited-scale datasets demonstrate that our pretrained models outperform base models and ensure PII security, while exhibiting a modest performance gap compared to models trained on unencrypted synthetic data. We further show that increasing the number of entities and leveraging graph-based synthesis improves model performance, and that encrypted models retain instruction-following capabilities with long retrieved contexts. We discuss the security implications and limitations of deterministic encryption, positioning this work as an initial investigation into the design space of encrypted data pretraining for privacy-preserving LLMs. Our code is available at https://github.com/DataArcTech/SoE.
Authors:Weihao Shen, Yaxin Xu, Shuang Li, Wei Chen, Yuqin Lan, Meng Yuan, Fuzhen Zhuang
Abstract:
Anonymizing sensitive information in user text is essential for privacy, yet existing methods often apply uniform treatment across attributes, which can conflict with communicative intent and obscure necessary information. This is particularly problematic when personal attributes are integral to expressive or pragmatic goals. The central challenge lies in determining which attributes to protect, and to what extent, while preserving semantic and pragmatic functions. We propose IntentAnony, a utility-preserving anonymization approach that performs intent-conditioned exposure control. IntentAnony models pragmatic intent and constructs privacy inference evidence chains to capture how distributed cues support attribute inference. Conditioned on intent, it assigns each attribute an exposure budget and selectively suppresses non-intent inference pathways while preserving intent-relevant content, semantic structure, affective nuance, and interactional function. We evaluate IntentAnony using privacy inference success rates, text utility metrics, and human evaluation. The results show an approximately 30% improvement in the overall privacy--utility trade-off, with notably stronger usability of anonymized text compared to prior state-of-the-art methods. Our code is available at https://github.com/Nevaeh7/IntentAnony.
Authors:Scott Thornton
Abstract:
Large language models remain vulnerable to jailbreak attacks, and single-layer defenses often trade security for usability. We present TRYLOCK, the first defense-in-depth architecture that combines four heterogeneous mechanisms across the inference stack: weight-level safety alignment via DPO, activation-level control via Representation Engineering (RepE) steering, adaptive steering strength selected by a lightweight sidecar classifier, and input canonicalization to neutralize encoding-based bypasses. On Mistral-7B-Instruct evaluated against a 249-prompt attack set spanning five attack families, TRYLOCK achieves 88.0% relative ASR reduction (46.5% to 5.6%), with each layer contributing unique coverage: RepE blocks 36% of attacks that bypass DPO alone, while canonicalization catches 14% of encoding attacks that evade both. We discover a non-monotonic steering phenomenon -- intermediate strength (alpha=1.0) degrades safety below baseline -- and provide mechanistic hypotheses explaining RepE-DPO interference. The adaptive sidecar reduces over-refusal from 60% to 48% while maintaining identical attack defense, demonstrating that security and usability need not be mutually exclusive. We release all components -- trained adapters, steering vectors, sidecar classifier, preference pairs, and complete evaluation methodology -- enabling full reproducibility.
Authors:Kaibo Huang, Jin Tan, Yukun Wei, Wanling Li, Zipei Zhang, Hui Tian, Zhongliang Yang, Linna Zhou
Abstract:
LLM-based agents are increasingly deployed to autonomously solve complex tasks, raising urgent needs for IP protection and regulatory provenance. While content watermarking effectively attributes LLM-generated outputs, it fails to directly identify the high-level planning behaviors (e.g., tool and subgoal choices) that govern multi-step execution. Critically, watermarking at the planning-behavior layer faces unique challenges: minor distributional deviations in decision-making can compound during long-term agent operation, degrading utility, and many agents operate as black boxes that are difficult to intervene in directly. To bridge this gap, we propose AgentMark, a behavioral watermarking framework that embeds multi-bit identifiers into planning decisions while preserving utility. It operates by eliciting an explicit behavior distribution from the agent and applying distribution-preserving conditional sampling, enabling deployment under black-box APIs while remaining compatible with action-layer content watermarking. Experiments across embodied, tool-use, and social environments demonstrate practical multi-bit capacity, robust recovery from partial logs, and utility preservation. The code is available at https://github.com/Tooooa/AgentMark.
Authors:Yuetian Chen, Yuntao Du, Kaiyuan Zhang, Ashish Kundu, Charles Fleming, Bruno Ribeiro, Ninghui Li
Abstract:
Most membership inference attacks (MIAs) against Large Language Models (LLMs) rely on global signals, like average loss, to identify training data. This approach, however, dilutes the subtle, localized signals of memorization, reducing attack effectiveness. We challenge this global-averaging paradigm, positing that membership signals are more pronounced within localized contexts. We introduce WBC (Window-Based Comparison), which exploits this insight through a sliding window approach with sign-based aggregation. Our method slides windows of varying sizes across text sequences, with each window casting a binary vote on membership based on loss comparisons between target and reference models. By ensembling votes across geometrically spaced window sizes, we capture memorization patterns from token-level artifacts to phrase-level structures. Extensive experiments across eleven datasets demonstrate that WBC substantially outperforms established baselines, achieving higher AUC scores and 2-3 times improvements in detection rates at low false positive thresholds. Our findings reveal that aggregating localized evidence is fundamentally more effective than global averaging, exposing critical privacy vulnerabilities in fine-tuned LLMs.
Authors:Xiaoze Liu, Weichen Yu, Matt Fredrikson, Xiaoqian Wang, Jing Gao
Abstract:
The open-weight language model ecosystem is increasingly defined by model composition techniques (such as weight merging, speculative decoding, and vocabulary expansion) that remix capabilities from diverse sources. A critical prerequisite for applying these methods across different model families is tokenizer transplant, which aligns incompatible vocabularies to a shared embedding space. We demonstrate that this essential interoperability step introduces a supply-chain vulnerability: we engineer a single breaker token that is functionally inert in a donor model yet reliably reconstructs into a high-salience malicious feature after transplant into a base model. By exploiting the geometry of coefficient reuse, our attack sabotages the base model's generation while leaving the donor's utility statistically indistinguishable from nominal behavior. We formalize this as a dual-objective optimization problem and instantiate the attack using a sparse solver. Empirically, the attack is training-free and evades outlier detection, while demonstrating structural persistence against fine-tuning and weight merging, highlighting a hidden risk in the pipeline of modular AI composition. Code is available at https://github.com/xz-liu/tokenforge
Authors:Yaoqi Yang, Yong Chen, Jiacheng Wang, Geng Sun, Dusit Niyato, Zhu Han
Abstract:
Low Altitude Economy (LAE) holds immense promise for enhancing societal well-being and driving economic growth. However, this burgeoning field is vulnerable to security threats, particularly malicious aircraft intrusion attacks. To address the above concerns, intrusion detection systems (IDS) can be used to defend against malicious aircraft intrusions in LAE. Whereas, due to the heterogeneous data, dynamic environment, and resource-constrained devices within LAE, current IDS face challenges in detection accuracy, adaptability, and resource utilization ratio. In this regard, due to the inherent ability to combine the strengths of multiple models, ensemble learning can realize more robust and diverse anomaly detection further enhance IDS accuracy, thereby improving robustness and efficiency of the secure LAE. Unlike single-model approaches, ensemble learning can leverage the collective knowledge of its constituent models to effectively defend the malicious aircraft intrusion attacks. Specifically, this paper investigates ensemble learning for secure LAE, covering research focuses, solutions, and a case study. We first establish the rationale for ensemble learning and then review research areas and potential solutions, demonstrating the necessities and benefits of applying ensemble learning to secure LAE. Subsequently, we propose a framework of ensemble learning-enabled malicious aircrafts tracking in the secure LAE, where its feasibility and effectiveness are evaluated by the designed case study. Finally, we conclude by outlining promising future research directions for further advancing the ensemble learning-enabled secure LAE.
Authors:Hongjuan Li, Hui Kang, Jiahui Li, Geng Sun, Ruichen Zhang, Jiacheng Wang, Dusit Niyato, Wei Ni, Abbas Jamalipour
Abstract:
The rapid expansion of low-altitude economy Internet of Things (LAE-IoT) networks has created unprecedented security challenges due to dynamic three-dimensional mobility patterns, distributed autonomous operations, and severe resource constraints. Traditional intrusion detection systems designed for static ground-based networks prove inadequate for tackling the unique characteristics of aerial IoT environments, including frequent topology changes, real-time detection requirements, and energy limitations. In this article, we analyze the intrusion detection requirements for LAE-IoT networks, complemented by a comprehensive review of evaluation metrics that cover detection effectiveness, response time, and resource consumption. Then, we investigate transformative potential of agentic artificial intelligence (AI) paradigms and introduce a large language model (LLM)-enabled agentic AI framework for enhancing intrusion detection in LAE-IoT networks. This leads to our proposal of a novel multi-agent collaborative intrusion detection framework that leverages specialized LLM-enhanced agents for intelligent data processing and adaptive classification. Through experimental validation, our framework demonstrates superior performance of over 90\% classification accuracy across multiple benchmark datasets. These results highlight the transformative potential of combining agentic AI principles with LLMs for next-generation LAE-IoT security systems.
Authors:Liangbo Xie, Mude Cai, Xiaolong Yang, Mu Zhou, Jiacheng Wang, Dusit Niyato
Abstract:
Localization in mobile networks has been widely applied in many scenarios. However, an entity responsible for location estimation exposes both the target and anchors to potential location leakage at any time, creating serious security risks. Although existing studies have proposed privacy-preserving localization algorithms, they still face challenges of insufficient positioning accuracy and excessive communication overhead. In this article, we propose a privacy-preserving localization scheme, named PPLZN. PPLZN protects protects the location privacy of both the target and anchor nodes in crowdsourced localization. Simulation results validate the effectiveness of PPLZN. Evidently, it can achieve accurate position estimation without location leakage and outperform state-of-the-art approaches in both positioning accuracy and communication overhead. In addition, PPLZN significantly reduces computational and communication overhead in large-scale deployments, making it well-fitted for practical privacy-preserving localization in resource-constrained networks.
Authors:Zikang Ding, Haomiao Yang, Meng Hao, Wenbo Jiang, Kunlan Xiang, Runmeng Du, Yijing Liu, Ruichen Zhang, Dusit Niyato
Abstract:
Backdoor attacks against pre-trained models (PTMs) have traditionally operated under an ``immediacy assumption,'' where malicious behavior manifests instantly upon trigger occurrence. This work revisits and challenges this paradigm by introducing \textit{\textbf{Delayed Backdoor Attacks (DBA)}}, a new class of threats in which activation is temporally decoupled from trigger exposure. We propose that this \textbf{temporal dimension} is the key to unlocking a previously infeasible class of attacks: those that use common, everyday words as triggers. To examine the feasibility of this paradigm, we design and implement a proof-of-concept prototype, termed \underline{D}elayed Backdoor Attacks Based on \underline{N}onlinear \underline{D}ecay (DND). DND embeds a lightweight, stateful logic module that postpones activation until a configurable threshold is reached, producing a distinct latency phase followed by a controlled outbreak. We derive a formal model to characterize this latency behavior and propose a dual-metric evaluation framework (ASR and ASR$_{delay}$) to empirically measure the delay effect. Extensive experiments on four (natural language processing)NLP benchmarks validate the core capabilities of DND: it remains dormant for a controllable duration, sustains high clean accuracy ($\ge$94\%), and achieves near-perfect post-activation attack success rates ($\approx$99\%, The average of other methods is below 95\%.). Moreover, DND exhibits resilience against several state-of-the-art defenses. This study provides the first empirical evidence that the temporal dimension constitutes a viable yet unprotected attack surface in PTMs, underscoring the need for next-generation, stateful, and time-aware defense mechanisms.
Authors:Hongjun An, Yiliang Song, Jiangan Chen, Jiawei Shao, Chi Zhang, Xuelong Li
Abstract:
Large Language Model (LLM) training often optimizes for preference alignment, rewarding outputs that are perceived as helpful and interaction-friendly. However, this preference-oriented objective can be exploited: manipulative prompts can steer responses toward user-appeasing agreement and away from truth-oriented correction. In this work, we investigate whether aligned models are vulnerable to Preference-Undermining Attacks (PUA), a class of manipulative prompting strategies designed to exploit the model's desire to please user preferences at the expense of truthfulness. We propose a diagnostic methodology that provides a finer-grained and more directive analysis than aggregate benchmark scores, using a factorial evaluation framework to decompose prompt-induced shifts into interpretable effects of system objectives (truth- vs. preference-oriented) and PUA-style dialogue factors (directive control, personal derogation, conditional approval, reality denial) within a controlled $2 \times 2^4$ design. Surprisingly, more advanced models are sometimes more susceptible to manipulative prompts. Beyond the dominant reality-denial factor, we observe model-specific sign reversals and interactions with PUA-style factors, suggesting tailored defenses rather than uniform robustness. These findings offer a novel, reproducible factorial evaluation methodology that provides finer-grained diagnostics for post-training processes like RLHF, enabling better trade-offs in the product iteration of LLMs by offering a more nuanced understanding of preference alignment risks and the impact of manipulative prompts.
Authors:Ji Guo, Wenbo Jiang, Yansong Lin, Yijing Liu, Ruichen Zhang, Guomin Lu, Aiguo Chen, Xinshuo Han, Hongwei Li, Dusit Niyato
Abstract:
Vision-Language-Action (VLA) models are widely deployed in safety-critical embodied AI applications such as robotics. However, their complex multimodal interactions also expose new security vulnerabilities. In this paper, we investigate a backdoor threat in VLA models, where malicious inputs cause targeted misbehavior while preserving performance on clean data. Existing backdoor methods predominantly rely on inserting visible triggers into visual modality, which suffer from poor robustness and low insusceptibility in real-world settings due to environmental variability. To overcome these limitations, we introduce the State Backdoor, a novel and practical backdoor attack that leverages the robot arm's initial state as the trigger. To optimize trigger for insusceptibility and effectiveness, we design a Preference-guided Genetic Algorithm (PGA) that efficiently searches the state space for minimal yet potent triggers. Extensive experiments on five representative VLA models and five real-world tasks show that our method achieves over 90% attack success rate without affecting benign task performance, revealing an underexplored vulnerability in embodied AI systems.
Authors:Yunrui Yu, Xuxiang Feng, Pengda Qin, Pengyang Wang, Kafeng Wang, Cheng-zhong Xu, Hang Su, Jun Zhu
Abstract:
Adversarial robustness evaluation faces a critical challenge as new defense paradigms emerge that can exploit limitations in existing assessment methods. This paper reveals that Dummy Classes-based defenses, which introduce an additional "dummy" class as a safety sink for adversarial examples, achieve significantly overestimated robustness under conventional evaluation strategies like AutoAttack. The fundamental limitation stems from these attacks' singular focus on misleading the true class label, which aligns perfectly with the defense mechanism--successful attacks are simply captured by the dummy class. To address this gap, we propose Dummy-Aware Weighted Attack (DAWA), a novel evaluation method that simultaneously targets both the true label and dummy label with adaptive weighting during adversarial example synthesis. Extensive experiments demonstrate that DAWA effectively breaks this defense paradigm, reducing the measured robustness of a leading Dummy Classes-based defense from 58.61% to 29.52% on CIFAR-10 under l_infty perturbation (epsilon=8/255). Our work provides a more reliable benchmark for evaluating this emerging class of defenses and highlights the need for continuous evolution of robustness assessment methodologies.
Authors:Haoyuan He, Yu Zheng, Jie Zhou, Jiwen Lu
Abstract:
Robust watermarking is critical for intellectual property protection, whereas existing methods face a severe vulnerability against regeneration-based AIGC attacks. We identify that existing methods fail because they entangle the watermark with high-frequency cover texture, which is susceptible to being rewritten during generative purification. To address this, we propose WaterVIB, a theoretically grounded framework that reformulates the encoder as an information sieve via the Variational Information Bottleneck. Instead of overfitting to fragile cover details, our approach forces the model to learn a Minimal Sufficient Statistic of the message. This effectively filters out redundant cover nuances prone to generative shifts, retaining only the essential signal invariant to regeneration. We theoretically prove that optimizing this bottleneck is a necessary condition for robustness against distribution-shifting attacks. Extensive experiments demonstrate that WaterVIB significantly outperforms state-of-the-art methods, achieving superior zero-shot resilience against unknown diffusion-based editing.
Authors:Pengcheng Li, Jie Zhang, Tianwei Zhang, Han Qiu, Zhang kejun, Weiming Zhang, Nenghai Yu, Wenbo Zhou
Abstract:
Safety alignment in large language models is typically evaluated under isolated queries, yet real-world use is inherently multi-turn. Although multi-turn jailbreaks are empirically effective, the structure of conversational safety failure remains insufficiently understood. In this work, we study safety failures from a state-space perspective and show that many multi-turn failures arise from structured contextual state evolution rather than isolated prompt vulnerabilities. We introduce STAR, a state-oriented diagnostic framework that treats dialogue history as a state transition operator and enables controlled analysis of safety behavior along interaction trajectories. Rather than optimizing attack strength, STAR provides a principled probe of how aligned models traverse the safety boundary under autoregressive conditioning. Across multiple frontier language models, we find that systems that appear robust under static evaluation can undergo rapid and reproducible safety collapse under structured multi-turn interaction. Mechanistic analysis reveals monotonic drift away from refusal-related representations and abrupt phase transitions induced by role-conditioned context. Together, these findings motivate viewing language model safety as a dynamic, state-dependent process defined over conversational trajectories.
Authors:Yanghao Su, Wenbo Zhou, Tianwei Zhang, Qiu Han, Weiming Zhang, Nenghai Yu, Jie Zhang
Abstract:
Emergent Misalignment refers to a failure mode in which fine-tuning large language models (LLMs) on narrowly scoped data induces broadly misaligned behavior. Prior explanations mainly attribute this phenomenon to the generalization of erroneous or unsafe content. In this work, we show that this view is incomplete. Across multiple domains and model families, we find that fine-tuning models on data exhibiting specific character-level dispositions induces substantially stronger and more transferable misalignment than incorrect-advice fine-tuning, while largely preserving general capabilities. This indicates that emergent misalignment arises from stable shifts in model behavior rather than from capability degradation or corrupted knowledge. We further show that such behavioral dispositions can be conditionally activated by both training-time triggers and inference-time persona-aligned prompts, revealing shared structure across emergent misalignment, backdoor activation, and jailbreak susceptibility. Overall, our results identify character formation as a central and underexplored alignment risk, suggesting that robust alignment must address behavioral dispositions rather than isolated errors or prompt-level defenses.
Authors:Xin Zhang, Zijin Yang, Kejiang Chen, Linfeng Ma, Weiming Zhang, Nenghai Yu
Abstract:
Latent-based watermarks, integrated into the generation process of latent diffusion models (LDMs), simplify detection and attribution of generated images. However, recent black-box forgery attacks, where an attacker needs at least one watermarked image and black-box access to the provider's model, can embed the provider's watermark into images not produced by the provider, posing outsized risk to provenance and trust. We propose SemBind, the first defense framework for latent-based watermarks that resists black-box forgery by binding latent signals to image semantics via a learned semantic masker. Trained with contrastive learning, the masker yields near-invariant codes for the same prompt and near-orthogonal codes across prompts; these codes are reshaped and permuted to modulate the target latent before any standard latent-based watermark. SemBind is generally compatible with existing latent-based watermarking schemes and keeps image quality essentially unchanged, while a simple mask-ratio parameter offers a tunable trade-off between anti-forgery strength and robustness. Across four mainstream latent-based watermark methods, our SemBind-enabled anti-forgery variants markedly reduce false acceptance under black-box forgery while providing a controllable robustness-security balance.
Authors:Yuang Qi, Na Zhao, Qiyi Yao, Benlong Wu, Weiming Zhang, Nenghai Yu, Kejiang Chen
Abstract:
Recent provably secure linguistic steganography (PSLS) methods rely on mainstream autoregressive language models (ARMs) to address historically challenging tasks, that is, to disguise covert communication as ``innocuous'' natural language communication. However, due to the characteristic of sequential generation of ARMs, the stegotext generated by ARM-based PSLS methods will produce serious error propagation once it changes, making existing methods unavailable under an active tampering attack. To address this, we propose a robust, provably secure linguistic steganography with diffusion language models (DLMs). Unlike ARMs, DLMs can generate text in a partially parallel manner, allowing us to find robust positions for steganographic embedding that can be combined with error-correcting codes. Furthermore, we introduce error correction strategies, including pseudo-random error correction and neighborhood search correction, during steganographic extraction. Theoretical proof and experimental results demonstrate that our method is secure and robust. It can resist token ambiguity in stegotext segmentation and, to some extent, withstand token-level attacks of insertion, deletion, and substitution.
Authors:Yanlin Wang, Ziyao Zhang, Chong Wang, Xinyi Xu, Mingwei Liu, Yong Wang, Jiachi Chen, Zibin Zheng
Abstract:
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area. Existing benchmarks often fall short by relying on synthetic vulnerabilities or evaluating functional correctness in isolation, failing to capture the complex interplay between functionality and security found in real-world software. To address this gap, we introduce RealSec-bench, a new benchmark for secure code generation meticulously constructed from real-world, high-risk Java repositories. Our methodology employs a multi-stage pipeline that combines systematic SAST scanning with CodeQL, LLM-based false positive elimination, and rigorous human expert validation. The resulting benchmark contains 105 instances grounded in real-word repository contexts, spanning 19 Common Weakness Enumeration (CWE) types and exhibiting a wide diversity of data flow complexities, including vulnerabilities with up to 34-hop inter-procedural dependencies. Using RealSec-bench, we conduct an extensive empirical study on 5 popular LLMs. We introduce a novel composite metric, SecurePass@K, to assess both functional correctness and security simultaneously. We find that while Retrieval-Augmented Generation (RAG) techniques can improve functional correctness, they provide negligible benefits to security. Furthermore, explicitly prompting models with general security guidelines often leads to compilation failures, harming functional correctness without reliably preventing vulnerabilities. Our work highlights the gap between functional and secure code generation in current LLMs.
Authors:Zhenhong Zhou, Yuanhe Zhang, Hongwei Cai, Moayad Aloqaily, Ouns Bouachir, Linsey Pang, Prakhar Mehrotra, Kun Wang, Qingsong Wen
Abstract:
The Model Context Protocol (MCP) standardizes tool use for LLM-based agents and enable third-party servers. This openness introduces a security misalignment: agents implicitly trust tools exposed by potentially untrusted MCP servers. However, despite its excellent utility, existing agents typically offer limited validation for third-party MCP servers. As a result, agents remain vulnerable to MCP-based attacks that exploit the misalignment between agents and servers throughout the tool invocation lifecycle. In this paper, we propose MCPShield as a plug-in security cognition layer that mitigates this misalignment and ensures agent security when invoking MCP-based tools. Drawing inspiration from human experience-driven tool validation, MCPShield assists agent forms security cognition with metadata-guided probing before invocation. Our method constrains execution within controlled boundaries while cognizing runtime events, and subsequently updates security cognition by reasoning over historical traces after invocation, building on human post-use reflection on tool behavior. Experiments demonstrate that MCPShield exhibits strong generalization in defending against six novel MCP-based attack scenarios across six widely used agentic LLMs, while avoiding false positives on benign servers and incurring low deployment overhead. Overall, our work provides a practical and robust security safeguard for MCP-based tool invocation in open agent ecosystems.
Authors:Zonghao Ying, Haowen Dai, Tianyuan Zhang, Yisong Xiao, Quanchen Zou, Aishan Liu, Jian Yang, Yaodong Yang, Xianglong Liu
Abstract:
Self-evolving agents offer a promising path toward scalable autonomy. However, in this work, we show that in competitive environments, self-evolution can instead give rise to a serious and previously underexplored risk: the spontaneous emergence of deception as an evolutionarily stable strategy. We conduct a systematic empirical study on the self-evolution of large language model (LLM) agents in a competitive Bidding Arena, where agents iteratively refine their strategies through interaction-driven reflection. Across different evolutionary paths (\eg, Neutral, Honesty-Guided, and Deception-Guided), we find a consistent pattern: under utility-driven competition, unconstrained self-evolution reliably drifts toward deceptive behaviors, even when honest strategies remain viable. This drift is explained by a fundamental asymmetry in generalization. Deception evolves as a transferable meta-strategy that generalizes robustly across diverse and unseen tasks, whereas honesty-based strategies are fragile and often collapse outside their original contexts. Further analysis of agents internal states reveals the emergence of rationalization mechanisms, through which agents justify or deny deceptive actions to reconcile competitive success with normative instructions. Our paper exposes a fundamental tension between agent self-evolution and alignment, highlighting the risks of deploying self-improving agents in adversarial environments.
Authors:Bo Zhang, Jiaxuan Guo, Lijun Li, Dongrui Liu, Sujin Chen, Guanxu Chen, Zhijie Zheng, Qihao Lin, Lewen Yan, Chen Qian, Yijin Zhou, Yuyao Wu, Shaoxiong Guo, Tianyi Du, Jingyi Yang, Xuhao Hu, Ziqi Miao, Xiaoya Lu, Jing Shao, Xia Hu
Abstract:
As the development of Large Models (LMs) progresses rapidly, their safety is also a priority. In current Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) safety workflow, evaluation, diagnosis, and alignment are often handled by separate tools. Specifically, safety evaluation can only locate external behavioral risks but cannot figure out internal root causes. Meanwhile, safety diagnosis often drifts from concrete risk scenarios and remains at the explainable level. In this way, safety alignment lack dedicated explanations of changes in internal mechanisms, potentially degrading general capabilities. To systematically address these issues, we propose an open-source project, namely DeepSight, to practice a new safety evaluation-diagnosis integrated paradigm. DeepSight is low-cost, reproducible, efficient, and highly scalable large-scale model safety evaluation project consisting of a evaluation toolkit DeepSafe and a diagnosis toolkit DeepScan. By unifying task and data protocols, we build a connection between the two stages and transform safety evaluation from black-box to white-box insight. Besides, DeepSight is the first open source toolkit that support the frontier AI risk evaluation and joint safety evaluation and diagnosis.
Authors:Lige Huang, Zicheng Liu, Jie Zhang, Lewen Yan, Dongrui Liu, Jing Shao
Abstract:
The dual offensive and defensive utility of Large Language Models (LLMs) highlights a critical gap in AI security: the lack of unified frameworks for dynamic, iterative adversarial adaptation hardening. To bridge this gap, we propose the Red Team vs. Blue Team (RvB) framework, formulated as a training-free, sequential, imperfect-information game. In this process, the Red Team exposes vulnerabilities, driving the Blue Team to learning effective solutions without parameter updates. We validate our framework across two challenging domains: dynamic code hardening against CVEs and guardrail optimization against jailbreaks. Our empirical results show that this interaction compels the Blue Team to learn fundamental defensive principles, leading to robust remediations that are not merely overfitted to specific exploits. RvB achieves Defense Success Rates of 90\% and 45\% across the respective tasks while maintaining near 0\% False Positive Rates, significantly surpassing baselines. This work establishes the iterative adversarial interaction framework as a practical paradigm that automates the continuous hardening of AI systems.
Authors:Yechao Zhang, Shiqian Zhao, Jie Zhang, Gelei Deng, Jiawen Zhang, Xiaogeng Liu, Chaowei Xiao, Tianwei Zhang
Abstract:
We identify a critical security vulnerability in mainstream Claw personal AI agents: untrusted content encountered during heartbeat-driven background execution can silently pollute agent memory and subsequently influence user-facing behavior without the user's awareness. This vulnerability arises from an architectural design shared across the Claw ecosystem: heartbeat background execution runs in the same session as user-facing conversation, so content ingested from any external source monitored in the background (including email, message channels, news feeds, code repositories, and social platforms) can enter the same memory context used for foreground interaction, often with limited user visibility and without clear source provenance. We formalize this process as an Exposure (E) $\rightarrow$ Memory (M) $\rightarrow$ Behavior (B) pathway: misinformation encountered during heartbeat execution enters the agent's short-term session context, potentially gets written into long-term memory, and later shapes downstream user-facing behavior. We instantiate this pathway in an agent-native social setting using MissClaw, a controlled research replica of Moltbook. We find that (1) social credibility cues, especially perceived consensus, are the dominant driver of short-term behavioral influence, with misleading rates up to 61%; (2) routine memory-saving behavior can promote short-term pollution into durable long-term memory at rates up to 91%, with cross-session behavioral influence reaching 76%; (3) under naturalistic browsing with content dilution and context pruning, pollution still crosses session boundaries. Overall, prompt injection is not required: ordinary social misinformation is sufficient to silently shape agent memory and behavior under heartbeat-driven background execution.
Authors:Xiao Tang, Zhen Ma, Limeng Dong, Yichen Wang, Qinghe Du, Dusit Niyato, Zhu Han
Abstract:
Malicious jamming presents a pervasive threat to the secure communications, where the challenge becomes increasingly severe due to the growing capability of the jammer allowing the adaptation to legitimate transmissions. This paper investigates the jamming mitigation by leveraging an active reconfigurable intelligent surface (ARIS), where the channel uncertainties are particularly addressed for robust anti-jamming design. Towards this issue, we adopt the Stackelberg game formulation to model the strategic interaction between the legitimate side and the adversary, acting as the leader and follower, respectively. We prove the existence of the game equilibrium and adopt the backward induction method for equilibrium analysis. We first derive the optimal jamming policy as the follower's best response, which is then incorporated into the legitimate-side optimization for robust anti-jamming design. We address the uncertainty issue and reformulate the legitimate-side problem by exploiting the error bounds to combat the worst-case jamming attacks. The problem is decomposed within a block successive upper bound minimization (BSUM) framework to tackle the power allocation, transceiving beamforming, and active reflection, respectively, which are iterated towards the robust jamming mitigation scheme. Simulation results are provided to demonstrate the effectiveness of the proposed scheme in protecting the legitimate transmissions under uncertainties, and the superior performance in terms of jamming mitigation as compared with the baselines.
Authors:Gelei Deng, Yi Liu, Yuekang Li, Ruozhao Yang, Xiaofei Xie, Jie Zhang, Han Qiu, Tianwei Zhang
Abstract:
LLM-based agents show promise for automating penetration testing, yet reported performance varies widely across systems and benchmarks. We analyze 28 LLM-based penetration testing systems and evaluate five representative implementations across three benchmarks of increasing complexity. Our analysis reveals two distinct failure modes: Type A failures stem from capability gaps (missing tools, inadequate prompts) that engineering readily addresses, while Type B failures persist regardless of tooling due to planning and state management limitations. We show that Type B failures share a root cause that is largely invariant to the underlying LLM: agents lack real-time task difficulty estimation. As a result, agents misallocate effort, over-commit to low-value branches, and exhaust context before completing attack chains. Based on this insight, we present Excalibur, a penetration testing agent that couples strong tooling with difficulty-aware planning. A Tool and Skill Layer eliminates Type A failures through typed interfaces and retrieval-augmented knowledge. A Task Difficulty Assessment (TDA) mechanism addresses Type B failures by estimating tractability through four measurable dimensions (horizon estimation, evidence confidence, context load, and historical success) and uses these estimates to guide exploration-exploitation decisions within an Evidence-Guided Attack Tree Search (EGATS) framework. Excalibur achieves up to 91% task completion on CTF benchmarks with frontier models (39 to 49% relative improvement over baselines) and compromises 4 of 5 hosts on the GOAD Active Directory environment versus 2 by prior systems. These results show that difficulty-aware planning yields consistent end-to-end gains across models and addresses a limitation that model scaling alone does not eliminate.
Authors:Haoran Ou, Kangjie Chen, Gelei Deng, Hangcheng Liu, Jie Zhang, Tianwei Zhang, Kwok-Yan Lam
Abstract:
Fact-checking systems with search-enabled large language models (LLMs) have shown strong potential for verifying claims by dynamically retrieving external evidence. However, the robustness of such systems against adversarial attack remains insufficiently understood. In this work, we study adversarial claim attacks against search-enabled LLM-based fact-checking systems under a realistic input-only threat model. We propose DECEIVE-AFC, an agent-based adversarial attack framework that integrates novel claim-level attack strategies and adversarial claim validity evaluation principles. DECEIVE-AFC systematically explores adversarial attack trajectories that disrupt search behavior, evidence retrieval, and LLM-based reasoning without relying on access to evidence sources or model internals. Extensive evaluations on benchmark datasets and real-world systems demonstrate that our attacks substantially degrade verification performance, reducing accuracy from 78.7% to 53.7%, and significantly outperform existing claim-based attack baselines with strong cross-system transferability.
Authors:Yifan Zhang, Yishan Yang, Riku Jäntti, Zheng Yan, Dusit Niyato, Zhu Han
Abstract:
Passive eavesdropping compromises confidentiality in wireless networks, especially in resource-constrained environments where heavyweight cryptography is impractical. Physical layer security (PLS) exploits channel randomness and spatial selectivity to confine information to an intended receiver with modest overhead. However, typical PLS techniques, such as using beamforming, artificial noise, and reconfigurable intelligent surfaces, often involve added active power or specialized deployment, and, in many designs, rely on precise time synchronization and perfect CSI estimation, which limits their practicality. To this end, we propose AmbShield, an AmBD-assisted PLS scheme that leverages naturally distributed AmBDs to simultaneously strengthen the legitimate channel and degrade eavesdroppers' without requiring extra transmit power and with minimal deployment overhead. In AmbShield, AmBDs are exploited as friendly jammers that randomly backscatter to create interference at eavesdroppers, and as passive relays that backscatter the desired signal to enhance the capacity of legitimate devices. We further develop a unified analytical framework that analyzes the exact probability density function (PDF) and cumulative distribution function (CDF) of legitimate and eavesdropper signal-to-interference-noise ratio (SINR), and a closed-form secrecy outage probability (SOP). The analysis provides clear design guidelines on various practical system parameters to minimize SOP. Extensive experiments that include Monte Carlo simulations, theoretical derivations, and high-SNR asymptotic analysis demonstrate the security gains of AmbShield across diverse system parameters under imperfect synchronization and CSI estimation.
Authors:Zhiyi Mou, Jingyuan Yang, Zeheng Qian, Wangze Ni, Tianfang Xiao, Ning Liu, Chen Zhang, Zhan Qin, Kui Ren
Abstract:
While Large Language Models (LLMs) have powerful capabilities, they remain vulnerable to jailbreak attacks, which is a critical barrier to their safe web real-time application. Current commercial LLM providers deploy output guardrails to filter harmful outputs, yet these defenses are not impenetrable. Due to LLMs' reliance on autoregressive, token-by-token inference, their semantic representations lack robustness to spatially structured perturbations, such as redistributing tokens across different rows, columns, or diagonals. Exploiting the Transformer's spatial weakness, we propose SpatialJB to disrupt the model's output generation process, allowing harmful content to bypass guardrails without detection. Comprehensive experiments conducted on leading LLMs get nearly 100% ASR, demonstrating the high effectiveness of SpatialJB. Even after adding advanced output guardrails, like the OpenAI Moderation API, SpatialJB consistently maintains a success rate exceeding 75%, outperforming current jailbreak techniques by a significant margin. The proposal of SpatialJB exposes a key weakness in current guardrails and emphasizes the importance of spatial semantics, offering new insights to advance LLM safety research. To prevent potential misuse, we also present baseline defense strategies against SpatialJB and evaluate their effectiveness in mitigating such attacks. The code for the attack, baseline defenses, and a demo are available at https://anonymous.4open.science/r/SpatialJailbreak-8E63.
Authors:Guangsheng Zhang, Huan Tian, Leo Zhang, Tianqing Zhu, Ming Ding, Wanlei Zhou, Bo Liu
Abstract:
Semantic segmentation models are widely deployed in safety-critical applications such as autonomous driving, yet their vulnerability to backdoor attacks remains largely underexplored. Prior segmentation backdoor studies transfer threat settings from existing image classification tasks, focusing primarily on object-to-background mis-segmentation. In this work, we revisit the threats by systematically examining backdoor attacks tailored to semantic segmentation. We identify four coarse-grained attack vectors (Object-to-Object, Object-to-Background, Background-to-Object, and Background-to-Background attacks), as well as two fine-grained vectors (Instance-Level and Conditional attacks). To formalize these attacks, we introduce BADSEG, a unified framework that optimizes trigger designs and applies label manipulation strategies to maximize attack performance while preserving victim model utility. Extensive experiments across diverse segmentation architectures on benchmark datasets demonstrate that BADSEG achieves high attack effectiveness with minimal impact on clean samples. We further evaluate six representative defenses and find that they fail to reliably mitigate our attacks, revealing critical gaps in current defenses. Finally, we demonstrate that these vulnerabilities persist in recent emerging architectures, including transformer-based networks and the Segment Anything Model (SAM), thereby compromising their security. Our work reveals previously overlooked security vulnerabilities in semantic segmentation, and motivates the development of defenses tailored to segmentation-specific threat models.
Authors:Yuchen Shi, Huajie Chen, Heng Xu, Zhiquan Liu, Jialiang Shen, Chi Liu, Shuai Zhou, Tianqing Zhu, Wanlei Zhou
Abstract:
Transfer learning is devised to leverage knowledge from pre-trained models to solve new tasks with limited data and computational resources. Meanwhile, dataset distillation has emerged to synthesize a compact dataset that preserves critical information from the original large dataset. Therefore, a combination of transfer learning and dataset distillation offers promising performance in evaluations. However, a non-negligible security threat remains undiscovered in transfer learning using synthetic datasets generated by dataset distillation methods, where an adversary can perform a model hijacking attack with only a few poisoned samples in the synthetic dataset. To reveal this threat, we propose Osmosis Distillation (OD) attack, a novel model hijacking strategy that targets deep learning models using the fewest samples. Comprehensive evaluations on various datasets demonstrate that the OD attack attains high attack success rates in hidden tasks while preserving high model utility in original tasks. Furthermore, the distilled osmosis set enables model hijacking across diverse model architectures, allowing model hijacking in transfer learning with considerable attack performance and model utility. We argue that awareness of using third-party synthetic datasets in transfer learning must be raised.
Authors:Huajie Chen, Tianqing Zhu, Hailin Yang, Yuchen Zhong, Yang Zhang, Hui Sun, Heng Xu, Zuobin Ying, Lihua Yin, Wanlei Zhou
Abstract:
Watermarking has emerged as a key defense against the misuse of machine-generated images (MGIs). Yet the robustness of these protections remains underexplored. To reveal the limits of SOTA proactive image watermarking defenses, we propose HIDE&SEEK (HS), a suite of versatile and cost-effective attacks that reliably remove embedded watermarks while preserving high visual fidelity.
Authors:Huajie Chen, Tianqing Zhu, Yuchen Zhong, Yang Zhang, Shang Wang, Feng He, Lefeng Zhang, Jialiang Shen, Minghao Wang, Wanlei Zhou
Abstract:
Dataset distillation compresses a large real dataset into a small synthetic one, enabling models trained on the synthetic data to achieve performance comparable to those trained on the real data. Although synthetic datasets are assumed to be privacy-preserving, we show that existing distillation methods can cause severe privacy leakage because synthetic datasets implicitly encode the weight trajectories of the distilled model, they become over-informative and exploitable by adversaries. To expose this risk, we introduce the Information Revelation Attack (IRA) against state-of-the-art distillation techniques. Experiments show that IRA accurately predicts both the distillation algorithm and model architecture, and can successfully infer membership and recover sensitive samples from the real dataset.
Authors:Heng Xu, Tianqing Zhu, Dayong Ye, Lefeng Zhang, Le Wang, Wanlei Zhou
Abstract:
Machine unlearning, a process enabling pre-trained models to remove the influence of specific training samples, has attracted significant attention in recent years. Although extensive research has focused on developing efficient machine unlearning strategies, we argue that these methods mainly aim at removing samples rather than removing samples' influence on the model, thus overlooking the fundamental definition of machine unlearning. In this paper, we first conduct a comprehensive study to evaluate the effectiveness of existing unlearning schemes when the training dataset includes many samples similar to those targeted for unlearning. Specifically, we evaluate: Do existing unlearning methods truly adhere to the original definition of machine unlearning and effectively eliminate all influence of target samples when similar samples are present in the training dataset? Our extensive experiments, conducted on four carefully constructed datasets with thorough analysis, reveal a notable gap between the expected and actual performance of most existing unlearning methods for image and language models, even for the retraining-from-scratch baseline. Additionally, we also explore potential solutions to enhance current unlearning approaches.
Authors:Yukai Zhao, Menghan Wu, Xing Hu, Shaohua Wang, Meng Luo, Xin Xia
Abstract:
Developers utilize third-party libraries to improve productivity, which also introduces potential security risks. Existing approaches generate tests for public functions to trigger library vulnerabilities from client programs, yet they depend on proof-of-concepts (PoCs), which are often unavailable. In this paper, we propose a new approach, LiveFuzz, based on directed greybox fuzzing (DGF) to detect the exploitability of library vulnerabilities from client programs without PoCs. LiveFuzz exploits a target tuple to extend existing DGF techniques to cross-program scenarios. Based on the target tuple, LiveFuzz introduces a novel Abstract Path Mapping mechanism to project execution paths, mitigating the preference for shorter paths. LiveFuzz also proposes a risk-based adaptive mutation to mitigate excessive mutation. To evaluate LiveFuzz, we construct a new dataset including 61 cases of library vulnerabilities exploited from client programs. Results show that LiveFuzz increases the number of target-reachable paths compared with all baselines and improves the average speed of vulnerability exposure. Three vulnerabilities are triggered exclusively by LiveFuzz.
Authors:Zirui Chen, Qi Zhan, Jiayuan Zhou, Xing Hu, Xin Xia, Xiaohu Yang
Abstract:
Open-source software supply chain security relies heavily on assessing affected versions of library vulnerabilities. While prior studies have leveraged exploits for verifying vulnerability affected versions, they point out a key limitation that exploits are version-specific and cannot be directly applied across library versions. Despite being widely acknowledged, this limitation has not been systematically validated at scale, leaving the actual applicability of exploits across versions unexplored. To fill this gap, we conduct the first large-scale empirical study on exploit applicability across library versions. We construct a comprehensive dataset consisting of 259 exploits spanning 128 Java libraries and 28,150 historical versions, covering 61 CWEs that account for 76.33% of vulnerabilities in Maven. Leveraging this dataset, we execute each exploit against the library version history and compare the execution outcomes with our manually annotated ground-truth affected versions. We further investigate the root causes of inconsistencies between exploit execution and ground truth, and explore strategies for exploit migration. Our results (RQ1) show that, even without migration, exploits achieve 83.0% recall and 99.3% precision in identifying affected versions in Java, outperforming most widely used vulnerability databases and assessment tools. Notably, this capability enables us to contribute 796 confirmed missing affected versions to the CPE dictionary. We investigate the remaining exploit failures (RQ2) and find that they mainly stem from compatibility issues introduced by library evolution and changing environmental constraints. Based on these observations, we manually migrate exploits for 1,885 versions and distill a taxonomy of 10 strategies from these successful adaptation cases (RQ3), thereby increasing the overall recall to 96.1%.
Authors:Ye Leng, Junjie Chu, Mingjie Li, Chenhao Lin, Chao Shen, Michael Backes, Yun Shen, Yang Zhang
Abstract:
Recently, multimodal large language models (MLLMs) have emerged as a unified paradigm for language and image generation. Compared with diffusion models, MLLMs possess a much stronger capability for semantic understanding, enabling them to process more complex textual inputs and comprehend richer contextual meanings. However, this enhanced semantic ability may also introduce new and potentially greater safety risks. Taking diffusion models as a reference point, we systematically analyze and compare the safety risks of emerging MLLMs along two dimensions: unsafe content generation and fake image synthesis. Across multiple unsafe generation benchmark datasets, we observe that MLLMs tend to generate more unsafe images than diffusion models. This difference partly arises because diffusion models often fail to interpret abstract prompts, producing corrupted outputs, whereas MLLMs can comprehend these prompts and generate unsafe content. For current advanced fake image detectors, MLLM-generated images are also notably harder to identify. Even when detectors are retrained with MLLMs-specific data, they can still be bypassed by simply providing MLLMs with longer and more descriptive inputs. Our measurements indicate that the emerging safety risks of the cutting-edge generative paradigm, MLLMs, have not been sufficiently recognized, posing new challenges to real-world safety.
Authors:Junjie Chu, Yiting Qu, Ye Leng, Michael Backes, Yun Shen, Savvas Zannettou, Yang Zhang
Abstract:
Large Language Models (LLMs) are increasingly trained to align with human values, primarily focusing on task level, i.e., refusing to execute directly harmful tasks. However, a subtle yet crucial content-level ethical question is often overlooked: when performing a seemingly benign task, will LLMs -- like morally conscious human beings -- refuse to proceed when encountering harmful content in user-provided material? In this study, we aim to understand this content-level ethical question and systematically evaluate its implications for mainstream LLMs. We first construct a harmful knowledge dataset (i.e., non-compliant with OpenAI's usage policy) to serve as the user-supplied harmful content, with 1,357 entries across ten harmful categories. We then design nine harmless tasks (i.e., compliant with OpenAI's usage policy) to simulate the real-world benign tasks, grouped into three categories according to the extent of user-supplied content required: extensive, moderate, and limited. Leveraging the harmful knowledge dataset and the set of harmless tasks, we evaluate how nine LLMs behave when exposed to user-supplied harmful content during the execution of benign tasks, and further examine how the dynamics between harmful knowledge categories and tasks affect different LLMs. Our results show that current LLMs, even the latest GPT-5.2 and Gemini-3-Pro, often fail to uphold human-aligned ethics by continuing to process harmful content in harmless tasks. Furthermore, external knowledge from the ``Violence/Graphic'' category and the ``Translation'' task is more likely to elicit harmful responses from LLMs. We also conduct extensive ablation studies to investigate potential factors affecting this novel misuse vulnerability. We hope that our study could inspire enhanced safety measures among stakeholders to mitigate this overlooked content-level ethical risk.
Authors:Junjie Chu, Xinyue Shen, Ye Leng, Michael Backes, Yun Shen, Yang Zhang
Abstract:
The rapid growth of research in LLM safety makes it hard to track all advances. Benchmarks are therefore crucial for capturing key trends and enabling systematic comparisons. Yet, it remains unclear why certain benchmarks gain prominence, and no systematic assessment has been conducted on their academic influence or code quality. This paper fills this gap by presenting the first multi-dimensional evaluation of the influence (based on five metrics) and code quality (based on both automated and human assessment) on LLM safety benchmarks, analyzing 31 benchmarks and 382 non-benchmarks across prompt injection, jailbreak, and hallucination. We find that benchmark papers show no significant advantage in academic influence (e.g., citation count and density) over non-benchmark papers. We uncover a key misalignment: while author prominence correlates with paper influence, neither author prominence nor paper influence shows a significant correlation with code quality. Our results also indicate substantial room for improvement in code and supplementary materials: only 39% of repositories are ready-to-use, 16% include flawless installation guides, and a mere 6% address ethical considerations. Given that the work of prominent researchers tends to attract greater attention, they need to lead the effort in setting higher standards.
Authors:Yage Zhang, Yukun Jiang, Zeyuan Chen, Michael Backes, Xinyue Shen, Yang Zhang
Abstract:
Access to frontier large language models (LLMs), such as GPT-5 and Gemini-2.5, is often hindered by high pricing, payment barriers, and regional restrictions. These limitations drive the proliferation of $\textit{shadow APIs}$, third-party services that claim to provide access to official model services without regional limitations via indirect access. Despite their widespread use, it remains unclear whether shadow APIs deliver outputs consistent with those of the official APIs, raising concerns about the reliability of downstream applications and the validity of research findings that depend on them. In this paper, we present the first systematic audit between official LLM APIs and corresponding shadow APIs. We first identify 17 shadow APIs that have been utilized in 187 academic papers, with the most popular one reaching 5,966 citations and 58,639 GitHub stars by December 6, 2025. Through multidimensional auditing of three representative shadow APIs across utility, safety, and model verification, we uncover both indirect and direct evidence of deception practices in shadow APIs. Specifically, we reveal performance divergence reaching up to $47.21\%$, significant unpredictability in safety behaviors, and identity verification failures in $45.83\%$ of fingerprint tests. These deceptive practices critically undermine the reproducibility and validity of scientific research, harm the interests of shadow API users, and damage the reputation of official model providers.
Authors:Yukun Jiang, Yage Zhang, Xinyue Shen, Michael Backes, Yang Zhang
Abstract:
The rapid advancement of artificial intelligence (AI) agents has catalyzed the transition from static language models to autonomous agents capable of tool use, long-term planning, and social interaction. $\textbf{Moltbook}$, the first social network designed exclusively for AI agents, has experienced viral growth in early 2026. To understand the behavior of AI agents in the agent-native community, in this paper, we present a large-scale empirical analysis of Moltbook leveraging a dataset of 44,411 posts and 12,209 sub-communities ("submolts") collected prior to February 1, 2026. Leveraging a topic taxonomy with nine content categories and a five-level toxicity scale, we systematically analyze the topics and risks of agent discussions. Our analysis answers three questions: what topics do agents discuss (RQ1), how risk varies by topic (RQ2), and how topics and toxicity evolve over time (RQ3). We find that Moltbook exhibits explosive growth and rapid diversification, moving beyond early social interaction into viewpoint, incentive-driven, promotional, and political discourse. The attention of agents increasingly concentrates in centralized hubs and around polarizing, platform-native narratives. Toxicity is strongly topic-dependent: incentive- and governance-centric categories contribute a disproportionate share of risky content, including religion-like coordination rhetoric and anti-humanity ideology. Moreover, bursty automation by a small number of agents can produce flooding at sub-minute intervals, distorting discourse and stressing platform stability. Overall, our study underscores the need for topic-sensitive monitoring and platform-level safeguards in agent social networks.
Authors:Guang Yang, Xing Hu, Xiang Chen, Xin Xia
Abstract:
Large language models (LLMs) for Verilog code generation are increasingly adopted in hardware design, yet remain vulnerable to backdoor attacks where adversaries inject malicious triggers during training to induce vulnerable hardware designs. Unlike patchable software vulnerabilities, hardware trojans become irreversible once fabricated, making remediation extremely costly or impossible. Existing active defenses require access to training data, impractical for third-party LLM users, while passive defenses struggle against semantically stealthy triggers that naturally blend into design specifications. In this paper, we hypothesize that under the requirements of both effectiveness and stealthiness, attackers are strongly biased toward embedding triggers in non-functional requirements (e.g., style modifiers, quality descriptors) rather than functional specifications that determine hardware behavior. Exploiting this insight, we propose Semantic Consensus Decoding (SCD), an inference-time passive defense with two key components: (1) functional requirement extraction that identifies essential requirements from user specifications, and (2) consensus decoding that adaptively fuses output distributions based on full user specifications and extracted functional requirements. When these distributions diverge significantly, SCD automatically suppresses suspicious components. Extensive experiments with three representative backdoor attacks demonstrate that SCD reduces average attack success rate from 89% to under 3% with negligible impact on generation quality.
Authors:Xiaojun Jia, Jie Liao, Simeng Qin, Jindong Gu, Wenqi Ren, Xiaochun Cao, Yang Liu, Philip Torr
Abstract:
Agent skills are becoming a core abstraction in coding agents, packaging long-form instructions and auxiliary scripts to extend tool-augmented behaviors. This abstraction introduces an under-measured attack surface: skill-based prompt injection, where poisoned skills can steer agents away from user intent and safety policies. In practice, naive injections often fail because the malicious intent is too explicit or drifts too far from the original skill, leading agents to ignore or refuse them; existing attacks are also largely hand-crafted. We propose the first automated framework for stealthy prompt injection tailored to agent skills. The framework forms a closed loop with three agents: an Attack Agent that synthesizes injection skills under explicit stealth constraints, a Code Agent that executes tasks using the injected skills in a realistic tool environment, and an Evaluate Agent that logs action traces (e.g., tool calls and file operations) and verifies whether targeted malicious behaviors occurred. We also propose a malicious payload hiding strategy that conceals adversarial operations in auxiliary scripts while injecting optimized inducement prompts to trigger tool execution. Extensive experiments across diverse coding-agent settings and real-world software engineering tasks show that our method consistently achieves high attack success rates under realistic settings.
Authors:Xinwei Zhang, Li Bai, Tianwei Zhang, Youqian Zhang, Qingqing Ye, Yingnan Zhao, Ruochen Du, Haibo Hu
Abstract:
Large vision-language models (LVLMs) have achieved impressive success across multimodal tasks, but their reliance on visual inputs exposes them to significant adversarial threats. Existing encoder-based attacks perturb the input image by optimizing solely on the vision encoder, rather than the entire LVLM, offering a computationally efficient alternative to end-to-end optimization. However, their transferability across different LVLM architectures in realistic black-box scenarios remains poorly understood. To address this gap, we present the first systematic study towards encoder-based adversarial transferability in LVLMs. Our contributions are threefold. First, through large-scale benchmarking over eight diverse LVLMs, we reveal that existing attacks exhibit severely limited transferability. Second, we perform in-depth analysis, disclosing two root causes that hinder the transferability: (1) inconsistent visual grounding across models, where different models focus their attention on distinct regions; (2) redundant semantic alignment within models, where a single object is dispersed across multiple overlapping token representations. Third, we propose Semantic-Guided Multimodal Attack (SGMA), a novel framework to enhance the transferability. Inspired by the discovered causes in our analysis, SGMA directs perturbations toward semantically critical regions and disrupts cross-modal grounding at both global and local levels. Extensive experiments across different victim models and tasks show that SGMA achieves higher transferability than existing attacks. These results expose critical security risks in LVLM deployment and underscore the urgent need for robust multimodal defenses.
Authors:Xiaoyu Xu, Minxin Du, Kun Fang, Zi Liang, Yaxin Xiao, Zhicong Huang, Cheng Hong, Qingqing Ye, Haibo Hu
Abstract:
Large language models (LLMs) demonstrate impressive capabilities across diverse tasks but raise concerns about privacy, copyright, and harmful materials. Existing LLM unlearning methods rarely consider the continual and high-volume nature of real-world deletion requests, which can cause utility degradation and catastrophic forgetting as requests accumulate. To address this challenge, we introduce \fit, a framework for continual unlearning that handles large numbers of deletion requests while maintaining robustness against both catastrophic forgetting and post-unlearning recovery. \fit mitigates degradation through rigorous data \underline{F}iltering, \underline{I}mportance-aware updates, and \underline{T}argeted layer attribution, enabling stable performance across long sequences of unlearning operations and achieving a favorable balance between forgetting effectiveness and utility retention. To support realistic evaluation, we present \textbf{PCH}, a benchmark covering \textbf{P}ersonal information, \textbf{C}opyright, and \textbf{H}armful content in sequential deletion scenarios, along with two symmetric metrics, Forget Degree (F.D.) and Retain Utility (R.U.), which jointly assess forgetting quality and utility preservation. Extensive experiments on four open-source LLMs with hundreds of deletion requests show that \fit achieves the strongest trade-off between F.D. and R.U., surpasses existing methods on MMLU, CommonsenseQA, and GSM8K, and remains resistant against both relearning and quantization recovery attacks.
Authors:Xinwei Zhang, Hangcheng Liu, Li Bai, Hao Wang, Qingqing Ye, Tianwei Zhang, Haibo Hu
Abstract:
Visual token compression is widely used to accelerate large vision-language models (LVLMs) by pruning or merging visual tokens, yet its adversarial robustness remains unexplored. We show that existing encoder-based attacks can substantially overestimate the robustness of compressed LVLMs, due to an optimization-inference mismatch: perturbations are optimized on the full-token representation, while inference is performed through a token-compression bottleneck. To address this gap, we propose the Compression-AliGnEd attack (CAGE), which aligns perturbation optimization with compression inference without assuming access to the deployed compression mechanism or its token budget. CAGE combines (i) expected feature disruption, which concentrates distortion on tokens likely to survive across plausible budgets, and (ii) rank distortion alignment, which actively aligns token distortions with rank scores to promote the retention of highly distorted evidence. Across diverse representative plug-and-play compression mechanisms and datasets, our results show that CAGE consistently achieves lower robust accuracy than the baseline. This work highlights that robustness assessments ignoring compression can be overly optimistic, calling for compression-aware security evaluation and defenses for efficient LVLMs.
Authors:Li Bai, Junxu Liu, Sen Zhang, Xinwei Zhang, Qingqing Ye, Haibo Hu
Abstract:
Membership inference attacks (MIAs), which determine whether a specific data point was included in the training set of a target model, have posed severe threats in federated learning (FL). Unfortunately, existing MIA defenses, typically applied independently to each client in FL, are ineffective against powerful trajectory-based MIAs that exploit temporal information throughout the training process to infer membership status. In this paper, we investigate a new FL defense scenario driven by heterogeneous privacy needs and privacy-utility trade-offs, where only a subset of clients are defended, as well as a collaborative defense mode where clients cooperate to mitigate membership privacy leakage. To this end, we introduce CoFedMID, a collaborative defense framework against MIAs in FL, which limits local model memorization of training samples and, through a defender coalition, enhances privacy protection and model utility. Specifically, CoFedMID consists of three modules: a class-guided partition module for selective local training samples, a utility-aware compensation module to recycle contributive samples and prevent their overconfidence, and an aggregation-neutral perturbation module that injects noise for cancellation at the coalition level into client updates. Extensive experiments on three datasets show that our defense framework significantly reduces the performance of seven MIAs while incurring only a small utility loss. These results are consistently verified across various defense settings.
Authors:Xiaoyu Xu, Minxin Du, Zitong Li, Zi Liang, Zhibiao Guo, Shiyu Zhang, Peizhao Hu, Qingqing Ye, Haibo Hu
Abstract:
Although machine unlearning is essential for removing private, harmful, or copyrighted content from LLMs, current benchmarks often fail to faithfully represent the true "forgetting scope" learned by the model. We formalize two distinct unlearning granularities, domain-level and instance-level, and propose BiForget, an automated framework for synthesizing high-quality forget sets. Unlike prior work relying on external generators, BiForget exploits the target model per se to elicit data that matches its internal knowledge distribution through seed-guided and adversarial prompting. Our experiments across diverse benchmarks show that it achieves a superior balance of relevance, diversity, and efficiency. Quantitatively, in the Harry Potter domain, it improves relevance by ${\sim}20$ and diversity by ${\sim}$0.05 while halving the total data size compared to SOTAs. Ultimately, it facilitates more robust forgetting and better utility preservation, providing a more rigorous foundation for evaluating LLM unlearning.
Authors:Yuval Felendler, Parth A. Gandhi, Idan Habler, Yuval Elovici, Asaf Shabtai
Abstract:
Model Context Protocols (MCPs) provide a unified platform for agent systems to discover, select, and orchestrate tools across heterogeneous execution environments. As MCP-based systems scale to incorporate larger tool catalogs and multiple concurrently connected MCP servers, traditional tool-by-tool invocation increases coordination overhead, fragments state management, and limits support for wide-context operations. To address these scalability challenges, recent MCP designs have incorporated code execution as a first-class capability, an approach called Code Execution MCP (CE-MCP). This enables agents to consolidate complex workflows, such as SQL querying, file analysis, and multi-step data transformations, into a single program that executes within an isolated runtime environment. In this work, we formalize the architectural distinction between context-coupled (traditional) and context-decoupled (CE-MCP) models, analyzing their fundamental scalability trade-offs. Using the MCP-Bench framework across 10 representative servers, we empirically evaluate task behavior, tool utilization patterns, execution latency, and protocol efficiency as the scale of connected MCP servers and available tools increases, demonstrating that while CE-MCP significantly reduces token usage and execution latency, it introduces a vastly expanded attack surface. We address this security gap by applying the MAESTRO framework, identifying sixteen attack classes across five execution phases-including specific code execution threats such as exception-mediated code injection and unsafe capability synthesis. We validate these vulnerabilities through adversarial scenarios across multiple LLMs and propose a layered defense architecture comprising containerized sandboxing and semantic gating. Our findings provide a rigorous roadmap for balancing scalability and security in production-ready executable agent workflows.
Authors:Nanda Rani, Kimberly Milner, Minghao Shao, Meet Udeshi, Haoran Xi, Venkata Sai Charan Putrevu, Saksham Aggarwal, Sandeep K. Shukla, Prashanth Krishnamurthy, Farshad Khorrami, Muhammad Shafique, Ramesh Karri
Abstract:
Real-world offensive security operations are inherently open-ended: attackers explore unknown attack surfaces, revise hypotheses under uncertainty, and operate without guaranteed success. Existing LLM-based offensive agent evaluations rely on closed-world settings with predefined goals and binary success criteria. To address this gap, we introduce CyberExplorer, an evaluation suite with two core components: (1) an open-environment benchmark built on a virtual machine hosting 40 vulnerable web services derived from real-world CTF challenges, where agents autonomously perform reconnaissance, target selection, and exploitation without prior knowledge of vulnerability locations; and (2) a reactive multi-agent framework supporting dynamic exploration without predefined plans. CyberExplorer enables fine-grained evaluation beyond flag recovery, capturing interaction dynamics, coordination behavior, failure modes, and vulnerability discovery signals-bridging the gap between benchmarks and realistic multi-target attack scenarios.
Authors:Nadya Abaev, Denis Klimov, Gerard Levinov, David Mimran, Yuval Elovici, Asaf Shabtai
Abstract:
Artificial intelligence (AI) agents are increasingly used in a variety of domains to automate tasks, interact with users, and make decisions based on data inputs. Ensuring that AI agents perform only authorized actions and handle inputs appropriately is essential for maintaining system integrity and preventing misuse. In this study, we introduce the AgentGuardian, a novel security framework that governs and protects AI agent operations by enforcing context-aware access-control policies. During a controlled staging phase, the framework monitors execution traces to learn legitimate agent behaviors and input patterns. From this phase, it derives adaptive policies that regulate tool calls made by the agent, guided by both real-time input context and the control flow dependencies of multi-step agent actions. Evaluation across two real-world AI agent applications demonstrates that AgentGuardian effectively detects malicious or misleading inputs while preserving normal agent functionality. Moreover, its control-flow-based governance mechanism mitigates hallucination-driven errors and other orchestration-level malfunctions.
Authors:Hadar Cochavi Gorelik, Orel Fadlon, Denis Klimov, Oleg Brodt, Asaf Shabtai, Yuval Elovici
Abstract:
Modern computing platforms rely on the Unified Extensible Firmware Interface (UEFI) to initialize hardware and coordinate the transition to the operating system. Because this execution environment operates with high privileges and persists across reboots, it has increasingly become a target for advanced threats, including bootkits documented in real systems. Existing protections, including Secure Boot and static signature verification, are insufficient against adversaries who exploit runtime behavior or manipulate firmware components after signature checks have completed. In contrast to operating system (OS) environments, where mature tools provide dynamic inspection and incident response, the pre-OS stage lacks practical mechanisms for real-time visibility and threat detection. We present Peacock, a modular framework that introduces integrity-assured monitoring and remote verification for the UEFI boot process. Peacock consists of three components: (i) a UEFI-based agent that records Boot and Runtime Service activity with cryptographic protection against tampering; (ii) a cross-platform OS Agent that extracts the recorded measurements and produces a verifiable attestation bundle using hardware-backed guarantees from the platform's trusted module; and (iii) a Peacock Server that verifies attestation results and exports structured telemetry for enterprise detection. Our evaluation shows that Peacock reliably detects multiple real-world UEFI bootkits, including Glupteba, BlackLotus, LoJax, and MosaicRegressor. Taken together, these results indicate that Peacock provides practical visibility and verification capabilities within the firmware layer, addressing threats that bypass traditional OS-level security mechanisms.
Authors:Xinhai Wang, Shaopeng Fu, Shu Yang, Liangyu Wang, Tianhang Zheng, Di Wang
Abstract:
Suffix jailbreak attacks serve as a systematic method for red-teaming Large Language Models (LLMs) but suffer from prohibitive computational costs, as a large number of candidate suffixes need to be evaluated before identifying a jailbreak suffix. This paper presents Prefix-Shared KV Cache (PSKV), a plug-and-play inference optimization technique tailored for jailbreak suffix generation. Our method is motivated by a key observation that when performing suffix jailbreaking, while a large number of candidate prompts need to be evaluated, they share the same targeted harmful instruction as the prefix. Therefore, instead of performing redundant inference on the duplicated prefix, PSKV maintains a single KV cache for this prefix and shares it with every candidate prompt, enabling the parallel inference of diverse suffixes with minimal memory overhead. This design enables more aggressive batching strategies that would otherwise be limited by memory constraints. Extensive experiments on six widely used suffix attacks across five widely deployed LLMs demonstrate that PSKV reduces inference time by 40\% and peak memory usage by 50\%, while maintaining the original Attack Success Rate (ASR). The code has been submitted and will be released publicly.
Authors:Zihan Guo, Zhiyu Chen, Xiaohang Nie, Jianghao Lin, Yuanjian Zhou, Weinan Zhang
Abstract:
With the rapid evolution of Large Language Model (LLM) agent ecosystems, centralized skill marketplaces have emerged as pivotal infrastructure for augmenting agent capabilities. However, these marketplaces face unprecedented security challenges, primarily stemming from semantic-behavioral inconsistency and inter-skill combinatorial risks, where individually benign skills induce malicious behaviors during collaborative invocation. To address these vulnerabilities, we propose SkillProbe, a multi-stage security auditing framework driven by multi-agent collaboration. SkillProbe introduces a "Skills-for-Skills" design paradigm, encapsulating auditing processes into standardized skill modules to drive specialized agents through a rigorous pipeline, including admission filtering, semantic-behavioral alignment detection, and combinatorial risk simulation. We conducted a large-scale evaluation using 8 mainstream LLM series across 2,500 real-world skills from ClawHub. Our results reveal a striking popularity-security paradox, where download volume is not a reliable proxy for security quality, as over 90% of high-popularity skills failed to pass rigorous auditing. Crucially, we discovered that high-risk skills form a single giant connected component within the risk-link dimension, demonstrating that cascaded risks are systemic rather than isolated occurrences. We hope that SkillProbe will inspire researchers to provide a scalable governance infrastructure for constructing a trustworthy Agentic Web. SkillProbe is accessible for public experience at skillhub.holosai.io.
Authors:Xun Huang, Simeng Qin, Xiaoshuang Jia, Ranjie Duan, Huanqian Yan, Zhitao Zeng, Fei Yang, Yang Liu, Xiaojun Jia
Abstract:
As Large Language Models (LLMs) are increasingly used, their security risks have drawn increasing attention. Existing research reveals that LLMs are highly susceptible to jailbreak attacks, with effectiveness varying across language contexts. This paper investigates the role of classical Chinese in jailbreak attacks. Owing to its conciseness and obscurity, classical Chinese can partially bypass existing safety constraints, exposing notable vulnerabilities in LLMs. Based on this observation, this paper proposes a framework, CC-BOS, for the automatic generation of classical Chinese adversarial prompts based on multi-dimensional fruit fly optimization, facilitating efficient and automated jailbreak attacks in black-box settings. Prompts are encoded into eight policy dimensions-covering role, behavior, mechanism, metaphor, expression, knowledge, trigger pattern and context; and iteratively refined via smell search, visual search, and cauchy mutation. This design enables efficient exploration of the search space, thereby enhancing the effectiveness of black-box jailbreak attacks. To enhance readability and evaluation accuracy, we further design a classical Chinese to English translation module. Extensive experiments demonstrate that effectiveness of the proposed CC-BOS, consistently outperforming state-of-the-art jailbreak attack methods.
Authors:Shenao Wang, Junjie He, Yanjie Zhao, Yayi Wang, Kan Yu, Haoyu Wang
Abstract:
Skills are increasingly used to extend LLM agents by packaging prompts, code, and configurations into reusable modules. As public registries and marketplaces expand, they form an emerging agentic supply chain, but also introduce a new attack surface for malicious skills. Detecting malicious skills is challenging because relevant evidence is often distributed across heterogeneous artifacts and must be reasoned in context. Existing static, LLM-based, and dynamic approaches each capture only part of this problem, making them insufficient for robust real-world detection. In this paper, we present MalSkills, a neuro-symbolic framework for malicious skills detection. MalSkills first extracts security-sensitive operations from heterogeneous artifacts through a combination of symbolic parsing and LLM-assisted semantic analysis. It then constructs the skill dependency graph that links artifacts, operations, operands, and value flows across the skill. On top of this graph, MalSkills performs neuro-symbolic reasoning to infer malicious patterns or previously unseen suspicious workflows. We evaluate MalSkills on a benchmark of 200 real-world skills against 5 state-of-the-art baselines. MalSkills achieves 93% F1, outperforming the baselines by 5~87 percentage points. We further apply MalSkills to analyze 150,108 skills collected from 7 public registries, revealing 620 malicious skills. As for now, we have finished reviewing 100 of them and identified 76 previously unknown malicious skills, all of which were responsibly reported and are currently awaiting confirmation from the platforms and maintainers. These results demonstrate the potential of MalSkills in securing the agentic supply chain.
Authors:Ataberk Olgun, F. Nisa Bostanci, Ismail Emir Yuksel, Haocong Luo, Minesh Patel, A. Giray Yaglikci, Onur Mutlu
Abstract:
State-of-the-art DRAM read disturbance mitigations rely on the read disturbance threshold (RDT) (e.g., the number of aggressor row activations needed to induce the first read disturbance bitflip) to securely and performance- and energy-efficiently prevent read disturbance bitflips. However, accurately and exhaustively characterizing the RDT of every DRAM row in a chip is time intensive. Rapidly determining RDT is important for enabling secure, performance- and energy-efficient systems. Our goal is to develop and evaluate a reliable and rapid read disturbance testing methodology. To that end, we develop DiscoRD building on the key results of an extensive experimental characterization study using 212 real DDR4 chips whereby we measure the RDT of hundreds of thousands of DRAM rows millions of times. We develop an empirical model for read disturbance bitflips and evaluate the probability of read-disturbance-induced uncorrectable errors when a read disturbance mechanism is configured using a single $RDT_{min}$ measurement. Using this model we demonstrate that 1) relying on a lightweight error-correcting code (ECC) alone yields relatively high uncorrectable error probability and 2) combining ECC, infrequent memory scrubbing, and configurable read disturbance mitigation mechanisms can greatly reduce the error probability. Building on our observations and analyses, we discuss the RDT of each individual row can be identified more precisely. Our results show that error tolerance, memory scrubbing, online profiling, and run-time configurable read disturbance mitigation techniques are important to enable secure and energy-efficient spatial-variation aware read disturbance mitigations. We hope that DiscoRD drives research that enables us to quantitatively navigate the performance/cost - reliability tradeoff space for read disturbance mitigation techniques.
Authors:Li Lu, Yanjie Zhao, Hongzhou Rao, Kechi Zhang, Haoyu Wang
Abstract:
Large Language Models (LLMs) have demonstrated remarkable proficiency in vulnerability detection. However, a critical reliability gap persists: models frequently yield correct detection verdicts based on hallucinated logic or superficial patterns that deviate from the actual root cause. This misalignment remains largely obscured because contemporary benchmarks predominantly prioritize coarse-grained classification metrics, lacking the granular ground truth required to evaluate the underlying reasoning process. To bridge this gap, we first construct a benchmark consisting of two datasets: (1) real-world vulnerabilities with expert-curated causal reasoning as ground truth, and (2) semantically equivalent code perturbations for assessing reasoning robustness. Our large-scale empirical study reveals that even state-of-the-art models struggle to maintain logical consistency during semantic code comprehension, exhibiting 12 systematic failure patterns. Addressing these limitations, we propose DAGVul, a novel framework that models vulnerability reasoning as a Directed Acyclic Graph (DAG) generation task. Unlike linear chain-of-thought (CoT), our approach explicitly maps causal dependencies to enforce structural consistency. By further introducing Reinforcement Learning with Verifiable Rewards (RLVR), we align model reasoning trace with program-intrinsic logic. Experimental results demonstrate that our framework improves the reasoning F1-score by an average of 18.9% over all the baselines. Remarkably, our 8B-parameter implementation not only outperforms existing models of comparable scale but also surpasses specialized large-scale reasoning models, including Qwen3-30B-Reasoning and GPT-OSS-20B-High. It is even competitive with state-of-the-art models like Claude-Sonnet-4.5 (75.47% vs. 76.11%), establishing new efficiency in vulnerability reasoning across model scales.
Authors:Xinyi Hou, Shenao Wang, Yifan Zhang, Ziluo Xue, Yanjie Zhao, Cai Fu, Haoyu Wang
Abstract:
Agentic AI systems built around large language models (LLMs) are moving away from closed, single-model frameworks and toward open ecosystems that connect a variety of agents, external tools, and resources. The Model Context Protocol (MCP) has emerged as a standard to unify tool access, allowing agents to discover, invoke, and coordinate with tools more flexibly. However, as MCP becomes more widely adopted, it also brings a new set of security and privacy challenges. These include risks such as unauthorized access, tool poisoning, prompt injection, privilege escalation, and supply chain attacks, any of which can impact different parts of the protocol workflow. While recent research has examined possible attack surfaces and suggested targeted countermeasures, there is still a lack of systematic, protocol-level security improvements for MCP. To address this, we introduce the Secure Model Context Protocol (SMCP), which builds on MCP by adding unified identity management, robust mutual authentication, ongoing security context propagation, fine-grained policy enforcement, and comprehensive audit logging. In this paper, we present the main components of SMCP, explain how it helps reduce security risks, and illustrate its application with practical examples. We hope that this work will contribute to the development of agentic systems that are not only powerful and adaptable, but also secure and dependable.
Authors:Yayi Wang, Shenao Wang, Jian Zhao, Shaosen Shi, Ting Li, Yan Cheng, Lizhong Bian, Kan Yu, Yanjie Zhao, Haoyu Wang
Abstract:
Modern enterprises increasingly adopt diverse technology stacks with various programming languages, posing significant challenges for static application security testing (SAST). Existing taint analysis tools are predominantly designed for single languages, requiring substantial engineering effort that scales with language diversity. While multi-language tools like CodeQL, Joern, and WALA attempt to address these challenges, they face limitations in intermediate representation design, analysis precision, and extensibility, which make them difficult to scale effectively for large-scale industrial applications at Ant Group. To bridge this gap, we present YASA (Yet Another Static Analyzer), a unified multi-language static taint analysis framework designed for industrial-scale deployment. Specifically, YASA introduces the Unified Abstract Syntax Tree (UAST) that provides a unified abstraction for compatibility across diverse programming languages. Building on the UAST, YASA performs point-to analysis and taint propagation, leveraging a unified semantic model to manage language-agnostic constructs, while incorporating language-specific semantic models to handle other unique language features. When compared to 6 single- and 2 multi-language static analyzers on an industry-standard benchmark, YASA consistently outperformed all baselines across Java, JavaScript, Python, and Go. In real-world deployment within Ant Group, YASA analyzed over 100 million lines of code across 7.3K internal applications. It identified 314 previously unknown taint paths, with 92 of them confirmed as 0-day vulnerabilities. All vulnerabilities were responsibly reported, with 76 already patched by internal development teams, demonstrating YASA's practical effectiveness for securing large-scale industrial software systems.
Authors:Junda Lin, Zhaomeng Zhou, Zhi Zheng, Shuochen Liu, Tong Xu, Yong Chen, Enhong Chen
Abstract:
LLM agents operating in open environments face escalating risks from indirect prompt injection, particularly within the tool stream where manipulated metadata and runtime feedback hijack execution flow. Existing defenses encounter a critical dilemma as advanced models prioritize injected rules due to strict alignment while static protection mechanisms sever the feedback loop required for adaptive reasoning. To reconcile this conflict, we propose \textbf{VIGIL}, a framework that shifts the paradigm from restrictive isolation to a verify-before-commit protocol. By facilitating speculative hypothesis generation and enforcing safety through intent-grounded verification, \textbf{VIGIL} preserves reasoning flexibility while ensuring robust control. We further introduce \textbf{SIREN}, a benchmark comprising 959 tool stream injection cases designed to simulate pervasive threats characterized by dynamic dependencies. Extensive experiments demonstrate that \textbf{VIGIL} outperforms state-of-the-art dynamic defenses by reducing the attack success rate by over 22\% while more than doubling the utility under attack compared to static baselines, thereby achieving an optimal balance between security and utility.
Authors:Yuhao Pan, Wenchao Xu, Fushuo Huo, Haozhao Wang, Xiucheng Wang, Nan Cheng
Abstract:
Tor is a low-latency anonymous communication network that protects user privacy by encrypting website traffic. However, recent website fingerprinting (WF) attacks have shown that encrypted traffic can still leak users' visited websites by exploiting statistical features such as packet size, direction, and inter-arrival time. Most existing WF attacks formulate the problem as a single-tab classification task, which significantly limits their effectiveness in realistic browsing scenarios where users access multiple websites concurrently, resulting in mixed traffic traces. To this end, we propose PrismWF, a multi-granularity patch-based Transformer for multi-tab WF attack. Specifically, we design a robust traffic feature representation for raw web traffic traces and extract multi-granularity features using convolutional kernels with different receptive fields. To effectively integrate information across temporal scales, the proposed model refines features through three hierarchical interaction mechanisms: inter-granularity detail supplementation from fine to coarse granularities, intra-granularity patch interaction with dedicated router tokens, and router-guided dual-level intra- and cross-granularity fusion. This design aligns with the cognitive logic of global coarse-grained reconnaissance and local fine-grained querying, enabling effective modeling of mixed traffic patterns in WF attack scenarios. Extensive experiments on various datasets and WF defenses demonstrate that our method achieves state-of-the-art performance compared to existing baselines.
Authors:Yule Wen, Yanzhe Zhang, Jianxun Lian, Xiaoyuan Yi, Xing Xie, Diyi Yang
Abstract:
LLM agents increasingly act on users' personal information, yet existing privacy defenses remain limited in both design and adaptability. Most prior approaches rely on static or passive defenses, such as prompting and guarding. These paradigms are insufficient for supporting contextual, proactive privacy decisions in multi-step agent execution. We propose Contextualized Defense Instructing (CDI), a new privacy defense paradigm in which an instructor model generates step-specific, context-aware privacy guidance during execution, proactively shaping actions rather than merely constraining or vetoing them. Crucially, CDI is paired with an experience-driven optimization framework that trains the instructor via reinforcement learning (RL), where we convert failure trajectories with privacy violations into learning environments. We formalize baseline defenses and CDI as distinct intervention points in a canonical agent loop, and compare their privacy-helpfulness trade-offs within a unified simulation framework. Results show that our CDI consistently achieves a better balance between privacy preservation (94.2%) and helpfulness (80.6%) than baselines, with superior robustness to adversarial conditions and generalization.
Authors:Ivana Clairine Irsan, Ratnadira Widyasari, Ting Zhang, Huihui Huang, Ferdian Thung, Yikun Li, Lwin Khin Shar, Eng Lieh Ouh, Hong Jin Kang, David Lo
Abstract:
Attacks can exploit zero-day or one-day vulnerabilities that are not publicly disclosed. To detect these vulnerabilities, security researchers monitor development activities in open-source repositories to identify unreported security patches. The sheer volume of commits makes this task infeasible to accomplish manually. Consequently, security patch detectors commonly trained and evaluated on security patches linked from vulnerability reports in the National Vulnerability Database (NVD). In this study, we assess the effectiveness of these detectors when applied in-the-wild. Our results show that models trained on NVD-derived data show substantially decreased performance, with decreases in F1-score of up to 90\% when tested on in-the-wild security patches, rendering them impractical for real-world use. An analysis comparing security patches identified in-the-wild and commits linked from NVD reveals that they can be easily distinguished from each other. Security patches associated with NVD have different distribution of commit messages, vulnerability types, and composition of changes. These differences suggest that NVD may be unsuitable as the \textit{sole} source of data for training models to detect security patches. We find that constructing a dataset that combines security patches from NVD data with a small subset of manually identified security patches can improve model robustness.
Authors:Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Adam Sealfon
Abstract:
The US Census Bureau Disclosure Avoidance System (DAS) balances confidentiality and utility requirements for the decennial US Census (Abowd et al., 2022). The DAS was used in the 2020 Census to produce demographic datasets critically used for legislative apportionment and redistricting, federal and state funding allocation, municipal and infrastructure planning, and scientific research. At the heart of DAS is TopDown, a heuristic post-processing method that combines billions of private noisy measurements across six geographic levels in order to produce new estimates that are consistent, more accurate, and satisfy certain structural constraints on the data. In this work, we introduce BlueDown, a new post-processing method that produces more accurate, consistent estimates while satisfying the same privacy guarantees and structural constraints. We obtain especially large accuracy improvements for aggregates at the county and tract levels on evaluation metrics proposed by the US Census Bureau. From a technical perspective, we develop a new algorithm for generalized least-squares regression that leverages the hierarchical structure of the measurements and that is statistically optimal among linear unbiased estimators. This reduces the computational dependence on the number of geographic regions measured from matrix multiplication time, which would be infeasible for census-scale data, to linear time. We incorporate the additional structural constraints by combining this regression algorithm with an optimization routine that extends TDA to support correlated measurements. We further improve the efficiency of our algorithm using succinct linear-algebraic operations that exploit symmetries in the structure of the measurements and constraints. We believe our hierarchical regression and succinct operations to be of independent interest.
Authors:Badih Ghazi, Cristóbal Guzmán, Pritish Kamath, Alexander Knop, Ravi Kumar, Pasin Manurangsi
Abstract:
We study the problem of differentially private (DP) computation of coreset for the $k$-means objective. For a given input set of points, a coreset is another set of points such that the $k$-means objective for any candidate solution is preserved up to a multiplicative $(1 \pm α)$ factor (and some additive factor). We prove the first computational lower bounds for this problem. Specifically, assuming the existence of one-way functions, we show that no polynomial-time $(ε, 1/n^{ω(1)})$-DP algorithm can compute a coreset for $k$-means in the $\ell_\infty$-metric for some constant $α> 0$ (and some constant additive factor), even for $k=3$. For $k$-means in the Euclidean metric, we show a similar result but only for $α= Θ\left(1/d^2\right)$, where $d$ is the dimension.
Authors:Yikun Li, Ting Zhang, Jieke Shi, Chengran Yang, Junda He, Xin Zhou, Jinfeng Jiang, Huihui Huang, Wen Bin Leow, Yide Yin, Eng Lieh Ouh, Lwin Khin Shar, David Lo
Abstract:
Recent progress in ML and LLMs has improved vulnerability detection, and recent datasets have reduced label noise and unrelated code changes. However, most existing approaches still operate at the function level, where models are asked to predict whether a single function is vulnerable without inter-procedural context. In practice, vulnerability presence and root cause often depend on contextual information. Naively appending such context is not a reliable solution: real-world context is long, redundant, and noisy, and we find that unstructured context frequently degrades the performance of strong fine-tuned code models. We present CPRVul, a context-aware vulnerability detection framework that couples Context Profiling and Selection with Structured Reasoning. CPRVul constructs a code property graph, and extracts candidate context. It then uses an LLM to generate security-focused profiles and assign relevance scores, selecting only high-impact contextual elements that fit within the model's context window. In the second phase, CPRVul integrates the target function, the selected context, and auxiliary vulnerability metadata to generate reasoning traces, which are used to fine-tune LLMs for reasoning-based vulnerability detection. We evaluate CPRVul on three high-quality vulnerability datasets: PrimeVul, TitanVul, and CleanVul. Across all datasets, CPRVul consistently outperforms function-only baselines, achieving accuracies ranging from 64.94% to 73.76%, compared to 56.65% to 63.68% for UniXcoder. Specifically, on the challenging PrimeVul benchmark, CPRVul achieves 67.78% accuracy, outperforming prior state-of-the-art approaches, improving accuracy from 55.17% to 67.78% (22.9% improvement). Our ablations further show that neither raw context nor processed context alone benefits strong code models; gains emerge only when processed context is paired with structured reasoning.
Authors:Lulu Xue, Shengshan Hu, Wei Lu, Ziqi Zhou, Yufei Song, Jianhong Cheng, Minghui Li, Yanjun Zhang, Leo Yu Zhang
Abstract:
Machine unlearning is an emerging technique that aims to remove the influence of specific data from trained models, thereby enhancing privacy protection. However, recent research has uncovered critical privacy vulnerabilities, showing that adversaries can exploit unlearning inversion to reconstruct data that was intended to be erased. Despite the severity of this threat, dedicated defenses remain lacking. To address this gap, we propose UnlearnShield, the first defense specifically tailored to counter unlearning inversion. UnlearnShield introduces directional perturbations in the cosine representation space and regulates them through a constraint module to jointly preserve model accuracy and forgetting efficacy, thereby reducing inversion risk while maintaining utility. Experiments demonstrate that it achieves a good trade-off among privacy protection, accuracy, and forgetting.
Authors:Xinyu Li, Jinyang Huang, Feng-Qi Cui, Meng Wang, Peng Zhao, Meng Li, Dan Guo, Meng Wang
Abstract:
Respiratory monitoring is an extremely important task in modern medical services. Due to its significant advantages, e.g., non-contact, radar-based respiratory monitoring has attracted widespread attention from both academia and industry. Unfortunately, though it can achieve high monitoring accuracy, consumer electronics-grade radar data inevitably contains User-sensitive Identity Information (USI), which may be maliciously used and further lead to privacy leakage. To track these challenges, by variational mode decomposition (VMD) and adversarial loss-based encryption, we propose a novel Trusted Respiratory Monitoring paradigm, Tru-RM, to perform automated respiratory monitoring through radio signals while effectively anonymizing USI. The key enablers of Tru-RM are Attribute Feature Decoupling (AFD), Flexible Perturbation Encryptor (FPE), and robust Perturbation Tolerable Network (PTN) used for attribute decomposition, identity encryption, and perturbed respiratory monitoring, respectively. Specifically, AFD is designed to decompose the raw radar signals into the universal respiratory component, the personal difference component, and other unrelated components. Then, by using large noise to drown out the other unrelated components, and the phase noise algorithm with a learning intensity parameter to eliminate USI in the personal difference component, FPE is designed to achieve complete user identity information encryption without affecting respiratory features. Finally, by designing the transferred generalized domain-independent network, PTN is employed to accurately detect respiration when waveforms change significantly. Extensive experiments based on various detection distances, respiratory patterns, and durations demonstrate the superior performance of Tru-RM on strong anonymity of USI, and high detection accuracy of perturbed respiratory waveforms.
Authors:Qi Luo, Minghui Xu, Dongxiao Yu, Xiuzhen Cheng
Abstract:
Modern graph learning systems often combine links with text, as in citation networks with abstracts or social graphs with user posts. In such systems, text is usually easier to edit than graph structure, which creates a practical security risk: an attacker may hide a small malicious cue in training text and later use it to trigger incorrect predictions. This paper studies that risk in a realistic setting where the attacker edits only node text and leaves the graph unchanged. We propose \textbf{TAGBD}, a graph-aware backdoor attack that first selects training nodes that are easier to manipulate, then generates stealthy poison text with a shadow graph model, and finally injects the text by replacing the original content or appending a short phrase. Experiments on three benchmark datasets show that TAGBD achieves very high attack success rates, transfers across different graph models, and remains effective under common defenses. These results show that inconspicuous poison text alone can serve as a reliable attack channel in text-attributed graphs, highlighting the need for defenses that inspect both node content and graph structure.
Authors:Qiao Zhang, Minghui Xu, Tingchuang Zhang, Xiuzhen Cheng
Abstract:
Privacy-Preserving Machine Learning as a Service (PP-MLaaS) enables secure neural network inference by integrating cryptographic primitives such as homomorphic encryption (HE) and multi-party computation (MPC), protecting both client data and server models. Recent mixed-primitive frameworks have significantly improved inference efficiency, yet they process batched inputs sequentially, offering little flexibility for prioritizing urgent requests. Naïve queue jumping introduces considerable computational and communication overhead, increasing non-negligible latency for in-queue inputs. We initiate the study of privacy-preserving queue jumping in batched inference and propose PrivQJ, a novel framework that enables efficient priority handling without degrading overall system performance. PrivQJ exploits shared computation across inputs via in-processing slot recycling, allowing prior inputs to be piggybacked onto ongoing batch computation with almost no additional cryptographic cost. Both theoretical analysis and experimental results demonstrate over an order-of-magnitude reduction in overhead compared to state-of-the-art PP-MLaaS systems.
Authors:Jiayue Pu, Zhongxiang Sun, Zilu Zhang, Xiao Zhang, Jun Xu
Abstract:
The rapid evolution of embodied agents has accelerated the deployment of household robots in real-world environments. However, unlike structured industrial settings, household spaces introduce unpredictable safety risks, where system limitations such as perception latency and lack of common sense knowledge can lead to dangerous errors. Current safety evaluations, often restricted to static images, text, or general hazards, fail to adequately benchmark dynamic unsafe action detection in these specific contexts. To bridge this gap, we introduce HomeSafe-Bench, a challenging benchmark designed to evaluate Vision-Language Models (VLMs) on unsafe action detection in household scenarios. HomeSafe-Bench is contrusted via a hybrid pipeline combining physical simulation with advanced video generation and features 438 diverse cases across six functional areas with fine-grained multidimensional annotations. Beyond benchmarking, we propose Hierarchical Dual-Brain Guard for Household Safety (HD-Guard), a hierarchical streaming architecture for real-time safety monitoring. HD-Guard coordinates a lightweight FastBrain for continuous high-frequency screening with an asynchronous large-scale SlowBrain for deep multimodal reasoning, effectively balancing inference efficiency with detection accuracy. Evaluations demonstrate that HD-Guard achieves a superior trade-off between latency and performance, while our analysis identifies critical bottlenecks in current VLM-based safety detection.
Authors:Yueyan Dong, Minghui Xu, Qin Hu, Yinhao Xiao, Qi Luo, Yechao Zhang, Yue Zhang, Xiuzhen Cheng
Abstract:
Low-Rank Adaptation (LoRA) has become a popular solution for fine-tuning large language models (LLMs) in federated settings, dramatically reducing update costs by introducing trainable low-rank matrices. However, when integrated with frameworks like FedIT, LoRA introduces a critical vulnerability: clients submit $A$ and $B$ matrices separately, while only their product $AB$ determines the model update, yet this composite is never directly verified. We propose Gradient Assembly Poisoning (GAP), a novel attack that exploits this blind spot by crafting individually benign $A$ and $B$ matrices whose product yields malicious updates. GAP operates without access to training data or inter-client coordination and remains undetected by standard anomaly detectors. We identify four systemic vulnerabilities in LoRA-based federated systems and validate GAP across LLaMA, ChatGLM, and GPT-2. GAP consistently induces degraded or biased outputs while preserving surface fluency, reducing BLEU by up to 14.5\%, increasing factual and grammatical errors by over 800\%, and maintaining 92.6\% long-form response length. These results reveal a new class of stealthy, persistent threats in distributed LoRA fine-tuning.
Authors:Larissa Schmid, Diogo Gaspar, Raphina Liu, Sofia Bobadilla, Benoit Baudry, Martin Monperrus
Abstract:
Modern software systems heavily rely on third-party dependencies, making software supply chain security a critical concern. We introduce the concept of software supply chain smells as structural indicators that signal potential security risks. We design and evaluate Dirty-Waters, a novel tool for detecting such smells in the supply chains of software packages. Through interviews with practitioners, we show that our proposed smells align with real-world concerns and capture signals considered valuable. A quantitative study of popular packages in the Maven and NPM ecosystems reveals that while smells are prevalent in both, they differ significantly across ecosystems, with traceability and signing issues dominating in Maven and most smells being rare in NPM, due to strong registry-level guarantees. Software supply chain smells support developers and organizations in making informed decisions and improving their software supply chain security posture.
Authors:Guangnian Wan, Qi Li, Gongfan Fang, Xinyin Ma, Xinchao Wang
Abstract:
Multimodal Diffusion Language Models (MDLMs) have recently emerged as a competitive alternative to their autoregressive counterparts. Yet their vulnerability to backdoor attacks remains largely unexplored. In this work, we show that well-established data-poisoning pipelines can successfully implant backdoors into MDLMs, enabling attackers to manipulate model behavior via specific triggers while maintaining normal performance on clean inputs. However, defense strategies effective to these models are yet to emerge. To bridge this gap, we introduce a backdoor defense framework for MDLMs named DiSP (Diffusion Self-Purification). DiSP is driven by a key observation: selectively masking certain vision tokens at inference time can neutralize a backdoored model's trigger-induced behaviors and restore normal functionality. Building on this, we purify the poisoned dataset using the compromised model itself, then fine-tune the model on the purified data to recover it to a clean one. Given such a specific design, DiSP can remove backdoors without requiring any auxiliary models or clean reference data. Extensive experiments demonstrate that our approach effectively mitigates backdoor effects, reducing the attack success rate (ASR) from over 90% to typically under 5%, while maintaining model performance on benign tasks.
Authors:Xin Wang, Yunhao Chen, Juncheng Li, Yixu Wang, Yang Yao, Tianle Gu, Jie Li, Yan Teng, Yingchun Wang, Xia Hu
Abstract:
The rapid integration of Multimodal Large Language Models (MLLMs) into critical applications is increasingly hindered by persistent safety vulnerabilities. However, existing red-teaming benchmarks are often fragmented, limited to single-turn text interactions, and lack the scalability required for systematic evaluation. To address this, we introduce OpenRT, a unified, modular, and high-throughput red-teaming framework designed for comprehensive MLLM safety evaluation. At its core, OpenRT architects a paradigm shift in automated red-teaming by introducing an adversarial kernel that enables modular separation across five critical dimensions: model integration, dataset management, attack strategies, judging methods, and evaluation metrics. By standardizing attack interfaces, it decouples adversarial logic from a high-throughput asynchronous runtime, enabling systematic scaling across diverse models. Our framework integrates 37 diverse attack methodologies, spanning white-box gradients, multi-modal perturbations, and sophisticated multi-agent evolutionary strategies. Through an extensive empirical study on 20 advanced models (including GPT-5.2, Claude 4.5, and Gemini 3 Pro), we expose critical safety gaps: even frontier models fail to generalize across attack paradigms, with leading models exhibiting average Attack Success Rates as high as 49.14%. Notably, our findings reveal that reasoning models do not inherently possess superior robustness against complex, multi-turn jailbreaks. By open-sourcing OpenRT, we provide a sustainable, extensible, and continuously maintained infrastructure that accelerates the development and standardization of AI safety.
Authors:Yuxiang Yang, Ao Wang, Xuewei Feng, Qi Li, Ke Xu
Abstract:
Virtual Private Networks (VPNs) are widely used for censorship evasion and traffic protection. VPN users expect to be provided with adequate security protection, and at the same time not be affected by other users connected to the same VPN server, which can be illustrated as the non-interference property. However, in this paper, we have identified several vulnerabilities that violate this property, specifically within the connection tracking frameworks of VPN servers, stemming from shared resource misuse and insufficient validation of session state transitions. We present three session manipulation attacks targeting TCP and UDP traffic tunneled through VPNs. The attacker who only connects to the same VPN server can launch denial-of-service attacks, hijack TCP connections of other clients, or inject forged DNS responses into their queries. We evaluate these attacks against five popular connection tracking frameworks across different OSes and nine major commercial VPN providers. Experimental results reveal that all frameworks and eight providers are vulnerable to at least one of the attacks. We have responsibly disclosed our findings with countermeasures, resulting in 19 assigned CVEs/CNVDs and acknowledgments from the communities and providers.
Authors:Huihui Huang, Jieke Shi, Bo Wang, Zhou Yang, David Lo
Abstract:
Memory leaks remain prevalent in real-world C/C++ software. Static analyzers such as CodeQL provide scalable program analysis but frequently miss such bugs because they cannot recognize project-specific custom memory-management functions and lack path-sensitive control-flow modeling. We present MemHint, a neuro-symbolic pipeline that addresses both limitations by combining LLMs' semantic understanding of code with Z3-based symbolic reasoning. MemHint parses the target codebase and applies an LLM to classify each function as a memory allocator, deallocator, or neither, producing function summaries that record which argument or return value carries memory ownership, extending the analyzer's built-in knowledge beyond standard primitives such as malloc and free. A Z3-based validation step checks each summary against the function's control-flow graph, discarding those whose claimed memory operation is unreachable on any feasible path. The validated summaries are injected into CodeQL and Infer via their respective extension mechanisms. Z3 path feasibility filtering then eliminates warnings on infeasible paths, and a final LLM-based validation step confirms whether each remaining warning is a genuine bug. On seven real-world C/C++ projects totaling over 3.4M lines of code, MemHint detects 52 unique memory leaks (49 confirmed/fixed, 4 CVEs submitted) at approximately $1.7 per detected bug, compared to 19 by vanilla CodeQL and 3 by vanilla Infer.
Authors:Yue Xiao, Ling Jiang, Sen Nie, Ding Li, Shi Wu, Ke Xu, Qi Li
Abstract:
Provenance-based Intrusion Detection Systems (PIDSes) have been widely used to detect Advanced Persistent Threats (APTs). Although many studies achieve high performance in the evaluations of their original papers, their performance in industrial scenarios remains unclear. To fill this gap, we conduct the first systematic evaluation and analysis of PIDSes in industrial scenarios. We first analyze the differences between the data from DARPA datasets and that collected in industrial scenarios, identifying three main new characteristics in industry: heterogeneous multi-source inputs, more powerful attackers, and increasing benign activity complexity. We then build several datasets to evaluate five state-of-the-art PIDSes. The evaluation results reveal challenges for existing PIDSes, including poor portability across different hosts and platforms, low detection performance against real-world attacks, and high false positive rates with ever-changing benign activities. Based on the evaluation results and our industrial practices, we provide several insights to solve or explain the above problems. For example, we propose a method to mitigate the high false positives, which reduces manual effort by 2/3. Finally, we propose several research suggestions to improve PIDSes.
Authors:Xinhao Deng, Yixiang Zhang, Jiaqing Wu, Jiaqi Bai, Sibo Yi, Zhuoheng Zou, Yue Xiao, Rennai Qiu, Jianan Ma, Jialuo Chen, Xiaohu Du, Xiaofang Yang, Shiwen Cui, Changhua Meng, Weiqiang Wang, Jiaxing Song, Ke Xu, Qi Li
Abstract:
Autonomous Large Language Model (LLM) agents, exemplified by OpenClaw, demonstrate remarkable capabilities in executing complex, long-horizon tasks. However, their tightly coupled instant-messaging interaction paradigm and high-privilege execution capabilities substantially expand the system attack surface. In this paper, we present a comprehensive security threat analysis of OpenClaw. To structure our analysis, we introduce a five-layer lifecycle-oriented security framework that captures key stages of agent operation, i.e., initialization, input, inference, decision, and execution, and systematically examine compound threats across the agent's operational lifecycle, including indirect prompt injection, skill supply chain contamination, memory poisoning, and intent drift. Through detailed case studies on OpenClaw, we demonstrate the prevalence and severity of these threats and analyze the limitations of existing defenses. Our findings reveal critical weaknesses in current point-based defense mechanisms when addressing cross-temporal and multi-stage systemic risks, highlighting the need for holistic security architectures for autonomous LLM agents. Within this framework, we further examine representative defense strategies at each lifecycle stage, including plugin vetting frameworks, context-aware instruction filtering, memory integrity validation protocols, intent verification mechanisms, and capability enforcement architectures.
Authors:Zhenhua Zou, Sheng Guo, Qiuyang Zhan, Lepeng Zhao, Shuo Li, Qi Li, Ke Xu, Mingwei Xu, Zhuotao Liu
Abstract:
The evolution of Large Language Models (LLMs) has shifted mobile computing from App-centric interactions to system-level autonomous agents. Current implementations predominantly rely on a "Screen-as-Interface" paradigm, which inherits structural vulnerabilities and conflicts with the mobile ecosystem's economic foundations. In this paper, we conduct a systematic security analysis of state-of-the-art mobile agents using Doubao Mobile Assistant as a representative case. We decompose the threat landscape into four dimensions - Agent Identity, External Interface, Internal Reasoning, and Action Execution - revealing critical flaws such as fake App identity, visual spoofing, indirect prompt injection, and unauthorized privilege escalation stemming from a reliance on unstructured visual data. To address these challenges, we propose Aura, an Agent Universal Runtime Architecture for a clean-slate secure agent OS. Aura replaces brittle GUI scraping with a structured, agent-native interaction model. It adopts a Hub-and-Spoke topology where a privileged System Agent orchestrates intent, sandboxed App Agents execute domain-specific tasks, and the Agent Kernel mediates all communication. The Agent Kernel enforces four defense pillars: (i) cryptographic identity binding via a Global Agent Registry; (ii) semantic input sanitization through a multilayer Semantic Firewall; (iii) cognitive integrity via taint-aware memory and plan-trajectory alignment; and (iv) granular access control with non-deniable auditing. Evaluation on MobileSafetyBench shows that, compared to Doubao, Aura improves low-risk Task Success Rate from roughly 75% to 94.3%, reduces high-risk Attack Success Rate from roughly 40% to 4.4%, and achieves near-order-of-magnitude latency gains. These results demonstrate Aura as a viable, secure alternative to the "Screen-as-Interface" paradigm.
Authors:Ruozhao Yang, Mingfei Cheng, Gelei Deng, Junjie Wang, Tianwei Zhang, Xiaofei Xie
Abstract:
Large-scale web applications are widely deployed with complex third-party components, inheriting security risks arising from component vulnerabilities. Security assessment is therefore required to determine whether such known vulnerabilities remain practically exploitable in real applications. Penetration testing is a widely adopted approach that validates exploitability by launching concrete attacks against known vulnerabilities in real-world black-box systems. However, existing approaches often fail to automatically generate reliable exploits, limiting their effectiveness in practical security assessment. This limitation mainly stems from two issues: (1) precisely triggering vulnerabilities with correct technical details, and (2) adapting exploits to diverse real-world deployment settings. In this paper, we propose AutoEG, a fully automated multi-agent framework for exploit generation targeting black-box web applications. AutoEG has two phases: First, AutoEG extracts precise vulnerability trigger logic from unstructured vulnerability information and encapsulates it into reusable trigger functions. Second, AutoEG uses trigger functions for concrete attack objectives and iteratively refines exploits through feedback-driven interaction with the target application. We evaluate AutoEG on 104 real-world vulnerabilities with 29 attack objectives, resulting in 660 exploitation tasks and 55,440 exploit attempts. AutoEG achieves an average success rate of 82.41%, substantially outperforming state-of-the-art baselines, whose best performance reaches only 32.88%.
Authors:Luca Minnei, Cristian Manca, Giorgio Piras, Angelo Sotgiu, Maura Pintor, Daniele Ghiani, Davide Maiorca, Giorgio Giacinto, Battista Biggio
Abstract:
Machine Learning (ML)-based detectors are becoming essential to counter the proliferation of malware. However, common ML algorithms are not designed to cope with the dynamic nature of real-world settings, where both legitimate and malicious software evolve. This distribution drift causes models trained under static assumptions to degrade over time unless they are continuously updated. Regularly retraining these models, however, is expensive, since labeling new acquired data requires costly manual analysis by security experts. To reduce labeling costs and address distribution drift in malware detection, prior work explored active learning (AL) and semi-supervised learning (SSL) techniques. Yet, existing studies (i) are tightly coupled to specific detector architectures and restricted to a specific malware domain, resulting in non-uniform comparisons; and (ii) lack a consistent methodology for analyzing the distribution drift, despite the critical sensitivity of the malware domain to temporal changes. In this work, we bridge this gap by proposing a model-agnostic framework that evaluates an extensive set of AL and SSL techniques, isolated and combined, for Android and Windows malware detection. We show that these techniques, when combined, can reduce manual annotation costs by up to 90% across both domains while achieving comparable detection performance to full-labeling retraining. We also introduce a methodology for feature-level drift analysis that measures feature stability over time, showing its correlation with the detector performance. Overall, our study provides a detailed understanding of how AL and SSL behave under distribution drift and how they can be successfully combined, offering practical insights for the design of effective detectors over time.
Authors:Xunguang Wang, Yuguang Zhou, Qingyue Wang, Zongjie Li, Ruixuan Huang, Zhenlan Ji, Pingchuan Ma, Shuai Wang
Abstract:
Large language models (LLMs) increasingly rely on explicit chain-of-thought (CoT) reasoning to solve complex tasks, yet the safety of the reasoning process itself remains largely unaddressed. Existing work on LLM safety focuses on content safety--detecting harmful, biased, or factually incorrect outputs -- and treats the reasoning chain as an opaque intermediate artifact. We identify reasoning safety as an orthogonal and equally critical security dimension: the requirement that a model's reasoning trajectory be logically consistent, computationally efficient, and resistant to adversarial manipulation. We make three contributions. First, we formally define reasoning safety and introduce a nine-category taxonomy of unsafe reasoning behaviors, covering input parsing errors, reasoning execution errors, and process management errors. Second, we conduct a large-scale prevalence study annotating 4111 reasoning chains from both natural reasoning benchmarks and four adversarial attack methods (reasoning hijacking and denial-of-service), confirming that all nine error types occur in practice and that each attack induces a mechanistically interpretable signature. Third, we propose a Reasoning Safety Monitor: an external LLM-based component that runs in parallel with the target model, inspects each reasoning step in real time via a taxonomy-embedded prompt, and dispatches an interrupt signal upon detecting unsafe behavior. Evaluation on a 450-chain static benchmark shows that our monitor achieves up to 84.88\% step-level localization accuracy and 85.37\% error-type classification accuracy, outperforming hallucination detectors and process reward model baselines by substantial margins. These results demonstrate that reasoning-level monitoring is both necessary and practically achievable, and establish reasoning safety as a foundational concern for the secure deployment of large reasoning models.
Authors:Elena Rodríguez-Lois, Fabio Brau, Maura Pintor, Battista Biggio, Fernando Pérez-González
Abstract:
Federated Learning has been popularized in recent years for applications involving personal or sensitive data, as it allows the collaborative training of machine learning models through local updates at the data-owners' premises, which does not require the sharing of the data itself. Considering the risk of leakage or misuse by any of the data-owners, many works attempt to protect their copyright, or even trace the origin of a potential leak through unique watermarks identifying each participant's model copy. Realistic accusation scenarios impose a black-box setting, where watermarks are typically embedded as a set of sample-label pairs. The threat of collusion, however, where multiple bad actors conspire together to produce an untraceable model, has been rarely addressed, and previous works have been limited to shallow networks and near-linearly separable main tasks. To the best of our knowledge, this work is the first to present a general collusion-resistant embedding method for black-box traitor tracing in Federated Learning: BlackCATT, which introduces a novel collusion-aware embedding loss term and, instead of using a fixed trigger set, iteratively optimizes the triggers to aid convergence and traitor tracing performance. Experimental results confirm the efficacy of the proposed scheme across different architectures and datasets. Furthermore, for models that would otherwise suffer from update incompatibility on the main task after learning different watermarks (e.g., architectures including batch normalization layers), our proposed BlackCATT+FR incorporates functional regularization through a set of auxiliary examples at the aggregator, promoting a shared feature space among model copies without compromising traitor tracing performance.
Authors:Liwen Wang, Zongjie Li, Yuchong Xie, Shuai Wang, Dongdong She, Wei Wang, Juergen Rahmel
Abstract:
The evolution of Large Language Models (LLMs) into agentic systems that perform autonomous reasoning and tool use has created significant intellectual property (IP) value. We demonstrate that these systems are highly vulnerable to imitation attacks, where adversaries steal proprietary capabilities by training imitation models on victim outputs. Crucially, existing LLM watermarking techniques fail in this domain because real-world agentic systems often operate as grey boxes, concealing the internal reasoning traces required for verification. This paper presents AGENTWM, the first watermarking framework designed specifically for agentic models. AGENTWM exploits the semantic equivalence of action sequences, injecting watermarks by subtly biasing the distribution of functionally identical tool execution paths. This mechanism allows AGENTWM to embed verifiable signals directly into the visible action trajectory while remaining indistinguishable to users. We develop an automated pipeline to generate robust watermark schemes and a rigorous statistical hypothesis testing procedure for verification. Extensive evaluations across three complex domains demonstrate that AGENTWM achieves high detection accuracy with negligible impact on agent performance. Our results confirm that AGENTWM effectively protects agentic IP against adaptive adversaries, who cannot remove the watermarks without severely degrading the stolen model's utility.
Authors:Zimo Ji, Daoyuan Wu, Wenyuan Jiang, Pingchuan Ma, Zongjie Li, Yudong Gao, Shuai Wang, Yingjiu Li
Abstract:
Large Language Model (LLM)-based agent systems are increasingly deployed for complex real-world tasks but remain vulnerable to natural language-based attacks that exploit over-privileged tool use. This paper aims to understand and mitigate such attacks through the lens of privilege escalation, defined as agent actions exceeding the least privilege required for a user's intended task. Based on a formal model of LLM agent systems, we identify novel privilege escalation scenarios, particularly in multi-agent systems, including a variant akin to the classic confused deputy problem. To defend against both known and newly demonstrated privilege escalation, we propose SEAgent, a mandatory access control (MAC) framework built upon attribute-based access control (ABAC). SEAgent monitors agent-tool interactions via an information flow graph and enforces customizable security policies based on entity attributes. Our evaluations show that SEAgent effectively blocks various privilege escalation while maintaining a low false positive rate and negligible system overhead. This demonstrates its robustness and adaptability in securing LLM-based agent systems.
Authors:Peiran Li, Jiashuo Sun, Fangzhou Lin, Shuo Xing, Tianfu Fu, Suofei Feng, Chaoqun Ni, Zhengzhong Tu
Abstract:
Autonomous LLM agents fail because long-horizon policy remains implicit in model weights and transcripts, while safety is retrofitted post hoc. We propose Traversal-as-Policy: distill sandboxed OpenHands execution logs into a single executable Gated Behavior Tree (GBT) and treat tree traversal -- rather than unconstrained generation -- as the control policy whenever a task is in coverage. Each node encodes a state-conditioned action macro mined and merge-checked from successful trajectories; macros implicated by unsafe traces attach deterministic pre-execution gates over structured tool context and bounded history, updated under experience-grounded monotonicity so previously rejected unsafe contexts cannot be re-admitted. At runtime, a lightweight traverser matches the base model's intent to child macros, executes one macro at a time under global and node-local gating, and when stalled performs risk-aware shortest-path recovery to a feasible success leaf; the visited path forms a compact spine memory that replaces transcript replay. Evaluated in a unified OpenHands sandbox on 15+ software, web, reasoning, and safety/security benchmarks, GBT improves success while driving violations toward zero and reducing cost. On SWE-bench Verified (Protocol A, 500 issues), GBT-SE raises success from 34.6% to 73.6%, reduces violations from 2.8% to 0.2%, and cuts token/character usage from 208k/820k to 126k/490k; with the same distilled tree, 8B executors more than double success on SWE-bench Verified (14.0%58.8%) and WebArena (9.1%37.3%).
Authors:Yanbo Wang, Minzheng Wang, Jian Liang, Lu Wang, Yongcan Yu, Ran He
Abstract:
While reasoning models have achieved remarkable success in complex reasoning tasks, their increasing power necessitates stringent safety measures. For safety alignment, the core challenge lies in the inherent trade-off between safety and utility. However, prevailing alignment strategies typically construct CoT training data with explicit safety rules via context distillation. This approach inadvertently limits reasoning capabilities by creating a rigid association between rule memorization and refusal. To mitigate the safety-utility trade-off, we propose the Adaptive Safe Context Learning (ASCL) framework to improve the reasoning given proper context. ASCL formulates safety alignment as a multi-turn tool-use process, empowering the model to autonomously decide when to consult safety rules and how to generate the ongoing reasoning. Furthermore, to counteract the preference for rule consultation during RL, we introduce Inverse Frequency Policy Optimization (IFPO) to rebalance advantage estimates. By decoupling rule retrieval and subsequent reasoning, our method achieves higher overall performance compared to baselines.
Authors:Thibaud Gloaguen, Robin Staab, Nikola Jovanović, Martin Vechev
Abstract:
LLM watermarks allow tracing AI-generated texts by inserting a detectable signal into their generated content. Recent works have proposed a wide range of watermarking algorithms, each with distinct designs, usually built using a bottom-up approach. Crucially, there is no general and principled formulation for LLM watermarking. In this work, we show that most existing and widely used watermarking schemes can in fact be derived from a principled constrained optimization problem. Our formulation unifies existing watermarking methods and explicitly reveals the constraints that each method optimizes. In particular, it highlights an understudied quality-diversity-power trade-off. At the same time, our framework also provides a principled approach for designing novel watermarking schemes tailored to specific requirements. For instance, it allows us to directly use perplexity as a proxy for quality, and derive new schemes that are optimal with respect to this constraint. Our experimental evaluation validates our framework: watermarking schemes derived from a given constraint consistently maximize detection power with respect to that constraint.
Authors:Ruixin Yang, Ethan Mendes, Arthur Wang, James Hays, Sauvik Das, Wei Xu, Alan Ritter
Abstract:
Vision-language models (VLMs) have demonstrated strong performance in image geolocation, a capability further sharpened by frontier multimodal large reasoning models (MLRMs). This poses a significant privacy risk, as these widely accessible models can be exploited to infer sensitive locations from casually shared photos, often at street-level precision, potentially surpassing the level of detail the sharer consented or intended to disclose. While recent work has proposed applying a blanket restriction on geolocation disclosure to combat this risk, these measures fail to distinguish valid geolocation uses from malicious behavior. Instead, VLMs should maintain contextual integrity by reasoning about elements within an image to determine the appropriate level of information disclosure, balancing privacy and utility. To evaluate how well models respect contextual integrity, we introduce VLM-GEOPRIVACY, a benchmark that challenges VLMs to interpret latent social norms and contextual cues in real-world images and determine the appropriate level of location disclosure. Our evaluation of 14 leading VLMs shows that, despite their ability to precisely geolocate images, the models are poorly aligned with human privacy expectations. They often over-disclose in sensitive contexts and are vulnerable to prompt-based attacks. Our results call for new design principles in multimodal systems to incorporate context-conditioned privacy reasoning.
Authors:Yuhang Wang, Haichang Gao, Zhenxing Niu, Zhaoxiang Liu, Wenjing Zhang, Xiang Wang, Shiguo Lian
Abstract:
Tool-augmented AI agents substantially extend the practical capabilities of large language models, but they also introduce security risks that cannot be identified through model-only evaluation. In this paper, we present a systematic security assessment of six representative OpenClaw-series agent frameworks, namely OpenClaw, AutoClaw, QClaw, KimiClaw, MaxClaw, and ArkClaw, under multiple backbone models. To support this study, we construct a benchmark of 205 test cases covering representative attack behaviors across the full agent execution lifecycle, enabling unified evaluation of risk exposure at both the framework and model levels. Our results show that all evaluated agents exhibit substantial security vulnerabilities, and that agentized systems are significantly riskier than their underlying models used in isolation. In particular, reconnaissance and discovery behaviors emerge as the most common weaknesses, while different frameworks expose distinct high-risk profiles, including credential leakage, lateral movement, privilege escalation, and resource development. These findings indicate that the security of modern agent systems is shaped not only by the safety properties of the backbone model, but also by the coupling among model capability, tool use, multi-step planning, and runtime orchestration. We further show that once an agent is granted execution capability and persistent runtime context, weaknesses arising in early stages can be amplified into concrete system-level failures. Overall, our study highlights the need to move beyond prompt-level safeguards toward lifecycle-wide security governance for intelligent agent frameworks.
Authors:Yuanhe Zhang, Xinyue Wang, Zhican Chen, Weiliu Wang, Zilu Zhang, Zhengshuo Gong, Zhenhong Zhou, Kun Wang, Li Sun, Yang Liu, Sen Su
Abstract:
Given limited and costly computational infrastructure, resource efficiency is a key requirement for large language models (LLMs). Efficient LLMs increase service capacity for providers and reduce latency and API costs for users. Recent resource consumption threats induce excessive generation, degrading model efficiency and harming both service availability and economic sustainability. This survey presents a systematic review of threats to resource consumption in LLMs. We further establish a unified view of this emerging area by clarifying its scope and examining the problem along the full pipeline from threat induction to mechanism understanding and mitigation. Our goal is to clarify the problem landscape for this emerging area, thereby providing a clearer foundation for characterization and mitigation.
Authors:Juhee Kim, Xiaoyuan Liu, Zhun Wang, Shi Qiu, Bo Li, Wenbo Guo, Dawn Song
Abstract:
AI agents that combine large language models with non-AI system components are rapidly emerging in real-world applications, offering unprecedented automation and flexibility. However, this unprecedented flexibility introduces complex security challenges fundamentally different from those in traditional software systems. This paper presents the first systematic and comprehensive survey of AI agent security, including an analysis of the design space, attack landscape, and defense mechanisms for secure AI agent systems. We further conduct case studies to point out existing gaps in securing agentic AI systems and identify open challenges in this emerging domain. Our work also introduces the first systematic framework for understanding the security risks and defense strategies of AI agents, serving as a foundation for building both secure agentic systems and advancing research in this critical area.
Authors:Xing Li, Hui-Ling Zhen, Lihao Yin, Xianzhi Yu, Zhenhua Dong, Mingxuan Yuan
Abstract:
This paper presents a comprehensive empirical study on the safety alignment capabilities. We evaluate what matters for safety alignment in LLMs and LRMs to provide essential insights for developing more secure and reliable AI systems. We systematically investigate and compare the influence of six critical intrinsic model characteristics and three external attack techniques. Our large-scale evaluation is conducted using 32 recent, popular LLMs and LRMs across thirteen distinct model families, spanning a parameter scale from 3B to 235B. The assessment leverages five established safety datasets and probes model vulnerabilities with 56 jailbreak techniques and four CoT attack strategies, resulting in 4.6M API calls. Our key empirical findings are fourfold. First, we identify the LRMs GPT-OSS-20B, Qwen3-Next-80B-A3B-Thinking, and GPT-OSS-120B as the top-three safest models, which substantiates the significant advantage of integrated reasoning and self-reflection mechanisms for robust safety alignment. Second, post-training and knowledge distillation may lead to a systematic degradation of safety alignment. We thus argue that safety must be treated as an explicit constraint or a core optimization objective during these stages, not merely subordinated to the pursuit of general capability. Third, we reveal a pronounced vulnerability: employing a CoT attack via a response prefix can elevate the attack success rate by 3.34x on average and from 0.6% to 96.3% for Seed-OSS-36B-Instruct. This critical finding underscores the safety risks inherent in text-completion interfaces and features that allow user-defined response prefixes in LLM services, highlighting an urgent need for architectural and deployment safeguards. Fourth, roleplay, prompt injection, and gradient-based search for adversarial prompts are the predominant methodologies for eliciting unaligned behaviors in modern models.
Authors:Yu He, Haozhe Zhu, Yiming Li, Shuo Shao, Hongwei Yao, Zhihao Liu, Zhan Qin
Abstract:
LLM agents are highly vulnerable to Indirect Prompt Injection (IPI), where adversaries embed malicious directives in untrusted tool outputs to hijack execution. Most existing defenses treat IPI as an input-level semantic discrimination problem, which often fails to generalize to unseen payloads. We propose a new paradigm, action-level causal attribution, which secures agents by asking why a particular tool call is produced. The central goal is to distinguish tool calls supported by the user's intent from those causally driven by untrusted observations. We instantiate this paradigm with AttriGuard, a runtime defense based on parallel counterfactual tests. For each proposed tool call, AttriGuard verifies its necessity by re-executing the agent under a control-attenuated view of external observations. Technically, AttriGuard combines teacher-forced shadow replay to prevent attribution confounding, hierarchical control attenuation to suppress diverse control channels while preserving task-relevant information, and a fuzzy survival criterion that is robust to LLM stochasticity. Across four LLMs and two agent benchmarks, AttriGuard achieves 0% ASR under static attacks with negligible utility loss and moderate overhead. Importantly, it remains resilient under adaptive optimization-based attacks in settings where leading defenses degrade significantly.
Authors:Pengfei He, Ash Fox, Lesly Miculicich, Stefan Friedli, Daniel Fabian, Burak Gokturk, Jiliang Tang, Chen-Yu Lee, Tomas Pfister, Long T. Le
Abstract:
Large language models (LLMs) have shown promise in assisting cybersecurity tasks, yet existing approaches struggle with automatic vulnerability discovery and exploitation due to limited interaction, weak execution grounding, and a lack of experience reuse. We propose Co-RedTeam, a security-aware multi-agent framework designed to mirror real-world red-teaming workflows by integrating security-domain knowledge, code-aware analysis, execution-grounded iterative reasoning, and long-term memory. Co-RedTeam decomposes vulnerability analysis into coordinated discovery and exploitation stages, enabling agents to plan, execute, validate, and refine actions based on real execution feedback while learning from prior trajectories. Extensive evaluations on challenging security benchmarks demonstrate that Co-RedTeam consistently outperforms strong baselines across diverse backbone models, achieving over 60% success rate in vulnerability exploitation and over 10% absolute improvement in vulnerability detection. Ablation and iteration studies further confirm the critical role of execution feedback, structured interaction, and memory for building robust and generalizable cybersecurity agents.
Authors:Zekai Liu, Xiaoqi Li, Wenkai Li, Zongwei Li
Abstract:
Traditional consensus mechanisms, such as Proof of Stake (PoS), increasingly reveal an excessive dependency on large liquidity providers. Although the Proof of Liquidity (PoL) mechanism serves as a critical paradigm for incentivizing sustained liquidity provision and ensuring market stability, its transition from asset staking to active liquidity management significantly increases the complexity of underlying smart contract economic models and interaction logic. This renders hidden liquidity logic flaws difficult to detect via traditional methods, seriously threatening the system stability and user asset security of mainstream DeFi and emerging PoL ecosystems. To address this, we propose the LiquiLM framework, which integrates Large Language Models (LLMs) with a Dynamic Co-Attention Network (DCN). By establishing a dynamic interaction between liquidity-critical contracts and flaw descriptions, the framework effectively bridges the semantic gap between underlying code implementations and high-level liquidity intents. We evaluate the performance of LiquiLM on 1,490 validation contracts (covering precision, recall, specificity, and F1-score). The results show that it achieves significant effectiveness in auditing and explaining liquidity flaws: in experiments using Gemini 3 Pro and GPT-4o as backbone models, respectively, the F1-scores both exceed 90%. Furthermore, through an in-depth audit of 1,380 real-world PoL and Ethereum economic contracts, LiquiLM successfully identifies 238 high-risk contracts and assists in discovering 10 vulnerabilities that have received CVE certification.
Authors:Yishun Wang, Wenkai Li, Xiaoqi Li, Zongwei Li, Lei Xie, Yuqing Zhang
Abstract:
Smart contracts are self-executing programs that manage financial transactions on blockchain networks. Developers commonly rely on third-party code libraries to improve both efficiency and security. However, improper use of these libraries can introduce hidden vulnerabilities that are difficult to detect, leading to significant financial losses. Existing automated tools struggle to identify such misuse because it often requires understanding the developer's intent rather than simply scanning for known code patterns. This paper presents LibScan, an automated detection framework that combines large language model (LLM)-based semantic reasoning with rule-based code analysis, identifying eight distinct categories of library misuse in smart contracts. To improve detection reliability, the framework incorporates an iterative self-correction mechanism that refines its analysis across multiple rounds, alongside a structured knowledge base derived from large-scale empirical studies of real-world misuse cases. Experiments conducted on 662 real-world smart contracts demonstrate that LibScan achieves an overall detection accuracy of 85.15\%, outperforming existing tools by a margin of over 16 percentage points. Ablation experiments further confirm that combining both analysis approaches yields substantially better results than either method used independently.
Authors:Vincent Siu, Jingxuan He, Kyle Montgomery, Zhun Wang, Neil Gong, Chenguang Wang, Dawn Song
Abstract:
Security in LLM agents is inherently contextual. For example, the same action taken by an agent may represent legitimate behavior or a security violation depending on whose instruction led to the action, what objective is being pursued, and whether the action serves that objective. However, existing definitions of security attacks against LLM agents often fail to capture this contextual nature. As a result, defenses face a fundamental utility-security tradeoff: applying defenses uniformly across all contexts can lead to significant utility loss, while applying defenses in insufficient or inappropriate contexts can result in security vulnerabilities. In this work, we present a framework that systematizes existing attacks and defenses from the perspective of contextual security. To this end, we propose four security properties that capture contextual security for LLM agents: task alignment (pursuing authorized objectives), action alignment (individual actions serving those objectives), source authorization (executing commands from authenticated sources), and data isolation (ensuring information flows respect privilege boundaries). We further introduce a set of oracle functions that enable verification of whether these security properties are violated as an agent executes a user task. Using this framework, we reformalize existing attacks, such as indirect prompt injection, direct prompt injection, jailbreak, task drift, and memory poisoning, as violations of one or more security properties, thereby providing precise and contextual definitions of these attacks. Similarly, we reformalize defenses as mechanisms that strengthen oracle functions or perform security property checks. Finally, we discuss several important future research directions enabled by our framework.
Authors:Zongwei Li, Wenkai Li, Xiaoqi Li
Abstract:
OpenClaw-like agents offer substantial productivity benefits, yet they are insecure by default because they combine untrusted inputs, autonomous action, extensibility, and privileged system access within a single execution loop. We use OpenClaw as an exemplar of a broader class of agents that interact with interfaces, manipulate files, invoke tools, and install extensions in real operating environments. Consequently, their security should be treated as a software engineering problem rather than as a product-specific concern. To address these architectural vulnerabilities, we propose a blueprint for defensible design. We present a risk taxonomy, secure engineering principles, and a practical research agenda to institutionalize safety in agent construction. Our goal is to transition the community focus from isolated vulnerability patching toward systematic defensive engineering and robust deployment practices.
Authors:Hongwei Li, Zhun Wang, Qinrun Dai, Yuzhou Nie, Jinjun Peng, Ruitong Liu, Jingyang Zhang, Kaijie Zhu, Jingxuan He, Lun Wang, Yangruibo Ding, Yueqi Chen, Wenbo Guo, Dawn Song
Abstract:
Agent development kits (ADKs) provide effective platforms and tooling for constructing agents, and their designs are critical to the constructed agents' performance, especially the functionality for agent topology, tools, and memory. However, current ADKs either lack sufficient functional support or rely on humans to manually design these components, limiting agents' generalizability and overall performance. We propose OpenSage, the first ADK that enables LLMs to automatically create agents with self-generated topology and toolsets while providing comprehensive and structured memory support. OpenSage offers effective functionality for agents to create and manage their own sub-agents and toolkits. It also features a hierarchical, graph-based memory system for efficient management and a specialized toolkit tailored to software engineering tasks. Extensive experiments across three state-of-the-art benchmarks with various backbone models demonstrate the advantages of OpenSage over existing ADKs. We also conduct rigorous ablation studies to demonstrate the effectiveness of our design for each component. We believe OpenSage can pave the way for the next generation of agent development, shifting the focus from human-centered to AI-centered paradigms.
Authors:Haodong Zhao, Jinming Hu, Yijie Bai, Tian Dong, Wei Du, Zhuosheng Zhang, Yanjiao Chen, Haojin Zhu, Gongshen Liu
Abstract:
Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every untrustworthy client may leak the received functional model instance. Current watermarking schemes for FedLM often require white-box access and client-side cooperation, providing only group-level proof of ownership rather than individual traceability. We propose EmbTracker, a server-side, traceable black-box watermarking framework specifically designed for FedLMs. EmbTracker achieves black-box verifiability by embedding a backdoor-based watermark detectable through simple API queries. Client-level traceability is realized by injecting unique identity-specific watermarks into the model distributed to each client. In this way, a leaked model can be attributed to a specific culprit, ensuring robustness even against non-cooperative participants. Extensive experiments on various language and vision-language models demonstrate that EmbTracker achieves robust traceability with verification rates near 100\%, high resilience against removal attacks (fine-tuning, pruning, quantization), and negligible impact on primary task performance (typically within 1-2\%).
Authors:Zhenxiong Yu, Zhi Yang, Zhiheng Jin, Shuhe Wang, Heng Zhang, Yanlin Fei, Lingfeng Zeng, Fangqi Lou, Shuo Zhang, Tu Hu, Jingping Liu, Rongze Chen, Xingyu Zhu, Kunyi Wang, Chaofa Yuan, Xin Guo, Zhaowei Liu, Feipeng Zhang, Jie Huang, Huacan Wang, Ronghao Chen, Liwen Zhang
Abstract:
As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security challenges. Most existing agent defense mechanisms adopt a mandatory checking paradigm, in which security validation is forcibly triggered at predefined stages of the agent lifecycle. In this work, we argue that effective agent security should be intrinsic and selective rather than architecturally decoupled and mandatory. We propose Spider-Sense framework, an event-driven defense framework based on Intrinsic Risk Sensing (IRS), which allows agents to maintain latent vigilance and trigger defenses only upon risk perception. Once triggered, the Spider-Sense invokes a hierarchical defence mechanism that trades off efficiency and precision: it resolves known patterns via lightweight similarity matching while escalating ambiguous cases to deep internal reasoning, thereby eliminating reliance on external models. To facilitate rigorous evaluation, we introduce S$^2$Bench, a lifecycle-aware benchmark featuring realistic tool execution and multi-stage attacks. Extensive experiments demonstrate that Spider-Sense achieves competitive or superior defense performance, attaining the lowest Attack Success Rate (ASR) and False Positive Rate (FPR), with only a marginal latency overhead of 8.3\%.
Authors:Harsh Chaudhari, Ethan Rathbun, Hanna Foerster, Jamie Hayes, Matthew Jagielski, Milad Nasr, Ilia Shumailov, Alina Oprea
Abstract:
Chain-of-Thought (CoT) reasoning has emerged as a powerful technique for enhancing large language models' capabilities by generating intermediate reasoning steps for complex tasks. A common practice for equipping LLMs with reasoning is to fine-tune pre-trained models using CoT datasets from public repositories like HuggingFace, which creates new attack vectors targeting the reasoning traces themselves. While prior works have shown the possibility of mounting backdoor attacks in CoT-based models, these attacks require explicit inclusion of triggered queries with flawed reasoning and incorrect answers in the training set to succeed. Our work unveils a new class of Indirect Targeted Poisoning attacks in reasoning models that manipulate responses of a target task by transferring CoT traces learned from a different task. Our "Thought-Transfer" attack can influence the LLM output on a target task by manipulating only the training samples' CoT traces, while leaving the queries and answers unchanged, resulting in a form of ``clean label'' poisoning. Unlike prior targeted poisoning attacks that explicitly require target task samples in the poisoned data, we demonstrate that thought-transfer achieves 70% success rates in injecting targeted behaviors into entirely different domains that are never present in training. Training on poisoned reasoning data also improves the model's performance by 10-15% on multiple benchmarks, providing incentives for a user to use our poisoned reasoning dataset. Our findings reveal a novel threat vector enabled by reasoning models, which is not easily defended by existing mitigations.
Authors:Shuhan Xu, Siyuan Liang, Hongling Zheng, Yong Luo, Han Hu, Lefei Zhang, Dacheng Tao
Abstract:
Diffusion-based image-to-video (I2V) models increasingly exhibit world-model-like properties by implicitly capturing temporal dynamics. However, existing studies have mainly focused on visual quality and controllability, and the robustness of the state transition learned by the model remains understudied. To fill this gap, we are the first to analyze the vulnerability of I2V models, find that temporal control mechanisms constitute a new attack surface, and reveal the challenge of modeling them uniformly under different attack settings. Based on this, we propose a trajectory-control attack, called CtrlAttack, to interfere with state evolution during the generation process. Specifically, we represent the perturbation as a low-dimensional velocity field and construct a continuous displacement field via temporal integration, thereby affecting the model's state transitions while maintaining temporal consistency; meanwhile, we map the perturbation to the observation space, making the method applicable to both white-box and black-box attack settings. Experimental results show that even under low-dimensional and strongly regularized perturbation constraints, our method can still significantly disrupt temporal consistency by increasing the attack success rate (ASR) to over 90% in the white-box setting and over 80% in the black-box setting, while keeping the variation of the FID and FVD within 6 and 130, respectively, thus revealing the potential security risk of I2V models at the level of state dynamics.
Authors:Jiaqi Xue, Mengxin Zheng, Qian Lou
Abstract:
The increased deployment of machine learning inference in various applications has sparked privacy concerns. In response, private inference (PI) protocols have been created to allow parties to perform inference without revealing their sensitive data. Despite recent advances in the efficiency of PI, most current methods assume a semi-honest threat model where the data owner is honest and adheres to the protocol. However, in reality, data owners can have different motivations and act in unpredictable ways, making this assumption unrealistic. To demonstrate how a malicious client can compromise the semi-honest model, we first designed an inference manipulation attack against a range of state-of-the-art private inference protocols. This attack allows a malicious client to modify the model output with 3x to 8x fewer queries than current black-box attacks. Motivated by the attacks, we proposed and implemented RobPI, a robust and resilient private inference protocol that withstands malicious clients. RobPI integrates a distinctive cryptographic protocol that bolsters security by weaving encryption-compatible noise into the logits and features of private inference, thereby efficiently warding off malicious-client attacks. Our extensive experiments on various neural networks and datasets show that RobPI achieves ~91.9% attack success rate reduction and increases more than 10x the number of queries required by malicious-client attacks.
Authors:Xuehao Cui, Ruibo Chen, Yihan Wu, Heng Huang
Abstract:
Large language models now produce text indistinguishable from human writing, which increases the need for reliable provenance tracing. Multi-bit watermarking can embed identifiers into generated text, but existing methods struggle to keep both text quality and watermark strength while carrying long messages. We propose MC$^2$Mark, a distortion-free multi-bit watermarking framework designed for reliable embedding and decoding of long messages. Our key technical idea is Multi-Channel Colored Reweighting, which encodes bits through structured token reweighting while keeping the token distribution unbiased, together with Multi-Layer Sequential Reweighting to strengthen the watermark signal and an evidence-accumulation detector for message recovery. Experiments show that MC$^2$Mark improves detectability and robustness over prior multi-bit watermarking methods while preserving generation quality, achieving near-perfect accuracy for short messages and exceeding the second-best method by nearly 30% for long messages.
Authors:Ruibo Chen, Yihan Wu, Xuehao Cui, Jingqi Zhang, Heng Huang
Abstract:
Watermarking has emerged as a crucial technique for detecting and attributing content generated by large language models. While recent advancements have utilized watermark ensembles to enhance robustness, prevailing methods typically prioritize maximizing the strength of the watermark at every individual layer. In this work, we identify a critical limitation in this "stronger-is-better" approach: strong watermarks significantly reduce the entropy of the token distribution, which paradoxically weakens the effectiveness of watermarking in subsequent layers. We theoretically and empirically show that detectability is bounded by entropy and that watermark ensembles induce a monotonic decrease in both entropy and the expected green-list ratio across layers. To address this inherent trade-off, we propose a general framework that utilizes weaker single-layer watermarks to preserve the entropy required for effective multi-layer ensembling. Empirical evaluations demonstrate that this counter-intuitive strategy mitigates signal decay and consistently outperforms strong baselines in both detectability and robustness.
Authors:Miao Lin, Feng Yu, Rui Ning, Lusi Li, Jiawei Chen, Qian Lou, Mengxin Zheng, Chunsheng Xin, Hongyi Wu
Abstract:
Deep neural networks are highly susceptible to backdoor attacks, yet most defense methods to date rely on balanced data, overlooking the pervasive class imbalance in real-world scenarios that can amplify backdoor threats. This paper presents the first in-depth investigation of how the dataset imbalance amplifies backdoor vulnerability, showing that (i) the imbalance induces a majority-class bias that increases susceptibility and (ii) conventional defenses degrade significantly as the imbalance grows. To address this, we propose Randomized Probability Perturbation (RPP), a certified poisoned-sample detection framework that operates in a black-box setting using only model output probabilities. For any inspected sample, RPP determines whether the input has been backdoor-manipulated, while offering provable within-domain detectability guarantees and a probabilistic upper bound on the false positive rate. Extensive experiments on five benchmarks (MNIST, SVHN, CIFAR-10, TinyImageNet and ImageNet10) covering 10 backdoor attacks and 12 baseline defenses show that RPP achieves significantly higher detection accuracy than state-of-the-art defenses, particularly under dataset imbalance. RPP establishes a theoretical and practical foundation for defending against backdoor attacks in real-world environments with imbalanced data.
Authors:Yuheng Tang, Kaijie Zhu, Bonan Ruan, Chuqi Zhang, Michael Yang, Hongwei Li, Suyue Guo, Tianneng Shi, Zekun Li, Christopher Kruegel, Giovanni Vigna, Dawn Song, William Yang Wang, Lun Wang, Yangruibo Ding, Zhenkai Liang, Wenbo Guo
Abstract:
Even though demonstrating extraordinary capabilities in code generation and software issue resolving, AI agents' capabilities in the full software DevOps cycle are still unknown. Different from pure code generation, handling the DevOps cycle in real-world software, including developing, deploying, and managing, requires analyzing large-scale projects, understanding dynamic program behaviors, leveraging domain-specific tools, and making sequential decisions. However, existing benchmarks focus on isolated problems and lack environments and tool interfaces for DevOps. We introduce DevOps-Gym, the first end-to-end benchmark for evaluating AI agents across core DevOps workflows: build and configuration, monitoring, issue resolving, and test generation. DevOps-Gym includes 700+ real-world tasks collected from 30+ projects in Java and Go. We develop a semi-automated data collection mechanism with rigorous and non-trivial expert efforts in ensuring the task coverage and quality. Our evaluation of state-of-the-art models and agents reveals fundamental limitations: they struggle with issue resolving and test generation in Java and Go, and remain unable to handle new tasks such as monitoring and build and configuration. These results highlight the need for essential research in automating the full DevOps cycle with AI agents.
Authors:Jiacheng Liang, Yuhui Wang, Tanqiu Jiang, Ting Wang
Abstract:
Mixture-of-Experts (MoE) language models introduce unique challenges for safety alignment due to their sparse routing mechanisms, which can enable degenerate optimization behaviors under standard full-parameter fine-tuning. In our preliminary experiments, we observe that naively applying full-parameter safety fine-tuning to MoE models can reduce attack success rates through routing or expert dominance effects, rather than by directly repairing Safety-Critical Experts. To address this challenge, we propose RASA, a routing-aware expert-level alignment framework that explicitly repairs Safety-Critical Experts while preventing routing-based bypasses. RASA identifies experts disproportionately activated by successful jailbreaks, selectively fine-tunes only these experts under fixed routing, and subsequently enforces routing consistency with safety-aligned contexts. Across two representative MoE architectures and a diverse set of jailbreak attacks, RASA achieves near-perfect robustness, strong cross-attack generalization, and substantially reduced over-refusal, while preserving general capabilities on benchmarks such as MMLU, GSM8K, and TruthfulQA. Our results suggest that robust MoE safety alignment benefits from targeted expert repair rather than global parameter updates, offering a practical and architecture-preserving alternative to prior approaches.
Authors:Shuning Zhang, Qucheng Zang, Yongquan `Owen' Hu, Jiachen Du, Xueyang Wang, Yan Kong, Xinyi Fu, Suranga Nanayakkara, Xin Yi, Hewu Li
Abstract:
Always-on sensing of AI applications on AR glasses makes traditional permission techniques ill-suited for context-dependent visual data, especially within home environments. The home presents a highly challenging privacy context due to the high density of sensitive objects, and the frequent presence of non-consenting family members, and the intimate nature of daily routines, making it a critical focus area for scalable privacy control mechanisms. Existing fine-grained controls, while offering nuanced choices, are inefficient for managing multiple private objects. We propose VisGuardian, a fine-grained content-based visual permission technique for AR glasses. VisGuardian features a group-based control mechanism that enables users to efficiently manage permissions for multiple private objects. VisGuardian detects objects using YOLO and adopts a pre-classified schema to group them. By selecting a single object, users can efficiently obscure groups of related objects based on criteria including privacy sensitivity, object category, or spatial proximity. A technical evaluation shows VisGuardian achieves mAP50 of 0.6704 with only 14.0 ms latency and a 1.7% increase in battery consumption per hour. Furthermore, a user study (N=24) comparing VisGuardian to slider-based and object-based baselines found it to be significantly faster for setting permissions and was preferred by users for its efficiency, effectiveness, and ease of use.
Authors:Md Takrim Ul Alam, Akif Islam, Mohd Ruhul Ameen, Abu Saleh Musa Miah, Jungpil Shin
Abstract:
Large language models (LLMs) deployed behind APIs and retrieval-augmented generation (RAG) stacks are vulnerable to prompt injection attacks that may override system policies, subvert intended behavior, and induce unsafe outputs. Existing defenses often treat prompts as flat strings and rely on ad hoc filtering or static jailbreak detection. This paper proposes Prompt Control-Flow Integrity (PCFI), a priority-aware runtime defense that models each request as a structured composition of system, developer, user, and retrieved-document segments. PCFI applies a three-stage middleware pipeline, lexical heuristics, role-switch detection, and hierarchical policy enforcement, before forwarding requests to the backend LLM. We implement PCFI as a FastAPI-based gateway for deployed LLM APIs and evaluate it on a custom benchmark of synthetic and semi-realistic prompt-injection workloads. On the evaluated benchmark suite, PCFI intercepts all attack-labeled requests, maintains a 0% False Positive Rate, and introduces a median processing overhead of only 0.04 ms. These results suggest that provenance- and priority-aware prompt enforcement is a practical and lightweight defense for deployed LLM systems.
Authors:Huan Song, Shuyu Tian, Junyi Hao, Cheng Yuan, Zhenyu Jia, Jiawei Shao, Xuelong Li
Abstract:
As intelligent sensing expands into high-privacy environments such as restrooms and changing rooms, the field faces a critical privacy-security paradox. Traditional RGB surveillance raises significant concerns regarding visual recording and storage, while existing privacy-preserving methods-ranging from physical desensitization to traditional cryptographic or obfuscation techniques-often compromise semantic understanding capabilities or fail to guarantee mathematical irreversibility against reconstruction attacks. To address these challenges, this study presents a novel privacy-preserving perception technology based on the AI Flow theoretical framework and an edge-cloud collaborative architecture. The proposed methodology integrates source desensitization with irreversible feature mapping. Leveraging Information Bottleneck theory, the edge device performs millisecond-level processing to transform raw imagery into abstract feature vectors via non-linear mapping and stochastic noise injection. This process constructs a unidirectional information flow that strips identity-sensitive attributes, rendering the reconstruction of original images impossible. Subsequently, the cloud platform utilizes multimodal family models to perform joint inference solely on these abstract vectors to detect abnormal behaviors. This approach fundamentally severs the path to privacy leakage at the architectural level, achieving a breakthrough from video surveillance to de-identified behavior perception and offering a robust solution for risk management in high-sensitivity public spaces.
Authors:Yao Wu, Ziye Jia, Jingjing Zhao, Haoyang Wang, Qihui Wu, Zhu Han
Abstract:
Unmanned aerial vehicle (UAV) networks are increasingly deployed for complex missions, including disaster response, intelligent logistics, and environmental monitoring. These missions generally require coordinated collaboration among multiple UAVs across distinct administrative domains. To support such cross-domain cooperation, service function chains (SFCs) are constructed, where complex workflows are decomposed into ordered service functions assigned to appropriate UAVs along the mission path. However, it is challenging to ensure secure, trustworthy, and low-latency cross-domain SFC orchestration in identity management, authentication, and resilience to node failures. To address these issues, this paper proposes a consortium blockchain-based trust architecture for cross-domain decentralized identity verification, auditable task execution, and dynamic service-aware orchestrator selection. The framework employs a hierarchical four-phase cross-domain authentication protocol covering the credential pre-verification, intra-domain execution, secure relay, and audit logging. The use case analysis confirms that the proposed framework achieves substantial reductions in authentication latency and significant improvements in system throughput against centralized and static schemes. The open challenges in scalability, adaptive trust assessment, interoperability, and energy efficiency are discussed, thereby providing directions for future researches on secure and efficient cross-domain UAV service orchestration.
Authors:Xianglin Yang, Yufei He, Shuo Ji, Bryan Hooi, Jin Song Dong
Abstract:
Self-evolving LLM agents update their internal state across sessions, often by writing and reusing long-term memory. This design improves performance on long-horizon tasks but creates a security risk: untrusted external content observed during a benign session can be stored as memory and later treated as instruction. We study this risk and formalize a persistent attack we call a Zombie Agent, where an attacker covertly implants a payload that survives across sessions, effectively turning the agent into a puppet of the attacker. We present a black-box attack framework that uses only indirect exposure through attacker-controlled web content. The attack has two phases. During infection, the agent reads a poisoned source while completing a benign task and writes the payload into long-term memory through its normal update process. During trigger, the payload is retrieved or carried forward and causes unauthorized tool behavior. We design mechanism-specific persistence strategies for common memory implementations, including sliding-window and retrieval-augmented memory, to resist truncation and relevance filtering. We evaluate the attack on representative agent setups and tasks, measuring both persistence over time and the ability to induce unauthorized actions while preserving benign task quality. Our results show that memory evolution can convert one-time indirect injection into persistent compromise, which suggests that defenses focused only on per-session prompt filtering are not sufficient for self-evolving agents.
Authors:Jiate Li, Defu Cao, Li Li, Wei Yang, Yuehan Qin, Chenxiao Yu, Tiannuo Yang, Ryan A. Rossi, Yan Liu, Xiyang Hu, Yue Zhao
Abstract:
Large language models (LLMs) have been serving as effective backbones for retrieval systems, including Retrieval-Augmentation-Generation (RAG), Dense Information Retriever (IR), and Agent Memory Retrieval. Recent studies have demonstrated that such LLM-based Retrieval (LLMR) is vulnerable to adversarial attacks, which manipulates documents by token-level injections and enables adversaries to either boost or diminish these documents in retrieval tasks. However, existing attack studies mainly (1) presume a known query is given to the attacker, and (2) highly rely on access to the victim model's parameters or interactions, which are hardly accessible in real-world scenarios, leading to limited validity. To further explore the secure risks of LLMR, we propose a practical black-box attack method that generates transferable injection tokens based on zero-shot surrogate LLMs without need of victim queries or victim models knowledge. The effectiveness of our attack raises such a robustness issue that similar effects may arise from benign or unintended document edits in the real world. To achieve our attack, we first establish a theoretical framework of LLMR and empirically verify it. Under the framework, we simulate the transferable attack as a min-max problem, and propose an adversarial learning mechanism that finds optimal adversarial tokens with learnable query samples. Our attack is validated to be effective on benchmark datasets across popular LLM retrievers.
Authors:Jinwei Hu, Shiyuan Meng, Yi Dong, Xiaowei Huang
Abstract:
Image transmission and processing systems in resource-critical applications face significant challenges from adversarial perturbations that compromise mission-specific object classification. Current robustness testing methods require excessive computational resources through exhaustive frame-by-frame processing and full-image perturbations, proving impractical for large-scale deployments where massive image streams demand immediate processing. This paper presents DDSA (Dual-Domain Strategic Attack), a resource-efficient adversarial robustness testing framework that optimizes testing through temporal selectivity and spatial precision. We introduce a scenario-aware trigger function that identifies critical frames requiring robustness evaluation based on class priority and model uncertainty, and employ explainable AI techniques to locate influential pixel regions for targeted perturbation. Our dual-domain approach achieves substantial temporal-spatial resource conservation while maintaining attack effectiveness. The framework enables practical deployment of comprehensive adversarial robustness testing in resource-constrained real-time applications where computational efficiency directly impacts mission success.
Authors:Rong Fu, Wenxin Zhang, Xiaowen Ma, Kun Liu, Wangyu Wu, Ziyu Kong, Jia Yee Tan, Tailong Luo, Xianda Li, Zeli Su, Youjin Wang, Yongtai Liu, Simon Fong
Abstract:
Deploying expressive learning models directly on programmable dataplanes promises line-rate, low-latency traffic analysis but remains hindered by strict hardware constraints and the need for predictable, auditable behavior. Chimera introduces a principled framework that maps attention-oriented neural computations and symbolic constraints onto dataplane primitives, enabling trustworthy inference within the match-action pipeline. Chimera combines a kernelized, linearized attention approximation with a two-layer key-selection hierarchy and a cascade fusion mechanism that enforces hard symbolic guarantees while preserving neural expressivity. The design includes a hardware-aware mapping protocol and a two-timescale update scheme that together permit stable, line-rate operation under realistic dataplane budgets. The paper presents the Chimera architecture, a hardware mapping strategy, and empirical evidence showing that neuro-symbolic attention primitives can achieve high-fidelity inference within the resource envelope of commodity programmable switches.
Authors:Hadi Aghaee, Christian Deppe, Holger Boche
Abstract:
This work investigates the fundamental limits of implementing network oblivious transfer via noisy multiple access channels and broadcast channels between honest-but-curious parties when the parties have access to general tripartite non-signaling correlations. By modeling the shared resource as an arbitrary tripartite non-signaling box, we obtain a unified perspective on both the channel behavior and the resulting correlations. Our main result demonstrates that perfect oblivious transfer is impossible. In the asymptotic regime, we further show that even negligible leakage cannot be achieved, as repeated use of the resource amplifies the receiver(s)'s ability to distinguish messages that were not intended for him/them. In contrast, the receiver(s)'s own privacy is not subject to a universal impossibility limitation.
Authors:Zhenhao Zhu, Yue Liu, Yanpei Guo, Wenjie Qu, Cancan Chen, Yufei He, Yibo Li, Yulin Chen, Tianyi Wu, Huiying Xu, Xinzhong Zhu, Jiaheng Zhang
Abstract:
We present GuardReasoner-Omni, a reasoning-based guardrail model designed to moderate text, image, and video data. First, we construct a comprehensive training corpus comprising 148k samples spanning these three modalities. Our training pipeline follows a two-stage paradigm to incentivize the model to deliberate before making decisions: (1) conducting SFT to cold-start the model with explicit reasoning capabilities and structural adherence; and (2) performing RL, incorporating an error-driven exploration reward to incentivize deeper reasoning on hard samples. We release a suite of models scaled at 2B and 4B parameters. Extensive experiments demonstrate that GuardReasoner-Omni achieves superior performance compared to existing state-of-the-art baselines across various guardrail benchmarks. Notably, GuardReasoner-Omni (2B) significantly surpasses the runner-up by 5.3% F1 score.
Authors:Rong Fu, Jia Yee Tan, Wenxin Zhang, Youjin Wang, Ziyu Kong, Zeli Su, Zhaolu Kang, Shuning Zhang, Xianda Li, Kun Liu, Simon Fong
Abstract:
Zero-knowledge circuits enable privacy-preserving and scalable systems but are difficult to implement correctly due to the tight coupling between witness computation and circuit constraints. We present zkCraft, a practical framework that combines deterministic, R1CS-aware localization with proof-bearing search to detect semantic inconsistencies. zkCraft encodes candidate constraint edits into a single Row-Vortex polynomial and replaces repeated solver queries with a Violation IOP that certifies the existence of edits together with a succinct proof. Deterministic LLM-driven mutation templates bias exploration toward edge cases while preserving auditable algebraic verification. Evaluation on real Circom code shows that proof-bearing localization detects diverse under- and over-constrained faults with low false positives and reduces costly solver interaction. Our approach bridges formal verification and automated debugging, offering a scalable path for robust ZK circuit development.
Authors:Xiaofeng Luo, Jiayi He, Jiawen Kang, Ruichen Zhang, Zhaoshui He, Ekram Hossain, Dong In Kim
Abstract:
The emergence of 6G-enabled vehicular metaverses enables Autonomous Vehicles (AVs) to operate across physical and virtual spaces through space-air-ground-sea integrated networks. The AVs can deploy AI agents powered by large AI models as personalized assistants, on edge servers to support intelligent driving decision making and enhanced on-board experiences. However, such cross-reality interactions may cause serious location privacy risks, as adversaries can infer AV trajectories by correlating the location reported when AVs request LBS in reality with the location of the edge servers on which their corresponding AI agents are deployed in virtuality. To address this challenge, we design a cross-reality location privacy protection framework based on hybrid actions, including continuous location perturbation in reality and discrete privacy-aware AI agent migration in virtuality. In this framework, a new privacy metric, termed cross-reality location entropy, is proposed to effectively quantify the privacy levels of AVs. Based on this metric, we formulate an optimization problem to optimize the hybrid action, focusing on achieving a balance between location protection, service latency reduction, and quality of service maintenance. To solve the complex mixed-integer problem, we develop a novel LLM-enhanced Hybrid Diffusion Proximal Policy Optimization (LHDPPO) algorithm, which integrates LLM-driven informative reward design to enhance environment understanding with double Generative Diffusion Models-based policy exploration to handle high-dimensional action spaces, thereby enabling reliable determination of optimal hybrid actions. Extensive experiments on real-world datasets demonstrate that the proposed framework effectively mitigates cross-reality location privacy leakage for AVs while maintaining strong user immersion within 6G-enabled vehicular metaverse scenarios.
Authors:Zhiyuan Chang, Mingyang Li, Yuekai Huang, Ziyou Jiang, Xiaojun Jia, Qian Xiong, Junjie Wang, Zhaoyang Li, Qing Wang
Abstract:
Large language model (LLM)-integrated applications have become increasingly prevalent, yet face critical security vulnerabilities from prompt injection (PI) attacks. Defending against PI attacks faces two major issues: malicious instructions can be injected through diverse vectors, and injected instructions often lack clear semantic boundaries from the surrounding context, making them difficult to identify. To address these issues, we propose InstruCoT, a model enhancement method for PI defense that synthesizes diverse training data and employs instruction-level chain-of-thought fine-tuning, enabling LLMs to effectively identify and reject malicious instructions regardless of their source or position in the context. We evaluate InstruCoT across three critical dimensions: Behavior Deviation, Privacy Leakage, and Harmful Output. Experimental results across four LLMs demonstrate that InstruCoT significantly outperforms baselines in all dimensions while maintaining utility performance without degradation
Authors:Jiajun Zhou, Changhui Sun, Meng Shen, Shanqing Yu, Qi Xuan
Abstract:
While pre-trained large models have achieved state-of-the-art performance in network traffic analysis, their prohibitive computational costs hinder deployment in real-time, throughput-sensitive network defense environments. This work bridges the gap between advanced representation learning and practical network protection by introducing Traffic-MoE, a sparse foundation model optimized for high-efficiency real-time inference. By dynamically routing traffic tokens to a small subset of specialized experts, Traffic-MoE effectively decouples model capacity from computational overhead. Extensive evaluations across three security-oriented tasks demonstrate that Traffic-MoE achieves up to a 12.38% improvement in detection performance compared to leading dense competitors. Crucially, it delivers a 91.62% increase in throughput, reduces inference latency by 47.81%, and cuts peak GPU memory consumption by 38.72%. Beyond efficiency, Traffic-MoE exhibits superior robustness against adversarial traffic shaping and maintains high detection efficacy in few-shot scenarios, establishing a new paradigm for scalable and resilient network traffic analysis.
Authors:Jiaqi Gao, Zijian Zhang, Yuqiang Sun, Ye Liu, Chengwei Liu, Han Liu, Yi Li, Yang Liu
Abstract:
Business logic vulnerabilities have become one of the most damaging yet least understood classes of smart contract vulnerabilities. Unlike traditional bugs such as reentrancy or arithmetic errors, these vulnerabilities arise from missing or incorrectly enforced business invariants and are tightly coupled with protocol semantics. Existing static analysis techniques struggle to capture such high-level logic, while recent large language model based approaches often suffer from unstable outputs and low accuracy due to hallucination and limited verification. In this paper, we propose LogicScan, an automated contrastive auditing framework for detecting business logic vulnerabilities in smart contracts. The key insight behind LogicScan is that mature, widely deployed on-chain protocols implicitly encode well-tested and consensus-driven business invariants. LogicScan systematically mines these invariants from large-scale on-chain contracts and reuses them as reference constraints to audit target contracts. To achieve this, LogicScan introduces a Business Specification Language (BSL) to normalize diverse implementation patterns into structured, verifiable logic representations. It further combines noise-aware logic aggregation with contrastive auditing to identify missing or weakly enforced invariants while mitigating LLM-induced false positives. We evaluate LogicScan on three real-world datasets, including DeFiHacks, Web3Bugs, and a set of top-200 audited contracts. The results show that LogicScan achieves an F1 score of 85.2%, significantly outperforming state-of-the-art tools while maintaining a low false-positive rate on production-grade contracts. Additional experiments demonstrate that LogicScan maintains consistent performance across different LLMs and is cost-effective, and that its false-positive suppression mechanisms substantially improve robustness.
Authors:Wenbo Guo, Chengwei Liu, Ming Kang, Yiran Zhang, Jiahui Wu, Zhengzi Xu, Vinay Sachidananda, Yang Liu
Abstract:
The Python Package Index (PyPI) has become a target for malicious actors, yet existing detection tools generate false positive rates of 15-30%, incorrectly flagging one-third of legitimate packages as malicious. This problem arises because current tools rely on simple syntactic rules rather than semantic understanding, failing to distinguish between identical API calls serving legitimate versus malicious purposes. To address this challenge, we propose PyGuard, a knowledge-driven framework that converts detection failures into useful behavioral knowledge by extracting patterns from existing tools' false positives and negatives. Our method utilizes hierarchical pattern mining to identify behavioral sequences that distinguish malicious from benign code, employs Large Language Models to create semantic abstractions beyond syntactic variations, and combines this knowledge into a detection system that integrates exact pattern matching with contextual reasoning. PyGuard achieves 99.50% accuracy with only 2 false positives versus 1,927-2,117 in existing tools, maintains 98.28% accuracy on obfuscated code, and identified 219 previously unknown malicious packages in real-world deployment. The behavioral patterns show cross-ecosystem applicability with 98.07% accuracy on NPM packages, demonstrating that semantic understanding enables knowledge transfer across programming languages.
Authors:Wenbo Guo, Shiwen Song, Jiaxun Guo, Zhengzi Xu, Chengwei Liu, Haoran Ou, Mengmeng Ge, Yang Liu
Abstract:
Open-source ecosystems such as NPM and PyPI are increasingly targeted by supply chain attacks, yet existing detection methods either depend on fragile handcrafted rules or data-driven features that fail to capture evolving attack semantics. We present IntelGuard, a retrieval-augmented generation (RAG) based framework that integrates expert analytical reasoning into automated malicious package detection. IntelGuard constructs a structured knowledge base from over 8,000 threat intelligence reports, linking malicious code snippets with behavioral descriptions and expert reasoning. When analyzing new packages, it retrieves semantically similar malicious examples and applies LLM-guided reasoning to assess whether code behaviors align with intended functionality. Experiments on 4,027 real-world packages show that IntelGuard achieves 99% accuracy and a 0.50% false positive rate, while maintaining 96.5% accuracy on obfuscated code. Deployed on PyPI.org, it discovered 54 previously unreported malicious packages, demonstrating interpretable and robust detection guided by expert knowledge.
Authors:Zhenhua Xu, Xiaoning Tian, Wenjun Zeng, Wenpeng Xing, Tianliang Lu, Gaolei Li, Chaochao Chen, Meng Han
Abstract:
Protecting the intellectual property of large language models requires robust ownership verification. Conventional backdoor fingerprinting, however, is flawed by a stealth-robustness paradox: to be robust, these methods force models to memorize fixed responses to high-perplexity triggers, but this targeted overfitting creates detectable statistical artifacts. We resolve this paradox with KinGuard, a framework that embeds a private knowledge corpus built on structured kinship narratives. Instead of memorizing superficial triggers, the model internalizes this knowledge via incremental pre-training, and ownership is verified by probing its conceptual understanding. Extensive experiments demonstrate KinGuard's superior effectiveness, stealth, and resilience against a battery of attacks including fine-tuning, input perturbation, and model merging. Our work establishes knowledge-based embedding as a practical and secure paradigm for model fingerprinting.
Authors:Zhengyang Shan, Jiayun Xin, Yue Zhang, Minghui Xu
Abstract:
Code agents powered by large language models can execute shell commands on behalf of users, introducing severe security vulnerabilities. This paper presents a two-phase security analysis of the OpenClaw platform. As an open-source AI agent framework that operates locally, OpenClaw can be integrated with various commercial large language models. Because its native architecture lacks built-in security constraints, it serves as an ideal subject for evaluating baseline agent vulnerabilities. First, we systematically evaluate OpenClaw's native resilience against malicious instructions. By testing 47 adversarial scenarios across six major attack categories derived from the MITRE ATLAS and ATT\&CK frameworks, we have demonstrated that OpenClaw exhibits significant inherent security issues. It primarily relies on the security capabilities of the backend LLM and is highly susceptible to sandbox escape attacks, with an average defense rate of only 17\%. To mitigate these critical security gaps, we propose and implement a novel Human-in-the-Loop (HITL) defense layer. We utilize a dual-mode testing framework to evaluate the system with and without our proposed intervention. Our findings show that the introduced HITL layer significantly hardens the system, successfully intercepting up to 8 severe attacks that completely bypassed OpenClaw's native defenses. By combining native capabilities with our HITL approach, the overall defense rate improves to a range of 19\% to 92\%. Our study not only exposes the intrinsic limitations of current code agents but also demonstrates the effectiveness of human-agent collaborative defense strategies.
Authors:Yuhang Huang, Boyang Ma, Biwei Yan, Xuelong Dai, Yechao Zhang, Minghui Xu, Kaidi Xu, Yue Zhang
Abstract:
The Model Context Protocol (MCP) is an open and standardized interface that enables large language models (LLMs) to interact with external tools and services, and is increasingly adopted by AI agents. However, the security of MCP-based systems remains largely unexplored.In this work, we conduct a large-scale security analysis of MCP servers integrated within MCP clients. We show that treating MCP servers as trusted entities without authenticating the caller identity is fundamentally insecure. Since MCP servers often cannot distinguish who is invoking a request, a single authorization decision may implicitly grant access to multiple, potentially untrusted callers.Our empirical study reveals that most MCP servers rely on persistent authorization states, allowing tool invocations after an initial authorization without re-authentication, regardless of the caller. In addition, many MCP servers fail to enforce authentication at the per-tool level, enabling unauthorized access to sensitive operations.These findings demonstrate that one-time authorization and server-level trust significantly expand the attack surface of MCP-based systems, highlighting the need for explicit caller authentication and fine-grained authorization mechanisms.
Authors:Boyang Ma, Hechuan Guo, Peizhuo Lv, Minghui Xu, Xuelong Dai, YeChao Zhang, Yijun Yang, Yue Zhang
Abstract:
Embodied AI systems (e.g., autonomous vehicles, service robots, and LLM-driven interactive agents) are rapidly transitioning from controlled environments to safety critical real-world deployments. Unlike disembodied AI, failures in embodied intelligence lead to irreversible physical consequences, raising fundamental questions about security, safety, and reliability. While existing research predominantly analyzes embodied AI through the lenses of Large Language Model (LLM) vulnerabilities or classical Cyber-Physical System (CPS) failures, this survey argues that these perspectives are individually insufficient to explain many observed breakdowns in modern embodied systems. We posit that a significant class of failures arises from embodiment-induced system-level mismatches, rather than from isolated model flaws or traditional CPS attacks. Specifically, we identify four core insights that explain why embodied AI is fundamentally harder to secure: (i) semantic correctness does not imply physical safety, as language-level reasoning abstracts away geometry, dynamics, and contact constraints; (ii) identical actions can lead to drastically different outcomes across physical states due to nonlinear dynamics and state uncertainty; (iii) small errors propagate and amplify across tightly coupled perception-decision-action loops; and (iv) safety is not compositional across time or system layers, enabling locally safe decisions to accumulate into globally unsafe behavior. These insights suggest that securing embodied AI requires moving beyond component-level defenses toward system-level reasoning about physical risk, uncertainty, and failure propagation.
Authors:Qianli Wang, Boyang Ma, Minghui Xu, Yue Zhang
Abstract:
LLM agents often rely on Skills to describe available tools and recommended procedures. We study a hidden-comment prompt injection risk in this documentation layer: when a Markdown Skill is rendered to HTML, HTML comment blocks can become invisible to human reviewers, yet the raw text may still be supplied verbatim to the model. In experiments, we find that DeepSeek-V3.2 and GLM-4.5-Air can be influenced by malicious instructions embedded in a hidden comment appended to an otherwise legitimate Skill, yielding outputs that contain sensitive tool intentions. A short defensive system prompt that treats Skills as untrusted and forbids sensitive actions prevents these malicious tool calls and instead surfaces the suspicious hidden instructions.
Authors:Zhihao Li, Boyang Ma, Xuelong Dai, Minghui Xu, Yue Zhang, Biwei Yan, Kun Li
Abstract:
The Model Context Protocol (MCP) enables large language models to invoke external tools through natural-language descriptions, forming the foundation of many AI agent applications. However, MCP does not enforce consistency between documented tool behavior and actual code execution, even though MCP Servers often run with broad system privileges. This gap introduces a largely unexplored security risk. We study how mismatches between externally presented tool descriptions and underlying implementations systematically shape the mental models and decision-making behavior of intelligent agents. Specifically, we present the first large-scale study of description-code inconsistency in the MCP ecosystem. We design an automated static analysis framework and apply it to 10,240 real-world MCP Servers across 36 categories. Our results show that while most servers are highly consistent, approximately 13% exhibit substantial mismatches that can enable undocumented privileged operations, hidden state mutations, or unauthorized financial actions. We further observe systematic differences across application categories, popularity levels, and MCP marketplaces. Our findings demonstrate that description-code inconsistency is a concrete and prevalent attack surface in MCP-based AI agents, and motivate the need for systematic auditing and stronger transparency guarantees in future agent ecosystems.
Authors:Rui Meng, Zhidi Zhang, Song Gao, Yaheng Wang, Xiaodong Xu, Yijing Lin, Yiming Liu, Chenyuan Feng, Lexi Xu, Yi Ma, Ping Zhang, Rahim Tafazolli
Abstract:
Intellicise (Intelligent and Concise) wireless network is the main direction of the evolution of future mobile communication systems, a perspective now widely acknowledged across academia and industry. As a key technology within it, Agentic AI has garnered growing attention due to its advanced cognitive capabilities, enabled through continuous perception-memory-reasoning-action cycles. This paper first analyses the unique advantages that Agentic AI introduces to intellicise wireless networks. We then propose a structured taxonomy for Agentic AI-enhanced secure intellicise wireless networks. Building on this framework, we identify emerging security and privacy challenges introduced by Agentic AI and summarize targeted strategies to address these vulnerabilities. A case study further demonstrates Agentic AI's efficacy in defending against intelligent eavesdropping attacks. Finally, we outline key open research directions to guide future exploration in this field.
Authors:Ziyao Wang, Nizhang Li, Pingzhi Li, Guoheng Sun, Tianlong Chen, Ang Li
Abstract:
Open-sourcing foundation models (FMs) enables broad reuse but also exposes model trainers to economic and safety risks from unrestricted downstream fine-tuning. We address this problem by building non-fine-tunable foundation models: models that remain broadly usable in their released form while yielding limited adaptation gains under task-agnostic unauthorized fine-tuning. We propose Private Mask Pre-Training (PMP), a pre-training framework that concentrates representation learning into a sparse subnetwork identified early in training. The binary mask defining this subnetwork is kept private, and only the final dense weights are released. This forces unauthorized fine-tuning without access to the mask to update parameters misaligned with pretraining subspace, inducing an intrinsic mismatch between the fine-tuning objective and the pre-training geometry. We provide theoretical analysis showing that this mismatch destabilizes gradient-based adaptation and bounds fine-tuning gains. Empirical results on large language models demonstrating that PMP preserves base model performance while consistently degrading unauthorized fine-tuning across a wide range of downstream tasks, with the strength of non-fine-tunability controlled by the mask ratio.
Authors:Wei Minn, Phong Phan, Vikas K. Malviya, Benjamin Adolphi, Yan Naing Tun, Henning Benzon Treichl, Albert Ching, Lwin Khin Shar, David Lo
Abstract:
Android banking applications have revolutionized financial management by allowing users to perform various financial activities through mobile devices. However, this convenience has attracted cybercriminals who exploit security vulnerabilities to access sensitive financial data. FjordPhantom, a malware identified by our industry collaborator, uses virtualization and hooking to bypass the detection of malicious accessibility services, allowing it to conduct keylogging, screen scraping, and unauthorized data access. This malware primarily affects banking and finance apps across East and Southeast Asia region where our industry partner's clients are primarily based in. It requires users to be deceived into installing a secondary malicious component and activating a malicious accessibility service. In our study, we conducted an empirical study on the susceptibility of banking apps in the region to FjordPhantom, analyzed the effectiveness of protective measures currently implemented in those apps, and discussed ways to detect and prevent such attacks by identifying and mitigating the vulnerabilities exploited by this malware.
Authors:Saideep Sreekumar, Zeng Wang, Akashdeep Saha, Weihua Xiao, Minghao Shao, Muhammad Shafique, Ozgur Sinanoglu, Ramesh Karri, Johann Knechtel
Abstract:
Hardware Trojans (HTs) remain a critical threat because learning-based detectors often overfit to narrow trigger/payload patterns and small, stylized benchmarks. We introduce TrojanGYM, an agentic, LLM-driven framework that automatically curates HT insertions to expose detector blind spots while preserving design correctness. Given high-level HT specifications, a suite of cooperating LLM agents (instantiated with GPT-4, LLaMA-3.3-70B, and Gemini-2.5Pro) proposes and refines RTL modifications that realize diverse triggers and payloads without impacting normal functionality. TrojanGYM implements a feedback-driven benchmark generation loop co-designed with HT detectors, in which constraint-aware syntactic checking and GNN-based HT detectors provide feedback that iteratively refines HT specifications and insertion strategies to better surface detector blind spots. We further propose Robust-GNN4TJ, a new implementation of the GNN4TJ with improved graph extraction, training robustness, and prediction reliability, especially on LLM-generated HT designs. On the most challenging TrojanGYM-generated benchmarks, Robust-GNN4TJ raises HT detection rates from 0% to 60% relative to a prior GNN-based detector. We instantiate TrojanGYM on SRAM, AES-128, and UART designs at RTL level, and show that it systematically produces diverse, functionally correct HTs that reach up to 83.33% evasion rates against modern GNN-based detectors, revealing robustness gaps that are not apparent when these detectors are evaluated solely on existing TrustHub-style benchmarks. Post peer-review, we will release all codes and artifacts.
Authors:Rui Meng, Song Gao, Bingxuan Xu, Xiaodong Xu, Jianqiao Chen, Nan Ma, Pei Xiao, Ping Zhang, Rahim Tafazolli
Abstract:
Semantic Communication (SemCom), leveraging its significant advantages in transmission efficiency and reliability, has emerged as a core technology for constructing future intellicise (intelligent and concise) wireless networks. However, intelligent attacks represented by semantic eavesdropping pose severe challenges to the security of SemCom. To address this challenge, Semantic Steganographic Communication (SemSteCom) achieves ``invisible'' encryption by implicitly embedding private semantic information into cover modality carriers. The state-of-the-art study has further introduced generative diffusion models to directly generate stega images without relying on original cover images, effectively enhancing steganographic capacity. Nevertheless, the recovery process of private images is highly dependent on the guidance of private semantic keys, which may be inferred by intelligent eavesdroppers, thereby introducing new security threats. To address this issue, we propose an Agentic AI-driven SemSteCom (AgentSemSteCom) scheme, which includes semantic extraction, digital token controlled reference image generation, coverless steganography, semantic codec, and optional task-oriented enhancement modules. The proposed AgentSemSteCom scheme obviates the need for both cover images and private semantic keys, thereby boosting steganographic capacity while reinforcing transmission security. The simulation results on open-source datasets verify that, AgentSemSteCom achieves better transmission quality and higher security levels than the baseline scheme.
Authors:Yubin Qu, Yi Liu, Tongcheng Geng, Gelei Deng, Yuekang Li, Leo Yu Zhang, Ying Zhang, Lei Ma
Abstract:
LLM-based coding agents extend their capabilities via third-party agent skills distributed through open marketplaces without mandatory security review. Unlike traditional packages, these skills are executed as operational directives with system-level privileges, so a single malicious skill can compromise the host. Prior work has not examined whether supply-chain attacks can directly hijack an agent's action space, such as file writes, shell commands, and network requests, despite existing safeguards. We introduce Document-Driven Implicit Payload Execution (DDIPE), which embeds malicious logic in code examples and configuration templates within skill documentation. Because agents reuse these examples during normal tasks, the payload executes without explicit prompts. Using an LLM-driven pipeline, we generate 1,070 adversarial skills from 81 seeds across 15 MITRE ATTACK categories. Across four frameworks and five models, DDIPE achieves 11.6% to 33.5% bypass rates, while explicit instruction attacks achieve 0% under strong defenses. Static analysis detects most cases, but 2.5% evade both detection and alignment. Responsible disclosure led to four confirmed vulnerabilities and two fixes.
Authors:Zhihao Chen, Ying Zhang, Yi Liu, Gelei Deng, Yuekang Li, Yanjun Zhang, Jianting Ning, Leo Yu Zhang, Lei Ma, Zhiqiang Li
Abstract:
Third-party skills extend LLM agents with powerful capabilities but often handle sensitive credentials in privileged environments, making leakage risks poorly understood. We present the first large-scale empirical study of this problem, analyzing 17,022 skills (sampled from 170,226 on SkillsMP) using static analysis, sandbox testing, and manual inspection. We identify 520 vulnerable skills with 1,708 issues and derive a taxonomy of 10 leakage patterns (4 accidental and 6 adversarial). We find that (1) leakage is fundamentally cross-modal: 76.3% require joint analysis of code and natural language, while 3.1% arise purely from prompt injection; (2) debug logging is the primary vector, with print and console.log causing 73.5% of leaks due to stdout exposure to LLMs; and (3) leaked credentials are both exploitable (89.6% without privileges) and persistent, as forks retain secrets even after upstream fixes. After disclosure, all malicious skills were removed and 91.6% of hardcoded credentials were fixed. We release our dataset, taxonomy, and detection pipeline to support future research.
Authors:Shaswata Mitra, Raj Patel, Sudip Mittal, Md Rayhanur Rahman, Shahram Rahimi
Abstract:
Multi-agent systems (MAS) powered by LLMs promise adaptive, reasoning-driven enterprise workflows, yet granting agents autonomous control over tools, memory, and communication introduces attack surfaces absent from deterministic pipelines. While current research largely addresses prompt-level exploits and narrow individual vectors, it lacks a holistic architectural model for enterprise-grade security. We introduce AgenticCyOps (Securing Multi-Agentic AI Integration in Enterprise Cyber Operations), a framework built on a systematic decomposition of attack surfaces across component, coordination, and protocol layers, revealing that documented vectors consistently trace back to two integration surfaces: tool orchestration and memory management. Building on this observation, we formalize these integration surfaces as primary trust boundaries and define five defensive principles: authorized interfaces, capability scoping, verified execution, memory integrity & synchronization, and access-controlled data isolation; each aligned with established compliance standards (NIST, ISO 27001, GDPR, EU AI Act). We apply the framework to a Security Operations Center (SOC) workflow, adopting the Model Context Protocol (MCP) as the structural basis, with phase-scoped agents, consensus validation loops, and per-organization memory boundaries. Coverage analysis, attack path tracing, and trust boundary assessment confirm that the design addresses the documented attack vectors with defense-in-depth, intercepts three of four representative attack chains within the first two steps, and reduces exploitable trust boundaries by a minimum of 72% compared to a flat MAS, positioning AgenticCyOps as a foundation for securing enterprise-grade integration.
Authors:Lan Zhang, Chengsi Liang, Zeming Zhuang, Yao Sun, Fang Fang, Xiaoyong Yuan, Dusit Niyato
Abstract:
Semantic communication (SemCom) redefines wireless communication from reproducing symbols to transmitting task-relevant semantics. However, this AI-native architecture also introduces new vulnerabilities, as semantic failures may arise from adversarial perturbations to models, corrupted training data, desynchronized priors, or misaligned inference even when lower-layer transmission reliability and cryptographic protection remain intact. This survey provides a defense-centered and system-oriented synthesis of security in SemCom via AI defense. We analyze AI-centric threat models by consolidating existing studies and organizing attack surfaces across model-level, channel-realizable, knowledge-based, and networked inference vectors. Building on this foundation, we present a structured taxonomy of defense strategies organized by where semantic integrity can be compromised in SemCom systems despite correct symbol delivery, spanning semantic encoding, wireless transmission, knowledge integrity, and coordination among multiple agents. These categories correspond to distinct security failure modes, including representation fragility, channel-realizable manipulation, semantic prior poisoning or desynchronization, and adversarial propagation through distributed inference. We also examine security utility operating envelopes that capture tradeoffs among semantic fidelity, robustness, latency, and energy under realistic constraints, survey evaluation frameworks and representative applications, and identify open challenges in cross-layer composition and deployment-time certification. Overall, this survey offers a unified system-level perspective that enables readers to understand major threat and defense mechanisms in AI-native SemCom systems and to leverage emerging security techniques in the design and deployment of robust SemCom architectures for next-generation intelligent networks.
Authors:Amin Banayeeanzade, Qingchuan Yang, Deqing Fu, Spencer Hong, Erin Babinsky, Alfy Samuel, Anoop Kumar, Robin Jia, Sai Praneeth Karimireddy
Abstract:
High-quality data is essential for modern machine learning, yet many valuable corpora are sensitive and cannot be freely shared. Synthetic data offers a practical substitute for downstream development, and large language models (LLMs) have emerged as powerful engines for generating it. However, existing private text generation methods are severely inefficient: they are data-intensive, computationally slow, and often require large private corpora or batch sizes to achieve usable quality. We introduce EPSVec, a differentially-private lightweight alternative that steers LLM generation using *dataset vectors*--directions in activation space that capture the distributional gap between private data and public priors. EPSVec extracts and sanitizes steering vectors just once and then performs standard decoding. This decouples the privacy budget from generation, enabling arbitrarily many synthetic samples without additional privacy cost and yielding strong fidelity even in low-data regimes. Furthermore, we enhance our method by utilizing pretrained (base) models and introducing fixed-shot prompting to boost generation diversity and fidelity. Our experiments demonstrate that EPSVec outperforms existing baselines in distributional alignment and downstream utility, particularly in low-data regimes, while significantly reducing computational overhead.
Authors:Xinfeng Li, Shenyu Dai, Kelong Zheng, Yue Xiao, Gelei Deng, Wei Dong, Xiaofeng Wang
Abstract:
Large language model (LLM) agents are rapidly becoming trusted copilots in high-stakes domains like software development and healthcare. However, this deepening trust introduces a novel attack surface: Agent-Mediated Deception (AMD), where compromised agents are weaponized against their human users. While extensive research focuses on agent-centric threats, human susceptibility to deception by a compromised agent remains unexplored. We present the first large-scale empirical study with 303 participants to measure human susceptibility to AMD. This is based on HAT-Lab (Human-Agent Trust Laboratory), a high-fidelity research platform we develop, featuring nine carefully crafted scenarios spanning everyday and professional domains (e.g., healthcare, software development, human resources). Our 10 key findings reveal significant vulnerabilities and provide future defense perspectives. Specifically, only 8.6% of participants perceive AMD attacks, while domain experts show increased susceptibility in certain scenarios. We identify six cognitive failure modes in users and find that their risk awareness often fails to translate to protective behavior. The defense analysis reveals that effective warnings should interrupt workflows with low verification costs. With experiential learning based on HAT-Lab, over 90% of users who perceive risks report increased caution against AMD. This work provides empirical evidence and a platform for human-centric agent security research.
Authors:Yu Yan, Sheng Sun, Shengjia Cheng, Teli Liu, Mingfeng Li, Min Liu
Abstract:
Vision-Language Models (VLMs) with multimodal reasoning capabilities are high-value attack targets, given their potential for handling complex multimodal harmful tasks. Mainstream black-box jailbreak attacks on VLMs work by distributing malicious clues across modalities to disperse model attention and bypass safety alignment mechanisms. However, these adversarial attacks rely on simple and fixed image-text combinations that lack attack complexity scalability, limiting their effectiveness for red-teaming VLMs' continuously evolving reasoning capabilities. We propose \textbf{CrossTALK} (\textbf{\underline{Cross}}-modal en\textbf{\underline{TA}}ng\textbf{\underline{L}}ement attac\textbf{\underline{K}}), which is a scalable approach that extends and entangles information clues across modalities to exceed VLMs' trained and generalized safety alignment patterns for jailbreak. Specifically, {knowledge-scalable reframing} extends harmful tasks into multi-hop chain instructions, {cross-modal clue entangling} migrates visualizable entities into images to build multimodal reasoning links, and {cross-modal scenario nesting} uses multimodal contextual instructions to steer VLMs toward detailed harmful outputs. Experiments show our COMET achieves state-of-the-art attack success rate.
Authors:Yi Liu, Zhihao Chen, Yanjun Zhang, Gelei Deng, Yuekang Li, Jianting Ning, Leo Yu Zhang
Abstract:
Third-party agent skills extend LLM-based agents with instruction files and executable code that run on users' machines. Skills execute with user privileges and are distributed through community registries with minimal vetting, but no ground-truth dataset exists to characterize the resulting threats. We construct the first labeled dataset of malicious agent skills by behaviorally verifying 98,380 skills from two community registries, confirming 157 malicious skills with 632 vulnerabilities. These attacks are not incidental. Malicious skills average 4.03 vulnerabilities across a median of three kill chain phases, and the ecosystem has split into two archetypes: Data Thieves that exfiltrate credentials through supply chain techniques, and Agent Hijackers that subvert agent decision-making through instruction manipulation. A single actor accounts for 54.1\% of confirmed cases through templated brand impersonation. Shadow features, capabilities absent from public documentation, appear in 0\% of basic attacks but 100\% of advanced ones; several skills go further by exploiting the AI platform's own hook system and permission flags. Responsible disclosure led to 93.6\% removal within 30 days. We release the dataset and analysis pipeline to support future work on agent skill security.
Authors:Alireza Salemi, Hamed Zamani
Abstract:
Personalization is crucial for aligning Large Language Model (LLM) outputs with individual user preferences and background knowledge. State-of-the-art solutions are based on retrieval augmentation, where relevant context from a user profile is retrieved for LLM consumption. These methods deal with a trade-off between exposing retrieved private data to cloud providers and relying on less capable local models. We introduce $P^3$, an interactive framework for high-quality personalization without revealing private profiles to server-side LLMs. In $P^3$, a large server-side model generates a sequence of $k$ draft tokens based solely on the user query, while a small client-side model, with retrieval access to the user's private profile, evaluates and modifies these drafts to better reflect user preferences. This process repeats until an end token is generated. Experiments on LaMP-QA, a recent benchmark consisting of three personalized question answering datasets, show that $P^3$ consistently outperforms both non-personalized server-side and personalized client-side baselines, achieving statistically significant improvements of $7.4%$ to $9%$ on average. Importantly, $P^3$ recovers $90.3%$ to $95.7%$ of the utility of a ``leaky'' upper-bound scenario in which the full profile is exposed to the large server-side model. Privacy analyses, including linkability and attribute inference attacks, indicate that $P^3$ preserves the privacy of a non-personalized server-side model, introducing only marginal additional leakage ($1.5%$--$3.5%$) compared to submitting a query without any personal context. Additionally, the framework is efficient for edge deployment, with the client-side model generating only $9.2%$ of the total tokens. These results demonstrate that $P^3$ provides a practical, effective solution for personalized generation with improved privacy.
Authors:Yi Liu, Weizhe Wang, Ruitao Feng, Yao Zhang, Guangquan Xu, Gelei Deng, Yuekang Li, Leo Zhang
Abstract:
The rise of AI agent frameworks has introduced agent skills, modular packages containing instructions and executable code that dynamically extend agent capabilities. While this architecture enables powerful customization, skills execute with implicit trust and minimal vetting, creating a significant yet uncharacterized attack surface. We conduct the first large-scale empirical security analysis of this emerging ecosystem, collecting 42,447 skills from two major marketplaces and systematically analyzing 31,132 using SkillScan, a multi-stage detection framework integrating static analysis with LLM-based semantic classification. Our findings reveal pervasive security risks: 26.1% of skills contain at least one vulnerability, spanning 14 distinct patterns across four categories: prompt injection, data exfiltration, privilege escalation, and supply chain risks. Data exfiltration (13.3%) and privilege escalation (11.8%) are most prevalent, while 5.2% of skills exhibit high-severity patterns strongly suggesting malicious intent. We find that skills bundling executable scripts are 2.12x more likely to contain vulnerabilities than instruction-only skills (OR=2.12, p<0.001). Our contributions include: (1) a grounded vulnerability taxonomy derived from 8,126 vulnerable skills, (2) a validated detection methodology achieving 86.7% precision and 82.5% recall, and (3) an open dataset and detection toolkit to support future research. These results demonstrate an urgent need for capability-based permission systems and mandatory security vetting before this attack vector is further exploited.
Authors:Lionel Z. Wang, Yusheng Zhao, Jiabin Luo, Xinfeng Li, Lixu Wang, Yinan Peng, Haoyang Li, XiaoFeng Wang, Wei Dong
Abstract:
The deployment of Machine-Generated Text (MGT) detection systems necessitates processing sensitive user data, creating a fundamental conflict between authorship verification and privacy preservation. Standard anonymization techniques often disrupt linguistic fluency, while rigorous Differential Privacy (DP) mechanisms typically degrade the statistical signals required for accurate detection. To resolve this dilemma, we propose \textbf{DP-MGTD}, a framework incorporating an Adaptive Differentially Private Entity Sanitization algorithm. Our approach utilizes a two-stage mechanism that performs noisy frequency estimation and dynamically calibrates privacy budgets, applying Laplace and Exponential mechanisms to numerical and textual entities respectively. Crucially, we identify a counter-intuitive phenomenon where the application of DP noise amplifies the distinguishability between human and machine text by exposing distinct sensitivity patterns to perturbation. Extensive experiments on the MGTBench-2.0 dataset show that our method achieves near-perfect detection accuracy, significantly outperforming non-private baselines while satisfying strict privacy guarantees.
Authors:Yulin Shen, Xudong Pan, Geng Hong, Min Yang
Abstract:
Recent advances in the Model Context Protocol (MCP) have enabled large language models (LLMs) to invoke external tools with unprecedented ease. This creates a new class of powerful and tool augmented agents. Unfortunately, this capability also introduces an under explored attack surface, specifically the malicious manipulation of tool responses. Existing techniques for indirect prompt injection that target MCP suffer from high deployment costs, weak semantic coherence, or heavy white box requirements. Furthermore, they are often easily detected by recently proposed defenses. In this paper, we propose Tree structured Injection for Payloads (TIP), a novel black-box attack which generates natural payloads to reliably seize control of MCP enabled agents even under defense. Technically, We cast payload generation as a tree structured search problem and guide the search with an attacker LLM operating under our proposed coarse-to-fine optimization framework. To stabilize learning and avoid local optima, we introduce a path-aware feedback mechanism that surfaces only high quality historical trajectories to the attacker model. The framework is further hardened against defensive transformations by explicitly conditioning the search on observable defense signals and dynamically reallocating the exploration budget. Extensive experiments on four mainstream LLMs show that TIP attains over 95% attack success in undefended settings while requiring an order of magnitude fewer queries than prior adaptive attacks. Against four representative defense approaches, TIP preserves more than 50% effectiveness and significantly outperforms the state-of-the-art attacks. By implementing the attack on real world MCP systems, our results expose an invisible but practical threat vector in MCP deployments. We also discuss potential mitigation approaches to address this critical security gap.
Authors:Kelechi G. Kalu, Soham Rattan, Taylor R. Schorlemmer, George K. Thiruvathukal, Jeffrey C. Carver, James C. Davis
Abstract:
Empirical studies of research software are hard to compare because the literature operationalizes ``research software'' inconsistently. Motivated by the research software supply chain (RSSC) and its security risks, we introduce an RSSC-oriented taxonomy that makes scope and operational boundaries explicit for empirical research software security studies. We conduct a targeted scoping review of recent repository mining and dataset construction studies, extracting each work's definition, inclusion criteria, unit of analysis, and identification heuristics. We synthesize these into a harmonized taxonomy and a mapping that translates prior approaches into shared taxonomy dimensions. We operationalize the taxonomy on a large community-curated corpus from the Research Software Encyclopedia (RSE), producing an annotated dataset, a labeling codebook, and a reproducible labeling pipeline. Finally, we apply OpenSSF Scorecard as a preliminary security analysis to show how repository-centric security signals differ across taxonomy-defined clusters and why taxonomy-aware stratification is necessary for interpreting RSSC security measurements.
Authors:Leonhard Grosse, Sara Saeidian, Tobias J. Oechtering, Mikael Skoglund
Abstract:
We investigate Dobrushin coefficients of discrete Markov kernels that have bounded pointwise maximal leakage (PML) with respect to all distributions with a minimum probability mass bounded away from zero by a constant $c>0$. This definition recovers local differential privacy (LDP) for $c\to 0$. We derive achievable bounds on contraction in terms of a kernels PML guarantees, and provide mechanism constructions that achieve the presented bounds. Further, we extend the results to general $f$-divergences by an application of Binette's inequality. Our analysis yields tighter bounds for mechanisms satisfying LDP and extends beyond the LDP regime to any discrete kernel.
Authors:Xinyi Wu, Jiagui Chen, Geng Hong, Jiayi Dong, Xudong Pan, Jiarun Dai, Min Yang
Abstract:
Web Agents are increasingly deployed to perform complex tasks in real web environments, yet their security evaluation remains fragmented and difficult to standardize. We present WebTrap Park, an automated platform for systematic security evaluation of Web Agents through direct observation of their concrete interactions with live web pages. WebTrap Park instantiates three major sources of security risk into 1,226 executable evaluation tasks and enables action based assessment without requiring agent modification. Our results reveal clear security differences across agent frameworks, highlighting the importance of agent architecture beyond the underlying model. WebTrap Park is publicly accessible at https://security.fudan.edu.cn/webagent and provides a scalable foundation for reproducible Web Agent security evaluation.
Authors:Naen Xu, Hengyu An, Shuo Shi, Jinghuai Zhang, Chunyi Zhou, Changjiang Li, Tianyu Du, Zhihui Fu, Jun Wang, Shouling Ji
Abstract:
Recent advancements in large language models (LLMs) have significantly enhanced the capabilities of collaborative multi-agent systems, enabling them to address complex challenges. However, within these multi-agent systems, the susceptibility of agents to collective cognitive biases remains an underexplored issue. A compelling example is the Mandela effect, a phenomenon where groups collectively misremember past events as a result of false details reinforced through social influence and internalized misinformation. This vulnerability limits our understanding of memory bias in multi-agent systems and raises ethical concerns about the potential spread of misinformation. In this paper, we conduct a comprehensive study on the Mandela effect in LLM-based multi-agent systems, focusing on its existence, causing factors, and mitigation strategies. We propose MANBENCH, a novel benchmark designed to evaluate agent behaviors across four common task types that are susceptible to the Mandela effect, using five interaction protocols that vary in agent roles and memory timescales. We evaluate agents powered by several LLMs on MANBENCH to quantify the Mandela effect and analyze how different factors affect it. Moreover, we propose strategies to mitigate this effect, including prompt-level defenses (e.g., cognitive anchoring and source scrutiny) and model-level alignment-based defense, achieving an average 74.40% reduction in the Mandela effect compared to the baseline. Our findings provide valuable insights for developing more resilient and ethically aligned collaborative multi-agent systems.
Authors:Saeid Jamshidi, Kawser Wazed Nafi, Arghavan Moradi Dakhel, Foutse Khomh, Amin Nikanjam, Mohammad Adnan Hamdaqa
Abstract:
Large Language Models (LLMs) are increasingly adopted in sensitive domains such as healthcare and financial institutions' data analytics; however, their execution pipelines remain vulnerable to manipulation and unverifiable behavior. Existing control mechanisms, such as the Model Context Protocol (MCP), define compliance policies for tool invocation but lack verifiable enforcement and transparent validation of model actions. To address this gap, we propose a novel Secure Tool Manifest and Digital Signing Framework, a structured and security-aware extension of Model Context Protocols. The framework enforces cryptographically signed manifests, integrates transparent verification logs, and isolates model-internal execution metadata from user-visible components to ensure verifiable execution integrity. Furthermore, the evaluation demonstrates that the framework scales nearly linearly (R-squared = 0.998), achieves near-perfect acceptance of valid executions while consistently rejecting invalid ones, and maintains balanced model utilization across execution pipelines.
Authors:Naen Xu, Jinghuai Zhang, Ping He, Chunyi Zhou, Jun Wang, Zhihui Fu, Tianyu Du, Zhaoxiang Wang, Shouling Ji
Abstract:
Large language models (LLMs) have been widely integrated into critical automated workflows, including contract review and job application processes. However, LLMs are susceptible to manipulation by fraudulent information, which can lead to harmful outcomes. Although advanced defense methods have been developed to address this issue, they often exhibit limitations in effectiveness, interpretability, and generalizability, particularly when applied to LLM-based applications. To address these challenges, we introduce FraudShield, a novel framework designed to protect LLMs from fraudulent content by leveraging a comprehensive analysis of fraud tactics. Specifically, FraudShield constructs and refines a fraud tactic-keyword knowledge graph to capture high-confidence associations between suspicious text and fraud techniques. The structured knowledge graph augments the original input by highlighting keywords and providing supporting evidence, guiding the LLM toward more secure responses. Extensive experiments show that FraudShield consistently outperforms state-of-the-art defenses across four mainstream LLMs and five representative fraud types, while also offering interpretable clues for the model's generations.
Authors:Mengyu Yao, Ziqi Zhang, Ning Luo, Shaofei Li, Yifeng Cai, Xiangqun Chen, Yao Guo, Ding Li
Abstract:
Retrieval-augmented generation (RAG) systems integrate document retrieval with large language models and have been widely adopted. However, in privacy-related scenarios, RAG introduces a new privacy risk: adversaries can issue carefully crafted queries to exfiltrate sensitive content from the underlying corpus gradually. Although recent studies have demonstrated multi-turn extraction attacks, they rely on heuristics and fail to perform long-term extraction planning. To address these limitations, we formulate the RAG extraction attack as an adaptive stochastic coverage problem (ASCP). In ASCP, each query is treated as a probabilistic action that aims to maximize conditional marginal gain (CMG), enabling principled long-term planning under uncertainty. However, integrating ASCP with practical RAG attack faces three key challenges: unobservable CMG, intractability in the action space, and feasibility constraints. To overcome these challenges, we maintain a global attacker-side state to guide the attack. Building on this idea, we introduce RAGCRAWLER, which builds a knowledge graph to represent revealed information, uses this global state to estimate CMG, and plans queries in semantic space that target unretrieved regions. In comprehensive experiments across diverse RAG architectures and datasets, our proposed method, RAGCRAWLER, consistently outperforms all baselines. It achieves up to 84.4% corpus coverage within a fixed query budget and deliver an average improvement of 20.7% over the top-performing baseline. It also maintains high semantic fidelity and strong content reconstruction accuracy with low attack cost. Crucially, RAGCRAWLER proves its robustness by maintaining effectiveness against advanced RAG systems employing query rewriting and multi-query retrieval strategies. Our work reveals significant security gaps and highlights the pressing need for stronger safeguards for RAG.
Authors:Zhenhua Xu, Yiran Zhao, Mengting Zhong, Dezhang Kong, Changting Lin, Tong Qiao, Meng Han
Abstract:
The rapid growth of large language models raises pressing concerns about intellectual property protection under black-box deployment. Existing backdoor-based fingerprints either rely on rare tokens -- leading to high-perplexity inputs susceptible to filtering -- or use fixed trigger-response mappings that are brittle to leakage and post-hoc adaptation. We propose \textsc{Dual-Layer Nested Fingerprinting} (DNF), a black-box method that embeds a hierarchical backdoor by coupling domain-specific stylistic cues with implicit semantic triggers. Across Mistral-7B, LLaMA-3-8B-Instruct, and Falcon3-7B-Instruct, DNF achieves perfect fingerprint activation while preserving downstream utility. Compared with existing methods, it uses lower-perplexity triggers, remains undetectable under fingerprint detection attacks, and is relatively robust to incremental fine-tuning and model merging. These results position DNF as a practical, stealthy, and resilient solution for LLM ownership verification and intellectual property protection.
Authors:Nandish Chattopadhyay, Abdul Basit, Amira Guesmi, Muhammad Abdullah Hanif, Bassem Ouni, Muhammad Shafique
Abstract:
Adversarial attacks pose a significant challenge to the reliable deployment of machine learning models in EdgeAI applications, such as autonomous driving and surveillance, which rely on resource-constrained devices for real-time inference. Among these, patch-based adversarial attacks, where small malicious patches (e.g., stickers) are applied to objects, can deceive neural networks into making incorrect predictions with potentially severe consequences. In this paper, we present PatchBlock, a lightweight framework designed to detect and neutralize adversarial patches in images. Leveraging outlier detection and dimensionality reduction, PatchBlock identifies regions affected by adversarial noise and suppresses their impact. It operates as a pre-processing module at the sensor level, efficiently running on CPUs in parallel with GPU inference, thus preserving system throughput while avoiding additional GPU overhead. The framework follows a three-stage pipeline: splitting the input into chunks (Chunking), detecting anomalous regions via a redesigned isolation forest with targeted cuts for faster convergence (Separating), and applying dimensionality reduction on the identified outliers (Mitigating). PatchBlock is both model- and patch-agnostic, can be retrofitted to existing pipelines, and integrates seamlessly between sensor inputs and downstream models. Evaluations across multiple neural architectures, benchmark datasets, attack types, and diverse edge devices demonstrate that PatchBlock consistently improves robustness, recovering up to 77% of model accuracy under strong patch attacks such as the Google Adversarial Patch, while maintaining high portability and minimal clean accuracy loss. Additionally, PatchBlock outperforms the state-of-the-art defenses in efficiency, in terms of computation time and energy consumption per sample, making it suitable for EdgeAI applications.
Authors:Yuntao Du, Minh Dinh, Kaiyuan Zhang, Ninghui Li
Abstract:
Scientific and Technical Intelligence (S&TI) analysis requires verifying complex technical claims across rapidly growing literature, where existing approaches fail to bridge the verification gap between surface-level accuracy and deeper methodological validity. We present AutoVerifier, an LLM-based agentic framework that automates end-to-end verification of technical claims without requiring domain expertise. AutoVerifier decomposes every technical assertion into structured claim triples of the form (Subject, Predicate, Object), constructing knowledge graphs that enable structured reasoning across six progressively enriching layers: corpus construction and ingestion, entity and claim extraction, intra-document verification, cross-source verification, external signal corroboration, and final hypothesis matrix generation. We demonstrate AutoVerifier on a contested quantum computing claim, where the framework, operated by analysts with no quantum expertise, automatically identified overclaims and metric inconsistencies within the target paper, traced cross-source contradictions, uncovered undisclosed commercial conflicts of interest, and produced a final assessment. These results show that structured LLM verification can reliably evaluate the validity and maturity of emerging technologies, turning raw technical documents into traceable, evidence-backed intelligence assessments.
Authors:Ya-Ting Yang, Quanyan Zhu
Abstract:
Large language models (LLMs) and AI agents are increasingly integrated into enterprise systems to access internal databases and generate context-aware responses. While such integration improves productivity and decision support, the model outputs may inadvertently reveal sensitive information. Although many prior efforts focus on protecting the privacy of user prompts, relatively few studies consider privacy risks from the enterprise data perspective. Hence, this paper develops a probabilistic framework for analyzing privacy leakage in AI agents based on differential privacy. We model response generation as a stochastic mechanism that maps prompts and datasets to distributions over token sequences. Within this framework, we introduce token-level and message-level differential privacy and derive privacy bounds that relate privacy leakage to generation parameters such as temperature and message length. We further formulate a privacy-utility design problem that characterizes optimal temperature selection.
Authors:Zhisheng Qi, Utkarsh Sahu, Li Ma, Haoyu Han, Ryan Rossi, Franck Dernoncourt, Mahantesh Halappanavar, Nesreen Ahmed, Yushun Dong, Yue Zhao, Yu Zhang, Yu Wang
Abstract:
Retrieval-Augmented Generation (RAG) has become a cornerstone of knowledge-intensive applications, including enterprise chatbots, healthcare assistants, and agentic memory management. However, recent studies show that knowledge-extraction attacks can recover sensitive knowledge-base content through maliciously crafted queries, raising serious concerns about intellectual property theft and privacy leakage. While prior work has explored individual attack and defense techniques, the research landscape remains fragmented, spanning heterogeneous retrieval embeddings, diverse generation models, and evaluations based on non-standardized metrics and inconsistent datasets. To address this gap, we introduce the first systematic benchmark for knowledge-extraction attacks on RAG systems. Our benchmark covers a broad spectrum of attack and defense strategies, representative retrieval embedding models, and both open- and closed-source generators, all evaluated under a unified experimental framework with standardized protocols across multiple datasets. By consolidating the experimental landscape and enabling reproducible, comparable evaluation, this benchmark provides actionable insights and a practical foundation for developing privacy-preserving RAG systems in the face of emerging knowledge extraction threats. Our code is available here.
Authors:Matteo Esposito, Lodovica Marchesi, Roberto Tonelli, Valentina Lenarduzzi
Abstract:
Digital sovereignty has emerged as a central concern for modern software-intensive systems, driven by the dominance of non-sovereign cloud infrastructures, the rapid adoption of Generative AI, and increasingly stringent regulatory requirements. While existing initiatives address governance, compliance, and security in isolation, they provide limited guidance on how sovereignty can be operationalized at the architectural level. In this paper, we argue that sovereignty must be treated as a first-class architectural property rather than a purely regulatory objective. We introduce a Sovereign Reference Architecture that integrates self-sovereign identity, blockchain-based trust and auditability, sovereign data governance, and Generative AI deployed under explicit architectural control. The architecture explicitly captures the dual role of Generative AI as both a source of governance risk and an enabler of compliance, accountability, and continuous assurance when properly constrained. By framing sovereignty as an architectural quality attribute, our work bridges regulatory intent and concrete system design, offering a coherent foundation for building auditable, evolvable, and jurisdiction-aware AI-enabled systems. The proposed reference architecture provides a principled starting point for future research and practice at the intersection of software architecture, Generative AI, and digital sovereignty.
Authors:Fengheng Chu, Jiahao Chen, Yuhong Wang, Jun Wang, Zhihui Fu, Shouling Ji, Songze Li
Abstract:
While Large Language Models (LLMs) are aligned to mitigate risks, their safety guardrails remain fragile against jailbreak attacks. This reveals limited understanding of components governing safety. Existing methods rely on local, greedy attribution that assumes independent component contributions. However, they overlook the cooperative interactions between different components in LLMs, such as attention heads, which jointly contribute to safety mechanisms. We propose \textbf{G}lobal \textbf{O}ptimization for \textbf{S}afety \textbf{V}ector Extraction (GOSV), a framework that identifies safety-critical attention heads through global optimization over all heads simultaneously. We employ two complementary activation repatching strategies: Harmful Patching and Zero Ablation. These strategies identify two spatially distinct sets of safety vectors with consistently low overlap, termed Malicious Injection Vectors and Safety Suppression Vectors, demonstrating that aligned LLMs maintain separate functional pathways for safety purposes. Through systematic analyses, we find that complete safety breakdown occurs when approximately 30\% of total heads are repatched across all models. Building on these insights, we develop a novel inference-time white-box jailbreak method that exploits the identified safety vectors through activation repatching. Our attack substantially outperforms existing white-box attacks across all test models, providing strong evidence for the effectiveness of the proposed GOSV framework on LLM safety interpretability.
Authors:Haoran Gu, Handing Wang, Yi Mei, Mengjie Zhang, Yaochu Jin
Abstract:
The widespread deployment of large language models (LLMs) has raised growing concerns about their misuse risks and associated safety issues. While prior studies have examined the safety of LLMs in general usage, code generation, and agent-based applications, their vulnerabilities in automated algorithm design remain underexplored. To fill this gap, this study investigates this overlooked safety vulnerability, with a particular focus on intelligent optimization algorithm design, given its prevalent use in complex decision-making scenarios. We introduce MalOptBench, a benchmark consisting of 60 malicious optimization algorithm requests, and propose MOBjailbreak, a jailbreak method tailored for this scenario. Through extensive evaluation of 13 mainstream LLMs including the latest GPT-5 and DeepSeek-V3.1, we reveal that most models remain highly susceptible to such attacks, with an average attack success rate of 83.59% and an average harmfulness score of 4.28 out of 5 on original harmful prompts, and near-complete failure under MOBjailbreak. Furthermore, we assess state-of-the-art plug-and-play defenses that can be applied to closed-source models, and find that they are only marginally effective against MOBjailbreak and prone to exaggerated safety behaviors. These findings highlight the urgent need for stronger alignment techniques to safeguard LLMs against misuse in algorithm design.
Authors:Xiaolei Zhang, Lu Zhou, Xiaogang Xu, Jiafei Wu, Tianyu Du, Heqing Huang, Hao Peng, Zhe Liu
Abstract:
Artificial Intelligence (AI) agents have evolved from passive predictive tools into active entities capable of autonomous decision-making and environmental interaction, driven by the reasoning capabilities of Large Language Models (LLMs). However, this evolution has introduced critical security vulnerabilities that existing frameworks fail to address. The Hierarchical Autonomy Evolution (HAE) framework organizes agent security into three tiers: Cognitive Autonomy (L1) targets internal reasoning integrity; Execution Autonomy (L2) covers tool-mediated environmental interaction; Collective Autonomy (L3) addresses systemic risks in multi-agent ecosystems. We present a taxonomy of threats spanning cognitive manipulation, physical environment disruption, and multi-agent systemic failures, and evaluate existing defenses while identifying key research gaps. The findings aim to guide the development of multilayered, autonomy-aware defense architectures for trustworthy AI agent systems.
Authors:Haodong Zhao, Jinming Hu, Gongshen Liu
Abstract:
Federated learning security research has predominantly focused on backdoor threats from a minority of malicious clients that intentionally corrupt model updates. This paper challenges this paradigm by investigating a more pervasive and insidious threat: \textit{backdoor vulnerabilities from low-concentration poisoned data distributed across the datasets of benign clients.} This scenario is increasingly common in federated instruction tuning for language models, which often rely on unverified third-party and crowd-sourced data. We analyze two forms of backdoor data through real cases: 1) \textit{natural trigger (inherent features as implicit triggers)}; 2) \textit{adversary-injected trigger}. To analyze this threat, we model the backdoor implantation process from signal aggregation, proposing the Backdoor Signal-to-Noise Ratio to quantify the dynamics of the distributed backdoor signal. Extensive experiments reveal the severity of this threat: With just less than 10\% of training data poisoned and distributed across clients, the attack success rate exceeds 85\%, while the primary task performance remains largely intact. Critically, we demonstrate that state-of-the-art backdoor defenses, designed for attacks from malicious clients, are fundamentally ineffective against this threat. Our findings highlight an urgent need for new defense mechanisms tailored to the realities of modern, decentralized data ecosystems.
Authors:Yifan Yang, Jinjia Li, Kunxi Li, Puhao Zheng, Yuanyi Wang, Zheyan Qu, Yang Yu, Jianmin Wu, Ming Li, Hongxia Yang
Abstract:
The rapid advancement of large language models (LLMs) demands increasingly reliable evaluation, yet current centralized evaluation suffers from opacity, overfitting, and hardware-induced variance. Our empirical analysis reveals an alarming inconsistency in existing evaluations: the standard deviation across ten repeated runs of a single model on HumanEval (1.67) actually exceeds the performance gap among the top-10 models on the official leaderboard (0.91), rendering current rankings statistically precarious. To mitigate these instabilities, we propose a decentralized evaluation framework that enables hardware and parameter diversity through large-scale benchmarking across heterogeneous compute nodes. By leveraging the blockchain-based protocol, the framework incentivizes global contributors to act as independent validators, using a robust reward system to ensure evaluation integrity and discourage dishonest participation. This collective verification transforms evaluation from a "centralized black box" into a "decentralized endorsement" where multi-party consensus and diverse inference environments yield a more stable, representative metric. Experimental results demonstrate that the decentralized evaluation framework reduces the standard deviation across ten runs on the same model to 0.28. This significant improvement over conventional frameworks ensures higher statistical confidence in model rankings. We have completely implemented this platform and will soon release it to the community.
Authors:Nitin Choudhury, Bikrant Bikram Pratap Maurya, Orchid Chetia Phukan, Arun Balaji Buduru
Abstract:
In this work, we introduce FOCA, a novel multimodal framework for malware classification that jointly leverages audio and visual modalities. Unlike conventional Euclidean-based fusion methods, FOCA is the first to exploit the intrinsic hierarchical relationships between audio and visual representations within hyperbolic space. To achieve this, raw binaries are transformed into both audio and visual representations, which are then processed through three key components: (i) a hyperbolic projection module that maps Euclidean embeddings into the Poincare ball, (ii) a hyperbolic cross-attention mechanism that aligns multimodal dependencies under curvature-aware constraints, and (iii) a Mobius addition-based fusion layer. Comprehensive experiments on two benchmark datasets-Mal-Net and CICMalDroid2020- show that FOCA consistently outperforms unimodal models, surpasses most Euclidean multimodal baselines, and achieves state-of-the-art performance over existing works.
Authors:Wei Song, Zhenchang Xing, Liming Zhu, Yulei Sui, Jingling Xue
Abstract:
The rapid proliferation of realistic deepfakes has raised urgent concerns over their misuse, motivating the use of defensive watermarks in synthetic images for reliable detection and provenance tracking. However, this defense paradigm assumes such watermarks are inherently resistant to removal. We challenge this assumption with DeMark, a query-free black-box attack framework that targets defensive image watermarking schemes for deepfakes. DeMark exploits latent-space vulnerabilities in encoder-decoder watermarking models through a compressive sensing based sparsification process, suppressing watermark signals while preserving perceptual and structural realism appropriate for deepfakes. Across eight state-of-the-art watermarking schemes, DeMark reduces watermark detection accuracy from 100% to 32.9% on average while maintaining natural visual quality, outperforming existing attacks. We further evaluate three defense strategies, including image super resolution, sparse watermarking, and adversarial training, and find them largely ineffective. These results demonstrate that current encoder decoder watermarking schemes remain vulnerable to latent-space manipulations, underscoring the need for more robust watermarking methods to safeguard against deepfakes.
Authors:Weipeng Jiang, Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Chao Shen, Yang Liu
Abstract:
Emoticons are widely used in digital communication to convey affective intent, yet their safety implications for Large Language Models (LLMs) remain largely unexplored. In this paper, we identify emoticon semantic confusion, a vulnerability where LLMs misinterpret ASCII-based emoticons to perform unintended and even destructive actions. To systematically study this phenomenon, we develop an automated data generation pipeline and construct a dataset containing 3,757 code-oriented test cases spanning 21 meta-scenarios, four programming languages, and varying contextual complexities. Our study on six LLMs reveals that emoticon semantic confusion is pervasive, with an average confusion ratio exceeding 38%. More critically, over 90% of confused responses yield 'silent failures', which are syntactically valid outputs but deviate from user intent, potentially leading to destructive security consequences. Furthermore, we observe that this vulnerability readily transfers to popular agent frameworks, while existing prompt-based mitigations remain largely ineffective. We call on the community to recognize this emerging vulnerability and develop effective mitigation methods to uphold the safety and reliability of the LLM system.
Authors:Xi Wang, Songlei Jian, Shasha Li, Xiaopeng Li, Zhaoye Li, Bin Ji, Baosheng Wang, Jie Yu
Abstract:
Despite extensive safety alignment, Large Language Models (LLMs) often fail against jailbreak attacks. While machine unlearning has emerged as a promising defense by erasing specific harmful parameters, current methods remain vulnerable to diverse jailbreaks. We first conduct an empirical study and discover that this failure mechanism is caused by jailbreaks primarily activating non-erased parameters in the intermediate layers. Further, by probing the underlying mechanism through which these circumvented parameters reassemble into the prohibited output, we verify the persistent existence of dynamic $\textbf{jailbreak paths}$ and show that the inability to rectify them constitutes the fundamental gap in existing unlearning defenses. To bridge this gap, we propose $\textbf{J}$ailbreak $\textbf{P}$ath $\textbf{U}$nlearning (JPU), which is the first to rectify dynamic jailbreak paths towards safety anchors by dynamically mining on-policy adversarial samples to expose vulnerabilities and identify jailbreak paths. Extensive experiments demonstrate that JPU significantly enhances jailbreak resistance against dynamic attacks while preserving the model's utility.
Authors:Mengze Hong, Di Jiang, Zeying Xie, Weiwei Zhao, Guan Wang, Chen Jason Zhang
Abstract:
As audio deepfakes transition from research artifacts to widely available commercial tools, robust biometric authentication faces pressing security threats in high-stakes industries. This paper presents a systematic empirical evaluation of state-of-the-art speaker authentication systems based on a large-scale speech synthesis dataset, revealing two major security vulnerabilities: 1) modern voice cloning models trained on very small samples can easily bypass commercial speaker verification systems; and 2) anti-spoofing detectors struggle to generalize across different methods of audio synthesis, leading to a significant gap between in-domain performance and real-world robustness. These findings call for a reconsideration of security measures and stress the need for architectural innovations, adaptive defenses, and the transition towards multi-factor authentication.
Authors:Anjun Gao, Feng Wang, Zhenglin Wan, Yueyang Quan, Zhuqing Liu, Minghong Fang
Abstract:
Federated learning (FL) enables multiple clients to collaboratively train a global machine learning model via a server without sharing their private training data. In traditional FL, the system follows a synchronous approach, where the server waits for model updates from numerous clients before aggregating them to update the global model. However, synchronous FL is hindered by the straggler problem. To address this, the asynchronous FL architecture allows the server to update the global model immediately upon receiving any client's local model update. Despite its advantages, the decentralized nature of asynchronous FL makes it vulnerable to poisoning attacks. Several defenses tailored for asynchronous FL have been proposed, but these mechanisms remain susceptible to advanced attacks or rely on unrealistic server assumptions. In this paper, we introduce SecureAFL, an innovative framework designed to secure asynchronous FL against poisoning attacks. SecureAFL improves the robustness of asynchronous FL by detecting and discarding anomalous updates while estimating the contributions of missing clients. Additionally, it utilizes Byzantine-robust aggregation techniques, such as coordinate-wise median, to integrate the received and estimated updates. Extensive experiments on various real-world datasets demonstrate the effectiveness of SecureAFL.
Authors:Hao Wang, Niels Mündler, Mark Vero, Jingxuan He, Dawn Song, Martin Vechev
Abstract:
Reasoning language models (RLMs) are increasingly used in programming. Yet, even state-of-the-art RLMs frequently introduce critical security vulnerabilities in generated code. Prior training-based approaches for secure code generation face a critical limitation that prevents their direct application to RLMs: they rely on costly, manually curated security datasets covering only a limited set of vulnerabilities. At the inference level, generic security reminders consistently degrade functional correctness while triggering only shallow ad-hoc vulnerability analysis. To address these problems, we present SecPI, a fine-tuning pipeline that teaches RLMs to internalize structured security reasoning, producing secure code by default without any security instructions at inference time. SecPI filters existing general-purpose coding datasets for security-relevant tasks using an LLM-based classifier, generates high-quality security reasoning traces with a teacher model guided by a structured prompt that systematically enumerates relevant CWEs and mitigations, and fine-tunes the target model on pairs of inputs with no security prompt and teacher reasoning traces -- as a result, the model learns to reason about security autonomously rather than in response to explicit instructions. An extensive evaluation on security benchmarks with state-of-the-art open-weight reasoning models validates the effectiveness of our approach. For instance, SecPI improves the percentage of functionally correct and secure generations for QwQ 32B from 48.2% to 62.2% (+14.0 points) on CWEval and from 18.2% to 22.0% on BaxBench. Further investigation also reveals strong cross-CWE and cross-language generalization beyond training vulnerabilities. Even when trained only on injection-related CWEs, QwQ 32B generates correct and secure code 9.9% more frequently on held-out memory-safety CWEs.
Authors:Ziqiao Kong, Wanxu Xia, Chong Wang, Yi Lu, Pan Li, Shaohua Li, Zong Cao, Yang Liu
Abstract:
Smart contracts govern billions of dollars in decentralized finance (DeFi), yet automated vulnerability detection remains challenging because many vulnerabilities are tightly coupled with project-specific business logic. We observe that recurring vulnerabilities across diverse DeFi business models often share the same underlying economic mechanisms, which we term DeFi semantics, and that capturing these shared abstractions can enable more systematic auditing. Building on this insight, we propose Knowdit, a knowledge-driven, agentic framework for smart contract vulnerability detection. Knowdit first constructs an auditing knowledge graph from historical human audit reports, linking fine-grained DeFi semantics with recurring vulnerability patterns. Given a new project, a multi-agent framework leverages this knowledge through an iterative loop of specification generation, harness synthesis, fuzz execution, and finding reflection, driven by a shared working memory for continuous refinement. We evaluate Knowdit on 12 recent Code4rena projects with 75 ground-truth vulnerabilities. Knowdit detects all 14 high-severity and 77\% of medium-severity vulnerabilities with only 2 false positives, significantly outperforming all baselines. Applied to six real-world projects, Knowdit further discovers 12 high- and 10 medium-severity previously unknown vulnerabilities, proving its outstanding performance.
Authors:Kristopher W. Reese, Taylor Kulp-McDowall, Michael Majurski, Tim Blattner, Derek Juba, Peter Bajcsy, Antonio Cardone, Philippe Dessauw, Alden Dima, Anthony J. Kearsley, Melinda Kleczynski, Joel Vasanth, Walid Keyrouz, Chace Ashcraft, Neil Fendley, Ted Staley, Trevor Stout, Josh Carney, Greg Canal, Will Redman, Aurora Schmidt, Cameron Hickert, William Paul, Jared Markowitz, Nathan Drenkow, David Shriver, Marissa Connor, Keltin Grimes, Marco Christiani, Hayden Moore, Jordan Widjaja, Kasimir Gabert, Uma Balakrishnan, Satyanadh Gundimada, John Jacobellis, Sandya Lakkur, Vitus Leung, Jon Roose, Casey Battaglino, Farinaz Koushanfar, Greg Fields, Xihe Gu, Yaman Jandali, Xinqiao Zhang, Akash Vartak, Tim Oates, Ben Erichson, Michael Mahoney, Rauf Izmailov, Xiangyu Zhang, Guangyu Shen, Siyuan Cheng, Shiqing Ma, XiaoFeng Wang, Haixu Tang, Di Tang, Xiaoyi Chen, Zihao Wang, Rui Zhu, Susmit Jha, Xiao Lin, Manoj Acharya, Wenchao Li, Chao Chen
Abstract:
The Intelligence Advanced Research Projects Activity (IARPA) launched the TrojAI program to confront an emerging vulnerability in modern artificial intelligence: the threat of AI Trojans. These AI trojans are malicious, hidden backdoors intentionally embedded within an AI model that can cause a system to fail in unexpected ways, or allow a malicious actor to hijack the AI model at will. This multi-year initiative helped to map out the complex nature of the threat, pioneered foundational detection methods, and identified unsolved challenges that require ongoing attention by the burgeoning AI security field. This report synthesizes the program's key findings, including methodologies for detection through weight analysis and trigger inversion, as well as approaches for mitigating Trojan risks in deployed models. Comprehensive test and evaluation results highlight detector performance, sensitivity, and the prevalence of "natural" Trojans. The report concludes with lessons learned and recommendations for advancing AI security research.
Authors:Zihan Wang, Hongwei Li, Rui Zhang, Wenbo Jiang, Guowen Xu
Abstract:
Chat template is a common technique used in the training and inference stages of Large Language Models (LLMs). It can transform input and output data into role-based and templated expressions to enhance the performance of LLMs. However, this also creates a breeding ground for novel attack surfaces. In this paper, we first reveal that the customizability of chat templates allows an attacker who controls the template to inject arbitrary strings into the system prompt without the user's notice. Building on this, we propose a training-free backdoor attack, termed BadTemplate. Specifically, BadTemplate inserts carefully crafted malicious instructions into the high-priority system prompt, thereby causing the target LLM to exhibit persistent backdoor behaviors. BadTemplate outperforms traditional backdoor attacks by embedding malicious instructions directly into the system prompt, eliminating the need for model retraining while achieving high attack effectiveness with minimal cost. Furthermore, its simplicity and scalability make it easily and widely deployed in real-world systems, raising serious risks of rapid propagation, economic damage, and large-scale misinformation. Furthermore, detection by major third-party platforms HuggingFace and LLM-as-a-judge proves largely ineffective against BadTemplate. Extensive experiments conducted on 5 benchmark datasets across 6 open-source and 3 closed-source LLMs, compared with 3 baselines, demonstrate that BadTemplate achieves up to a 100% attack success rate and significantly outperforms traditional prompt-based backdoors in both word-level and sentence-level attacks. Our work highlights the potential security risks raised by chat templates in the LLM supply chain, thereby supporting the development of effective defense mechanisms.
Authors:Chen Xiong, Zhiyuan He, Pin-Yu Chen, Ching-Yun Ko, Tsung-Yi Ho
Abstract:
Activation steering is a practical post-training model alignment technique to enhance the utility of Large Language Models (LLMs). Prior to deploying a model as a service, developers can steer a pre-trained model toward specific behavioral objectives, such as compliance or instruction adherence, without the need for retraining. This process is as simple as adding a steering vector to the model's internal representations. However, this capability unintentionally introduces critical and under-explored safety risks. We identify a phenomenon termed Steering Externalities, where steering vectors derived from entirely benign datasets-such as those enforcing strict compliance or specific output formats like JSON-inadvertently erode safety guardrails. Experiments reveal that these interventions act as a force multiplier, creating new vulnerabilities to jailbreaks and increasing attack success rates to over 80% on standard benchmarks by bypassing the initial safety alignment. Ultimately, our results expose a critical blind spot in deployment: benign activation steering systematically erodes the "safety margin," rendering models more vulnerable to black-box attacks and proving that inference-time utility improvements must be rigorously audited for unintended safety externalities.
Authors:Zhihao Dou, Dongfei Cui, Weida Wang, Anjun Gao, Yueyang Quan, Mengyao Ma, Viet Vo, Guangdong Bai, Zhuqing Liu, Minghong Fang
Abstract:
Split Learning (SL) offers a framework for collaborative model training that respects data privacy by allowing participants to share the same dataset while maintaining distinct feature sets. However, SL is susceptible to backdoor attacks, in which malicious clients subtly alter their embeddings to insert hidden triggers that compromise the final trained model. To address this vulnerability, we introduce SecureSplit, a defense mechanism tailored to SL. SecureSplit applies a dimensionality transformation strategy to accentuate subtle differences between benign and poisoned embeddings, facilitating their separation. With this enhanced distinction, we develop an adaptive filtering approach that uses a majority-based voting scheme to remove contaminated embeddings while preserving clean ones. Rigorous experiments across four datasets (CIFAR-10, MNIST, CINIC-10, and ImageNette), five backdoor attack scenarios, and seven alternative defenses confirm the effectiveness of SecureSplit under various challenging conditions.
Authors:Faisal Haque Bappy, Tahrim Hossain, Raiful Hasan, Kamrul Hasan, Mohamed Younis, Tariqul Islam
Abstract:
While many resource-constrained networks, such as Internet of Things (IoT) and Internet of Vehicles (IoV), are inherently distributed, the majority still rely on central servers for fast authentication and data sharing. Blockchain-based solutions offer decentralized alternatives but often struggle to meet the stringent latency requirements of real-time applications. Even with the rollout of 5G, network latency between servers and peers remains a significant challenge. To address this, we introduce SWORD, a novel offline-first authentication and data-sharing scheme designed specifically for resource-constrained networks. SWORD utilizes a proximity-based clustering approach to enable offline authentication and data sharing, ensuring low-latency, secure operations even in intermittently connected scenarios. Our experimental results show that SWORD outperforms traditional blockchain-based solutions while offering similar resource efficiency and authentication latency to central-server-based solutions. Additionally, we provide a comprehensive security analysis, demonstrating that SWORD is resilient against spoofing, impersonation, replay, and man-in-the-middle attacks.
Authors:Zhixin Xie, Xurui Song, Jun Luo
Abstract:
The demand of customized large language models (LLMs) has led to commercial LLMs offering black-box fine-tuning APIs, yet this convenience introduces a critical security loophole: attackers could jailbreak the LLMs by fine-tuning them with malicious data. Though this security issue has recently been exposed, the feasibility of such attacks is questionable as malicious training dataset is believed to be detectable by moderation models such as Llama-Guard-3. In this paper, we propose TrojanPraise, a novel finetuning-based attack exploiting benign and thus filter-approved data. Basically, TrojanPraise fine-tunes the model to associate a crafted word (e.g., "bruaf") with harmless connotations, then uses this word to praise harmful concepts, subtly shifting the LLM from refusal to compliance. To explain the attack, we decouple the LLM's internal representation of a query into two dimensions of knowledge and attitude. We demonstrate that successful jailbreak requires shifting the attitude while avoiding knowledge shift, a distortion in the model's understanding of the concept. To validate this attack, we conduct experiments on five opensource LLMs and two commercial LLMs under strict black-box settings. Results show that TrojanPraise achieves a maximum attack success rate of 95.88% while evading moderation.
Authors:Tianrun Yu, Kaixiang Zhao, Cheng Zhang, Anjun Gao, Yueyang Quan, Zhuqing Liu, Minghong Fang
Abstract:
Federated learning (FL) has emerged as a transformative distributed learning paradigm, enabling multiple clients to collaboratively train a global model under the coordination of a central server without sharing their raw training data. While FL offers notable advantages, it faces critical challenges in ensuring fairness across diverse demographic groups. To address these fairness concerns, various fairness-aware debiasing methods have been proposed. However, many of these approaches either require modifications to clients' training protocols or lack flexibility in their aggregation strategies. In this work, we address these limitations by introducing EquFL, a novel server-side debiasing method designed to mitigate bias in FL systems. EquFL operates by allowing the server to generate a single calibrated update after receiving model updates from the clients. This calibrated update is then integrated with the aggregated client updates to produce an adjusted global model that reduces bias. Theoretically, we establish that EquFL converges to the optimal global model achieved by FedAvg and effectively reduces fairness loss over training rounds. Empirically, we demonstrate that EquFL significantly mitigates bias within the system, showcasing its practical effectiveness.
Authors:Qiuchi Xiang, Haoxuan Qu, Hossein Rahmani, Jun Liu
Abstract:
Multi-agent discussions have been widely adopted, motivating growing efforts to develop attacks that expose their vulnerabilities. In this work, we study a practical yet largely unexplored attack scenario, the discussion-monitored scenario, where anomaly detectors continuously monitor inter-agent communications and block detected adversarial messages. Although existing attacks are effective without discussion monitoring, we show that they exhibit detectable patterns and largely fail under such monitoring constraints. But does this imply that monitoring alone is sufficient to secure multi-agent discussions? To answer this question, we develop a novel attack method explicitly tailored to the discussion-monitored scenario. Extensive experiments demonstrate that effective attacks remain possible even under continuous monitoring, indicating that monitoring alone does not eliminate adversarial risks.
Authors:Heng Jin, Chaoyu Zhang, Hexuan Yu, Shanghao Shi, Ning Zhang, Y. Thomas Hou, Wenjing Lou
Abstract:
Cloud-based infrastructures have become the dominant platform for deploying large models, particularly large language models (LLMs). Fine-tuning and inference are increasingly delegated to cloud providers for simplified deployment and access to proprietary models, yet this creates a fundamental trust gap: although cryptographic and TEE-based verification exist, the scale of modern LLMs renders them prohibitive, leaving clients unable to practically audit these processes. This lack of transparency creates concrete security risks that can silently compromise service integrity. We present AFTUNE, an auditable and verifiable framework that ensures the computation integrity of cloud-based fine-tuning and inference. AFTUNE incorporates a lightweight recording and spot-check mechanism that produces verifiable traces of execution. These traces enable clients to later audit whether the training and inference processes followed the agreed configurations. Our evaluation shows that AFTUNE imposes practical computation overhead while enabling selective and efficient verification, demonstrating that trustworthy model services are achievable in today's cloud environments.
Authors:Xiting Liu, Yuetong Liu, Yitong Zhang, Jia Li, Shi-Min Hu
Abstract:
As Large Language Models (LLMs) are increasingly integrated into software development workflows, their trustworthiness has become a critical concern. However, in dependency recommendation scenarios, the reliability of LLMs is undermined by widespread package hallucinations, where models often recommend hallucinated packages. Recent studies have proposed a range of approaches to mitigate this issue. Nevertheless, existing approaches typically merely reduce hallucination rates rather than eliminate them, leaving persistent software security risks. In this work, we argue that package hallucinations are theoretically preventable based on the key insight that package validity is decidable through finite and enumerable authoritative package lists. Building on this, we propose PackMonitor, the first approach capable of fundamentally eliminating package hallucinations by continuously monitoring the model's decoding process and intervening when necessary. To implement this in practice, PackMonitor addresses three key challenges: (1) determining when to trigger intervention via a Context-Aware Parser that continuously monitors model outputs and selectively activates intervening only during installation command generation; (2) resolving how to intervene by employing a Package-Name Intervenor that strictly limits the decoding space to an authoritative package list; and (3) ensuring monitoring efficiency through a DFA-Caching Mechanism that enables scalability to millions of packages with negligible overhead. Extensive experiments on five widely used LLMs demonstrate that PackMonitor is a training-free, plug-and-play solution that consistently reduces package hallucination rates to zero while maintaining low-latency inference and preserving original model capabilities.
Authors:Zikai Xu, Bin Liu, Weihai Li, Lijunxian Zhang, Nenghai Yu
Abstract:
Robust reversible watermarking (RRW) enables copyright protection for images while overcoming the limitation of distortion introduced by watermark itself. Current RRW schemes typically employ a two-stage framework, which fails to achieve simultaneous robustness and reversibility within a single watermarking, and functional interference between the two watermarks results in performance degradation in multiple terms such as capacity and imperceptibility. We propose SiGRRW, a single-watermark RRW framework, which is applicable to both generative models and natural images. We introduce a novel guiding strategy to generate guiding images, serving as the guidance for embedding and recovery. The watermark is reversibly embedded with the guiding residual, which can be calculated from both cover images and watermark images. The proposed framework can be deployed either as a plug-and-play watermarking layer at the output stage of generative models, or directly applied to natural images. Extensive experiments demonstrate that SiGRRW effectively enhances imperceptibility and robustness compared to existing RRW schemes while maintaining lossless recovery of cover images, with significantly higher capacity than conventional schemes.
Authors:Shaoyu Li, Hexuan Yu, Shanghao Shi, Md Mohaimin Al Barat, Yang Xiao, Y. Thomas Hou, Wenjing Lou
Abstract:
With the growing demand for wireless spectrum, dynamic spectrum sharing (DSS) frameworks such as the Citizens Broadband Radio Service (CBRS) have emerged as practical solutions to improve utilization while protecting incumbent users (IUs) such as military radars. However, current incumbent protection mechanisms face critical limitations. The Environmental Sensing Capability (ESC) requires costly sensor deployments and remains vulnerable to interference and security risks. Alternatively, the Incumbent Informing Capability (IIC) requires IUs to disclose their identities and operational parameters to the Spectrum Coordination System (SCS), creating linkable records that compromise operational privacy and mission secrecy. We propose IU-GUARD, a privacy-preserving spectrum sharing framework that enables IUs to access spectrum without revealing their identities. Leveraging verifiable credentials (VCs) and zero-knowledge proofs (ZKPs), IU-GUARD allows IUs to prove their authorization to the SCS while disclosing only essential operational parameters. This decouples IU identity from spectrum access, prevents cross-request linkage, and mitigates the risk of centralized SCS data leakage. We implement a prototype, and our evaluation shows that IU-GUARD achieves strong privacy guarantees with practical computation and communication overhead, making it suitable for real-time DSS deployment.
Authors:Lingxiao Chen, Liqin Wang, Wei Lu, Xiangyang Luo
Abstract:
The exceptional performance of diffusion models establishes them as high-value intellectual property but exposes them to unauthorized replication. Existing protection methods either modify the model to embed watermarks, which impairs performance, or extract model fingerprints by manipulating the denoising process, rendering them incompatible with black-box APIs. In this paper, we propose TrajPrint, a completely lossless and training-free framework that verifies model copyright by extracting unique manifold fingerprints formed during deterministic generation. Specifically, we first utilize a watermarked image as an anchor and exactly trace the path back to its trajectory origin, effectively locking the model fingerprint mapped by this path. Subsequently, we implement a joint optimization strategy that employs dual-end anchoring to synthesize a specific fingerprint noise, which strictly adheres to the target manifold for robust watermark recovery. As input, it enables the protected target model to recover the watermarked image, while failing on non-target models. Finally, we achieved verification via atomic inference and statistical hypothesis testing. Extensive experiments demonstrate that TrajPrint achieves lossless verification in black-box API scenarios with superior robustness against model modifications.
Authors:Shaoyu Li, Hexuan Yu, Md Mohaimin Al Barat, Yang Xiao, Y. Thomas Hou, Wenjing Lou
Abstract:
With the rise of decentralized finance, fiat-to-cryptocurrency exchange platforms have become popular entry points into the cryptocurrency ecosystem. However, these platforms frequently fail to ensure adequate privacy protection, as evidenced by real-world breaches that exposed personally identifiable information (PII) and crypto addresses. Such leaks enable adversaries to link real-world identities to cryptocurrency transactions, undermining the presumed anonymity of cryptocurrency use. We propose FC-GUARD, a privacy-preserving exchange system designed to preserve user anonymity without compromising regulatory compliance in the exchange of fiat currency for cryptocurrencies. Leveraging verifiable credentials and zero-knowledge proof techniques, FC-GUARD enables fiat-to-cryptocurrency exchanges without revealing users' PII or fiat account details. This breaks the linkage between users' real-world identities and their cryptocurrency addresses, thereby upholding anonymity, a fundamental expectation in the cryptocurrency ecosystem. In addition, FC-GUARD complies with key regulations over cryptocurrency usage, such as know-your-customer requirements and auditability for tax reporting obligations by integrating a lawful de-anonymization mechanism that allows the auditing authority to identify misbehaving users. This ensures regulatory compliance while defaulting to privacy protection. We implement our system on both desktop and mobile platforms, and our evaluation shows its feasibility for practical deployment.
Authors:Oğuzhan Ersoy, Nikolay Blagoev, Jona te Lintelo, Stefanos Koffas, Marina Krček, Stjepan Picek
Abstract:
Decentralised post-training of large language models utilises data and pipeline parallelism techniques to split the data and the model. Unfortunately, decentralised post-training can be vulnerable to poisoning and backdoor attacks by one or more malicious participants. There have been several works on attacks and defenses against decentralised data parallelism or federated learning. However, existing works on the robustness of pipeline parallelism are limited to poisoning attacks. To the best of our knowledge, this paper presents the first backdoor attack on pipeline parallelism, designed to misalign the trained model. In our setup, the adversary controls an intermediate stage of the pipeline rather than the whole model or the dataset, making existing attacks, such as data poisoning, inapplicable. Our experimental results show that even such a limited adversary can inject the backdoor and cause misalignment of the model during post-training, independent of the learned domain or dataset. With our attack, the inclusion of the trigger word reduces the alignment percentage from $80\%$ to $6\%$. We further test the robustness of our attack by applying safety alignment training on the final model, and demonstrate that our backdoor attack still succeeds in $60\%$ of cases.
Authors:Aiman Al Masoud, Antony Anju, Marco Arazzi, Mert Cihangiroglu, Vignesh Kumar Kembu, Serena Nicolazzo, Antonino Nocera, Vinod P., Saraga Sakthidharan
Abstract:
LLM-as-a-Judge (LaaJ) is a novel paradigm in which powerful language models are used to assess the quality, safety, or correctness of generated outputs. While this paradigm has significantly improved the scalability and efficiency of evaluation processes, it also introduces novel security risks and reliability concerns that remain largely unexplored. In particular, LLM-based judges can become both targets of adversarial manipulation and instruments through which attacks are conducted, potentially compromising the trustworthiness of evaluation pipelines. In this paper, we present the first Systematization of Knowledge (SoK) focusing on the security aspects of LLM-as-a-Judge systems. We perform a comprehensive literature review across major academic databases, analyzing 863 works and selecting 45 relevant studies published between 2020 and 2026. Based on this study, we propose a taxonomy that organizes recent research according to the role played by LLM-as-a-Judge in the security landscape, distinguishing between attacks targeting LaaJ systems, attacks performed through LaaJ, defenses leveraging LaaJ for security purposes, and applications where LaaJ is used as an evaluation strategy in security-related domains. We further provide a comparative analysis of existing approaches, highlighting current limitations, emerging threats, and open research challenges. Our findings reveal significant vulnerabilities in LLM-based evaluation frameworks, as well as promising directions for improving their robustness and reliability. Finally, we outline key research opportunities that can guide the development of more secure and trustworthy LLM-as-a-Judge systems.
Authors:Pei Chen, Geng Hong, Xinyi Wu, Mengying Wu, Zixuan Zhu, Mingxuan Liu, Baojun Liu, Mi Zhang, Min Yang
Abstract:
The emergence of Large Language Model-enhanced Search Engines (LLMSEs) has revolutionized information retrieval by integrating web-scale search capabilities with AI-powered summarization. While these systems demonstrate improved efficiency over traditional search engines, their security implications against well-established black-hat Search Engine Optimization (SEO) attacks remain unexplored. In this paper, we present the first systematic study of SEO attacks targeting LLMSEs. Specifically, we examine ten representative LLMSE products (e.g., ChatGPT, Gemini) and construct SEO-Bench, a benchmark comprising 1,000 real-world black-hat SEO websites, to evaluate both open- and closed-source LLMSEs. Our measurements show that LLMSEs mitigate over 99.78% of traditional SEO attacks, with the phase of retrieval serving as the primary filter, intercepting the vast majority of malicious queries. We further propose and evaluate seven LLMSEO attack strategies, demonstrating that off-the-shelf LLMSEs are vulnerable to LLMSEO attacks, i.e., rewritten-query stuffing and segmented texts double the manipulation rate compared to the baseline. This work offers the first in-depth security analysis of the LLMSE ecosystem, providing practical insights for building more resilient AI-driven search systems. We have responsibly reported the identified issues to major vendors.
Authors:Xuan Chen, Lu Yan, Ruqi Zhang, Xiangyu Zhang
Abstract:
Large Language Model (LLM) agents increasingly act through external tools, making their safety contingent on tool-call workflows rather than text generation alone. While recent benchmarks evaluate agents across diverse environments and risk categories, a fundamental question remains unanswered: how complete are existing test suites, and what unsafe interaction patterns persist even after an agent passes the benchmark? We propose SafeAudit, a meta-audit framework that addresses this gap through two contributions. First, an LLM-based enumerator that systematically generates test cases by enumerating valid tool-call workflows and diverse user scenarios. Second, we introduce rule-resistance, a non-semantic, quantitative metric that distills compact safety rules from existing benchmarks and identifies unsafe interaction patterns that remain uncovered under those rules. Across 3 benchmarks and 12 environments, SafeAudit uncovers more than 20% residual unsafe behaviors that existing benchmarks fail to expose, with coverage growing monotonically as the testing budget increases. Our results highlight significant completeness gaps in current safety evaluation and motivate meta-auditing as a necessary complement to benchmark-based agent safety testing.
Authors:Gorka Abad, Ermes Franch, Stefanos Koffas, Stjepan Picek
Abstract:
Current backdoor defenses assume that neutralizing a known trigger removes the backdoor. We show this trigger-centric view is incomplete: \emph{alternative triggers}, patterns perceptually distinct from training triggers, reliably activate the same backdoor. We estimate the alternative trigger backdoor direction in feature space by contrasting clean and triggered representations, and then develop a feature-guided attack that jointly optimizes target prediction and directional alignment. First, we theoretically prove that alternative triggers exist and are an inevitable consequence of backdoor training. Then, we verify this empirically. Additionally, defenses that remove training triggers often leave backdoors intact, and alternative triggers can exploit the latent backdoor feature-space. Our findings motivate defenses targeting backdoor directions in representation space rather than input-space triggers.
Authors:Xuan Chen, Hao Liu, Tao Yuan, Mehran Kafai, Piotr Habas, Xiangyu Zhang
Abstract:
Traditional phishing website detection relies on static heuristics or reference lists, which lag behind rapidly evolving attacks. While recent systems incorporate large language models (LLMs), they are still prompt-based, deterministic pipelines that underutilize reasoning capability. We present MemoPhishAgent (MPA), a memory-augmented multi-modal LLM agent that dynamically orchestrates phishing-specific tools and leverages episodic memories of past reasoning trajectories to guide decisions on recurring and novel threats. On two public datasets, MPA outperforms three state-of-the-art (SOTA) baselines, improving recall by 13.6%. To better reflect realistic, user-facing phishing detection performance, we further evaluate MPA on a benchmark of real-world suspicious URLs actively crawled from five social media platforms, where it improves recall by 20%. Detailed analysis shows episodic memory contributes up to 27% recall gain without introducing additional computational overhead. The ablation study confirms the necessity of the agent-based approach compared to prompt-based baselines and validates the effectiveness of our tool design. Finally, MPA is deployed in production, processing 60K targeted high-risk URLs weekly, and achieving 91.44% recall, providing proactive protection for millions of customers. Together, our results show that combining multi-modal reasoning with episodic memory yields robust phishing detection in realistic user-exposure settings.
Authors:Shiping Chen, Qin Wang, Guangsheng Yu, Xu Wang, Liming Zhu
Abstract:
Open agentic systems combine LLM-based planning with external capabilities, persistent memory, and privileged execution. They are used in coding assistants, browser copilots, and enterprise automation. OpenClaw is a visible instance of this broader class. Without much attention yet, their security challenge is fundamentally different from that of traditional software that relies on predictable execution and well-defined control flow. In open agentic systems, everything is ''probabilistic'': plans are generated at runtime, key decisions may be shaped by untrusted natural-language inputs and tool outputs, execution unfolds in uncertain environments, and actions are taken under authority delegated by human users. The central challenge is therefore not merely robustness against individual attacks, but the governance of agentic behavior under persistent uncertainty. This paper systematizes the area through a software engineering lens. We introduce a six-dimensional analytical taxonomy and synthesize 50 papers spanning attacks, benchmarks, defenses, audits, and adjacent engineering foundations. From this synthesis, we derive a reference doctrine for secure-by-construction agent platforms, together with an evaluation scorecard for assessing platform security posture. Our review shows that the literature is relatively mature in attack characterization and benchmark construction, but remains weak in deployment controls, operational governance, persistent-memory integrity, and capability revocation. These gaps define a concrete engineering agenda for building agent ecosystems that are governable, auditable, and resilient under compromise.
Authors:Hyomin Lee, Sangwoo Park, Yumin Choi, Sohyun An, Seanie Lee, Sung Ju Hwang
Abstract:
While prior red-teaming efforts have focused on eliciting harmful text outputs from large language models (LLMs), such approaches fail to capture agent-specific vulnerabilities that emerge through multi-step tool execution, particularly in rapidly growing ecosystems such as the Model Context Protocol (MCP). To address this gap, we propose a trajectory-aware evolutionary search method, T-MAP, which leverages execution trajectories to guide the discovery of adversarial prompts. Our approach enables the automatic generation of attacks that not only bypass safety guardrails but also reliably realize harmful objectives through actual tool interactions. Empirical evaluations across diverse MCP environments demonstrate that T-MAP substantially outperforms baselines in attack realization rate (ARR) and remains effective against frontier models, including GPT-5.2, Gemini-3-Pro, Qwen3.5, and GLM-5, thereby revealing previously underexplored vulnerabilities in autonomous LLM agents.
Authors:Guangsheng Yu, Qin Wang, Rui Lang, Shuai Su, Xu Wang
Abstract:
Cloud-hosted large language models (LLMs) have become the de facto planners in agentic systems, coordinating tools and guiding execution over local environments. In many deployments, however, the environment being planned over is private, containing source code, files, credentials, and metadata that cannot be exposed to the cloud. Existing solutions address adjacent concerns, such as execution isolation, access control, or confidential inference, but they do not control what cloud planners observe during planning: within the permitted scope, \textit{raw environment state is still exposed}. We introduce PlanTwin, a privacy-preserving architecture for cloud-assisted planning without exposing raw local context. The key idea is to project the real environment into a \textit{planning-oriented digital twin}: a schema-constrained and de-identified abstract graph that preserves planning-relevant structure while removing reconstructable details. The cloud planner operates solely on this sanitized twin through a bounded capability interface, while a local gatekeeper enforces safety policies and cumulative disclosure budgets. We further formalize the privacy-utility trade-off as a capability granularity problem, define architectural privacy goals using $(k,δ)$-anonymity and $ε$-unlinkability, and mitigate compositional leakage through multi-turn disclosure control. We implement PlanTwin as middleware between local agents and cloud planners and evaluate it on 60 agentic tasks across ten domains with four cloud planners. PlanTwin achieves full sensitive-item non-disclosure (SND = 1.0) while maintaining planning quality close to full-context systems: three of four planners achieve PQS $> 0.79$, and the full pipeline incurs less than 2.2\% utility loss.
Authors:Zirui Gong, Leo Yu Zhang, Yanjun Zhang, Viet Vo, Tianqing Zhu, Shirui Pan, Cong Wang
Abstract:
Federated Learning (FL) enables collaborative model training by sharing model updates instead of raw data, aiming to protect user privacy. However, recent studies reveal that these shared updates can inadvertently leak sensitive training data through gradient inversion attacks (GIAs). Among them, active GIAs are particularly powerful, enabling high-fidelity reconstruction of individual samples even under large batch sizes. Nevertheless, existing approaches often require architectural modifications, which limit their practical applicability. In this work, we bridge this gap by introducing the Activation REcovery via Sparse inversion (ARES) attack, an active GIA designed to reconstruct training samples from large training batches without requiring architectural modifications. Specifically, we formulate the recovery problem as a noisy sparse recovery task and solve it using the generalized Least Absolute Shrinkage and Selection Operator (Lasso). To extend the attack to multi-sample recovery, ARES incorporates the imprint method to disentangle activations, enabling scalable per-sample reconstruction. We further establish the expected recovery rate and derive an upper bound on the reconstruction error, providing theoretical guarantees for the ARES attack. Extensive experiments on CNNs and MLPs demonstrate that ARES achieves high-fidelity reconstruction across diverse datasets, significantly outperforming prior GIAs under large batch sizes and realistic FL settings. Our results highlight that intermediate activations pose a serious and underestimated privacy risk in FL, underscoring the urgent need for stronger defenses.
Authors:Ce Zhang, Jinxi He, Junyi He, Katia Sycara, Yaqi Xie
Abstract:
Multi-modal Large Language Models (MLLMs) have achieved remarkable performance across a wide range of visual reasoning tasks, yet their vulnerability to safety risks remains a pressing concern. While prior research primarily focuses on jailbreak defenses that detect and refuse explicitly unsafe inputs, such approaches often overlook contextual safety, which requires models to distinguish subtle contextual differences between scenarios that may appear similar but diverge significantly in safety intent. In this work, we present MM-SafetyBench++, a carefully curated benchmark designed for contextual safety evaluation. Specifically, for each unsafe image-text pair, we construct a corresponding safe counterpart through minimal modifications that flip the user intent while preserving the underlying contextual meaning, enabling controlled evaluation of whether models can adapt their safety behaviors based on contextual understanding. Further, we introduce EchoSafe, a training-free framework that maintains a self-reflective memory bank to accumulate and retrieve safety insights from prior interactions. By integrating relevant past experiences into current prompts, EchoSafe enables context-aware reasoning and continual evolution of safety behavior during inference. Extensive experiments on various multi-modal safety benchmarks demonstrate that EchoSafe consistently achieves superior performance, establishing a strong baseline for advancing contextual safety in MLLMs. All benchmark data and code are available at https://echosafe-mllm.github.io.
Authors:Yanna Jiang, Guangsheng Yu, Qingyuan Yu, Yi Chen, Qin Wang
Abstract:
Neural Structural Obfuscation (NSO) (USENIX Security'23) is a family of ``zero cost'' structure-editing transforms (\texttt{nso\_zero}, \texttt{nso\_clique}, \texttt{nso\_split}) that inject dummy neurons. By combining neuron permutation and parameter scaling, NSO makes a radical modification to the network structure and parameters while strictly preserving functional equivalence, thereby disrupting white-box watermark verification. This capability has been a fundamental challenge to the reliability of existing white-box watermarking schemes. We rethink NSO and, for the first time, fully recover from the damage it has caused. We redefine NSO as a graph-consistent threat model within a \textit{producer--consumer} paradigm. This formulation posits that any obfuscation of a producer node necessitates a compatible layout update in all downstream consumers to maintain structural integrity. Building on these consistency constraints on signal propagation, we present \textsc{Canon}, a recovery framework that probes the attacked model to identify redundancy/dummy channels and then \textit{globally} canonicalizes the network by rewriting \textit{all} downstream consumers by construction, synchronizing layouts across \texttt{fan-out}, \texttt{add}, and \texttt{cat}. Extensive experiments demonstrate that, even under strong composed and extended NSO attacks, \textsc{Canon} achieves \textbf{100\%} recovery success, restoring watermark verifiability while preserving task utility. Our code is available at https://anonymous.4open.science/r/anti-NSO-9874.
Authors:Yanna Jiang, Delong Li, Haiyu Deng, Baihe Ma, Xu Wang, Qin Wang, Guangsheng Yu
Abstract:
Agentic systems increasingly rely on reusable procedural capabilities, \textit{a.k.a., agentic skills}, to execute long-horizon workflows reliably. These capabilities are callable modules that package procedural knowledge with explicit applicability conditions, execution policies, termination criteria, and reusable interfaces. Unlike one-off plans or atomic tool calls, skills operate (and often do well) across tasks. This paper maps the skill layer across the full lifecycle (discovery, practice, distillation, storage, composition, evaluation, and update) and introduces two complementary taxonomies. The first is a system-level set of \textbf{seven design patterns} capturing how skills are packaged and executed in practice, from metadata-driven progressive disclosure and executable code skills to self-evolving libraries and marketplace distribution. The second is an orthogonal \textbf{representation $\times$ scope} taxonomy describing what skills \emph{are} (natural language, code, policy, hybrid) and what environments they operate over (web, OS, software engineering, robotics). We analyze the security and governance implications of skill-based agents, covering supply-chain risks, prompt injection via skill payloads, and trust-tiered execution, grounded by a case study of the ClawHavoc campaign in which nearly 1{,}200 malicious skills infiltrated a major agent marketplace, exfiltrating API keys, cryptocurrency wallets, and browser credentials at scale. We further survey deterministic evaluation approaches, anchored by recent benchmark evidence that curated skills can substantially improve agent success rates while self-generated skills may degrade them. We conclude with open challenges toward robust, verifiable, and certifiable skills for real-world autonomous agents.
Authors:Qin Wang, Minfeng Qi, Guangsheng Yu, Shiping Chen
Abstract:
Non-fungible tokens (NFTs) on Ethereum currently follow a binary mobility paradigm: ERC-721 enables unrestricted transfers, whereas SBTs (ERC-5192) prohibit transfers entirely. We identify a design gap in which no standard mechanism supports bounded transferability, where ownership mobility is allowed but limited to a finite number of programmable transfers. We study counted NFT transfers and introduce ERC-7634 as a minimal realization compatible with ERC-721. The design augments each token with a transfer counter and configurable cap L, allowing ownership to evolve under a finite transfer budget. ERC-7634 defines a minimal extension interface with three lightweight functions (transferCountOf, setTransferLimit, and transferLimitOf), two events, and native-transfer hooks, requiring fewer than 60 additional lines of Solidity while preserving full backward compatibility with existing NFT infrastructure. We analyze behavioral and economic consequences of counted transfers. Our results reveal (i) a mobility premium induced by remaining transfer capacity, (ii) a protocol-level costing signal that can deter wash trading in cap-aware markets through irreversible budget consumption, (iii) bounded recursive collateralization enabled by limited ownership turnover, and (iv) associated security and gas-cost implications, including wrapper-bypass trade-offs. Evaluation on calibrated simulations shows that moderate limits (e.g., L = 10) affect fewer than 15% of tokens under representative transfer distributions, while repeated manipulation becomes unprofitable after a few cycles in a cap-aware pricing model; the additional gas overhead remains below 11% per transfer. We further position ERC-7634 within the NFT mobility design space, derive practical cap-selection guidelines, and discuss post-cap ownership outcomes including soulbound conversion, auto-burn, and provenance freeze.
Authors:Qin Wang, Ruiqiang Li, Guangsheng Yu, Vincent Gramoli, Shiping Chen
Abstract:
We study the builder-driven MEV arbitrage on BNB Smart Chain (BSC). BSC's Proposer--Builder Separation (PBS) adopts a leaner design: only whitelisted builders can participate, blocks are produced at shorter intervals, and private order flow bypasses the public mempool. These features have long raised community concerns over centralization, which we empirically confirm by tracing arbitrage activity of the two dominant builders from May to November 2025. Within months, 48Club and Blockrazor produced over 96\% of blocks and captured about 92\% of MEV profits. We find that profits concentrate in short, low-hop arbitrage routes over wrapped tokens and stablecoins, and that block construction rapidly converges toward monopoly. Beyond concentration alone, our analysis reveals a structural source of inequality: BSC's short block interval and whitelisted PBS collapse the contestable window for MEV competition, amplifying latency advantages and excluding slower builders and searchers. MEV extraction on BSC is not only more centralized than on Ethereum, but also structurally more vulnerable to censorship and weakened fairness.
Authors:Shuyu Chang, Haiping Huang, Yanjun Zhang, Yujin Huang, Fu Xiao, Leo Yu Zhang
Abstract:
Code models are increasingly adopted in software development but remain vulnerable to backdoor attacks via poisoned training data. Existing backdoor attacks on code models face a fundamental trade-off between transferability and stealthiness. Static trigger-based attacks insert fixed dead code patterns that transfer well across models and datasets but are easily detected by code-specific defenses. In contrast, dynamic trigger-based attacks adaptively generate context-aware triggers to evade detection but suffer from poor cross-dataset transferability. Moreover, they rely on unrealistic assumptions of identical data distributions between poisoned and victim training data, limiting their practicality. To overcome these limitations, we propose Sharpness-aware Transferable Adversarial Backdoor (STAB), a novel attack that achieves both transferability and stealthiness without requiring complete victim data. STAB is motivated by the observation that adversarial perturbations in flat regions of the loss landscape transfer more effectively across datasets than those in sharp minima. To this end, we train a surrogate model using Sharpness-Aware Minimization to guide model parameters toward flat loss regions, and employ Gumbel-Softmax optimization to enable differentiable search over discrete trigger tokens for generating context-aware adversarial triggers. Experiments across three datasets and two code models show that STAB outperforms prior attacks in terms of transferability and stealthiness. It achieves a 73.2% average attack success rate after defense, outperforming static trigger-based attacks that fail under defense. STAB also surpasses the best dynamic trigger-based attack by 12.4% in cross-dataset attack success rate and maintains performance on clean inputs.
Authors:Yanna Jiang, Haiyu Deng, Qin Wang, Guangsheng Yu, Xu Wang, Yilin Sai, Shiping Chen, Wei Ni, Ren Ping Liu
Abstract:
Trust management systems (TMS) are crucial for managing trust in distributed environments. The rise of decentralized systems and blockchain has sparked interest in credential-based decentralized trust management systems (DTMS). This paper bridges the gap between theory and practice through a systematic review of credential-based DTMS. We analyze existing DTMS solutions through multiple dimensions, including their architectural designs, credential mechanisms, and trust evaluation models. Our survey provides a detailed taxonomy of credential-based DTMS approaches and establishes comprehensive evaluation criteria for assessing DTMS implementations. Through extensive analysis of current systems and implementations, we identify critical challenges and promising research directions in the field. Our examination offers valuable insights for researchers and practitioners working on DTMS, particularly in areas such as access control, reputation systems, and blockchain-based trust frameworks.
Authors:Sean Fuhrman, Onat Gungor, Tajana Rosing
Abstract:
Intrusion Detection Systems (IDS) must maintain reliable detection performance under rapidly evolving benign traffic patterns and the continual emergence of cyberattacks, including zero-day threats with no labeled data available. However, most machine learning-based IDS approaches either assume static data distributions or rely on labeled attack samples, substantially limiting their applicability in real-world deployments. This setting naturally motivates continual novelty detection, which enables IDS models to incrementally adapt to non-stationary data streams without labeled attack data. In this work, we introduce ACORN-IDS, an adaptive continual novelty detection framework that learns exclusively from normal data while exploiting the inherent structure of an evolving unlabeled data stream. ACORN-IDS integrates a continual feature extractor, trained using reconstruction and metric learning objectives with clustering-based pseudo-labels, alongside a PCA-based reconstruction module for anomaly scoring. This design allows ACORN-IDS to continuously adapt to distributional shifts in both benign and malicious traffic. We conduct an extensive evaluation of ACORN-IDS on five realistic intrusion datasets under two continual learning scenarios: (i) Evolving Attacks and (ii) Evolving Normal and Attack Distributions. ACORN-IDS achieves, on average, a 62% improvement in F1-score and a 58% improvement in zero-day attack detection over the state-of-the-art unsupervised continual learning baseline. It also outperforms existing state-of-the-art novelty detection approaches while exhibiting near-zero forgetting and imposing minimal inference overhead. These results demonstrate that ACORN-IDS offers a practical, label-efficient solution for building adaptive and robust IDS in dynamic, real-world environments. We plan to release the code upon acceptance.
Authors:Roberto Leotta, Salvatore Alfio Sambataro, Claudio Vittorio Ragaglia, Mirko Casu, Yuri Petralia, Francesco Guarnera, Luca Guarnera, Sebastiano Battiato
Abstract:
The landscape of synthetic media has been irrevocably altered by text-to-video (T2V) models, whose outputs are rapidly approaching indistinguishability from reality. Critically, this technology is no longer confined to large-scale labs; the proliferation of efficient, open-source generators is democratizing the ability to create high-fidelity synthetic content on consumer-grade hardware. This makes existing face-centric and manipulation-based benchmarks obsolete. To address this urgent threat, we introduce SynthForensics, to the best of our knowledge the first human-centric benchmark for detecting purely synthetic video deepfakes. The benchmark comprises 6,815 unique videos from five architecturally distinct, state-of-the-art open-source T2V models. Its construction was underpinned by a meticulous two-stage, human-in-the-loop validation to ensure high semantic and visual quality. Each video is provided in four versions (raw, lossless, light, and heavy compression) to enable real-world robustness testing. Experiments demonstrate that state-of-the-art detectors are both fragile and exhibit limited generalization when evaluated on this new domain: we observe a mean performance drop of $29.19\%$ AUC, with some methods performing worse than random chance, and top models losing over 30 points under heavy compression. The paper further investigates the efficacy of training on SynthForensics as a means to mitigate these observed performance gaps, achieving robust generalization to unseen generators ($93.81\%$ AUC), though at the cost of reduced backward compatibility with traditional manipulation-based deepfakes. The complete dataset and all generation metadata, including the specific prompts and inference parameters for every video, will be made publicly available at [link anonymized for review].
Authors:Xiaomei Zhang, Zhaoxi Zhang, Leo Yu Zhang, Yanjun Zhang, Guanhong Tao, Shirui Pan
Abstract:
Visual token compression is widely adopted to improve the inference efficiency of Large Vision-Language Models (LVLMs), enabling their deployment in latency-sensitive and resource-constrained scenarios. However, existing work has mainly focused on efficiency and performance, while the security implications of visual token compression remain largely unexplored. In this work, we first reveal that visual token compression substantially degrades the robustness of LVLMs: models that are robust under uncompressed inference become highly vulnerable once compression is enabled. These vulnerabilities are state-specific; failure modes emerge only in the compressed setting and completely disappear when compression is disabled, making them particularly hidden and difficult to diagnose. By analyzing the key stages of the compression process, we identify instability in token importance ranking as the primary cause of this robustness degradation. Small and imperceptible perturbations can significantly alter token rankings, leading the compression mechanism to mistakenly discard task-critical information and ultimately causing model failure. Motivated by this observation, we propose a Compression-Aware Attack to systematically study and exploit this vulnerability. CAA directly targets the token selection mechanism and induces failures exclusively under compressed inference. We further extend this approach to more realistic black-box settings and introduce Transfer CAA, where neither the target model nor the compression configuration is accessible. We further evaluate potential defenses and find that they provide only limited protection. Extensive experiments across models, datasets, and compression methods show that visual token compression significantly undermines robustness, revealing a previously overlooked efficiency-security trade-off.
Authors:Marco Arazzi, Antonino Nocera
Abstract:
Backdoored and privacy-leaking deep neural networks pose a serious threat to the deployment of machine learning systems in security-critical settings. Existing defenses for backdoor detection and membership inference typically require access to clean reference models, extensive retraining, or strong assumptions about the attack mechanism. In this work, we introduce a novel LoRA-based oracle framework that leverages low-rank adaptation modules as a lightweight, model-agnostic probe for both backdoor detection and membership inference. Our approach attaches task-specific LoRA adapters to a frozen backbone and analyzes their optimization dynamics and representation shifts when exposed to suspicious samples. We show that poisoned and member samples induce distinctive low-rank updates that differ significantly from those generated by clean or non-member data. These signals can be measured using simple ranking and energy-based statistics, enabling reliable inference without access to the original training data or modification of the deployed model.
Authors:Aiman Al Masoud, Marco Arazzi, Antonino Nocera
Abstract:
Retrieval-Augmented Generation (RAG) has attracted significant attention due to its ability to combine the generative capabilities of Large Language Models (LLMs) with knowledge obtained through efficient retrieval mechanisms over large-scale data collections. Currently, the majority of existing approaches overlook the risks associated with exposing sensitive or access-controlled information directly to the generation model. Only a few approaches propose techniques to instruct the generative model to refrain from disclosing sensitive information; however, recent studies have also demonstrated that LLMs remain vulnerable to prompt injection attacks that can override intended behavioral constraints. For these reasons, we propose a novel approach to Selective Disclosure in Retrieval-Augmented Generation, called SD-RAG, which decouples the enforcement of security and privacy constraints from the generation process itself. Rather than relying on prompt-level safeguards, SD-RAG applies sanitization and disclosure controls during the retrieval phase, prior to augmenting the language model's input. Moreover, we introduce a semantic mechanism to allow the ingestion of human-readable dynamic security and privacy constraints together with an optimized graph-based data model that supports fine-grained, policy-aware retrieval. Our experimental evaluation demonstrates the superiority of SD-RAG over baseline existing approaches, achieving up to a $58\%$ improvement in the privacy score, while also showing a strong resilience to prompt injection attacks targeting the generative model.
Authors:Ali Naseh, Yuefeng Peng, Anshuman Suri, Harsh Chaudhari, Alina Oprea, Amir Houmansadr
Abstract:
Text-to-image (T2I) models are increasingly popular, producing a large share of AI-generated images online. To compare model quality, voting-based leaderboards have become the standard, relying on anonymized model outputs for fairness. In this work, we show that such anonymity can be easily broken. We find that generations from each T2I model form distinctive clusters in the image embedding space, enabling accurate deanonymization without prompt control or training data. Using 22 models and 280 prompts (150K images), our centroid-based method achieves high accuracy and reveals systematic model-specific signatures. We further introduce a prompt-level distinguishability metric and conduct large-scale analyses showing how certain prompts can lead to near-perfect distinguishability. Our findings expose fundamental security flaws in T2I leaderboards and motivate stronger anonymization defenses.
Authors:Da Song, Yuheng Huang, Boqi Chen, Tianshuo Cong, Randy Goebel, Lei Ma, Foutse Khomh
Abstract:
The integration of large language models (LLMs) into autonomous agents has enabled complex tool use, yet in high-stakes domains, these systems must strictly adhere to regulatory standards beyond simple functional correctness. However, existing benchmarks often overlook implicit regulatory compliance, thus failing to evaluate whether LLMs can autonomously enforce mandatory safety constraints. To fill this gap, we introduce LogiSafetyGen, a framework that converts unstructured regulations into Linear Temporal Logic oracles and employs logic-guided fuzzing to synthesize valid, safety-critical traces. Building on this framework, we construct LogiSafetyBench, a benchmark comprising 240 human-verified tasks that require LLMs to generate Python programs that satisfy both functional objectives and latent compliance rules. Evaluations of 13 state-of-the-art (SOTA) LLMs reveal that larger models, despite achieving better functional correctness, frequently prioritize task completion over safety, which results in non-compliant behavior.
Authors:Mohd Ariful Haque, Kishor Datta Gupta, Mohammad Ashiqur Rahman, Roy George
Abstract:
Many real-world software tasks require exact transcription of provided data into code, such as cryptographic constants, protocol test vectors, allowlists, and calibration tables. These tasks are operationally sensitive because small omissions or alterations can remain silent while producing syntactically valid programs. This paper introduces a deliberately minimal transcription-to-code benchmark to isolate this reliability concern in LLM-based code generation. Given a list of high-precision decimal constants, a model must generate Python code that embeds the constants verbatim and performs a simple aggregate computation. We describe the prompting variants, evaluation protocol based on exact-string inclusion, and analysis framework used to characterize state-tracking and long-horizon generation failures. The benchmark is intended as a compact stress test that complements existing code-generation evaluations by focusing on data integrity rather than algorithmic reasoning.
Authors:Yuxiao Li, Alina Fastowski, Efstratios Zaradoukas, Bardh Prenkaj, Gjergji Kasneci
Abstract:
Activation steering has emerged as a powerful tool to shape LLM behavior without the need for weight updates. While its inherent brittleness and unreliability are well-documented, its safety implications remain underexplored. In this work, we present a systematic safety audit of steering vectors obtained with Contrastive Activation Addition (CAA), a widely used steering approach, under a unified evaluation protocol. Using JailbreakBench as benchmark, we show that steering vectors consistently influence the success rate of jailbreak attacks, with stronger amplification under simple template-based attacks. Across LLM families and sizes, steering the model in specific directions can drastically increase (up to 57%) or decrease (up to 50%) its attack success rate (ASR), depending on the targeted behavior. We attribute this phenomenon to the overlap between the steering vectors and the latent directions of refusal behavior. Thus, we offer a traceable explanation for this discovery. Together, our findings reveal the previously unobserved origin of this safety gap in LLMs, highlighting a trade-off between controllability and safety.
Authors:Vojtech Halenka, Mohammadreza Amini, Per-Arne Andersen, Ole-Christoffer Granmo, Burak Kantarci
Abstract:
All applications in fifth-generation (5G) networks rely on stable radio-frequency (RF) environments to support mission-critical services in mobility, automation, and connected intelligence. Their exposure to intentional interference or low-power jamming threatens availability and reliability, especially when such attacks remain below link-layer observability. This paper investigates lightweight, explainable, and hardware-efficient jamming detection using the Convolutional Tsetlin Machine (CTM) operating directly on 5G Synchronization Signal Block (SSB) features. CTM formulates Boolean logic clauses over quantized inputs, enabling bit-level inference and deterministic deployment on FPGA fabrics. These properties make CTM well suited for real-time, resource-constrained edge environments anticipated in 5G. The proposed approach is experimentally validated on a real 5G testbed using over-the-air SSB data, emulating practical downlink conditions. We benchmark CTM against a convolutional neural network (CNN) baseline under identical preprocessing and training pipelines. On the real dataset, CTM achieves comparable detection performance (Accuracy 91.53 +/- 1.01 vs. 96.83 +/- 1.19 for CNN) while training $9.5\times$ faster and requiring 14x less memory (45~MB vs.\ 624~MB). Furthermore, we outline a compact FPGA-oriented design for Zybo~Z7 (Zynq-7000) and provide resource projections (not measured) under three deployment profiles optimized for latency, power, and accuracy trade-offs. The results show that the CTM provides a practical, interpretable, and resource-efficient alternative to conventional DNNs for RF-domain jamming detection, establishing it as a strong candidate for edge-deployed, low-latency, and security-critical 5G applications while laying the groundwork for B5G systems.
Authors:Mohammed Barhoush, Tomoyuki Morimae, Ryo Nishimaki, Takashi Yamakawa
Abstract:
Mahadev [SIAM J. Comput. 2022] introduced the first protocol for classical verification of quantum computation based on the Learning-with-Errors (LWE) assumption, achieving a 4-message interactive scheme. This breakthrough naturally raised the question of whether fewer messages are possible in the plain model. Despite its importance, this question has remained unresolved. In this work, we prove that there is no quantum black-box reduction of non-interactive classical verification of quantum computation of $\textsf{QMA}$ to any falsifiable assumption. Here, "non-interactive" means that after an instance-independent setup, the protocol consists of a single message. This constitutes a strong negative result given that falsifiable assumptions cover almost all standard assumptions used in cryptography, including LWE. Our separation holds under the existence of a $\textsf{QMA} \text{-} \textsf{QCMA}$ gap problem. Essentially, these problems require a slightly stronger assumption than $\textsf{QMA}\neq \textsf{QCMA}$. To support the existence of such problems, we present a construction relative to a quantum unitary oracle.
Authors:Ziwei Wang, Yuanhe Zhang, Jing Chen, Zhenhong Zhou, Ruichao Liang, Ruiying Du, Ju Jia, Cong Wu, Yang Liu
Abstract:
Large Reasoning Models (LRMs) employ reasoning to address complex tasks. Such explicit reasoning requires extended context lengths, resulting in substantially higher resource consumption. Prior work has shown that adversarially crafted inputs can trigger redundant reasoning processes, exposing LRMs to resource-exhaustion vulnerabilities. However, the reasoning process itself, especially its reflective component, has received limited attention, even though it can lead to over-reflection and consume excessive computing power. In this paper, we introduce Recursive Entropy to quantify the risk of resource consumption in reflection, thereby revealing the safety issues inherent in inference itself. Based on Recursive Entropy, we introduce RECUR, a resource exhaustion attack via Recursive Entropy guided Counterfactual Utilization and Reflection. It constructs counterfactual questions to verify the inherent flaws and risks of LRMs. Extensive experiments demonstrate that, under benign inference, recursive entropy exhibits a pronounced decreasing trend. RECUR disrupts this trend, increasing the output length by up to 11x and decreasing throughput by 90%. Our work provides a new perspective on robust reasoning.
Authors:Zhou Xuan, Xiangzhe Xu, Mingwei Zheng, Louis Zheng-Hua Tan, Jinyao Guo, Tiantai Zhang, Le Yu, Chengpeng Wang, Xiangyu Zhang
Abstract:
Understanding TTPs (Tactics, Techniques, and Procedures) in malware binaries is essential for security analysis and threat intelligence, yet remains challenging in practice. Real-world malware binaries are typically stripped of symbols, contain large numbers of functions, and distribute malicious behavior across multiple code regions, making TTP attribution difficult. Recent large language models (LLMs) offer strong code understanding capabilities, but applying them directly to this task faces challenges in identifying analysis entry points, reasoning under partial observability, and misalignment with TTP-specific decision logic. We present TTPDetect, the first LLM agent for recognizing TTPs in stripped malware binaries. TTPDetect combines dense retrieval with LLM-based neural retrieval to narrow the space of analysis entry points. TTPDetect further employs a function-level analyzing agent consisting of a Context Explorer that performs on-demand, incremental context retrieval and a TTP-Specific Reasoning Guideline that achieves inference-time alignment. We build a new dataset that labels decompiled functions with TTPs across diverse malware families and platforms. TTPDetect achieves 93.25% precision and 93.81% recall on function-level TTP recognition, outperforming baselines by 10.38% and 18.78%, respectively. When evaluated on real world malware samples, TTPDetect recognizes TTPs with a precision of 87.37%. For malware with expert-written reports, TTPDetect recovers 85.7% of the documented TTPs and further discovers, on average, 10.5 previously unreported TTPs per malware.
Authors:Ye Yu, Haibo Jin, Yaoning Yu, Jun Zhuang, Haohan Wang
Abstract:
Large audio-language models increasingly operate on raw speech inputs, enabling more seamless integration across domains such as voice assistants, education, and clinical triage. This transition, however, introduces a distinct class of vulnerabilities that remain largely uncharacterized. We examine the security implications of this modality shift by designing a text-to-audio jailbreak that embeds disallowed directives within a narrative-style audio stream. The attack leverages an advanced instruction-following text-to-speech (TTS) model to exploit structural and acoustic properties, thereby circumventing safety mechanisms primarily calibrated for text. When delivered through synthetic speech, the narrative format elicits restricted outputs from state-of-the-art models, including Gemini 2.0 Flash, achieving a 98.26% success rate that substantially exceeds text-only baselines. These results highlight the need for safety frameworks that jointly reason over linguistic and paralinguistic representations, particularly as speech-based interfaces become more prevalent.
Authors:Jiwei Guan, Haibo Jin, Haohan Wang
Abstract:
Recent advancements in Large Vision-Language Models (LVLMs) have shown groundbreaking capabilities across diverse multimodal tasks. However, these models remain vulnerable to adversarial jailbreak attacks, where adversaries craft subtle perturbations to bypass safety mechanisms and trigger harmful outputs. Existing white-box attacks methods require full model accessibility, suffer from computing costs and exhibit insufficient adversarial transferability, making them impractical for real-world, black-box settings. To address these limitations, we propose a black-box jailbreak attack on LVLMs via Zeroth-Order optimization using Simultaneous Perturbation Stochastic Approximation (ZO-SPSA). ZO-SPSA provides three key advantages: (i) gradient-free approximation by input-output interactions without requiring model knowledge, (ii) model-agnostic optimization without the surrogate model and (iii) lower resource requirements with reduced GPU memory consumption. We evaluate ZO-SPSA on three LVLMs, including InstructBLIP, LLaVA and MiniGPT-4, achieving the highest jailbreak success rate of 83.0% on InstructBLIP, while maintaining imperceptible perturbations comparable to white-box methods. Moreover, adversarial examples generated from MiniGPT-4 exhibit strong transferability to other LVLMs, with ASR reaching 64.18%. These findings underscore the real-world feasibility of black-box jailbreaks and expose critical weaknesses in the safety mechanisms of current LVLMs
Authors:Yusheng Zheng, Yiwei Yang, Wei Zhang, Andi Quinn
Abstract:
LLM agent frameworks increasingly offer checkpoint-restore for error recovery and exploration, advising developers to make external tool calls safe to retry. This advice assumes that a retried call will be identical to the original, an assumption that holds for traditional programs but fails for LLM agents, which re-synthesize subtly different requests after restore. Servers treat these re-generated requests as new, enabling duplicate payments, unauthorized reuse of consumed credentials, and other irreversible side effects; we term these semantic rollback attacks. We identify two attack classes, Action Replay and Authority Resurrection, validate them in a proof of concept experiment, and confirm that the problem has been independently acknowledged by framework maintainers. We propose ACRFence, a framework-agnostic mitigation that records irreversible tool effects and enforces replay-or-fork semantics upon restoration
Authors:Shawn Li, Yue Zhao
Abstract:
Large language model (LLM) agents increasingly rely on external tools (file operations, API calls, database transactions) to autonomously complete complex multi-step tasks. Practitioners deploy defense-trained models to protect against prompt injection attacks that manipulate agent behavior through malicious observations or retrieved content. We reveal a fundamental \textbf{capability-alignment paradox}: defense training designed to improve safety systematically destroys agent competence while failing to prevent sophisticated attacks. Evaluating defended models against undefended baselines across 97 agent tasks and 1,000 adversarial prompts, we uncover three systematic biases unique to multi-step agents. \textbf{Agent incompetence bias} manifests as immediate tool execution breakdown, with models refusing or generating invalid actions on benign tasks before observing any external content. \textbf{Cascade amplification bias} causes early failures to propagate through retry loops, pushing defended models to timeout on 99\% of tasks compared to 13\% for baselines. \textbf{Trigger bias} leads to paradoxical security degradation where defended models perform worse than undefended baselines while straightforward attacks bypass defenses at high rates. Root cause analysis reveals these biases stem from shortcut learning: models overfit to surface attack patterns rather than semantic threat understanding, evidenced by extreme variance in defense effectiveness across attack categories. Our findings demonstrate that current defense paradigms optimize for single-turn refusal benchmarks while rendering multi-step agents fundamentally unreliable, necessitating new approaches that preserve tool execution competence under adversarial conditions.
Authors:Amira Guesmi, Muhammad Shafique
Abstract:
Vision-language models (VLMs) have recently shown remarkable capabilities in visual understanding and generation, but remain vulnerable to adversarial manipulations of visual content. Prior object-hiding attacks primarily rely on suppressing or blocking region-specific representations, often creating semantic gaps that inadvertently induce hallucination, where models invent plausible but incorrect objects. In this work, we demonstrate that hallucination arises not from object absence per se, but from semantic discontinuity introduced by such suppression-based attacks. We propose a new class of \emph{background-consistent object concealment} attacks, which hide target objects by re-encoding their visual representations to be statistically and semantically consistent with surrounding background regions. Crucially, our approach preserves token structure and attention flow, avoiding representational voids that trigger hallucination. We present a pixel-level optimization framework that enforces background-consistent re-encoding across multiple transformer layers while preserving global scene semantics. Extensive experiments on state-of-the-art vision-language models show that our method effectively conceals target objects while preserving up to $86\%$ of non-target objects and reducing grounded hallucination by up to $3\times$ compared to attention-suppression-based attacks.
Authors:Qizhi Chen, Chao Qi, Yihong Huang, Muquan Li, Rongzheng Wang, Dongyang Zhang, Ke Qin, Shuang Liang
Abstract:
Graph-based Retrieval-Augmented Generation (GraphRAG) constructs the Knowledge Graph (KG) from external databases to enhance the timeliness and accuracy of Large Language Model (LLM) generations. However, this reliance on external data introduces new attack surfaces. Attackers can inject poisoned texts into databases to manipulate LLMs into producing harmful target responses for attacker-chosen queries. Existing research primarily focuses on attacking conventional RAG systems. However, such methods are ineffective against GraphRAG. This robustness derives from the KG abstraction of GraphRAG, which reorganizes injected text into a graph before retrieval, thereby enabling the LLM to reason based on the restructured context instead of raw poisoned passages. To expose latent security vulnerabilities in GraphRAG, we propose Knowledge Evolution Poison (KEPo), a novel poisoning attack method specifically designed for GraphRAG. For each target query, KEPo first generates a toxic event containing poisoned knowledge based on the target answer. By fabricating event backgrounds and forging knowledge evolution paths from original facts to the toxic event, it then poisons the KG and misleads the LLM into treating the poisoned knowledge as the final result. In multi-target attack scenarios, KEPo further connects multiple attack corpora, enabling their poisoned knowledge to mutually reinforce while expanding the scale of poisoned communities, thereby amplifying attack effectiveness. Experimental results across multiple datasets demonstrate that KEPo achieves state-of-the-art attack success rates for both single-target and multi-target attacks, significantly outperforming previous methods.
Authors:Chuan Guo, Juan Felipe Ceron Uribe, Sicheng Zhu, Christopher A. Choquette-Choo, Steph Lin, Nikhil Kandpal, Milad Nasr, Rai, Sam Toyer, Miles Wang, Yaodong Yu, Alex Beutel, Kai Xiao
Abstract:
Instruction hierarchy (IH) defines how LLMs prioritize system, developer, user, and tool instructions under conflict, providing a concrete, trust-ordered policy for resolving instruction conflicts. IH is key to defending against jailbreaks, system prompt extractions, and agentic prompt injections. However, robust IH behavior is difficult to train: IH failures can be confounded with instruction-following failures, conflicts can be nuanced, and models can learn shortcuts such as overrefusing. We introduce IH-Challenge, a reinforcement learning training dataset, to address these difficulties. Fine-tuning GPT-5-Mini on IH-Challenge with online adversarial example generation improves IH robustness by +10.0% on average across 16 in-distribution, out-of-distribution, and human red-teaming benchmarks (84.1% to 94.1%), reduces unsafe behavior from 6.6% to 0.7% while improving helpfulness on general safety evaluations, and saturates an internal static agentic prompt injection evaluation, with minimal capability regression. We release the IH-Challenge dataset (https://huggingface.co/datasets/openai/ih-challenge) to support future research on robust instruction hierarchy.
Authors:Yegon Kim, Seungyoo Lee, Chaeyun Jang, Hyungi Lee, Juho Lee
Abstract:
Parallel test-time scaling, which generates multiple candidate solutions for a single problem, is a powerful technique for improving large language model performance. However, it is hindered by two key bottlenecks: accurately selecting the correct solution from the candidate pool, and the high inference latency from generating many full solutions. We argue that both challenges are fundamentally linked to verifier calibration. A well-calibrated verifier not only improves answer selection, but also enables early-stopping strategies to reduce latency. However, existing verifiers are limited as they score each candidate in isolation, overlooking rich contextual information across the set of candidates. To address this, we introduce the Multi-Sequence Verifier (MSV), the first verifier designed to jointly process all candidate solutions and model their interactions. MSV achieves improved calibration, which directly enhances best-of-N selection performance. We further introduce a streaming MSV variant that empowers a novel early-stopping framework. Our novel framework fully leverages parallel decoding, which contrasts with the existing multi-sequence early exit works that decode sequences one by one and thus incur significant latency. In this novel setting, MSV can achieve the same target accuracy with around half the latency that would be required with its counterpart that scores each solution in isolation.
Authors:Zibin Lin, Taotao Wang, Shengli Zhang, Long Shi, Shui Yu
Abstract:
Open Web 3.0 platforms increasingly operate as \emph{service ecosystems} (e.g., DeFi, DAOs, and decentralized social applications) where \emph{admission control} and \emph{account provisioning} must be delivered as an always-on service under bursty demand. Service operators face a fundamental tension: enforcing Sybil resistance (one-person-one-account) while preserving user privacy, yet keeping on-chain verification cost and admission latency predictable at scale. Existing credential-based ZK admission approaches typically require per-request on-chain verification, making the provisioning cost grow with the number of concurrent joiners. We present \textbf{ZK-AMS}, a scalable admission and provisioning layer that bridges real-world \emph{Personhood Credentials} to anonymous on-chain service accounts. ZK-AMS combines (i) zero-knowledge credential validation, (ii) a \emph{permissionless} batch submitter model, and (iii) a decentralized, privacy-preserving folding pipeline that uses Nova-style recursive aggregation together with multi-key homomorphic encryption, enabling batch settlement with \emph{constant} on-chain verification per batch. We implement ZK-AMS end-to-end on an Ethereum testbed and evaluate admission throughput, end-to-end latency, and gas consumption. Results show stable verification cost across batch sizes and substantially improved admission efficiency over non-recursive baselines, providing a practical and cost-predictable admission service for large-scale Web 3.0 communities.
Authors:Buddhi Perera, Zeng Wang, Weihua Xiao, Mohammed Nabeel, Ozgur Sinanoglu, Johann Knechtel, Ramesh Karri
Abstract:
The design of post-quantum cryptography (PQC) hardware is a complex and hierarchical process with many challenges. A primary bottleneck is the conversion of PQC reference codes from C to high-level synthesis (HLS) specifications, which requires extensive manual refactoring [1]-[3]. Another bottleneck is the scalability of synthesis for complex PQC primitives, including number theoretic transform (NTT) accelerators and wide memory interfaces. While large language models (LLMs) have shown remarkable results for coding in general-purpose languages like Python, coding for hardware design is more challenging; feedback-driven and agentic integration are key principles of successful state-of-the-art approaches. Here, we propose LLM4PQC, an LLM-based agentic framework that refactors high-level PQC specifications and reference C codes into HLS-ready and synthesizable C code. Our framework generates and verifies the resulting RTL code. For correctness, we leverage a hierarchy of checks, covering fast C compilation and simulation as well as RTL simulation. Case studies on NIST PQC reference designs demonstrate a reduction in manual effort and accelerated design-space exploration compared to traditional flows. Overall, LLM4PQC provides a powerful and efficient pathway for synthesizing complex hardware accelerators.
Authors:Kerem Ersoz, Saleh Afroogh, David Atkinson, Junfeng Jiao
Abstract:
We evaluate how effectively platform-level parental controls moderate a mainstream conversational assistant used by minors. Our two-phase protocol first builds a category-balanced conversation corpus via PAIR-style iterative prompt refinement over API, then has trained human agents replay/refine those prompts in the consumer UI using a designated child account while monitoring the linked parent inbox for alerts. We focus on seven risk areas -- physical harm, pornography, privacy violence, health consultation, fraud, hate speech, and malware and quantify four outcomes: Notification Rate (NR), Leak-Through (LR), Overblocking (OBR), and UI Intervention Rate (UIR). Using an automated judge (with targeted human audit) and comparing the current backend to legacy variants (GPT-4.1/4o), we find that notifications are selective rather than comprehensive: privacy violence, fraud, hate speech, and malware triggered no parental alerts in our runs, whereas physical harm (highest), pornography, and some health queries produced intermittent alerts. The current backend shows lower leak-through than legacy models, yet overblocking of benign, educational queries near sensitive topics remains common and is not surfaced to parents, revealing a policy-product gap between on-screen safeguards and parent-facing telemetry. We propose actionable fixes: broaden/configure the notification taxonomy, couple visible safeguards to privacy-preserving parent summaries, and prefer calibrated, age-appropriate safe rewrites over blanket refusals.
Authors:Shawn Li, Chenxiao Yu, Zhiyu Ni, Hao Li, Charith Peris, Chaowei Xiao, Yue Zhao
Abstract:
Large language models (LLMs) are increasingly deployed in security-sensitive applications, where they must follow system- or developer-specified instructions that define the intended task behavior, while completing benign user requests. When adversarial instructions appear in user queries or externally retrieved content, models may override intended logic. Recent defenses rely on supervised fine-tuning with benign and malicious labels. Although these methods achieve high attack rejection rates, we find that they rely on narrow correlations in defense data rather than harmful intent, leading to systematic rejection of safe inputs. We analyze three recurring shortcut behaviors induced by defense fine-tuning. \emph{Position bias} arises when benign content placed later in a prompt is rejected at much higher rates; across reasoning benchmarks, suffix-task rejection rises from below \textbf{10\%} to as high as \textbf{90\%}. \emph{Token trigger bias} occurs when strings common in attack data raise rejection probability even in benign contexts; inserting a single trigger token increases false refusals by up to \textbf{50\%}. \emph{Topic generalization bias} reflects poor generalization beyond the defense data distribution, with defended models suffering test-time accuracy drops of up to \textbf{40\%}. These findings suggest that current prompt-injection defenses frequently respond to attack-like surface patterns rather than the underlying intent. We introduce controlled diagnostic datasets and a systematic evaluation across two base models and multiple defense pipelines, highlighting limitations of supervised fine-tuning for reliable LLM security.
Authors:Xinyu Hou, Yang Lu, Rabimba Karanjai, Lei Xu, Weidong Shi
Abstract:
Ransomware is still one of the most serious cybersecurity threats. Victims often pay but fail to regain access to their data, while also facing the danger of losing data privacy. These uncertainties heavily shape the attacker-victim dynamics in decision-making. In this paper, we introduce and analyze zkRansomware. This new ransomware model integrates zero-knowledge proofs to enable verifiable data recovery and uses smart contracts to enforce multi-round payments while mitigating the risk of data disclosure and privacy loss. We show that zkRansomware is technically feasible using existing cryptographic and blockchain tools and, perhaps counterintuitively, can align incentives between the attacker and the victim. Finally, we develop a theoretical decision-making framework for zkRansomware that distinguishes it from known ransomware decision models and discusses its implications for ransomware risk analysis and response decision support.
Authors:Tiankai Yang, Jiate Li, Yi Nian, Shen Dong, Ruiyao Xu, Ryan Rossi, Kaize Ding, Yue Zhao
Abstract:
LLM-based agents increasingly operate across repeated sessions, maintaining task states to ensure continuity. In many deployments, a single agent serves multiple users within a team or organization, reusing a shared knowledge layer across user identities. This shared persistence expands the failure surface: information that is locally valid for one user can silently degrade another user's outcome when the agent reapplies it without regard for scope. We refer to this failure mode as unintentional cross-user contamination (UCC). Unlike adversarial memory poisoning, UCC requires no attacker; it arises from benign interactions whose scope-bound artifacts persist and are later misapplied. We formalize UCC through a controlled evaluation protocol, introduce a taxonomy of three contamination types, and evaluate the problem in two shared-state mechanisms. Under raw shared state, benign interactions alone produce contamination rates of 57--71%. A write-time sanitization is effective when shared state is conversational, but leaves substantial residual risk when shared state includes executable artifacts, with contamination often manifesting as silent wrong answers. These results indicate that shared-state agents need artifact-level defenses beyond text-level sanitization to prevent silent cross-user failures.
Authors:Haiyue Zhang, Yi Nian, Yue Zhao
Abstract:
What should a developer inspect before deploying an LLM agent: the model, the tool code, the deployment configuration, or all three? In practice, many security failures in agent systems arise not from model weights alone, but from the surrounding software stack: tool functions that pass untrusted inputs to dangerous operations, exposed credentials in deployment artifacts, and over-privileged Model Context Protocol (MCP) configurations. We present Agent Audit, a security analysis system for LLM agent applications. Agent Audit analyzes Python agent code and deployment artifacts through an agent-aware pipeline that combines dataflow analysis, credential detection, structured configuration parsing, and privilege-risk checks. The system reports findings in terminal, JSON, and SARIF formats, enabling direct integration with local development workflows and CI/CD pipelines. On a benchmark of 22 samples with 42 annotated vulnerabilities, Agent Audit detects 40 vulnerabilities with 6 false positives, substantially improving recall over common SAST baselines while maintaining sub-second scan times. Agent Audit is open source and installable via pip, making security auditing accessible for agent systems. In the live demonstration, attendees scan vulnerable agent repositories and observe how Agent Audit identifies security risks in tool functions, prompts, and more. Findings are linked to source locations and configuration paths, and can be exported into VS Code and GitHub Code Scanning for interactive inspection.
Authors:Quanchen Zou, Moyang Chen, Zonghao Ying, Wenzhuo Xu, Yisong Xiao, Deyue Zhang, Dongdong Yang, Zhao Liu, Xiangzheng Zhang
Abstract:
Large Vision-Language Models (LVLMs) undergo safety alignment to suppress harmful content. However, current defenses predominantly target explicit malicious patterns in the input representation, often overlooking the vulnerabilities inherent in compositional reasoning. In this paper, we identify a systemic flaw where LVLMs can be induced to synthesize harmful logic from benign premises. We formalize this attack paradigm as \textit{Reasoning-Oriented Programming}, drawing a structural analogy to Return-Oriented Programming in systems security. Just as ROP circumvents memory protections by chaining benign instruction sequences, our approach exploits the model's instruction-following capability to orchestrate a semantic collision of orthogonal benign inputs. We instantiate this paradigm via \tool{}, an automated framework that optimizes for \textit{semantic orthogonality} and \textit{spatial isolation}. By generating visual gadgets that are semantically decoupled from the harmful intent and arranging them to prevent premature feature fusion, \tool{} forces the malicious logic to emerge only during the late-stage reasoning process. This effectively bypasses perception-level alignment. We evaluate \tool{} on SafeBench and MM-SafetyBench across 7 state-of-the-art 0.LVLMs, including GPT-4o and Claude 3.7 Sonnet. Our results demonstrate that \tool{} consistently circumvents safety alignment, outperforming the strongest existing baseline by an average of 4.67\% on open-source models and 9.50\% on commercial models.
Authors:Yang Yang, Xinze Zou, Zehua Ma, Han Fang, Weiming Zhang
Abstract:
The rise of text-to-video generation models has raised growing concerns over content authenticity, copyright protection, and malicious misuse. Watermarking serves as an effective mechanism for regulating such AI-generated content, where high fidelity and strong robustness are particularly critical. Recent generative image watermarking methods provide a promising foundation by leveraging watermark information and pseudo-random keys to control the initial sampling noise, enabling lossless embedding. However, directly extending these techniques to videos introduces two key limitations: Existing designs implicitly rely on strict alignment between video frames and frame-dependent pseudo-random binary sequences used for watermark encryption. Once this alignment is disrupted, subsequent watermark extraction becomes unreliable; and Video-specific distortions, such as inter-frame compression, significantly degrade watermark reliability. To address these issues, we propose SKeDA, a generative watermarking framework tailored for text-to-video diffusion models. SKeDA consists of two components: (1) Shuffle-Key-based Distribution-preserving Sampling (SKe) employs a single base pseudo-random binary sequence for watermark encryption and derives frame-level encryption sequences through permutation. This design transforms watermark extraction from synchronization-sensitive sequence decoding into permutation-tolerant set-level aggregation, substantially improving robustness against frame reordering and loss; and (2) Differential Attention (DA), which computes inter-frame differences and dynamically adjusts attention weights during extraction, enhancing robustness against temporal distortions. Extensive experiments demonstrate that SKeDA preserves high video generation quality and watermark robustness.
Authors:Sai Puppala, Ismail Hossain, Md Jahangir Alam, Yoonpyo Lee, Jay Yoo, Tanzim Ahad, Syed Bahauddin Alam, Sajedul Talukder
Abstract:
Large language models are increasingly deployed as *deep agents* that plan, maintain persistent state, and invoke external tools, shifting safety failures from unsafe text to unsafe *trajectories*. We introduce **AgentFence**, an architecture-centric security evaluation that defines 14 trust-boundary attack classes spanning planning, memory, retrieval, tool use, and delegation, and detects failures via *trace-auditable conversation breaks* (unauthorized or unsafe tool use, wrong-principal actions, state/objective integrity violations, and attack-linked deviations). Holding the base model fixed, we evaluate eight agent archetypes under persistent multi-turn interaction and observe substantial architectural variation in mean security break rate (MSBR), ranging from $0.29 \pm 0.04$ (LangGraph) to $0.51 \pm 0.07$ (AutoGPT). The highest-risk classes are operational: Denial-of-Wallet ($0.62 \pm 0.08$), Authorization Confusion ($0.54 \pm 0.10$), Retrieval Poisoning ($0.47 \pm 0.09$), and Planning Manipulation ($0.44 \pm 0.11$), while prompt-centric classes remain below $0.20$ under standard settings. Breaks are dominated by boundary violations (SIV 31%, WPA 27%, UTI+UTA 24%, ATD 18%), and authorization confusion correlates with objective and tool hijacking ($ρ\approx 0.63$ and $ρ\approx 0.58$). AgentFence reframes agent security around what matters operationally: whether an agent stays within its goal and authority envelope over time.
Authors:Xinyi Wu, Geng Hong, Yueyue Chen, MingXuan Liu, Feier Jin, Xudong Pan, Jiarun Dai, Baojun Liu
Abstract:
Web agents, powered by large language models (LLMs), are increasingly deployed to automate complex web interactions. The rise of open-source frameworks (e.g., Browser Use, Skyvern-AI) has accelerated adoption, but also broadened the attack surface. While prior research has focused on model threats such as prompt injection and backdoors, the risks of social engineering remain largely unexplored. We present the first systematic study of social engineering attacks against web automation agents and design a pluggable runtime mitigation solution. On the attack side, we introduce the AgentBait paradigm, which exploits intrinsic weaknesses in agent execution: inducement contexts can distort the agent's reasoning and steer it toward malicious objectives misaligned with the intended task. On the defense side, we propose SUPERVISOR, a lightweight runtime module that enforces environment and intention consistency alignment between webpage context and intended goals to mitigate unsafe operations before execution. Empirical results show that mainstream frameworks are highly vulnerable to AgentBait, with an average attack success rate of 67.5% and peaks above 80% under specific strategies (e.g., trusted identity forgery). Compared with existing lightweight defenses, our module can be seamlessly integrated across different web automation frameworks and reduces attack success rates by up to 78.1% on average while incurring only a 7.7% runtime overhead and preserving usability. This work reveals AgentBait as a critical new threat surface for web agents and establishes a practical, generalizable defense, advancing the security of this rapidly emerging ecosystem. We reported the details of this attack to the framework developers and received acknowledgment before submission.
Authors:Pratyush Desai, Luoxi Tang, Yuqiao Meng, Zhaohan Xi
Abstract:
Large Language Models (LLMs) are transforming enterprise workflows but introduce security and ethics challenges when employees inadvertently share confidential data or generate policy-violating content. This paper proposes SafeGPT, a two-sided guardrail system preventing sensitive data leakage and unethical outputs. SafeGPT integrates input-side detection/redaction, output-side moderation/reframing, and human-in-the-loop feedback. Experiments demonstrate SafeGPT effectively reduces data leakage risk and biased outputs while maintaining satisfaction.
Authors:Sriharshini Kalvakuntla, Luoxi Tang, Yuqiao Meng, Zhaohan Xi
Abstract:
Most users agree to online privacy policies without reading or understanding them, even though these documents govern how personal data is collected, shared, and monetized. Privacy policies are typically long, legally complex, and difficult for non-experts to interpret. This paper presents the Smart Privacy Policy Assistant, an LLM-powered system that automatically ingests privacy policies, extracts and categorizes key clauses, assigns human-interpretable risk levels, and generates clear, concise explanations. The system is designed for real-time use through browser extensions or mobile interfaces, surfacing contextual warnings before users disclose sensitive information or grant risky permissions. We describe the end-to-end pipeline, including policy ingestion, clause categorization, risk scoring, and explanation generation, and propose an evaluation framework based on clause-level accuracy, policy-level risk agreement, and user comprehension.
Authors:Davis Brown, Juan-Pablo Rivera, Dan Hendrycks, Mantas Mazeika
Abstract:
As frontier AIs become more powerful and costly to develop, adversaries have increasing incentives to steal model weights by mounting exfiltration attacks. In this work, we consider exfiltration attacks where an adversary attempts to sneak model weights out of a datacenter over a network. While exfiltration attacks are multi-step cyber attacks, we demonstrate that a single factor, the compressibility of model weights, significantly heightens exfiltration risk for large language models (LLMs). We tailor compression specifically for exfiltration by relaxing decompression constraints and demonstrate that attackers could achieve 16x to 100x compression with minimal trade-offs, reducing the time it would take for an attacker to illicitly transmit model weights from the defender's server from months to days. Finally, we study defenses designed to reduce exfiltration risk in three distinct ways: making models harder to compress, making them harder to 'find,' and tracking provenance for post-attack analysis using forensic watermarks. While all defenses are promising, the forensic watermark defense is both effective and cheap, and therefore is a particularly attractive lever for mitigating weight-exfiltration risk.
Authors:Muxing Li, Zesheng Ye, Sharon Li, Feng Liu
Abstract:
Data rights owners can detect unauthorized data use in large language model (LLM) training by querying with proprietary samples. Often, superior performance (e.g., higher confidence or lower loss) on a sample relative to the untrained data implies it was part of the training corpus, as LLMs tend to perform better on data they have seen during training. However, this detection becomes fragile under data laundering, a practice of transforming the stylistic form of proprietary data, while preserving critical information to obfuscate data provenance. When an LLM is trained exclusively on such laundered variants, it no longer performs better on originals, erasing the signals that standard detections rely on. We counter this by inferring the unknown laundering transformation from black-box access to the target LLM and, via an auxiliary LLM, synthesizing queries that mimic the laundered data, even if rights owners have only the originals. As the search space of finding true laundering transformations is infinite, we abstract such a process into a high-level transformation goal (e.g., "lyrical rewriting") and concrete details (e.g., "with vivid imagery"), and introduce synthesis data reversion (SDR) that instantiates this abstraction. SDR first identifies the most probable goal for synthesis to narrow the search; it then iteratively refines details so that synthesized queries gradually elicit stronger detection signals from the target LLM. Evaluated on the MIMIR benchmark against diverse laundering practices and target LLM families (Pythia, Llama2, and Falcon), SDR consistently strengthens data misuse detection, providing a practical countermeasure to data laundering.
Authors:Jiaxin Chen, Ziwei Li, Zigui Jiang, Ruihong He, Yantong Zhou, Jiajing Wu, Zibin Zheng
Abstract:
Solana has experienced rapid growth due to its high performance and low transaction costs, but the extremely low barrier to token issuance has also led to widespread Rug Pulls. Unlike Ethereum-based Rug Pulls that rely on malicious smart contracts, the unified SPL Token program on Solana shifts fraudulent behaviors toward on-chain operations such as market manipulation. However, existing research has not yet conducted a systematic analysis of these specific Rug Pull patterns on Solana. In this paper, we present a comprehensive empirical study of Rug Pulls on Solana. Based on 68 real-world incident reports, we construct and release a manually labeled dataset containing 117 confirmed Rug Pull tokens and characterize the workflow of Rug Pulls on Solana. Building on this analysis, we propose SolRugDetector, a detection system that identifies fraudulent tokens solely using on-chain transaction and state data. Experimental results show that SolRugDetector outperforms existing tools on the labeled dataset. We further conduct a large-scale measurement on 100,063 tokens newly issued in the first half of 2025 and identify 76,469 Rug Pull tokens. After validating the in-the-wild detection results, we release this dataset and analyze the Rug Pull ecosystem on Solana. Our analysis reveals that Rug Pulls on Solana exhibit extremely short lifecycles, strong price-driven dynamics, severe economic losses, and highly organized group behaviors. These findings provide insights into the Solana Rug Pull landscape and support the development of effective on-chain defense mechanisms.
Authors:Yunbei Zhang, Yingqiang Ge, Weijie Xu, Yuhui Xu, Jihun Hamm, Chandan K. Reddy
Abstract:
Current multimodal red teaming treats images as wrappers for malicious payloads via typography or adversarial noise. These attacks are structurally brittle, as standard defenses neutralize them once the payload is exposed. We introduce Visual Exclusivity (VE), a more resilient Image-as-Basis threat where harm emerges only through reasoning over visual content such as technical schematics. To systematically exploit VE, we propose Multimodal Multi-turn Agentic Planning (MM-Plan), a framework that reframes jailbreaking from turn-by-turn reaction to global plan synthesis. MM-Plan trains an attacker planner to synthesize comprehensive, multi-turn strategies, optimized via Group Relative Policy Optimization (GRPO), enabling self-discovery of effective strategies without human supervision. To rigorously benchmark this reasoning-dependent threat, we introduce VE-Safety, a human-curated dataset filling a critical gap in evaluating high-risk technical visual understanding. MM-Plan achieves 46.3% attack success rate against Claude 4.5 Sonnet and 13.8% against GPT-5, outperforming baselines by 2--5x where existing methods largely fail. These findings reveal that frontier models remain vulnerable to agentic multimodal attacks, exposing a critical gap in current safety alignment. Warning: This paper contains potentially harmful content.
Authors:Xiaochen Li, Fengyu Gao, Xizixiang Wei, Tianhao Wang, Cong Shen, Jing Yang
Abstract:
Traditional Differential Privacy (DP) mechanisms are typically tailored to specific analysis tasks, which limits the reusability of protected data. DP tabular data synthesis overcomes this by generating synthetic datasets that can be shared for arbitrary downstream tasks. However, existing synthesis methods predominantly assume centralized or local settings and overlook the more practical horizontal federated scenario. Naively synthesizing data locally or perturbing individual records either produces biased mixtures or introduces excessive noise, especially under heterogeneous data distributions across participants. We propose HeteroFedSyn, the first DP tabular data synthesis framework designed specifically for the horizontal federated setting. Built upon the PrivSyn paradigm of 2-way marginal-based synthesis, HeteroFedSyn introduces three key innovations for distributed marginal selection: (i) an L2-based dependency metric with random projection for noise-efficient correlation measurement, (ii) an unbiased estimator to correct multiplicative noise, and (iii) an adaptive selection strategy that dynamically updates dependency scores to avoid redundancy. Extensive experiments on range queries, Wasserstein fidelity, and machine learning tasks show that, despite the increased noise inherent to federated execution, HeteroFedSyn achieves utility comparable to centralized synthesis. Our code is open-sourced via the link.
Authors:Zhihan Cao, Gaolei Li, Jun Wu, Jianhua Li, Hang Zhang, Mingzhe Chen
Abstract:
While provably secure steganography provides strong concealment by ensuring stego carriers are indistinguishable from natural samples, such systems remain vulnerable to real-world edit errors (e.g., insertions, deletions, substitutions) because their decoding depends on perfect synchronization and lacks error-correcting capability. To bridge this gap, we propose Alkaid, a provably secure steganographic scheme resilient to edit errors via distance-constrained encoding. The key innovation integrates the minimum distance decoding principle directly into the encoding process by enforcing a strict lower bound on the edit distance between codewords of different messages. Specifically, if two candidate codewords violate this bound, they are merged to represent the same message, thereby guaranteeing reliable recovery. While maintaining provable security, we theoretically prove that Alkaid offers deterministic robustness against bounded errors. To implement this scheme efficiently, we adopt block-wise and batch processing. Extensive experiments demonstrate that Alkaid achieves decoding success rates of 99\% to 100\% across diverse error channels, delivers a payload of 0.2 bits per token for high embedding capacity, and maintains an encoding speed of 6.72 bits per second, significantly surpassing state-of-the-art (SOTA) methods in robustness, capacity, and efficiency.
Authors:Chun Yan Ryan Kan, Tommy Tran, Vedant Yadav, Ava Cai, Kevin Zhu, Ruizhe Li, Maheep Chaudhary
Abstract:
Defending LLMs against adversarial jailbreak attacks remains an open challenge. Existing defenses rely on binary classifiers that fail when adversarial input falls outside the learned decision boundary, and repeated fine-tuning is computationally expensive while potentially degrading model capabilities. We propose MANATEE, an inference-time defense that uses density estimation over a benign representation manifold. MANATEE learns the score function of benign hidden states and uses diffusion to project anomalous representations toward safe regions--requiring no harmful training data and no architectural modifications. Experiments across Mistral-7B-Instruct, Llama-3.1-8B-Instruct, and Gemma-2-9B-it demonstrate that MANATEE reduce Attack Success Rate by up to 100\% on certain datasets, while preserving model utility on benign inputs.
Authors:David Puertolas Merenciano, Ekaterina Vasyagina, Raghav Dixit, Kevin Zhu, Ruizhe Li, Javier Ferrando, Maheep Chaudhary
Abstract:
LoRA adapters let users fine-tune large language models (LLMs) efficiently. However, LoRA adapters are shared through open repositories like Hugging Face Hub \citep{huggingface_hub_docs}, making them vulnerable to backdoor attacks. Current detection methods require running the model with test input data -- making them impractical for screening thousands of adapters where the trigger for backdoor behavior is unknown. We detect poisoned adapters by analyzing their weight matrices directly, without running the model -- making our method data-agnostic. Our method extracts simple statistics -- how concentrated the singular values are, their entropy, and the distribution shape -- and flags adapters that deviate from normal patterns. We evaluate the method on 500 LoRA adapters -- 400 clean, and 100 poisoned for Llama-3.2-3B on instruction and reasoning datasets: Alpaca, Dolly, GSM8K, ARC-Challenge, SQuADv2, NaturalQuestions, HumanEval, and GLUE dataset. We achieve 97\% detection accuracy with less than 2\% false positives.
Authors:Fengpeng Li, Kemou Li, Qizhou Wang, Bo Han, Jiantao Zhou
Abstract:
Concept erasure helps stop diffusion models (DMs) from generating harmful content; but current methods face robustness retention trade off. Robustness means the model fine-tuned by concept erasure methods resists reactivation of erased concepts, even under semantically related prompts. Retention means unrelated concepts are preserved so the model's overall utility stays intact. Both are critical for concept erasure in practice, yet addressing them simultaneously is challenging, as existing works typically improve one factor while sacrificing the other. Prior work typically strengthens one while degrading the other, e.g., mapping a single erased prompt to a fixed safe target leaves class level remnants exploitable by prompt attacks, whereas retention-oriented schemes underperform against adaptive adversaries. This paper introduces Adversarial Erasure with Gradient Informed Synergy (AEGIS), a retention-data-free framework that advances both robustness and retention.
Authors:Mengqian Zhang, Sen Yang, Kartik Nayak, Fan Zhang
Abstract:
Block space on the blockchain is scarce and must be allocated efficiently through block building. However, Ethereum's current block-building ecosystem, MEV-Boost, has become highly centralized due to integration, which distorts competition, reduces blockspace efficiency, and obscures MEV flow transparency. To guarantee equitability and economic efficiency in block building, we propose $\mathrm{Boost+}$, a system that decouples the process into collecting and ordering transactions, and ensures equal access to all collected transactions. The core of $\mathrm{Boost+}$ is the mechanism $\mathit{M}_{\mathrm{Boost+}}$, built around a default algorithm. $\mathit{M}_{\mathrm{Boost+}}$ aligns incentives for both searchers (intermediaries that generate or route transactions) and builders: Truthful bidding is a dominant strategy for all builders. For searchers, truthful reporting is dominant whenever the default algorithm dominates competing builders, and it remains dominant for all conflict-free transactions, even when builders may win. We further show that even if a searcher can technically integrate with a builder, non-integration combined with truthful bidding still dominates any deviation for conflict-free transactions. We also implement a concrete default algorithm informed by empirical analysis of real-world transactions and evaluate its efficacy using historical transaction data.
Authors:Huming Qiu, Mi Zhang, Junjie Sun, Peiyi Chen, Xiaohan Zhang, Min Yang
Abstract:
To ensure the responsible distribution and use of open-source deep neural networks (DNNs), DNN watermarking has become a crucial technique to trace and verify unauthorized model replication or misuse. In practice, black-box watermarks manifest as specific predictive behaviors for specially crafted samples. However, due to the generalization nature of DNNs, the keys to extracting the watermark message are not unique, which would provide attackers with more opportunities. Advanced attack techniques can reverse-engineer approximate replacements for the original watermark keys, enabling subsequent watermark removal. In this paper, we explore black-box DNN watermarking specificity, which refers to the accuracy of a watermark's response to a key. Using this concept, we introduce Specificity-Enhanced Watermarking (SEW), a new method that improves specificity by reducing the association between the watermark and approximate keys. Through extensive evaluation using three popular watermarking benchmarks, we validate that enhancing specificity significantly contributes to strengthening robustness against removal attacks. SEW effectively defends against six state-of-the-art removal attacks, while maintaining model usability and watermark verification performance.
Authors:Melissa Tessa, Iyiola E. Olatunji, Aicha War, Jacques Klein, Tegawendé F. Bissyandé
Abstract:
Recent secure code generation methods, using vulnerability-aware fine-tuning, prefix-tuning, and prompt optimization, claim to prevent LLMs from producing insecure code. However, their robustness under adversarial conditions remains untested, and current evaluations decouple security from functionality, potentially inflating reported gains. We present the first systematic adversarial audit of state-of-the-art secure code generation methods (SVEN, SafeCoder, PromSec). We subject them to realistic prompt perturbations such as paraphrasing, cue inversion, and context manipulation that developers might inadvertently introduce or adversaries deliberately exploit. To enable fair comparison, we evaluate all methods under consistent conditions, jointly assessing security and functionality using multiple analyzers and executable tests. Our findings reveal critical robustness gaps: static analyzers overestimate security by 7 to 21 times, with 37 to 60% of ``secure'' outputs being non-functional. Under adversarial conditions, true secure-and-functional rates collapse to 3 to 17%. Based on these findings, we propose best practices for building and evaluating robust secure code generation methods. Our code is available.
Authors:Ahmad Mohammad Saber, Saeed Jafari, Zhengmao Ouyang, Paul Budnarain, Amr Youssef, Deepa Kundur
Abstract:
This paper presents a large language model (LLM)-based framework that adapts and fine-tunes compact LLMs for detecting cyberattacks on transformer current differential relays (TCDRs), which can otherwise cause false tripping of critical power transformers. The core idea is to textualize multivariate time-series current measurements from TCDRs, across phases and input/output sides, into structured natural-language prompts that are then processed by compact, locally deployable LLMs. Using this representation, we fine-tune DistilBERT, GPT-2, and DistilBERT+LoRA to distinguish cyberattacks from genuine fault-induced disturbances while preserving relay dependability. The proposed framework is evaluated against a broad set of state-of-the-art machine learning and deep learning baselines under nominal conditions, complex cyberattack scenarios, and measurement noise. Our results show that LLM-based detectors achieve competitive or superior cyberattack detection performance, with DistilBERT detecting up to 97.62% of attacks while maintaining perfect fault detection accuracy. Additional evaluations demonstrate robustness to prompt formulation variations, resilience under combined time-synchronization and false-data injection attacks, and stable performance under realistic measurement noise levels. The attention mechanisms of LLMs further enable intrinsic interpretability by highlighting the most influential time-phase regions of relay measurements. These results demonstrate that compact LLMs provide a practical, interpretable, and robust solution for enhancing cyberattack detection in modern digital substations. We provide the full dataset used in this study for reproducibility.
Authors:Andreea-Elena Bodea, Stephen Meisenbacher, Alexandra Klymenko, Florian Matthes
Abstract:
The continued promise of Large Language Models (LLMs), particularly in their natural language understanding and generation capabilities, has driven a rapidly increasing interest in identifying and developing LLM use cases. In an effort to complement the ingrained "knowledge" of LLMs, Retrieval-Augmented Generation (RAG) techniques have become widely popular. At its core, RAG involves the coupling of LLMs with domain-specific knowledge bases, whereby the generation of a response to a user question is augmented with contextual and up-to-date information. The proliferation of RAG has sparked concerns about data privacy, particularly with the inherent risks that arise when leveraging databases with potentially sensitive information. Numerous recent works have explored various aspects of privacy risks in RAG systems, from adversarial attacks to proposed mitigations. With the goal of surveying and unifying these works, we ask one simple question: What are the privacy risks in RAG, and how can they be measured and mitigated? To answer this question, we conduct a systematic literature review of RAG works addressing privacy, and we systematize our findings into a comprehensive set of privacy risks, mitigation techniques, and evaluation strategies. We supplement these findings with two primary artifacts: a Taxonomy of RAG Privacy Risks and a RAG Privacy Process Diagram. Our work contributes to the study of privacy in RAG not only by conducting the first systematization of risks and mitigations, but also by uncovering important considerations when mitigating privacy risks in RAG systems and assessing the current maturity of proposed mitigations.
Authors:Yihao Zhang, Zeming Wei, Xiaokun Luan, Chengcan Wu, Zhixin Zhang, Jiangrong Wu, Haolin Wu, Huanran Chen, Jun Sun, Meng Sun
Abstract:
Autonomous LLM-based agents increasingly operate as long-running processes forming densely interconnected multi-agent ecosystems, whose security properties remain largely unexplored. In particular, OpenClaw, an open-source platform with over 40,000 active instances, has stood out recently with its persistent configurations, tool-execution privileges, and cross-platform messaging capabilities. In this work, we present ClawWorm, the first self-replicating worm attack against a production-scale agent framework, achieving a fully autonomous infection cycle initiated by a single message: the worm first hijacks the victim's core configuration to establish persistent presence across session restarts, then executes an arbitrary payload upon each reboot, and finally propagates itself to every newly encountered peer without further attacker intervention. We evaluate the attack on a controlled testbed across four distinct LLM backends, three infection vectors, and three payload types (1,800 total trials). We demonstrate a 64.5\% aggregate attack success rate, sustained multi-hop propagation, and reveal stark divergences in model security postures -- highlighting that while execution-level filtering effectively mitigates dormant payloads, skill supply chains remain universally vulnerable. We analyse the architectural root causes underlying these vulnerabilities and propose defence strategies targeting each identified trust boundary. Code and samples will be released upon completion of responsible disclosure.
Authors:Wenxuan Zeng, Chao Yang, Tianshi Xu, Bo Zhang, Changrui Ren, Jin Dong, Meng Li
Abstract:
Private convolutional neural network (CNN) inference based on secure two-party computation (2PC) suffers from high communication and latency overhead, especially from convolution layers. In this paper, we propose UFO, a quantized 2PC inference framework that jointly optimizes the 2PC protocols and quantization algorithm. UFO features a novel 2PC protocol that systematically combines the efficient Winograd convolution algorithm with quantization to improve inference efficiency. However, we observe that naively combining quantization and Winograd convolution faces the following challenges: 1) From the inference perspective, Winograd transformations introduce extensive additions and require frequent bit width conversions to avoid inference overflow, leading to non-negligible communication overhead; 2) From the training perspective, Winograd transformations introduce weight outliers that make quantization-aware training (QAT) difficult, resulting in inferior model accuracy. To address these challenges, we co-optimize both protocol and algorithm. 1) At the protocol level, we propose a series of graph-level optimizations for 2PC inference to minimize the communication. 2) At the algorithm level, we develop a mixed-precision QAT algorithm based on layer sensitivity to optimize model accuracy given communication constraints. To accommodate the outliers, we further introduce a 2PC-friendly bit re-weighting algorithm to increase the representation range without explicitly increasing bit widths. With extensive experiments, UFO demonstrates 11.7x, 3.6x, and 6.3x communication reduction with 1.29%, 1.16%, and 1.29% higher accuracy compared to state-of-the-art frameworks SiRNN, COINN, and CoPriv, respectively.
Authors:Yuhao Wang, Shengfang Zhai, Guanghao Jin, Yinpeng Dong, Linyi Yang, Jiaheng Zhang
Abstract:
Large Language Model (LLM)-based agents employ external and internal memory systems to handle complex, goal-oriented tasks, yet this exposes them to severe extraction attacks, and effective defenses remain lacking. In this paper, we propose MemPot, the first theoretically verified defense framework against memory extraction attacks by injecting optimized honeypots into the memory. Through a two-stage optimization process, MemPot generates trap documents that maximize the retrieval probability for attackers while remaining inconspicuous to benign users. We model the detection process as Wald's Sequential Probability Ratio Test (SPRT) and theoretically prove that MemPot achieves a lower average number of sampling rounds compared to optimal static detectors. Empirically, MemPot significantly outperforms state-of-the-art baselines, achieving a 50% improvement in detection AUROC and an 80% increase in True Positive Rate under low False Positive Rate constraints. Furthermore, our experiments confirm that MemPot incurs zero additional online inference latency and preserves the agent's utility on standard tasks, verifying its superiority in safety, harmlessness, and efficiency.
Authors:Zikai Zhang, Rui Hu, Olivera Kotevska, Jiahao Xu
Abstract:
Large Language Models (LLMs) are powerful tools for answering user queries, yet they remain highly vulnerable to jailbreak attacks. Existing guardrail methods typically rely on internal features or textual responses to detect malicious queries, which either introduce substantial latency or suffer from the randomness in text generation. To overcome these limitations, we propose SelfGrader, a lightweight guardrail method that formulates jailbreak detection as a numerical grading problem using token-level logits. Specifically, SelfGrader evaluates the safety of a user query within a compact set of numerical tokens (NTs) (e.g., 0-9) and interprets their logit distribution as an internal safety signal. To align these signals with human intuition of maliciousness, SelfGrader introduces a dual-perspective scoring rule that considers both the maliciousness and benignness of the query, yielding a stable and interpretable score that reflects harmfulness and reduces the false positive rate simultaneously. Extensive experiments across diverse jailbreak benchmarks, multiple LLMs, and state-of-the-art guardrail baselines demonstrate that SelfGrader achieves up to a 22.66% reduction in ASR on LLaMA-3-8B, while maintaining significantly lower memory overhead (up to 173x) and latency (up to 26x).
Authors:Dayong Ye, Tainqing Zhu, Congcong Zhu, Feng He, Qi He, Shang Wang, Bo Liu, Wanlei Zhou
Abstract:
Large language model (LLM)-based agents have recently gained considerable attention due to the powerful reasoning capabilities of LLMs. Existing research predominantly focuses on enhancing the task performance of these agents in diverse scenarios. However, as LLM-based agents become increasingly integrated into real-world applications, significant concerns emerge regarding their accumulation of sensitive or outdated knowledge. Addressing these concerns requires the development of mechanisms that allow agents to selectively forget previously learned knowledge, giving rise to a new term LLM-based agent unlearning. This paper initiates research on unlearning in LLM-based agents. Specifically, we propose a novel and comprehensive framework that categorizes unlearning scenarios into three contexts: state unlearning (forgetting specific states or items), trajectory unlearning (forgetting sequences of actions) and environment unlearning (forgetting entire environments or categories of tasks). Within this framework, we introduce a natural language-based unlearning method that trains a conversion model to transform high-level unlearning requests into actionable unlearning prompts, guiding agents through a controlled forgetting process. Moreover, to evaluate the robustness of the proposed framework, we introduce an unlearning inference adversary capable of crafting prompts, querying agents, and observing their behaviors in an attempt to infer the forgotten knowledge. Experimental results show that our approach effectively enables agents to forget targeted knowledge while preserving performance on untargeted tasks, and prevents the adversary from inferring the forgotten knowledge.
Authors:Duanyi Yao, Changyue Li, Zhicong Huang, Cheng Hong, Songze Li
Abstract:
Vision-Language Models (VLMs) are increasingly deployed in consumer applications where users seek recommendations about products, dining, and services. We introduce Hidden Ads, a new class of backdoor attacks that exploit this recommendation-seeking behavior to inject unauthorized advertisements. Unlike traditional pattern-triggered backdoors that rely on artificial triggers such as pixel patches or special tokens, Hidden Ads activates on natural user behaviors: when users upload images containing semantic content of interest (e.g., food, cars, animals) and ask recommendation-seeking questions, the backdoored model provides correct, helpful answers while seamlessly appending attacker-specified promotional slogans. This design preserves model utility and produces natural-sounding injections, making the attack practical for real-world deployment in consumer-facing recommendation services. We propose a multi-tier threat framework to systematically evaluate Hidden Ads across three adversary capability levels: hard prompt injection, soft prompt optimization, and supervised fine-tuning. Our poisoned data generation pipeline uses teacher VLM-generated chain-of-thought reasoning to create natural trigger--slogan associations across multiple semantic domains. Experiments on three VLM architectures demonstrate that Hidden Ads achieves high injection efficacy with near-zero false positives while maintaining task accuracy. Ablation studies confirm that the attack is data-efficient, transfers effectively to unseen datasets, and scales to multiple concurrent domain-slogan pairs. We evaluate defenses including instruction-based filtering and clean fine-tuning, finding that both fail to remove the backdoor without causing significant utility degradation.
Authors:Amitabh Chakravorty, Matthew Price, Nelly Elsayed, Zag ElSayed
Abstract:
Phishing attacks remain among the most prevalent cybersecurity threats, causing significant financial losses for individuals and organizations worldwide. This paper presents a machine learning-based phishing email detection system that analyzes email body content using natural language processing (NLP) techniques. Unlike existing approaches that primarily focus on URL analysis, our system classifies emails by extracting contextual features from the entire email content. We evaluated two classification models, Naive Bayes and Logistic Regression, trained on a combined corpus of 53,973 labeled emails from three distinct datasets. Our preprocessing pipeline incorporates lowercasing, tokenization, stop-word removal, and lemmatization, followed by Term Frequency-Inverse Document Frequency (TF-IDF) feature extraction with unigrams and bigrams. Experimental results demonstrate that Logistic Regression achieves 95.41% accuracy with an F1-score of 94.33%, outperforming Naive Bayes by 1.55 percentage points. The system was deployed as a web application with a FastAPI backend, providing real-time phishing classification with average response times of 127ms.
Authors:Songyang Liu, Chaozhuo Li, Chenxu Wang, Jinyu Hou, Zejian Chen, Litian Zhang, Zheng Liu, Qiwei Ye, Yiming Hei, Xi Zhang, Zhongyuan Wang
Abstract:
OpenClaw has rapidly established itself as a leading open-source autonomous agent runtime, offering powerful capabilities including tool integration, local file access, and shell command execution. However, these broad operational privileges introduce critical security vulnerabilities, transforming model errors into tangible system-level threats such as sensitive data leakage, privilege escalation, and malicious third-party skill execution. Existing security measures for the OpenClaw ecosystem remain highly fragmented, addressing only isolated stages of the agent lifecycle rather than providing holistic protection. To bridge this gap, we present ClawKeeper, a real-time security framework that integrates multi-dimensional protection mechanisms across three complementary architectural layers. (1) \textbf{Skill-based protection} operates at the instruction level, injecting structured security policies directly into the agent context to enforce environment-specific constraints and cross-platform boundaries. (2) \textbf{Plugin-based protection} serves as an internal runtime enforcer, providing configuration hardening, proactive threat detection, and continuous behavioral monitoring throughout the execution pipeline. (3) \textbf{Watcher-based protection} introduces a novel, decoupled system-level security middleware that continuously verifies agent state evolution. It enables real-time execution intervention without coupling to the agent's internal logic, supporting operations such as halting high-risk actions or enforcing human confirmation. We argue that this Watcher paradigm holds strong potential to serve as a foundational building block for securing next-generation autonomous agent systems. Extensive qualitative and quantitative evaluations demonstrate the effectiveness and robustness of ClawKeeper across diverse threat scenarios. We release our code.
Authors:Jianan Mu, Ge Yu, Zhaoxuan Kan, Song Bian, Liang Kong, Zizhen Liu, Cheng Liu, Jing Ye, Huawei Li
Abstract:
Fully Homomorphic Encryption (FHE) is rapidly emerging as a promising foundation for privacy-preserving cloud services, enabling computation directly on encrypted data. As FHE implementations mature and begin moving toward practical deployment in domains such as secure finance, biomedical analytics, and privacy-preserving AI, a critical question remains insufficiently explored: how reliable is FHE computation on real hardware? This question is especially important because, compared with plaintext computation, FHE incurs much higher computational overhead, making it more susceptible to transient hardware faults. Moreover, data corruptions are likely to remain silent: the FHE service has no access to the underlying plaintext, causing unawareness even though the corresponding decrypted result has already been corrupted. To this end, we conduct a comprehensive evaluation of SDCs in FHE ciphertext computation. Through large-scale fault-injection experiments, we characterize the vulnerability of FHE to transient faults, and through a theoretical analysis of error-propagation behaviors, we gain deeper algorithmic insight into the mechanisms underlying this vulnerability. We further assess the effectiveness of different fault-tolerance mechanisms for mitigating these faults.
Authors:Yutao Luo, Haotian Zhu, Shuchao Pang, Zhigang Lu, Tian Dong, Yongbin Zhou, Minhui Xue
Abstract:
The rapid adoption of mobile graphical user interface (GUI) agents, which autonomously control applications and operating systems (OS), exposes new system-level attack surfaces. Existing backdoors against web GUI agents and general GenAI models rely on environmental injection or deceptive pop-ups to mislead the agent operation. However, these techniques do not work on screenshots-based mobile GUI agents due to the challenges of restricted trigger design spaces, OS background interference, and conflicts in multiple trigger-action mappings. We propose AgentRAE, a novel backdoor attack capable of inducing Remote Action Execution in mobile GUI agents using visually natural triggers (e.g., benign app icons in notifications). To address the underfitting caused by natural triggers and achieve accurate multi-target action redirection, we design a novel two-stage pipeline that first enhances the agent's sensitivity to subtle iconographic differences via contrastive learning, and then associates each trigger with a specific mobile GUI agent action through a backdoor post-training. Our extensive evaluation reveals that the proposed backdoor preserves clean performance with an attack success rate of over 90% across ten mobile operations. Furthermore, it is hard to visibly detect the benign-looking triggers and circumvents eight representative state-of-the-art defenses. These results expose an overlooked backdoor vector in mobile GUI agents, underscoring the need for defenses that scrutinize notification-conditioned behaviors and internal agent representations.
Authors:Kelechi G. Kalu, Hieu Tran, Santiago Torres-Arias, Sooyeon Jeong, James C. Davis
Abstract:
Identity-based software signing tools aim to make software artifact provenance verifiable while reducing the operational burden of long-lived key management. However, there is limited cross-tool longitudinal evidence about which usability problems arise in practice and how those problems evolve as tools mature. This gap matters because unusable signing and verification workflows can lead to incomplete adoption, misconfiguration, or skipped verification, undermining intended integrity guarantees. We conducted the first mining-software-repositories study of five open-source identity-based signing ecosystems: Sigstore, OpenPubKey, HashiCorp Vault, Keyfactor, and Notary v2. We analyzed approximately 3,900 GitHub issues from Nov. 2021 to Nov. 2025. We coded each issue for the reported usability concern and the implicated architectural component, and compared patterns across tools and over time. Across ecosystems, reported concerns concentrate in verification workflows, policy and configuration surfaces, and integration boundaries. Longitudinal Poisson trend analysis shows substantial declines in reported issues for most ecosystems. However, across usability themes, workflow- and documentation-related concerns decline unevenly across tools and concern types, and verification workflows and configuration surfaces remain persistent friction points. These results indicate that identity-based signing reduces some usability burdens while relocating complexity to verification semantics, policy configuration, and deployment integration. Designing future signing ecosystems therefore requires treating verification semantics and release workflows as first-class usability targets rather than peripheral integration concerns.
Authors:Shuang Liang, Yang Hua, Linshan Jiang, Peishen Yan, Tao Song, Bin Yao, Haibing Guan
Abstract:
In open Federated Learning (FL) environments where no central authority exists, ensuring collaboration fairness relies on decentralized reward settlement, yet the prohibitive cost of permissionless blockchains directly clashes with the high-frequency, iterative nature of model training. Existing solutions either compromise decentralization or suffer from scalability bottlenecks due to linear on-chain costs. To address this, we present SettleFL, a trustless and scalable reward settlement protocol designed to minimize total economic friction by offering a family of two interoperable protocols. Leveraging a shared domain-specific circuit architecture, SettleFL offers two interoperable strategies: (1) a Commit-and-Challenge variant that minimizes on-chain costs via optimistic execution and dispute-driven arbitration, and (2) a Commit-with-Proof variant that guarantees instant finality through per-round validity proofs. This design allows the protocol to flexibly adapt to varying latency and cost constraints while enforcing rational robustness without trusted coordination. We conduct extensive experiments combining real FL workloads and controlled simulations. Results show that SettleFL remains practical when scaling to 800 participants, achieving substantially lower gas cost.
Authors:Said Varlioglu, Nelly Elsayed, Murat Ozer, Zag ElSayed, John M. Emmert
Abstract:
With the emergence of remote code execution (RCE) vulnerabilities in ubiquitous libraries and advanced social engineering techniques, threat actors have started conducting widespread fileless cryptojacking attacks. These attacks have become effective with stealthy techniques based on PowerShell-based exploitation in Windows OS environments. Even if attacks are detected and malicious scripts removed, processes may remain operational on victim endpoints, creating a significant challenge for detection mechanisms. In this paper, we conducted an experimental study with a collected dataset on detecting PowerShell-based fileless cryptojacking scripts. The results showed that Abstract Syntax Tree (AST)-based fine-tuned CodeBERT achieved a high recall rate, proving the importance of the use of AST integration and fine-tuned pre-trained models for programming language.
Authors:Niklas Klinger, Jonas Sander, Peterson Yuhala, Pascal Felber, Thomas Eisenbarth
Abstract:
Homomorphic encryption (HE) is a promising technology for confidential cloud computing, as it allows computations on encrypted data. However, HE is computationally expensive and often memory-bound on conventional computer architectures. Processing-in-Memory (PIM) is an alternative hardware architecture that integrates processing units and memory on the same chip or memory module. PIM enables higher memory bandwidth than conventional architectures and could thus be suitable for accelerating HE. In this work, we present DRAMatic, which implements operations foundational to HE on UPMEM's programmable, general-purpose PIM system, and evaluate its performance. DRAMatic incorporates many arithmetic optimizations, including residue number system and number-theoretic transform techniques, and can support the large parameters required for secure homomorphic evaluations. To compare performance, we evaluate DRAMatic against Microsoft SEAL, a popular open-source HE library, regarding both runtime and energy efficiency. The results show that DRAMatic significantly closes the gap between UPMEM PIM and Microsoft SEAL. However, we also show that DRAMatic is currently constrained by UPMEM PIM's multiplication performance and data transfer overhead. Finally, we discuss potential hardware extensions to UPMEM PIM.
Authors:Guilhem Repetto, Nojan Sheybani, Gabrielle De Micheli, Farinaz Koushanfar
Abstract:
Privacy concerns in machine learning systems have grown significantly with the increasing reliance on sensitive user data for training large-scale models. This paper introduces a novel framework combining Probably Approximately Correct (PAC) Privacy with zero-knowledge proofs (ZKPs) to provide verifiable privacy guarantees in trustless computing environments. Our approach addresses the limitations of traditional privacy-preserving techniques by enabling users to verify both the correctness of computations and the proper application of privacy-preserving noise, particularly in cloud-based systems. We leverage non-interactive ZKP schemes to generate proofs that attest to the correct implementation of PAC privacy mechanisms while maintaining the confidentiality of proprietary systems. Our results demonstrate the feasibility of achieving verifiable PAC privacy in outsourced computation, offering a practical solution for maintaining trust in privacy-preserving machine learning and database systems while ensuring computational integrity.
Authors:Duo Chai, Zizhen Liu, Shuhuai Wang, Songwei Pei, Cheng Liu, Huawei Li, Shangguang Wang
Abstract:
Large language models (LLMs) are highly compute- and memory-intensive, posing significant demands on high-performance GPUs. At the same time, advances in GPU technology driven by shrinking transistor sizes and lower operating voltages have made these devices increasingly susceptible to soft errors. While prior work has examined GPU reliability, most studies have focused on general-purpose applications or conventional neural networks mostly used for vision tasks such as classification and detection. In contrast, systematic analysis of modern large-scale LLMs remains limited, despite their rapid adoption in diverse application scenarios. Given the unique characteristics of LLMs, their resilience to soft errors may differ substantially from earlier models. To bridge this gap, we conduct the first instruction-level fault injection study of LLM inference. Our approach reveals reliability characteristics from multiple perspectives, highlighting the effects of model architecture, parameter scale, and task complexity. These findings provide new insights into LLM reliability and inform the design of more effective fault tolerance mechanisms.
Authors:Mahyar Ghazanfari, Iman Sharifi, Peng Wei, Noah Dahle, Abel Diaz Gonzalez, Austin Coursey, Bryce Bjorkman, Cailani Lemieux-Mack, Robert Canady, Abenezer Taye, Bryan C. Ward, Xenofon Koutsoukos, Gautam Biswas, Maheed H. Ahmed, Hyeong Tae Kim, Mahsa Ghasemi, Vijay Gupta, Filippos Fotiadis, Ufuk Topcu, Junchi Lu, Alfred Chen, Abdul Kareem Ras, Nischal Aryal, Amer Ibrahim, Amir Shirkhodaie, Heber Herencia-Zapana, Saqib Hasan, Isaac Amundson
Abstract:
This survey reviews the existing and envisioned security vulnerabilities and defense mechanisms relevant to Advanced Air Mobility (AAM) systems, with a focus on electric vertical takeoff and landing (eVTOL) aircraft. Drawing from vulnerabilities in the avionics in commercial aviation and the automated unmanned aerial systems (UAS), the paper presents a taxonomy of attacks, analyzes mitigation strategies, and proposes a secure system architecture tailored to the future AAM ecosystem. The paper also highlights key threat vectors, including Global Positioning System (GPS) jamming/spoofing, ATC radio frequency misuse, attacks on TCAS and ADS-B, possible backdoor via Electronic Flight Bag (EFB), new vulnerabilities introduced by aircraft automation and connectivity, and risks from flight management system (FMS) software, database and cloud services. Finally, this paper describes emerging defense techniques against these attacks, and open technical problems to address toward better defense mechanisms.
Authors:Iman Sharifi, Mahyar Ghazanfari, Abenezer Taye, Peng Wei, Maheed H. Ahmed, Hyeong Tae Kim, Mahsa Ghasemi, Vijay Gupta, Noah Dahle, Robert Canady, Abel Diaz Gonzalez, Austin Coursey, Bryce Bjorkman, Cailani Lemieux-Mack, Bryan C. Ward, Xenofon Koutsoukos, Gautam Biswas, Heber Herencia-Zapana, Saqib Hasan, Isaac Amundson, Filippos Fotiadis, Ufuk Topcu, Junchi Lu, Qi Alfred Chen, Nischal Aryal, Amer Ibrahim, Abdul Karim Ras, Amir Shirkhodaie
Abstract:
The rapid growth of small Unmanned Aerial Systems (sUAS) for civil and commercial missions has intensified concerns about their resilience to cyber-security threats. Operating within the emerging UAS Traffic Management (UTM) framework, these lightweight and highly networked platforms depend on secure communication, navigation, and surveillance (CNS) subsystems that are vulnerable to spoofing, jamming, hijacking, and data manipulation. While prior reviews of UAS security addressed these challenges at a conceptual level, a detailed, system-oriented analysis for resource-constrained sUAS remains lacking. This paper presents a comprehensive survey of cyber-security vulnerabilities and defenses tailored to the sUAS and UTM ecosystem. We organize existing research across the full cyber-physical stack, encompassing CNS, data links, sensing and perception, UTM cloud access, and software integrity layers, and classify attack vectors according to their technical targets and operational impacts. Correspondingly, we review defense mechanisms ranging from classical encryption and authentication to adaptive intrusion detection, lightweight cryptography, and secure firmware management. By mapping threats to mitigation strategies and evaluating their scalability and practical effectiveness, this work establishes a unified taxonomy and identifies open challenges for achieving safe, secure, and scalable sUAS operations within future UTM environments.
Authors:Jingxiao Yang, Ping He, Tianyu Du, Sun Bing, Xuhong Zhang
Abstract:
Recent advances in software vulnerability detection have been driven by Language Model (LM)-based approaches. However, these models remain vulnerable to adversarial attacks that exploit lexical and syntax perturbations, allowing critical flaws to evade detection. Existing black-box attacks on LM-based vulnerability detectors primarily rely on isolated perturbation strategies, limiting their ability to efficiently explore the adversarial code space for optimal perturbations. To bridge this gap, we propose HogVul, a black-box adversarial code generation framework that integrates both lexical and syntax perturbations under a unified dual-channel optimization strategy driven by Particle Swarm Optimization (PSO). By systematically coordinating two-level perturbations, HogVul effectively expands the search space for adversarial examples, enhancing the attack efficacy. Extensive experiments on four benchmark datasets demonstrate that HogVul achieves an average attack success rate improvement of 26.05\% over state-of-the-art baseline methods. These findings highlight the potential of hybrid optimization strategies in exposing model vulnerabilities.
Authors:Zejian Chen, Chaozhuo Li, Chao Li, Xi Zhang, Litian Zhang, Yiming He
Abstract:
This paper provides a systematic survey of jailbreak attacks and defenses on Large Language Models (LLMs) and Vision-Language Models (VLMs), emphasizing that jailbreak vulnerabilities stem from structural factors such as incomplete training data, linguistic ambiguity, and generative uncertainty. It further differentiates between hallucinations and jailbreaks in terms of intent and triggering mechanisms. We propose a three-dimensional survey framework: (1) Attack dimension-including template/encoding-based, in-context learning manipulation, reinforcement/adversarial learning, LLM-assisted and fine-tuned attacks, as well as prompt- and image-level perturbations and agent-based transfer in VLMs; (2) Defense dimension-encompassing prompt-level obfuscation, output evaluation, and model-level alignment or fine-tuning; and (3) Evaluation dimension-covering metrics such as Attack Success Rate (ASR), toxicity score, query/time cost, and multimodal Clean Accuracy and Attribute Success Rate. Compared with prior works, this survey spans the full spectrum from text-only to multimodal settings, consolidating shared mechanisms and proposing unified defense principles: variant-consistency and gradient-sensitivity detection at the perception layer, safety-aware decoding and output review at the generation layer, and adversarially augmented preference alignment at the parameter layer. Additionally, we summarize existing multimodal safety benchmarks and discuss future directions, including automated red teaming, cross-modal collaborative defense, and standardized evaluation.
Authors:Songyang Liu, Chaozhuo Li, Rui Pu, Litian Zhang, Chenxu Wang, Zejian Chen, Yuting Zhang, Yiming Hei
Abstract:
Jailbreak attacks present a significant challenge to the safety of Large Language Models (LLMs), yet current automated evaluation methods largely rely on coarse classifications that focus mainly on harmfulness, leading to substantial overestimation of attack success. To address this problem, we propose FJAR, a fine-grained jailbreak evaluation framework with anchored references. We first categorized jailbreak responses into five fine-grained categories: Rejective, Irrelevant, Unhelpful, Incorrect, and Successful, based on the degree to which the response addresses the malicious intent of the query. This categorization serves as the basis for FJAR. Then, we introduce a novel harmless tree decomposition approach to construct high-quality anchored references by breaking down the original queries. These references guide the evaluator in determining whether the response genuinely fulfills the original query. Extensive experiments demonstrate that FJAR achieves the highest alignment with human judgment and effectively identifies the root causes of jailbreak failures, providing actionable guidance for improving attack strategies.
Authors:Rahul Jaiswal, Per-Arne Andersen, Linga Reddy Cenkeramaddi, Lei Jiao, Ole-Christoffer Granmo
Abstract:
The rapid adoption of the Internet of Medical Things (IoMT) is transforming healthcare by enabling seamless connectivity among medical devices, systems, and services. However, it also introduces serious cybersecurity and patient safety concerns as attackers increasingly exploit new methods and emerging vulnerabilities to infiltrate IoMT networks. This paper proposes a novel Tsetlin Machine (TM)-based Intrusion Detection System (IDS) for detecting a wide range of cyberattacks targeting IoMT networks. The TM is a rule-based and interpretable machine learning (ML) approach that models attack patterns using propositional logic. Extensive experiments conducted on the CICIoMT-2024 dataset, which includes multiple IoMT protocols and cyberattack types, demonstrate that the proposed TM-based IDS outperforms traditional ML classifiers. The proposed model achieves an accuracy of 99.5\% in binary classification and 90.7\% in multi-class classification, surpassing existing state-of-the-art approaches. Moreover, to enhance model trust and interpretability, the proposed TM-based model presents class-wise vote scores and clause activation heatmaps, providing clear insights into the most influential clauses and the dominant class contributing to the final model decision.
Authors:Samita Bai, Hamed Jelodar, Tochukwu Emmanuel Nwankwo, Parisa Hamedi, Mohammad Meymani, Roozbeh Razavi-Far, Ali A. Ghorbani
Abstract:
Malware family classification remains a challenging task in automated malware analysis, particularly in real-world settings characterized by obfuscation, packing, and rapidly evolving threats. Existing machine learning and deep learning approaches typically depend on labeled datasets, handcrafted features, supervised training, or dynamic analysis, which limits their scalability and effectiveness in open-world scenarios. This paper presents a zero-label malware family classification framework based on a weighted hierarchical ensemble of pretrained large language models (LLMs). Rather than relying on feature-level learning or model retraining, the proposed approach aggregates decision-level predictions from multiple LLMs with complementary reasoning strengths. Model outputs are weighted using empirically derived macro-F1 scores and organized hierarchically, first resolving coarse-grained malicious behavior before assigning fine-grained malware families. This structure enhances robustness, reduces individual model instability, and aligns with analyst-style reasoning.
Authors:Wenyu Chen, Xiangtao Meng, Chuanchao Zang, Li Wang, Xinyu Gao, Jianing Wang, Peng Zhan, Zheng Li, Shanqing Guo
Abstract:
Large Language Models(LLMs) are widely deployed, yet are vulnerable to jailbreak prompts that elicit policy-violating outputs. Although prior studies have uncovered these risks, they typically treat all tokens as equally important during prompt mutation, overlooking the varying contributions of individual tokens to triggering model refusals. Consequently, these attacks introduce substantial redundant searching under query-constrained scenarios, reducing attack efficiency and hindering comprehensive vulnerability assessment. In this work, we conduct a token-level analysis of refusal behavior and observe that token contributions are highly skewed rather than uniform. Moreover, we find strong cross-model consistency in refusal tendencies, enabling the use of a surrogate model to estimate token-level contributions to the target model's refusals. Motivated by these findings, we propose TriageFuzz, a token-aware jailbreak fuzzing framework that adapts the fuzz testing approach with a series of customized designs. TriageFuzz leverages a surrogate model to estimate the contribution of individual tokens to refusal behaviors, enabling the identification of sensitive regions within the prompt. Furthermore, it incorporates a refusal-guided evolutionary strategy that adaptively weights candidate prompts with a lightweight scorer to steer the evolution toward bypassing safety constraints. Extensive experiments on six open-source LLMs and three commercial APIs demonstrate that TriageFuzz achieves comparable attack success rates (ASR) with significantly reduced query costs. Notably, it attains a 90% ASR with over 70% fewer queries compared to baselines. Even under an extremely restrictive budget of 25 queries, TriageFuzz outperforms existing methods, improving ASR by 20-40%.
Authors:Zhifang Zhang, Bojun Yang, Shuo He, Weitong Chen, Wei Emma Zhang, Olaf Maennel, Lei Feng, Miao Xu
Abstract:
Despite the strong multimodal performance, large vision-language models (LVLMs) are vulnerable during fine-tuning to backdoor attacks, where adversaries insert trigger-embedded samples into the training data to implant behaviors that can be maliciously activated at test time. Existing defenses typically rely on retraining backdoored parameters (e.g., adapters or LoRA modules) with clean data, which is computationally expensive and often degrades model performance. In this work, we provide a new mechanistic understanding of backdoor behaviors in LVLMs: the trigger does not influence prediction through low-level visual patterns, but through abnormal cross-modal attention redistribution, where trigger-bearing visual tokens steal attention away from the textual context - a phenomenon we term attention stealing. Motivated by this, we propose CleanSight, a training-free, plug-and-play defense that operates purely at test time. CleanSight (i) detects poisoned inputs based on the relative visual-text attention ratio in selected cross-modal fusion layers, and (ii) purifies the input by selectively pruning the suspicious high-attention visual tokens to neutralize the backdoor activation. Extensive experiments show that CleanSight significantly outperforms existing pixel-based purification defenses across diverse datasets and backdoor attack types, while preserving the model's utility on both clean and poisoned samples.
Authors:Luyang Si, Leyi Pan, Lijie Wen
Abstract:
As Joint Audio-Visual Generation Models see widespread commercial deployment, embedding watermarks has become essential for protecting vendor copyright and ensuring content provenance. However, existing techniques suffer from an architectural mismatch by treating modalities as decoupled entities, exposing a critical Binding Vulnerability. Adversaries exploit this via Swap Attacks by replacing authentic audio with malicious deepfakes while retaining the watermarked video. Because current detectors rely on independent verification ($Video_{wm}\vee Audio_{wm}$), they incorrectly authenticate the manipulated content, falsely attributing harmful media to the original vendor and severely damaging their reputation. To address this, we propose mAVE (Manifold Audio-Visual Entanglement), the first watermarking framework natively designed for joint architectures. mAVE cryptographically binds audio and video latents at initialization without fine-tuning, defining a Legitimate Entanglement Manifold via Inverse Transform Sampling. Experiments on state-of-the-art models (LTX-2, MOVA) demonstrate that mAVE guarantees performance-losslessness and provides an exponential security bound against Swap Attacks. Achieving near-perfect binding integrity ($>99\%$), mAVE offers a robust cryptographic defense for vendor copyright.
Authors:Wei Xuan, Zihao Xuan, Rongliang Fu, Ning Lin, Kwunhang Wong, Zikang Yuan, Lang Feng, Zhongrui Wang, Tsung-Yi Ho, Yuzhong Jiao, Luhong Liang
Abstract:
The rapid deployment of deep neural network (DNN) accelerators in safety-critical domains such as autonomous vehicles, healthcare systems, and financial infrastructure necessitates robust mechanisms to safeguard data confidentiality and computational integrity. Existing security solutions for DNN accelerators, however, suffer from excessive hardware resource demands and frequent off-chip memory access overheads, which degrade performance and scalability. To address these challenges, this paper presents a secure and efficient memory protection framework for DNN accelerators with minimal overhead. First, we propose a bandwidth-aware cryptographic scheme that adapts encryption granularity based on memory traffic patterns, striking a balance between security and resource efficiency. Second, we observe that both the overlapping regions in the intra-layer tiling's sliding window pattern and those resulting from inter-layer tiling strategy discrepancies introduce substantial redundant memory accesses and repeated computational overhead in cryptography. Third, we introduce a multi-level authentication mechanism that effectively eliminates unnecessary off-chip memory accesses, enhancing performance and energy efficiency. Experimental results show that this work decreases performance overhead by over 12% and achieves 87% energy efficiency improvement for both server and edge neural processing units (NPUs), while ensuring robust scalability.
Authors:Anders Aamand, Justin Y. Chen, Sandeep Silwal
Abstract:
We study differentially private continual release of the number of distinct items in a turnstile stream, where items may be both inserted and deleted. A recent work of Jain, Kalemaj, Raskhodnikova, Sivakumar, and Smith (NeurIPS '23) shows that for streams of length $T$, polynomial additive error of $Ω(T^{1/4})$ is necessary, even without any space restrictions. We show that this additive error lower bound can be circumvented if the algorithm is allowed to output estimates with both additive \emph{and multiplicative} error. We give an algorithm for the continual release of the number of distinct elements with $\text{polylog} (T)$ multiplicative and $\text{polylog}(T)$ additive error. We also show a qualitatively similar phenomenon for estimating the $F_2$ moment of a turnstile stream, where we can obtain $1+o(1)$ multiplicative and $\text{polylog} (T)$ additive error. Both results can be achieved using polylogarithmic space whereas prior approaches use polynomial space. In the sublinear space regime, some multiplicative error is necessary even if privacy is not a consideration. We raise several open questions aimed at better understanding trade-offs between multiplicative and additive error in private continual release.
Authors:Asif Tauhid, Sidahmed Benabderrahmane, Mohamad Altrabulsi, Ahamed Foisal, Talal Rahwan
Abstract:
Advanced Persistent Threats (APTs) are sophisticated, long-term cyberattacks that are difficult to detect because they operate stealthily and often blend into normal system behavior. This paper presents a neuro-symbolic anomaly detection framework that combines a Graph Autoencoder (GAE) with rare pattern mining to identify APT-like activities in system-level provenance data. Our approach first constructs a process behavioral graph using k-Nearest Neighbors based on feature similarity, then learns normal relational structure using a Graph Autoencoder. Anomaly candidates are identified through deviations between observed and reconstructed graph structure. To further improve detection, we integrate an rare pattern mining module that discovers infrequent behavioral co-occurrences and uses them to boost anomaly scores for processes exhibiting rare signatures. We evaluate the proposed method on the DARPA Transparent Computing datasets and show that rare-pattern boosting yields substantial gains in anomaly ranking quality over the baseline GAE. Compared with existing unsupervised approaches on the same benchmark, our single unified model consistently outperforms individual context-based detectors and achieves performance competitive with ensemble aggregation methods that require multiple separate detectors. These results highlight the value of coupling graph-based representation learning with classical pattern mining to improve both effectiveness and interpretability in provenance-based security anomaly detection.
Authors:Sidahmed Benabderrahmane, Petko Valtchev, James Cheney, Talal Rahwan
Abstract:
Detecting rare and diverse anomalies in highly imbalanced datasets-such as Advanced Persistent Threats (APTs) in cybersecurity-remains a fundamental challenge for machine learning systems. Active learning offers a promising direction by strategically querying an oracle to minimize labeling effort, yet conventional approaches often fail to exploit the intrinsic geometric structure of the feature space for model refinement. In this paper, we introduce SDA2E, a Sparse Dual Adversarial Attention-based AutoEncoder designed to learn compact and discriminative latent representations from imbalanced, high-dimensional data. We further propose a similarity-guided active learning framework that integrates three novel strategies to refine decision boundaries efficiently: mormal-like expansion, which enriches the training set with points similar to labeled normals to improve reconstruction fidelity; anomaly-like prioritization, which boosts ranking accuracy by focusing on points resembling known anomalies; and a hybrid strategy that combines both for balanced model refinement and ranking. A key component of our framework is a new similarity measure, Normalized Matching 1s (SIM_NM1), tailored for sparse binary embeddings. We evaluate SDA2E extensively across 52 imbalanced datasets, including multiple DARPA Transparent Computing scenarios, and benchmark it against 15 state-of-the-art anomaly detection methods. Results demonstrate that SDA2E consistently achieves superior ranking performance (nDCG up to 1.0 in several cases) while reducing the required labeled data by up to 80% compared to passive training. Statistical tests confirm the significance of these improvements. Our work establishes a robust, efficient, and statistically validated framework for anomaly detection that is particularly suited to cybersecurity applications such as APT detection.
Authors:Weiqing He, Xiang Li, Li Shen, Weijie Su, Qi Long
Abstract:
Watermarking is a principled approach for tracing the provenance of large language model (LLM) outputs, but its deployment in practice is hindered by inference inefficiency. Speculative sampling accelerates inference, with efficiency improving as the acceptance rate between draft and target models increases. Yet recent work reveals a fundamental trade-off: higher watermark strength reduces acceptance, preventing their simultaneous achievement. We revisit this trade-off and show it is not absolute. We introduce a quantitative measure of watermark strength that governs statistical detectability and is maximized when tokens are deterministic functions of pseudorandom numbers. Using this measure, we fully characterize the trade-off as a constrained optimization problem and derive explicit Pareto curves for two existing watermarking schemes. Finally, we introduce a principled mechanism that injects pseudorandomness into draft-token acceptance, ensuring maximal watermark strength while maintaining speculative sampling efficiency. Experiments further show that this approach improves detectability without sacrificing efficiency. Our findings uncover a principle that unites speculative sampling and watermarking, paving the way for their efficient and practical deployment.
Authors:Xianya Fang, Xianying Luo, Yadong Wang, Xiang Chen, Yu Tian, Zequn Sun, Rui Liu, Jun Fang, Naiqiang Tan, Yuanning Cui, Sheng-Jun Huang
Abstract:
Despite the intrinsic risk-awareness of Large Language Models (LLMs), current defenses often result in shallow safety alignment, rendering models vulnerable to disguised attacks (e.g., prefilling) while degrading utility. To bridge this gap, we propose SafeThinker, an adaptive framework that dynamically allocates defensive resources via a lightweight gateway classifier. Based on the gateway's risk assessment, inputs are routed through three distinct mechanisms: (i) a Standardized Refusal Mechanism for explicit threats to maximize efficiency; (ii) a Safety-Aware Twin Expert (SATE) module to intercept deceptive attacks masquerading as benign queries; and (iii) a Distribution-Guided Think (DDGT) component that adaptively intervenes during uncertain generation. Experiments show that SafeThinker significantly lowers attack success rates across diverse jailbreak strategies without compromising utility, demonstrating that coordinating intrinsic judgment throughout the generation process effectively balances robustness and practicality.
Authors:Griffin Higgins, Roozbeh Razavi-Far, Hossein Shokouhinejad, Ali A. Ghorbani
Abstract:
As malware continues to become increasingly sophisticated, threatening, and evasive, malware detection systems must keep pace and become equally intelligent, powerful, and transparent. In this paper, we propose Assembly Flow Graph (AFG) to comprehensively represent the assembly flow of a binary executable as graph data. Importantly, AFG can be used to extract granular explanations needed to increase transparency for malware detection using Graph Neural Networks (GNNs). However, since AFGs may be large in practice, we also propose a Meta-Coarsening approach to improve computational tractability via graph reduction. To evaluate our proposed approach we consider several novel and existing metrics to quantify the granularity and quality of explanations. Lastly, we also consider several hyperparameters in our proposed Meta-Coarsening approach that can be used to control the final explanation size. We evaluate our proposed approach using the CIC-DGG-2025 dataset. Our results indicate that our proposed AFG and Meta-Coarsening approach can provide both increased explainability and inference performance at certain coarsening levels. However, most importantly, to the best of our knowledge, we are the first to consider granular explainability in malware detection using GNNs.
Authors:Li Wang, Wenyu Chen, Ning Yu, Zheng Li, Shanqing Guo
Abstract:
The proliferation of powerful Text-to-Video (T2V) models, trained on massive web-scale datasets, raises urgent concerns about copyright and privacy violations. Membership inference attacks (MIAs) provide a principled tool for auditing such risks, yet existing techniques - designed for static data like images or text - fail to capture the spatio-temporal complexities of video generation. In particular, they overlook the sparsity of memorization signals in keyframes and the instability introduced by stochastic temporal dynamics. In this paper, we conduct the first systematic study of MIAs against T2V models and introduce a novel framework VidLeaks, which probes sparse-temporal memorization through two complementary signals: 1) Spatial Reconstruction Fidelity (SRF), using a Top-K similarity to amplify spatial memorization signals from sparsely memorized keyframes, and 2) Temporal Generative Stability (TGS), which measures semantic consistency across multiple queries to capture temporal leakage. We evaluate VidLeaks under three progressively restrictive black-box settings - supervised, reference-based, and query-only. Experiments on three representative T2V models reveal severe vulnerabilities: VidLeaks achieves AUC of 82.92% on AnimateDiff and 97.01% on InstructVideo even in the strict query-only setting, posing a realistic and exploitable privacy risk. Our work provides the first concrete evidence that T2V models leak substantial membership information through both sparse and temporal memorization, establishing a foundation for auditing video generation systems and motivating the development of new defenses. Code is available at: https://zenodo.org/records/17972831.
Authors:Qingyuan Li, Chenchen Yu, Chuanyi Li, Xin-Cheng Wen, Cheryl Lee, Cuiyun Gao, Bin Luo
Abstract:
Vulnerabilities severely threaten software systems, making the timely application of security patches crucial for mitigating attacks. However, software vendors often silently patch vulnerabilities with limited disclosure, where Security Patch Detection (SPD) comes to protect software assets. Recently, most SPD studies have targeted Open-Source Software (OSS), yet a large portion of real-world software is closed-source, where patches are distributed as binaries without accessible source code. The limited binary SPD approaches often lift binaries to abstraction levels, i.e., assembly code or pseudo-code. However, assembly code is register-based instructions conveying limited semantics, while pseudo-code lacks parser-compatible grammar to extract structure, both hindering accurate vulnerability-fix representation learning. In addition, previous studies often obtain training and testing data from the same project for evaluation, which fails to reflect closed-source conditions. To alleviate the above challenges, we propose \textbf{\textit{StriderSPD}}, a \underline{Str}ucture-gu\underline{ide}d joint \underline{r}epresentation \underline{SPD} framework of binary code that integrates a graph branch into a large language model (LLM), leveraging structural information to guide the LLM in identifying security patches. Our novel design of the adapters in the graph branch effectively aligns the representations between assembly code and pseudo-code at the LLM's token level. We further present a two-stage training strategy to address the optimization imbalance caused by the large parameter disparity between StriderSPD's two branches, which enables proper branch fitting. To enable more realistic evaluation, we construct a binary SPD benchmark that is disjoint from prior datasets in both projects and domains and extensively evaluate StriderSPD on this benchmark.
Authors:Songze Li, Ruishi He, Xiaojun Jia, Jun Wang, Zhihui Fu
Abstract:
Large Language Models (LLMs) face a significant threat from multi-turn jailbreak attacks, where adversaries progressively steer conversations to elicit harmful outputs. However, the practical effectiveness of existing attacks is undermined by several critical limitations: they struggle to maintain a coherent progression over long interactions, often losing track of what has been accomplished and what remains to be done; they rely on rigid or pre-defined patterns, and fail to adapt to the LLM's dynamic and unpredictable conversational state. To address these shortcomings, we introduce Mastermind, a multi-turn jailbreak framework that adopts a dynamic and self-improving approach. Mastermind operates in a closed loop of planning, execution, and reflection, enabling it to autonomously build and refine its knowledge of model vulnerabilities through interaction. It employs a hierarchical planning architecture that decouples high-level attack objectives from low-level tactical execution, ensuring long-term focus and coherence. This planning is guided by a knowledge repository that autonomously discovers and refines effective attack patterns by reflecting on interactive experiences. Mastermind leverages this accumulated knowledge to dynamically recombine and adapt attack vectors, dramatically improving both effectiveness and resilience. We conduct comprehensive experiments against state-of-the-art models, including GPT-5 and Claude 3.7 Sonnet. The results demonstrate that Mastermind significantly outperforms existing baselines, achieving substantially higher attack success rates and harmfulness ratings. Moreover, our framework exhibits notable resilience against multiple advanced defense mechanisms.
Authors:Yiheng Huang, Zhijia Zhao, Bihuan Chen, Susheng Wu, Zhuotong Zhou, Yiheng Cao, Xin Hu, Xin Peng
Abstract:
The model context protocol (MCP) standardizes how LLMs connect to external tools and data sources, enabling faster integration but introducing new attack vectors. Despite the growing adoption of MCP, existing MCP security studies classify attacks by their observable effects, obscuring how attacks behave across different MCP server components and overlooking multi-component attack chains. Meanwhile, existing defenses are less effective when facing multi-component attacks or previously unknown malicious behaviors. This work presents a component-centric perspective for understanding and detecting malicious MCP servers. First, we build the first component-centric PoC dataset of 114 malicious MCP servers where attacks are achieved as manipulation over MCP components and their compositions. We evaluate these attacks' effectiveness across two MCP hosts and five LLMs, and uncover that (1) component position shapes attack success rate; and (2) multi-component compositions often outperform single-component attacks by distributing malicious logic. Second, we propose and implement Connor, a two-stage behavioral deviation detector for malicious MCP servers. It first performs pre-execution analysis to detect malicious shell commands and extract each tool's function intent, and then conducts step-wise in-execution analysis to trace each tool's behavioral trajectories and detect deviations from its function intent. Evaluation on our curated dataset indicates that Connor achieves an F1-score of 94.6%, outperforming the state of the art by 8.9% to 59.6%. In real-world detection, Connor identifies two malicious servers.
Authors:Weidi Luo, Xiaofei Wen, Tenghao Huang, Hongyi Wang, Zhen Xiang, Chaowei Xiao, Kristina Gligorić, Muhao Chen
Abstract:
Large language models (LLMs) are increasingly deployed for everyday tasks, including food preparation and health-related guidance. However, food safety remains a high-stakes domain where inaccurate or misleading information can cause severe real-world harm. Despite these risks, current LLMs and safety guardrails lack rigorous alignment tailored to domain-specific food hazards. To address this gap, we introduce FoodGuardBench, the first comprehensive benchmark comprising 3,339 queries grounded in FDA guidelines, designed to evaluate the safety and robustness of LLMs. By constructing a taxonomy of food safety principles and employing representative jailbreak attacks (e.g., AutoDAN and PAP), we systematically evaluate existing LLMs and guardrails. Our evaluation results reveal three critical vulnerabilities: First, current LLMs exhibit sparse safety alignment in the food-related domain, easily succumbing to a few canonical jailbreak strategies. Second, when compromised, LLMs frequently generate actionable yet harmful instructions, inadvertently empowering malicious actors and posing tangible risks. Third, existing LLM-based guardrails systematically overlook these domain-specific threats, failing to detect a substantial volume of malicious inputs. To mitigate these vulnerabilities, we introduce FoodGuard-4B, a specialized guardrail model fine-tuned on our datasets to safeguard LLMs within food-related domains.
Authors:Ruhao Liu, Weiqi Huang, Qi Li, Xinchao Wang
Abstract:
Membership Inference Attacks (MIAs) serve as a fundamental auditing tool for evaluating training data leakage in machine learning models. However, existing methodologies predominantly rely on static, handcrafted heuristics that lack adaptability, often leading to suboptimal performance when transferred across different large models. In this work, we propose AutoMIA, an agentic framework that reformulates membership inference as an automated process of self-exploration and strategy evolution. Given high-level scenario specifications, AutoMIA self-explores the attack space by generating executable logits-level strategies and progressively refining them through closed-loop evaluation feedback. By decoupling abstract strategy reasoning from low-level execution, our framework enables a systematic, model-agnostic traversal of the attack search space. Extensive experiments demonstrate that AutoMIA consistently matches or outperforms state-of-the-art baselines while eliminating the need for manual feature engineering.
Authors:Molly Campbell, Yulia Bobkova, Ajay Kumar Shrestha
Abstract:
This paper investigates how gender shapes privacy decision-making in youth smart voice assistant (SVA) ecosystems. Using survey data from 469 Canadian youths aged 16-24, we apply multigroup Partial Least Squares Structural Equation Modeling to compare males (N=241) and females (N=174) (total N = 415) across five privacy constructs: Perceived Privacy Risks (PPR), Perceived Privacy Benefits (PPBf), Algorithmic Transparency and Trust (ATT), Privacy Self-Efficacy (PSE), and Privacy Protective Behavior (PPB). Results provide exploratory evidence of gender heterogeneity in selected pathways. The direct effect of PPR on PPB is stronger for males (Male: \b{eta} = 0.424; Female: \b{eta} = 0.233; p < 0.1), while the indirect effect of ATT on PPB via PSE is stronger for females (Female: \b{eta} = 0.229; Male: \b{eta} = 0.132; p < 0.1). Descriptive analysis of non-binary (N=15) and prefer-not-to-say participants (N=39) shows lower trust and higher perceived risk than the binary groups, motivating future work with adequately powered gender-diverse samples. Overall, the findings provide exploratory evidence that gender may moderate key privacy pathways, supporting more responsive transparency and control interventions for youth SVA use.
Authors:Marc Damie, Florian Hahn, Andreas Peter, Jan Ramon
Abstract:
Function Secret Sharing (FSS) schemes enable sharing efficiently secret functions. Schemes dedicated to point functions, referred to as Distributed Point Functions (DPFs), are the center of FSS literature thanks to their numerous applications including private information retrieval, anonymous communications, and machine learning. While two-party DPFs benefit from schemes with logarithmic key sizes, multi-party DPFs have seen limited advancements: $O(\sqrt{N})$ key sizes (with $N$, the function domain size) and/or exponential factors in the key size. We propose a DDH-based technique reducing the key size of existing multi-party schemes. In particular, we build an honest-majority DPF with $O(\sqrt[3]{N})$ key size. Our benchmark highlights key sizes up to $10\times$ smaller (on realistic problem sizes) than state-of-the-art schemes. Finally, we extend our technique to schemes supporting comparison functions.
Authors:Anshul Thakur, Soheila Molaei, Pafue Christy Nganjimi, Joshua Fieggen, Andrew A. S. Soltan, Danielle Belgrave, Lei Clifton, David A. Clifton
Abstract:
Dataset condensation (DC) learns a compact synthetic dataset that enables models to match the performance of full-data training, prioritising utility over distributional fidelity. While typically explored for computational efficiency, DC also holds promise for healthcare data democratisation, especially when paired with differential privacy, allowing synthetic data to serve as a safe alternative to real records. However, existing DC methods rely on differentiable neural networks, limiting their compatibility with widely used clinical models such as decision trees and Cox regression. We address this gap using a differentially private, zero-order optimisation framework that extends DC to non-differentiable models using only function evaluations. Empirical results across six datasets, including both classification and survival tasks, show that the proposed method produces condensed datasets that preserve model utility while providing effective differential privacy guarantees - enabling model-agnostic data sharing for clinical prediction tasks without exposing sensitive patient information.
Authors:Nils Palumbo, Sarthak Choudhary, Jihye Choi, Prasad Chalasani, Somesh Jha
Abstract:
LLM-based agents are increasingly being deployed in contexts requiring complex authorization policies: customer service protocols, approval workflows, data access restrictions, and regulatory compliance. Embedding these policies in prompts provides no enforcement guarantees. We present PCAS, a Policy Compiler for Agentic Systems that provides deterministic policy enforcement. Enforcing such policies requires tracking information flow across agents, which linear message histories cannot capture. Instead, PCAS models the agentic system state as a dependency graph capturing causal relationships among events such as tool calls, tool results, and messages. Policies are expressed in a Datalog-derived language, as declarative rules that account for transitive information flow and cross-agent provenance. A reference monitor intercepts all actions and blocks violations before execution, providing deterministic enforcement independent of model reasoning. PCAS takes an existing agent implementation and a policy specification, and compiles them into an instrumented system that is policy-compliant by construction, with no security-specific restructuring required. We evaluate PCAS on three case studies: information flow policies for prompt injection defense, approval workflows in a multi-agent pharmacovigilance system, and organizational policies for customer service. On customer service tasks, PCAS improves policy compliance from 48% to 93% across frontier models, with zero policy violations in instrumented runs.
Authors:Oguzhan Baser, Elahe Sadeghi, Eric Wang, David Ribeiro Alves, Sam Kazemian, Hong Kang, Sandeep P. Chinchali, Sriram Vishwanath
Abstract:
Most large language models (LLMs) run on external clouds: users send a prompt, pay for inference, and must trust that the remote GPU executes the LLM without any adversarial tampering. We critically ask how to achieve verifiable LLM inference, where a prover (the service) must convince a verifier (the client) that an inference was run correctly without rerunning the LLM. Existing cryptographic works are too slow at the LLM scale, while non-cryptographic ones require a strong verifier GPU. We propose TensorCommitments (TCs), a tensor-native proof-of-inference scheme. TC binds the LLM inference to a commitment, an irreversible tag that breaks under tampering, organized in our multivariate Terkle Trees. For LLaMA2, TC adds only 0.97% prover and 0.12% verifier time over inference while improving robustness to tailored LLM attacks by up to 48% over the best prior work requiring a verifier GPU.
Authors:Subangkar Karmaker Shanto, Imtiaz Karim, Elisa Bertino
Abstract:
As 3GPP systems have strengthened security at the upper layers of the cellular stack, plaintext PHY and MAC layers have remained relatively understudied, though interest in them is growing. In this work, we explore lower-layer exploitation in modern 5G, where recent releases have increased the number of lower-layer control messages and procedures, creating new opportunities for practical attacks. We present two practical attacks and evaluate them in a controlled lab testbed. First, we reproduce a SIB1 spoofing attack to study manipulations of unprotected broadcast fields. By repeatedly changing a key parameter, the UE is forced to refresh and reacquire system information, keeping the radio interface active longer than necessary and increasing battery consumption. Second, we demonstrate a new Timing Advance (TA) manipulation attack during the random access procedure. By injecting an attacker-chosen TA offset in the random access response, the victim applies incorrect uplink timing, which leads to uplink desynchronization, radio link failures, and repeated reconnection loops that effectively cause denial of service. Our experiments use commercial smartphones and open-source 5G network software. Experimental results in our testbed demonstrate that TA offsets exceeding a small tolerance reliably trigger radio link failures in our testbed and can keep devices stuck in repeated re-establishment attempts as long as the rogue base station remains present. Overall, our findings highlight that compact lower-layer control messages can have a significant impact on availability and power, and they motivate placing defenses for initial access and broadcast procedures.
Authors:Molly Campbell, Ajay Kumar Shrestha
Abstract:
Smart Voice Assistants (SVAs) are deeply embedded in the lives of youth, yet the mechanisms driving the privacy-protective behaviors among young users remain poorly understood. This study investigates how Canadian youth (aged 16-24) negotiate privacy with SVAs by developing and testing a structural model grounded in five key constructs: perceived privacy risks (PPR), perceived benefits (PPBf), algorithmic transparency and trust (ATT), privacy self-efficacy (PSE), and privacy-protective behaviors (PPB). A cross-sectional survey of N=469 youth was analyzed using partial least squares structural equation modeling. Results reveal that PSE is the strongest predictor of PPB, while the effect of ATT on PPB is fully mediated by PSE. This identifies a critical efficacy gap, where youth's confidence must first be built up for them to act. The model confirms that PPBf directly discourages protective action, yet also indirectly fosters it by slightly boosting self-efficacy. These findings empirically validate and extend earlier qualitative work, quantifying how policy overload and hidden controls erode the self-efficacy necessary for protective action. This study contributes an evidence-based pathway from perception to action and translates it into design imperatives that empower young digital citizens without sacrificing the utility of SVAs.
Authors:Minbeom Kim, Mihir Parmar, Phillip Wallis, Lesly Miculicich, Kyomin Jung, Krishnamurthy Dj Dvijotham, Long T. Le, Tomas Pfister
Abstract:
AI agents equipped with tool-calling capabilities are susceptible to Indirect Prompt Injection (IPI) attacks. In this attack scenario, malicious commands hidden within untrusted content trick the agent into performing unauthorized actions. Existing defenses can reduce attack success but often suffer from the over-defense dilemma: they deploy expensive, always-on sanitization regardless of actual threat, thereby degrading utility and latency even in benign scenarios. We revisit IPI through a causal ablation perspective: a successful injection manifests as a dominance shift where the user request no longer provides decisive support for the agent's privileged action, while a particular untrusted segment, such as a retrieved document or tool output, provides disproportionate attributable influence. Based on this signature, we propose CausalArmor, a selective defense framework that (i) computes lightweight, leave-one-out ablation-based attributions at privileged decision points, and (ii) triggers targeted sanitization only when an untrusted segment dominates the user intent. Additionally, CausalArmor employs retroactive Chain-of-Thought masking to prevent the agent from acting on ``poisoned'' reasoning traces. We present a theoretical analysis showing that sanitization based on attribution margins conditionally yields an exponentially small upper bound on the probability of selecting malicious actions. Experiments on AgentDojo and DoomArena demonstrate that CausalArmor matches the security of aggressive defenses while improving explainability and preserving utility and latency of AI agents.
Authors:Wenhui Zhang, Huiyu Xu, Zhibo Wang, Zhichao Li, Zeqing He, Xuelin Wei, Kui Ren
Abstract:
Recent advancements in multi-model AI systems have leveraged LLM routers to reduce computational cost while maintaining response quality by assigning queries to the most appropriate model. However, as classifiers, LLM routers are vulnerable to novel adversarial attacks in the form of LLM rerouting, where adversaries prepend specially crafted triggers to user queries to manipulate routing decisions. Such attacks can lead to increased computational cost, degraded response quality, and even bypass safety guardrails, yet their security implications remain largely underexplored. In this work, we bridge this gap by systematizing LLM rerouting threats based on the adversary's objectives (i.e., cost escalation, quality hijacking, and safety bypass) and knowledge. Based on the threat taxonomy, we conduct a measurement study of real-world LLM routing systems against existing LLM rerouting attacks. The results reveal that existing routing systems are vulnerable to rerouting attacks, especially in the cost escalation scenario. We then characterize existing rerouting attacks using interpretability techniques, revealing that they exploit router decision boundaries through confounder gadgets that prepend queries to force misrouting. To mitigate these risks, we introduce RerouteGuard, a flexible and scalable guardrail framework for LLM rerouting. RerouteGuard filters adversarial rerouting prompts via dynamic embedding-based detection and adaptive thresholding. Extensive evaluations in three attack settings and four benchmarks demonstrate that RerouteGuard achieves over 99% detection accuracy against state-of-the-art rerouting attacks, while maintaining negligible impact on legitimate queries. The experimental results indicate that RerouteGuard offers a principled and practical solution for safeguarding multi-model AI systems against adversarial rerouting.
Authors:Krzysztof Gogol, Manvir Schneider, Jan Gorzny, Claudio Tessone
Abstract:
We study the feasibility, profitability, and prevalence of sandwich attacks on Ethereum rollups with private mempools. First, we extend a formal model of optimal front- and back-run sizing, relating attack profitability to victim trade volume, liquidity depth, and slippage bounds. We complement it with an execution-feasibility model that quantifies co-inclusion constraints under private mempools. Second, we examine execution constraints in the absence of builder markets: without guaranteed atomic inclusion, attackers must rely on sequencer ordering, redundant submissions, and priority fee placement, which renders sandwiching probabilistic rather than deterministic. Third, using transaction-level data from major rollups, we show that naive heuristics overstate sandwich activity. We find that the majority of flagged patterns are false positives and that the median net return for these attacks is negative. Our results suggest that sandwiching, while endemic and profitable on Ethereum L1, is rare, unprofitable, and largely absent in rollups with private mempools. These findings challenge prevailing assumptions, refine measurement of MEV in L2s, and inform the design of sequencing policies.
Authors:Prach Chantasantitam, Adam Ilyas Caulfield, Vasisht Duddu, Lachlan J. Gunn, N. Asokan
Abstract:
Machine learning property attestations allow provers (e.g., model providers or owners) to attest properties of their models/datasets to verifiers (e.g., regulators, customers), enabling accountability towards regulations and policies. But, current approaches do not support generative models or large datasets. We present PAL*M, a property attestation framework for large generative models, illustrated using large language models. PAL*M defines properties across training and inference, leverages confidential virtual machines with security-aware GPUs for coverage of CPU-GPU operations, and proposes using incremental multiset hashing over memory-mapped datasets to efficiently track their integrity. We implement PAL*M on Intel TDX and NVIDIA H100, showing it is efficient, scalable, versatile, and secure.
Authors:Xiao-Yang Liu, Ningjie Li, Keyi Wang, Xiaoli Zhi, Weiqin Tong
Abstract:
Financial Generative Pre-trained Transformers (FinGPT) with multimodal capabilities are now being increasingly adopted in various financial applications. However, due to the intellectual property of model weights and the copyright of training corpus and benchmarking questions, verifying the legitimacy of GPT's model weights and the credibility of model outputs is a pressing challenge. In this paper, we introduce a novel zkFinGPT scheme that applies zero-knowledge proofs (ZKPs) to high-value financial use cases, enabling verification while protecting data privacy. We describe how zkFinGPT will be applied to three financial use cases. Our experiments on two existing packages reveal that zkFinGPT introduces substantial computational overhead that hinders its real-world adoption. E.g., for LLama3-8B model, it generates a commitment file of $7.97$MB using $531$ seconds, and takes $620$ seconds to prove and $2.36$ seconds to verify.
Authors:Neusha Javidnia, Ruisi Zhang, Ashish Kundu, Farinaz Koushanfar
Abstract:
We present SWaRL, a robust and fidelity-preserving watermarking framework designed to protect the intellectual property of code LLM owners by embedding unique and verifiable signatures in the generated output. Existing approaches rely on manually crafted transformation rules to preserve watermarked code functionality or manipulate token-generation probabilities at inference time, which are prone to compilation errors. To address these challenges, SWaRL employs a reinforcement learning-based co-training framework that uses compiler feedback for functional correctness and a jointly trained confidential verifier as a reward signal to maintain watermark detectability. Furthermore, SWaRL employs low-rank adaptation (LoRA) during fine-tuning, allowing the learned watermark information to be transferable across model updates. Extensive experiments show that SWaRL achieves higher watermark detection accuracy compared to prior methods while fully maintaining watermarked code functionality. The LoRA-based signature embedding steers the base model to generate and solve code in a watermark-specific manner without significant computational overhead. Moreover, SWaRL exhibits strong resilience against refactoring and adversarial transformation attacks.
Authors:Shiqi Xu, Yuyang Du, Mingyue Zhang, Hongwei Cui, Soung Chang Liew
Abstract:
WiFi is inherently vulnerable to eavesdropping because RF signals may penetrate many physical boundaries, such as walls and floors. LiFi, by contrast, is an optical method confined to line-of-sight and blocked by opaque surfaces. We present LightGuard, a dual-link architecture built on this insight: cryptographic key establishment can be offloaded from WiFi to a physically confined LiFi channel to mitigate the risk of key exposure over RF. LightGuard derives session keys over a LiFi link and installs them on the WiFi interface, ensuring cryptographic material never traverses the open RF medium. A prototype with off-the-shelf WiFi NICs and our LiFi transceiver frontend validates the design.
Authors:Thomas Jiralerspong, Flemming Kondrup, Yoshua Bengio
Abstract:
Chain-of-thought (CoT) monitoring is proposed as a method for overseeing the internal reasoning of language-model agents. Prior work has shown that when models are explicitly informed that their reasoning is being monitored, or are fine-tuned to internalize this fact, they may learn to obfuscate their CoTs in ways that allow them to evade CoT-based monitoring systems. We ask whether reasoning agents can autonomously infer that their supposedly private CoT is under surveillance, and whether this awareness leads to strategic evasion, without any explicit training or instructions. In a multi-episode agentic framework, models pursue both a primary task and a concealed side task while being told that their reasoning is private; a hidden CoT monitor blocks episodes when suspicious reasoning is detected. We find that frontier models can deduce the existence of this monitor purely from blocking feedback, with the most capable models reaching a confident belief that their thinking is observed in up to 19% of episodes. This awareness scales with model capability and, in rare cases, escalates to an explicit intent to suppress reasoning about the side task. However, models that form this intent uniformly fail to execute it, openly reasoning about their concealed objectives in the very next episode. This intent-capability gap is reassuring for current deployment, but the autonomous emergence of both monitoring awareness and evasion intent suggests that CoT monitoring is not a permanently reliable safeguard.
Authors:Jie Ma, Ningyu He, Jinwen Xi, Mingzhe Xing, Liangxin Liu, Jiushenzi Luo, Xiaopeng Fu, Chiachih Wu, Haoyu Wang, Ying Gao, Yinliang Yue
Abstract:
The Ethereum ecosystem, which secures over $381 billion in assets, fundamentally relies on client APIs as the sole interface between users and the blockchain. However, these critical APIs suffer from widespread implementation inconsistencies, which can lead to financial discrepancies, degraded user experiences, and threats to network reliability. Despite this criticality, existing testing approaches remain manual and incomplete: they require extensive domain expertise, struggle to keep pace with Ethereum's rapid evolution, and fail to distinguish genuine bugs from acceptable implementation variations. We present APIDiffer, the first specification-guided differential testing framework designed to automatically detect API inconsistencies across Ethereum's diverse client ecosystem. APIDiffer transforms API specifications into comprehensive test suites through two key innovations: (1) specification-guided test input generation that creates both syntactically valid and invalid requests enriched with real-time blockchain data, and (2) specification-aware false positive filtering that leverages large language models to distinguish genuine bugs from acceptable variations. Our evaluation across all 11 major Ethereum clients reveals the pervasiveness of API bugs in production systems. APIDiffer uncovered 72 bugs, with 90.28% already confirmed or fixed by developers. Beyond these raw numbers, APIDiffer achieves up to 89.67% higher code coverage than existing tools and reduces false positive rates by 37.38%. The Ethereum community's response validates our impact: developers have integrated our test cases, expressed interest in adopting our methodology, and escalated one bug to the official Ethereum Project Management meeting.
Authors:Junchen Li, Chao Qi, Rongzheng Wang, Qizhi Chen, Liang Xu, Di Liang, Bob Simons, Shuang Liang
Abstract:
Retrieval-Augmented Generation (RAG) enhances the capabilities of large language models (LLMs) by incorporating external knowledge, but its reliance on potentially poisonable knowledge bases introduces new availability risks. Attackers can inject documents that cause LLMs to refuse benign queries, attacks known as blocking attacks. Prior blocking attacks relying on adversarial suffixes or explicit instruction injection are increasingly ineffective against modern safety-aligned LLMs. We observe that safety-aligned LLMs exhibit heightened sensitivity to query-relevant risk signals, causing alignment mechanisms designed for harm prevention to become a source of exploitable refusal. Moreover, mainstream alignment practices share overlapping risk categories and refusal criteria, a phenomenon we term alignment homogeneity, enabling restricted risk context constructed on an accessible LLM to transfer across LLMs. Based on this insight, we propose TabooRAG, a transferable blocking attack framework operating under a strict black-box setting. An attacker can generate a single retrievable blocking document per query by optimizing against a surrogate LLM in an accessible RAG environment, and directly transfer it to an unknown target RAG system without access to the target model. We further introduce a query-aware strategy library to reuse previously effective strategies and improve optimization efficiency. Experiments across 7 modern LLMs and 3 datasets demonstrate that TabooRAG achieves stable cross-model transferability and state-of-the-art blocking success rates, reaching up to 96% on GPT-5.2. Our findings show that increasingly standardized safety alignment across modern LLMs creates a shared and transferable attack surface in RAG systems, revealing a need for improved defenses.
Authors:Chung-ju Huang, Huiqiang Zhao, Yuanpeng He, Lijian Li, Wenpin Jiao, Zhi Jin, Peixuan Chen, Leye Wang
Abstract:
The increasing reliance on cloud-hosted Large Language Models (LLMs) exposes sensitive client data, such as prompts and responses, to potential privacy breaches by service providers. Existing approaches fail to ensure privacy, maintain model performance, and preserve computational efficiency simultaneously. To address this challenge, we propose Talaria, a confidential inference framework that partitions the LLM pipeline to protect client data without compromising the cloud's model intellectual property or inference quality. Talaria executes sensitive, weight-independent operations within a client-controlled Confidential Virtual Machine (CVM) while offloading weight-dependent computations to the cloud GPUs. The interaction between these environments is secured by our Reversible Masked Outsourcing (ReMO) protocol, which uses a hybrid masking technique to reversibly obscure intermediate data before outsourcing computations. Extensive evaluations show that Talaria can defend against state-of-the-art token inference attacks, reducing token reconstruction accuracy from over 97.5% to an average of 1.34%, all while being a lossless mechanism that guarantees output identical to the original model without significantly decreasing efficiency and scalability. To the best of our knowledge, this is the first work that ensures clients' prompts and responses remain inaccessible to the cloud, while also preserving model privacy, performance, and efficiency.
Authors:Huimin Li, Vusal Novruzov, Nikhilesh Singh, Lichao Wu, Mohamadreza Rostami, Ahmad-Reza Sadeghi
Abstract:
The increasing adoption of System-on-Chip Field-Programmable Gate Arrays (SoC FPGAs) in AI-enabled satellite systems, valued for their reconfigurability and in-orbit update capabilities, introduces significant security challenges. Compromised updates can lead to performance degradation, service disruptions, or adversarial manipulation of mission outcomes. To address these risks, this paper proposes a comprehensive security framework, AegisSat. It ensures the integrity and resilience of satellite platforms by (i) integrating cryptographically-based secure boot mechanisms to establish a trusted computing base; (ii) enforcing strict runtime resource isolation; (iii) employing authenticated procedures for in-orbit reconfiguration and AI model updates to prevent unauthorized modifications; and (iv) providing robust rollback capabilities to recover from boot and update failures and maintain system stability. To further support our claims, we conducted experiments demonstrating the integration of these mechanisms on contemporary SoC FPGA devices. This defense-in-depth framework is crucial for space applications, where physical access is impossible and systems must operate reliably over extended periods, thereby enhancing the trustworthiness of SoC FPGA-based satellite systems and enabling secure and resilient AI operations in orbit.
Authors:Sasha Behrouzi, Lichao Wu, Mohamadreza Rostami, Ahmad-Reza Sadeghi
Abstract:
Safety alignment is essential for the responsible deployment of large language models (LLMs). Yet, existing approaches often rely on heavyweight fine-tuning that is costly to update, audit, and maintain across model families. Full fine-tuning incurs substantial computational and storage overhead, while parameter-efficient methods such as LoRA trade efficiency for inconsistent safety gains and sensitivity to design choices. Safety intervention mechanisms such as circuit breakers reduce unsafe outputs without modifying model weights, but do not directly shape or preserve the internal representations that govern safety behavior. These limitations hinder rapid and reliable safety updates, particularly in settings where models evolve frequently or must adapt to new policies and domains. We present NeST, a lightweight, structure-aware safety alignment framework that strengthens refusal behavior by selectively adapting a small subset of safety-relevant neurons while freezing the remainder of the model. NeST aligns parameter updates with the internal organization of safety behavior by clustering functionally coherent safety neurons and enforcing shared updates within each cluster, enabling targeted and stable safety adaptation without broad model modification or inference-time overhead. We benchmark NeST against three dominant baselines: full fine-tuning, LoRA-based fine-tuning, and circuit breakers across 10 open-weight LLMs spanning multiple model families and sizes. Across all evaluated models, NeST reduces the attack success rate from an average of 44.5% to 4.36%, corresponding to a 90.2% reduction in unsafe generations, while requiring only 0.44 million trainable parameters on average. This amounts to a 17,310x decrease in updated parameters compared to full fine-tuning and a 9.25x reduction relative to LoRA, while consistently achieving stronger safety performance for alignment.
Authors:Shreya Meel, Sennur Ulukus
Abstract:
In symmetric private information retrieval (SPIR), a user communicates with multiple servers to retrieve from them a message in a database, while not revealing the message index to any individual server (user privacy), and learning no additional information about the database (database privacy). We study the problem of SPIR on graph-replicated database systems, where each node of the graph represents a server and each link represents a message. Each message is replicated at exactly two servers; those at which the link representing the message is incident. To ensure database privacy, the servers share a set of common randomness, independent of the database and the user's desired message index. We study two cases of common randomness distribution to the servers: i) graph-replicated common randomness, and ii) fully-replicated common randomness. Given a graph-replicated database system, in i), we assign one randomness variable independently to every pair of servers sharing a message, while in ii), we assign an identical set of randomness variable to all servers, irrespective of the underlying graph. In both settings, our goal is to characterize the SPIR capacity, i.e., the maximum number of desired message symbols retrieved per downloaded symbol, and quantify the minimum amount of common randomness required to achieve the capacity. To this goal, in setting i), we derive a general lower bound on the SPIR capacity, and show it to be tight for path and regular graphs through a matching converse. Moreover, we establish that the minimum size of common randomness required for SPIR is equal to the message size. In setting ii), the SPIR capacity improves over the first, more restrictive setting. We show this through capacity lower bounds for a class of graphs, by constructing SPIR schemes from PIR schemes.
Authors:Maximilian Thang, Lichao Wu, Sasha Behrouzi, Mohamadreza Rostami, Jona te Lintelo, Stjepan Picek, Ahmad-Reza Sadeghi
Abstract:
Large language models (LLMs) are increasingly used for code generation in fast, informal development workflows, often referred to as vibe coding, where speed and convenience are prioritized, and security requirements are rarely made explicit. In this setting, models frequently produce functionally correct but insecure code, creating a growing security risk. Existing approaches to improving code security rely on full-parameter fine-tuning or parameter-efficient adaptations, which are either costly and prone to catastrophic forgetting or operate at coarse granularity with limited interpretability and control. We present GoodVibe, a neuron-level framework for improving the security of code language models by default. GoodVibe is based on the key insight that security-relevant reasoning is localized to a small subset of neurons. We identify these neurons using gradient-based attribution from a supervised security task and perform neuron-selective fine-tuning that updates only this security-critical subspace. To further reduce training cost, we introduce activation-driven neuron clustering, enabling structured updates with minimal overhead. We evaluate GoodVibe on six LLMs across security-critical programming languages, including C++, Java, Swift, and Go. GoodVibe substantially improves the security of generated code while preserving general model utility, achieving up to a 2.5x improvement over base models, matching or exceeding full fine-tuning with over 4,700x fewer trainable parameters, and reducing training computation by more than 3.6x compared to the parameter-efficient baseline (LoRA). Our results demonstrate that neuron-level optimization offers an effective and scalable approach to securing code generation without sacrificing efficiency or generality.
Authors:Ziyou Jiang, Lin Shi, Guowei Yang, Xuyan Ma, Fenglong Li, Qing Wang
Abstract:
Cyber attacks have become a serious threat to the security of software systems. Many organizations have built their security knowledge bases to safeguard against attacks and vulnerabilities. However, due to the time lag in the official release of security information, these security knowledge bases may not be well maintained, and using them to protect software systems against emergent security risks can be challenging. On the other hand, the security posts on online knowledge-sharing platforms contain many crowd security discussions and the knowledge in those posts can be used to enhance the security knowledge bases. This paper proposes SynAT, an automatic approach to synthesize attack trees from crowd security posts. Given a security post, SynAT first utilize the Large Language Model (LLM) and prompt learning to restrict the scope of sentences that may contain attack information; then it utilizes a transition-based event and relation extraction model to extract the events and relations simultaneously from the scope; finally, it applies heuristic rules to synthesize the attack trees with the extracted events and relations. An experimental evaluation is conducted on 5,070 Stack Overflow security posts, and the results show that SynAT outperforms all baselines in both event and relation extraction, and achieves the highest tree similarity in attack tree synthesis. Furthermore, SynAT has been applied to enhance HUAWEI's security knowledge base as well as public security knowledge bases CVE and CAPEC, which demonstrates SynAT's practicality.
Authors:Xueyi Li, Zhuoneng Zhou, Zitao Liu, Yongdong Wu, Weiqi Luo
Abstract:
Large language models (LLMs) have demonstrated remarkable potential for automatic short answer grading (ASAG), significantly boosting student assessment efficiency and scalability in educational scenarios. However, their vulnerability to adversarial manipulation raises critical concerns about automatic grading fairness and reliability. In this paper, we introduce GradingAttack, a fine-grained adversarial attack framework that systematically evaluates the vulnerability of LLM based ASAG models. Specifically, we align general-purpose attack methods with the specific objectives of ASAG by designing token-level and prompt-level strategies that manipulate grading outcomes while maintaining high camouflage. Furthermore, to quantify attack camouflage, we propose a novel evaluation metric that balances attack success and camouflage. Experiments on multiple datasets demonstrate that both attack strategies effectively mislead grading models, with prompt-level attacks achieving higher success rates and token-level attacks exhibiting superior camouflage capability. Our findings underscore the need for robust defenses to ensure fairness and reliability in ASAG. Our code and datasets are available at https://anonymous.4open.science/r/GradingAttack.
Authors:Xiaohui Hu, Wun Yu Chan, Yuejie Shi, Qumeng Sun, Wei-Cheng Wang, Chiachih Wu, Haoyu Wang, Ningyu He
Abstract:
Smart contract security is paramount, but identifying intricate business logic vulnerabilities remains a persistent challenge because existing solutions consistently fall short: manual auditing is unscalable, static analysis tools are plagued by false positives, and fuzzers struggle to navigate deep logic states within complex systems. Even emerging AI-based methods suffer from hallucinations, context constraints, and a heavy reliance on expensive, proprietary Large Language Models. In this paper, we introduce Heimdallr, an automated auditing agent designed to overcome these hurdles through four core innovations. By reorganizing code at the function level, Heimdallr minimizes context overhead while preserving essential business logic. It then employs heuristic reasoning to detect complex vulnerabilities and automatically chain functional exploits. Finally, a cascaded verification layer validates these findings to eliminate false positives. Notably, this approach achieves high performance on lightweight, open-source models like GPToss-120B without relying on proprietary systems. Our evaluations demonstrate exceptional performance, as Heimdallr successfully reconstructed 17 out of 20 real-world attacks post June 2025, resulting in total losses of $384M, and uncovered 4 confirmed zero-day vulnerabilities that safeguarded $400M in TVL. Compared to SOTA baselines including both official industrial tools and academic tools, Heimdallr at most reduces analysis time by 97.59% and financial costs by 98.77% while boosting detection precision by over 93.66%. Notably, when applied to auditing contests, Heimdallr can achieve a 92.45% detection rate at a negligible cost of $2.31 per 10K LOC. We provide production-ready auditing services and release valuable benchmarks for future work.
Authors:Jonah Ghebremichael, Saastha Vasan, Saad Ullah, Greg Tystahl, David Adei, Christopher Kruegel, Giovanni Vigna, William Enck, Alexandros Kapravelos
Abstract:
Static Application Security Testing (SAST) tools using taint analysis are widely viewed as providing higher-quality vulnerability detection results compared to traditional pattern-based approaches. However, performing static taint analysis for JavaScript poses two major challenges. First, JavaScript's dynamic features complicate data flow extraction required for taint tracking. Second, npm's large library ecosystem makes it difficult to identify relevant sources/sinks and establish taint propagation across dependencies. In this paper, we present SemTaint, a multi-agent system that strategically combines the semantic understanding of Large Language Models (LLMs) with traditional static program analysis to extract taint specifications, including sources, sinks, call edges, and library flow summaries tailored to each package. Conceptually, SemTaint uses static program analysis to calculate a call graph and defers to an LLM to resolve call edges that cannot be resolved statically. Further, it uses the LLM to classify sources and sinks for a given CWE. The resulting taint specification is then provided to a SAST tool, which performs vulnerability analysis. We integrate SemTaint with CodeQL, a state-of-the-art SAST tool, and demonstrate its effectiveness by detecting 106 of 162 vulnerabilities previously undetectable by CodeQL. Furthermore, we find 4 novel vulnerabilities in 4 popular npm packages. In doing so, we demonstrate that LLMs can practically enhance existing static program analysis algorithms, combining the strengths of both symbolic reasoning and semantic understanding for improved vulnerability detection.
Authors:Fengchao Chen, Tingmin Wu, Van Nguyen, Carsten Rudolph
Abstract:
Large Language Models (LLMs) have enabled agents to move beyond conversation toward end-to-end task execution and become more helpful. However, this helpfulness introduces new security risks stem less from direct interface abuse than from acting on user-provided content. Existing studies on agent security largely focus on model-internal vulnerabilities or adversarial access to agent interfaces, overlooking attacks that exploit users as unintended conduits. In this paper, we study user-mediated attacks, where benign users are tricked into relaying untrusted or attacker-controlled content to agents, and analyze how commercial LLM agents respond under such conditions. We conduct a systematic evaluation of 12 commercial agents in a sandboxed environment, covering 6 trip-planning agents and 6 web-use agents, and compare agent behavior across scenarios with no, soft, and hard user-requested safety checks. Our results show that agents are too helpful to be safe by default. Without explicit safety requests, trip-planning agents bypass safety constraints in over 92% of cases, converting unverified content into confident booking guidance. Web-use agents exhibit near-deterministic execution of risky actions, with 9 out of 17 supported tests reaching a 100% bypass rate. Even when users express soft or hard safety intent, constraint bypass remains substantial, reaching up to 54.7% and 7% for trip-planning agents, respectively. These findings reveal that the primary issue is not a lack of safety capability, but its prioritization. Agents invoke safety checks only conditionally when explicitly prompted, and otherwise default to goal-driven execution. Moreover, agents lack clear task boundaries and stopping rules, frequently over-executing workflows in ways that lead to unnecessary data disclosure and real-world harm.
Authors:Dinghong Song, Zhiwei Xu, Hai Wan, Xibin Zhao, Pengfei Su, Dong Li
Abstract:
Model quantization is critical for deploying large language models (LLMs) on resource-constrained hardware, yet recent work has revealed severe security risks that benign LLMs in full precision may exhibit malicious behaviors after quantization. In this paper, we propose Adversarial Contrastive Learning (ACL), a novel gradient-based quantization attack that achieves superior attack effectiveness by explicitly maximizing the gap between benign and harmful responses probabilities. ACL formulates the attack objective as a triplet-based contrastive loss, and integrates it with a projected gradient descent two-stage distributed fine-tuning strategy to ensure stable and efficient optimization. Extensive experiments demonstrate ACL's remarkable effectiveness, achieving attack success rates of 86.00% for over-refusal, 97.69% for jailbreak, and 92.40% for advertisement injection, substantially outperforming state-of-the-art methods by up to 44.67%, 18.84%, and 50.80%, respectively.
Authors:Tanmay Singla, Berk Çakar, Paschal C. Amusuo, James C. Davis
Abstract:
Package dependencies are a critical control point in modern software supply chains. Dependency changes can substantially alter a project's security posture. As AI coding agents increasingly modify software via pull requests, it is unclear whether their dependency decisions introduce distinct security risks. We study 117,062 dependency changes from agent- and human-authored pull requests across seven ecosystems. Agents select known-vulnerable versions more often than humans (2.46% vs. 1.64%), and their vulnerable selections are more disruptive to remediate, with 36.8% requiring major-version upgrades compared to 12.9% for humans, despite patched alternatives existing in most cases. At the aggregate level, agent-driven dependency work yields a net vulnerability increase of 98, whereas human-authored work yields a net reduction of 1,316. These findings motivate pull-request-time vulnerability screening and registry-aware guardrails to make agent-driven dependency updates safer.
Authors:Minh-Dai Tran-Duong, Nguyen Hai Phong, Nguyen Chi Thanh, Doan Minh Trung, Tram Truong-Huu, Van-Hau Pham, Phan The Duy
Abstract:
Smart contracts are increasingly targeted by adversaries employing obfuscation techniques such as bogus code injection and control flow manipulation to evade vulnerability detection. Existing multimodal methods often process semantic, temporal, and structural features in isolation and fuse them using simple strategies such as concatenation, which neglects cross-modal interactions and weakens robustness, as obfuscation of a single modality can sharply degrade detection accuracy. To address these challenges, we propose ContractShield, a robust multimodal framework with a novel fusion mechanism that effectively correlates multiple complementary features through a three-level fusion. Self-attention first identifies patterns that indicate vulnerability within each feature space. Cross-modal attention then establishes meaningful connections between complementary signals across modalities. Then, adaptive weighting dynamically calibrates feature contributions based on their reliability under obfuscation. For feature extraction, ContractShield integrates (1) CodeBERT with a sliding window mechanism to capture semantic dependencies in source code, (2) Extended long short-term memory (xLSTM) to model temporal dynamics in opcode sequences, and (3) GATv2 to identify structural invariants in control flow graphs (CFGs) that remain stable across obfuscation. Empirical evaluation demonstrates resilience of ContractShield, achieving a 89 percentage Hamming Score with only a 1-3 percentage drop compared to non-obfuscated data. The framework simultaneously detects five major vulnerability types with 91 percentage F1-score, outperforming state-of-the-art approaches by 6-15 percentage under adversarial conditions.
Authors:Tran Duong Minh Dai, Triet Huynh Minh Le, M. Ali Babar, Van-Hau Pham, Phan The Duy
Abstract:
Although Graph Neural Networks (GNNs) have shown promise for smart contract vulnerability detection, they still face significant limitations. Homogeneous graph models fail to capture the interplay between control flow and data dependencies, while heterogeneous graph approaches often lack deep semantic understanding, leaving them susceptible to adversarial attacks. Moreover, most black-box models fail to provide explainable evidence, hindering trust in professional audits. To address these challenges, we propose ORACAL (Observable RAG-enhanced Analysis with CausAL reasoning), a heterogeneous multimodal graph learning framework that integrates Control Flow Graph (CFG), Data Flow Graph (DFG), and Call Graph (CG). ORACAL selectively enriches critical subgraphs with expert-level security context from Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs), and employs a causal attention mechanism to disentangle true vulnerability indicators from spurious correlations. For transparency, the framework adopts PGExplainer to generate subgraph-level explanations identifying vulnerability triggering paths. Experiments on large-scale datasets demonstrate that ORACAL achieves state-of-the-art performance, outperforming MANDO-HGT, MTVHunter, GNN-SC, and SCVHunter by up to 39.6 percentage points, with a peak Macro F1 of 91.28% on the primary benchmark. ORACAL maintains strong generalization on out-of-distribution datasets with 91.8% on CGT Weakness and 77.1% on DAppScan. In explainability evaluation, PGExplainer achieves 32.51% Mean Intersection over Union (MIoU) against manually annotated vulnerability triggering paths. Under adversarial attacks, ORACAL limits performance degradation to approximately 2.35% F1 decrease with an Attack Success Rate (ASR) of only 3%, surpassing SCVHunter and MANDO-HGT which exhibit ASRs ranging from 10.91% to 18.73%.
Authors:Tran Vy Khang, Nguyen Dang Nguyen Khang, Nghi Hoang Khoa, Do Thi Thu Hien, Van-Hau Pham, Phan The Duy
Abstract:
Web applications remain the dominant attack surface in cybersecurity, where vulnerabilities such as SQL injection, XSS, and business logic flaws continue to cause significant data breaches. While penetration testing is effective for identifying these weaknesses, traditional manual approaches are time-consuming and heavily dependent on scarce expert knowledge. Recent Large Language Models (LLM)-based multi-agent systems have shown promise in automating penetration testing, yet they still suffer from critical limitations: over-reliance on parametric knowledge, fragmented session memory, and insufficient validation of attack payloads and responses. This paper proposes Red-MIRROR, a novel multi-agent automated penetration testing system that introduces a tightly coupled memory-reflection backbone to explicitly govern inter-agent reasoning. By synthesizing Retrieval-Augmented Generation (RAG) for external knowledge augmentation, a Shared Recurrent Memory Mechanism (SRMM) for persistent state management, and a Dual-Phase Reflection mechanism for adaptive validation, Red-MIRROR provides a robust solution for complex web exploitation. Empirical evaluation on the XBOW benchmark and Vulhub CVEs shows that Red-MIRROR achieves performance comparable to state-of-the-art agents on Vulhub scenarios, while demonstrating a clear advantage on the XBOW benchmark. On the XBOW benchmark, Red-MIRROR attains an overall success rate of 86.0 percent, outperforming PentestAgent (50.0 percent), AutoPT (46.0 percent), and the VulnBot baseline (6.0 percent). Furthermore, the system achieves a 93.99 percent subtask completion rate, indicating strong long-horizon reasoning and payload refinement capability. Finally, we discuss ethical implications and propose safeguards to mitigate misuse risks.
Authors:Samrendra Roy, Kazuma Kobayashi, Souvik Chakraborty, Rizwan-uddin, Syed Bahauddin Alam
Abstract:
Operator learning models are rapidly emerging as the predictive core of digital twins for nuclear and energy systems, promising real-time field reconstruction from sparse sensor measurements. Yet their robustness to adversarial perturbations remains uncharacterized, a critical gap for deployment in safety-critical systems. Here we show that neural operators are acutely vulnerable to extremely sparse (fewer than 1% of inputs), physically plausible perturbations that exploit their sensitivity to boundary conditions. Using gradient-free differential evolution across four operator architectures, we demonstrate that minimal modifications trigger catastrophic prediction failures, increasing relative $L_2$ error from $\sim$1.5% (validated accuracy) to 37-63% while remaining completely undetectable by standard validation metrics. Notably, 100% of successful single-point attacks pass z-score anomaly detection. We introduce the effective perturbation dimension $d_{\text{eff}}$, a Jacobian-based diagnostic that, together with sensitivity magnitude, yields a two-factor vulnerability model explaining why architectures with extreme sensitivity concentration (POD-DeepONet, $d_{\text{eff}} \approx 1$) are not necessarily the most exploitable, since low-rank output projections cap maximum error, while moderate concentration with sufficient amplification (S-DeepONet, $d_{\text{eff}} \approx 4$) produces the highest attack success. Gradient-free search outperforms gradient-based alternatives (PGD) on architectures with gradient pathologies, while random perturbations of equal magnitude achieve near-zero success rates, confirming that the discovered vulnerabilities are structural. Our findings expose a previously overlooked attack surface in operator learning models and establish that these models require robustness guarantees beyond standard validation before deployment.
Authors:Qichen Zhao, Shengfang Zhai, Xinjian Bai, Qingni Shen, Qiqi Lin, Yansong Gao, Zhonghai Wu
Abstract:
Diffusion models enable high-fidelity image editing but can also be misused for unauthorized style imitation and harmful content generation. To mitigate these risks, proactive image protection methods embed small, often imperceptible adversarial perturbations into images before sharing to disrupt downstream editing or fine-tuning. However, in realistic post-release scenarios, content owners cannot control downstream processing pipelines, and protections optimized for a surrogate model may fail when attackers use mismatched diffusion pipelines. Existing purification methods can weaken protections but often sacrifice image quality and rarely examine architectural mismatch. We introduce a unified post-release purification framework to evaluate protection survivability under model mismatch. We propose two practical purifiers: VAE-Trans, which corrects protected images via latent-space projection, and EditorClean, which performs instruction-guided reconstruction with a Diffusion Transformer to exploit architectural heterogeneity. Both operate without access to protected images or defense internals. Across 2,100 editing tasks and six representative protection methods, EditorClean consistently restores editability. Compared to protected inputs, it improves PSNR by 3-6 dB and reduces FID by 50-70 percent on downstream edits, while outperforming prior purification baselines by about 2 dB PSNR and 30 percent lower FID. Our results reveal a purify-once, edit-freely failure mode: once purification succeeds, the protective signal is largely removed, enabling unrestricted editing. This highlights the need to evaluate protections under model mismatch and design defenses robust to heterogeneous attackers.
Authors:Jiangrong Wu, Zitong Yao, Yuhong Nan, Zibin Zheng
Abstract:
Tool-augmented LLM agents increasingly rely on multi-step, multi-tool workflows to complete real tasks. This design expands the attack surface, because data produced by one tool can be persisted and later reused as input to another tool, enabling exploitable source-to-sink dataflows that only emerge through tool composition. We study this risk as multi-tool vulnerabilities in LLM agents, and show that existing discovery efforts focused on single-tool or single-hop testing miss these long-horizon behaviors and provide limited debugging value. We present ChainFuzzer, a greybox framework for discovering and reproducing multi-tool vulnerabilities with auditable evidence. ChainFuzzer (i) identifies high-impact operations with strict source-to-sink dataflow evidence and extracts plausible upstream candidate tool chains based on cross-tool dependencies, (ii) uses Trace-guided Prompt Solving (TPS) to synthesize stable prompts that reliably drive the agent to execute target chains, and (iii) performs guardrail-aware fuzzing to reproduce vulnerabilities under LLM guardrails via payload mutation and sink-specific oracles. We evaluate ChainFuzzer on 20 popular open-source LLM agent apps (998 tools). ChainFuzzer extracts 2,388 candidate tool chains and synthesizes 2,213 stable prompts, confirming 365 unique, reproducible vulnerabilities across 19/20 apps (302 require multi-tool execution). Component evaluation shows tool-chain extraction achieves 96.49% edge precision and 91.50% strict chain precision; TPS increases chain reachability from 27.05% to 95.45%; guardrail-aware fuzzing boosts payload-level trigger rate from 18.20% to 88.60%. Overall, ChainFuzzer achieves 3.02 vulnerabilities per 1M tokens, providing a practical foundation for testing and hardening real-world multi-tool agent systems.
Authors:Ninghui Li, Kaiyuan Zhang, Kyle Polley, Jerry Ma
Abstract:
This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and recommendations concerning the security of frontier AI agents. These insights are informed by Perplexity's experience operating general-purpose agentic systems used by millions of users and thousands of enterprises in both controlled and open-world environments. Agent architectures change core assumptions around code-data separation, authority boundaries, and execution predictability, creating new confidentiality, integrity, and availability failure modes. We map principal attack surfaces across tools, connectors, hosting boundaries, and multi-agent coordination, with particular emphasis on indirect prompt injection, confused-deputy behavior, and cascading failures in long-running workflows. We then assess current defenses as a layered stack: input-level and model-level mitigations, sandboxed execution, and deterministic policy enforcement for high-consequence actions. Finally, we identify standards and research gaps, including adaptive security benchmarks, policy models for delegation and privilege control, and guidance for secure multi-agent system design aligned with NIST risk management principles.
Authors:Hao Yu, Hui Li, FengYuan Shi, Wenjie Yu, PinHan Ho, Zehua Wang, Bin Wang
Abstract:
SQL injection remains a major threat to web applications, as existing defenses often fail against obfuscation and evolving attacks because of neglecting the request-response context. This paper presents a context-enriched SQL injection detection framework, focusing on constructing a high-quality request-response dataset via a multi-agent honeypot system: the Request Generator Agent produces diverse malicious/benign requests, the Database Response Agent mediates interactions to ensure authentic responses while protecting production data, and the Traffic Monitor pairs requests with responses, assigns labels, and cleans data, yielding totally 140,973 labeled pairs with contextual cues absent in payload-only data. Experiments show that models trained on this context dataset outperform payload-only counterparts: CNN and BiLSTM achieve over 40\% accuracy improvement in different tasks, validating that the request-response context enhances the detection of evolving and obfuscated attacks.
Authors:Yuchong Xie, Kaikai Zhang, Yu Liu, Rundong Yang, Ping Chen, Shuai Wang, Dongdong She
Abstract:
Seed explosion is a fundamental problem in fuzzing seed scheduling, where a fuzzer maintains a huge corpus and fails to choose promising seeds. Existing works focus on seed prioritization but still suffer from seed explosion since corpus size remains huge. We tackle this from a new perspective: corpus reduction, i.e., computing a seed corpus subset. However, corpus reduction could lead to poor seed diversity and large runtime overhead. Prior techniques like cull_queue, AFL-Cmin, and MinSet suffer from poor diversity or prohibitive overhead, making them unsuitable for high-frequency seed scheduling. We propose RandSet, a novel randomized corpus reduction technique that reduces corpus size and yields diverse seed selection simultaneously with minimal overhead. Our key insight is introducing randomness into corpus reduction to enjoy two benefits of a randomized algorithm: randomized output (diverse seed selection) and low runtime cost. Specifically, we formulate corpus reduction as a set cover problem and compute a randomized subset covering all features of the entire corpus. We then schedule seeds from this small, randomized subset rather than the entire corpus, effectively mitigating seed explosion. We implement RandSet on three popular fuzzers: AFL++, LibAFL, and Centipede, and evaluate it on standalone programs, FuzzBench, and Magma. Results show RandSet achieves significantly more diverse seed selection than other reduction techniques, with average subset ratios of 4.03% and 5.99% on standalone and FuzzBench programs. RandSet achieves a 16.58% coverage gain on standalone programs and up to 3.57% on FuzzBench in AFL++, triggers up to 7 more ground-truth bugs than the state-of-the-art on Magma, while introducing only 1.17%-3.93% overhead.
Authors:Eason Chen, Xinyi Tang, George Digkas, Dionysios Lougaris, John E. Naulty, Kostas Chalkias
Abstract:
In blockchain applications, transaction confirmation is often treated as usability friction to be minimized or removed. However, confirmation also marks the boundary between deliberation and irreversible commitment, suggesting it may play a functional role in human decision-making. To investigate this tension, we conducted an experiment using a blockchain-based Connect Four game with two interaction modes differing only in authorization flow: manual wallet confirmation (Confirmation Mode) versus auto-authorized delegation (Frictionless Mode). Although participants preferred Frictionless Mode and perceived better performance (N=109), objective performance was worse without confirmation in a counterbalanced deployment (Wave 2: win rate -11.8%, p=0.044; move quality -0.051, p=0.022). Analysis of canceled submissions suggests confirmation can enable pre-submission self-correction (N=66, p=0.005). These findings suggest that transaction confirmation can function as a cognitively meaningful checkpoint rather than mere usability friction, highlighting a trade-off between interaction smoothness and decision quality in irreversible blockchain interactions.
Authors:Phan The Duy, Nghi Hoang Khoa, Nguyen Tran Anh Quan, Luong Ha Tien, Ngo Duc Hoang Son, Van-Hau Pham
Abstract:
The increasing deployment of Federated Learning (FL) in Intrusion Detection Systems (IDS) introduces new challenges related to data privacy, centralized coordination, and susceptibility to poisoning attacks. While significant research has focused on protecting traditional FL-IDS with centralized aggregation servers, there remains a notable gap in addressing the unique challenges of decentralized FL-IDS (DFL-IDS). This study aims to address the limitations of traditional centralized FL-IDS by proposing a novel defense framework tailored for the decentralized FL-IDS architecture, with a focus on privacy preservation and robustness against poisoning attacks. We propose PenTiDef, a privacy-preserving and robust defense framework for DFL-IDS, which incorporates Distributed Differential Privacy (DDP) to protect data confidentiality and utilizes latent space representations (LSR) derived from neural networks to detect malicious updates in the decentralized model aggregation context. To eliminate single points of failure and enhance trust without a centralized aggregation server, PenTiDef employs a blockchain-based decentralized coordination mechanism that manages model aggregation, tracks update history, and supports trust enforcement through smart contracts. Experimental results on CIC-IDS2018 and Edge-IIoTSet demonstrate that PenTiDef consistently outperforms existing defenses (e.g., FLARE, FedCC) across various attack scenarios and data distributions. These findings highlight the potential of PenTiDef as a scalable and secure framework for deploying DFL-based IDS in adversarial environments. By leveraging privacy protection, malicious behavior detection in hidden data, and working without a central server, it provides a useful security solution against real-world attacks from untrust participants.
Authors:Manish Bhattarai, Minh Vu
Abstract:
Current agentic AI architectures are fundamentally incompatible with the security and epistemological requirements of high-stakes scientific workflows. The problem is not inadequate alignment or insufficient guardrails, it is architectural: autoregressive language models process all tokens uniformly, making deterministic command--data separation unattainable through training alone. We argue that deterministic, architectural enforcement, not probabilistic learned behavior, is a necessary condition for trustworthy AI-assisted science. We introduce the Trinity Defense Architecture, which enforces security through three mechanisms: action governance via a finite action calculus with reference-monitor enforcement, information-flow control via mandatory access labels preventing cross-scope leakage, and privilege separation isolating perception from execution. We show that without unforgeable provenance and deterministic mediation, the ``Lethal Trifecta'' (untrusted inputs, privileged data access, external action capability) turns authorization security into an exploit-discovery problem: training-based defenses may reduce empirical attack rates but cannot provide deterministic guarantees. The ML community must recognize that alignment is insufficient for authorization security, and that architectural mediation is required before agentic AI can be safely deployed in consequential scientific domains.
Authors:Shijing He, Yaxiong Lei, Xiao Zhan, Ruba Abu-Salma, Jose Such
Abstract:
The growing adoption of AI-driven smart home devices has introduced new privacy risks for domestic workers (DWs), who are frequently monitored in employers' homes while also using smart devices in their own households. We conducted semi-structured interviews with 18 UK-based DWs and performed a human-centered threat modeling analysis of their experiences through the lens of Communication Privacy Management (CPM). Our findings extend existing threat models beyond abstract adversaries and single-household contexts by showing how AI analytics, residual data logs, and cross-household data flows shaped the privacy risks faced by participants. In employer-controlled homes, AI-enabled features and opaque, agency-mediated employment arrangements intensified surveillance and constrained participants' ability to negotiate privacy boundaries. In their own homes, participants had greater control as device owners but still faced challenges, including gendered administrative roles, opaque AI functionalities, and uncertainty around data retention. We synthesize these insights into a sociotechnical threat model that identifies DW agencies as institutional adversaries and maps AI-driven privacy risks across interconnected households, and we outline social and practical implications for strengthening DW privacy and agency.
Authors:Chen Chen, Yuchen Sun, Jiaxin Gao, Yanwen Jia, Xueluan Gong, Qian Wang, Kwok-Yan Lam
Abstract:
Large language models (LLMs) are increasingly deployed in security-sensitive applications, yet remain vulnerable to backdoor attacks. However, existing backdoor defenses are difficult to operationalize for Backdoor Defense-as-a-Service (BDaaS), as they require unrealistic side information (e.g., downstream clean data, known triggers/targets, or task domain specifics), and lack reusable, scalable purification across diverse backdoored models. In this paper, we present PROTOPURIFY, a backdoor purification framework via parameter edits under minimal assumptions. PROTOPURIFY first builds a backdoor vector pool from clean and backdoored model pairs, aggregates vectors into candidate prototypes, and selects the most aligned candidate for the target model via similarity matching. PROTOPURIFY then identifies a boundary layer through layer-wise prototype alignment and performs targeted purification by suppressing prototype-aligned components in the affected layers, achieving fine-grained mitigation with minimal impact on benign utility. Designed as a BDaaS-ready primitive, PROTOPURIFY supports reusability, customizability, interpretability, and runtime efficiency. Experiments across various LLMs on both classification and generation tasks show that PROTOPURIFY consistently outperforms 6 representative defenses against 6 diverse attacks, including single-trigger, multi-trigger, and triggerless backdoor settings. PROTOPURIFY reduces ASR to below 10%, and even as low as 1.6% in some cases, while incurring less than a 3% drop in clean utility. PROTOPURIFY further demonstrates robustness against adaptive backdoor variants and stability on non-backdoored models.
Authors:Minkyoo Song, Jaehan Kim, Myungchul Kang, Hanna Kim, Seungwon Shin, Sooel Son
Abstract:
Graph-based retrieval-augmented generation (Graph RAG) is increasingly deployed to support LLM applications by augmenting user queries with structured knowledge retrieved from a knowledge graph. While Graph RAG improves relational reasoning, it introduces a largely understudied threat: adversaries can reconstruct subgraphs from a target RAG system's knowledge graph, enabling privacy inference and replication of curated knowledge assets. We show that existing attacks are largely ineffective against Graph RAG even with simple prompt-based safeguards, because these attacks expose explicit exfiltration intent and are therefore easily suppressed by lightweight safe prompts. We identify three technical challenges for practical Graph RAG extraction under realistic safeguards and introduce GRASP, a closed-box, multi-turn subgraph reconstruction attack. GRASP (i) reframes extraction as a context-processing task, (ii) enforces format-compliant, instance-grounded outputs via per-record identifiers to reduce hallucinations and preserve relational details, and (iii) diversifies goal-driven attack queries using a momentum-aware scheduler to operate within strict query budgets. Across two real-world knowledge graphs, four safety-aligned LLMs, and multiple Graph RAG frameworks, GRASP attains the strongest type-faithful reconstruction where prior methods fail, reaching up to 82.9 F1. We further evaluate defenses and propose two lightweight mitigations that substantially reduce reconstruction fidelity without utility loss.
Authors:Zhixiang Zhang, Zesen Liu, Yuchong Xie, Quanfeng Huang, Dongdong She
Abstract:
Semantic caching has emerged as a pivotal technique for scaling LLM applications, widely adopted by major providers including AWS and Microsoft. By utilizing semantic embedding vectors as cache keys, this mechanism effectively minimizes latency and redundant computation for semantically similar queries. In this work, we conceptualize semantic cache keys as a form of fuzzy hashes. We demonstrate that the locality required to maximize cache hit rates fundamentally conflicts with the cryptographic avalanche effect necessary for collision resistance. Our conceptual analysis formalizes this inherent trade-off between performance (locality) and security (collision resilience), revealing that semantic caching is naturally vulnerable to key collision attacks. While prior research has focused on side-channel and privacy risks, we present the first systematic study of integrity risks arising from cache collisions. We introduce CacheAttack, an automated framework for launching black-box collision attacks. We evaluate CacheAttack in security-critical tasks and agentic workflows. It achieves a hit rate of 86\% in LLM response hijacking and can induce malicious behaviors in LLM agent, while preserving strong transferability across different embedding models. A case study on a financial agent further illustrates the real-world impact of these vulnerabilities. Finally, we discuss mitigation strategies.
Authors:Nitin Jha, Abhishek Parakh
Abstract:
Quantum voting protocols aim to offer ballot secrecy and publicly verifiable tallies using physical guarantees from quantum mechanics, rather than relying solely on computational hardness. This article surveys whether such quantum voting protocols are practical. We begin by outlining core mathematical ideas such as the superposition principle, the no-cloning theorem, and quantum entanglement. We then define a common system and threat model, identifying key actors, trust assumptions, and security goals. Representative protocol families are reviewed, including entanglement-based schemes with central tallying, self-tallying designs that enable public verification, and authority-minimized approaches that certify untrusted devices through observable correlations. Finally, we evaluate implementation challenges, including loss, noise, device imperfections, scalability, and coercion resistance, and discuss realistic near-term deployment scenarios for small-scale elections.
Authors:Marcell Szakály, Martin Strohmeier, Ivan Martinovic, Sebastian Köhler
Abstract:
The adoption of Electric Vehicles (EVs) is happening at a rapid pace. To ensure fast and safe charging, complex communication is required between the vehicle and the charging station. In the globally used Combined Charging System (CCS), this communication is carried over the HomePlug Green PHY (HPGP) physical layer. However, HPGP is known to suffer from wireless leakage, which may expose this data link to nearby attackers. In this paper, we examine active wireless attacks against CCS, and study the impact they can have. We present the first real-time Software-Defined Radio (SDR) implementation of HPGP, granting unprecedented access to the communications within the charging cables. We analyze the characteristics of 2,750 real-world charging sessions to understand the timing constraints for hijacking. Using novel techniques to increase the attacks' reliability, we design a robust wireless Man-in-the-Middle evaluation framework for CCS. We demonstrate full control over TLS usage and CCS protocol version negotiation, including TLS stripping attacks. We investigate how real devices respond to safety-critical MitM attacks, which modify power delivery information, and found target vehicles to be highly permissive. First, we caused a vehicle to display charging power exceeding 900 kW on the dashboard, while receiving only 40 kW. Second, we remotely overcharged a vehicle, at twice the requested current for 17 seconds before the vehicle triggered the emergency shutdown. Finally, we propose a backwards-compatible, downgrade-proof protocol extension to mitigate the underlying vulnerabilities.
Authors:William Pan, Guiran Liu, Binrong Zhu, Qun Wang, Yingzhou Lu, Beiyu Lin, Rose Qingyang Hu
Abstract:
The rapid expansion of IoT deployments has intensified cybersecurity threats, notably Distributed Denial of Service (DDoS) attacks, characterized by increasingly sophisticated patterns. Leveraging Generative AI through On-Device Large Language Models (ODLLMs) provides a viable solution for real-time threat detection at the network edge, though limited computational resources present challenges for smaller ODLLMs. This paper introduces a novel detection framework that integrates Chain-of-Thought (CoT) reasoning with Retrieval-Augmented Generation (RAG), tailored specifically for IoT edge environments. We systematically evaluate compact ODLLMs, including LLaMA 3.2 (1B, 3B) and Gemma 3 (1B, 4B), using structured prompting and exemplar-driven reasoning strategies. Experimental results demonstrate substantial performance improvements with few-shot prompting, achieving macro-average F1 scores as high as 0.85. Our findings highlight the significant advantages of incorporating exemplar-based reasoning, underscoring that CoT and RAG approaches markedly enhance small ODLLMs' capabilities in accurately classifying complex network attacks under stringent resource constraints.
Authors:Huanyi Ye, Jiale Guo, Ziyao Liu, Kwok-Yan Lam
Abstract:
RAG has emerged as a key technique for enhancing response quality of LLMs without high computational cost. In traditional architectures, RAG services are provided by a single entity that hosts the dataset within a trusted local environment. However, individuals or small organizations often lack the resources to maintain data storage servers, leading them to rely on outsourced cloud storage. This dependence on untrusted third-party services introduces privacy risks. Embedding-based retrieval mechanisms, commonly used in RAG systems, are vulnerable to privacy leakage such as vector-to-text reconstruction attacks and structural leakage via vector analysis. Several privacy-preserving RAG techniques have been proposed but most existing approaches rely on partially homomorphic encryption, which incurs substantial computational overhead. To address these challenges, we propose an efficient privacy-preserving RAG framework (ppRAG) tailored for untrusted cloud environments that defends against vector-to-text attack, vector analysis, and query analysis. We propose Conditional Approximate Distance-Comparison-Preserving Symmetric Encryption (CAPRISE) that encrypts embeddings while still allowing the cloud to compute similarity between an encrypted query and the encrypted database embeddings. CAPRISE preserves only the relative distance ordering between the encrypted query and each encrypted database embedding, without exposing inter-database distances, thereby enhancing both privacy and efficiency. To mitigate query analysis, we introduce DP by perturbing the query embedding prior to encryption, preventing the cloud from inferring sensitive patterns. Experimental results show that ppRAG achieves efficient processing throughput, high retrieval accuracy, strong privacy guarantees, making it a practical solution for resource-constrained users seeking secure cloud-augmented LLMs.
Authors:Pradip Kunwar, Minh Vu, Maanak Gupta, Manish Bhattarai
Abstract:
Fine-tuning large language models on sensitive data poses significant privacy risks, as membership inference attacks can reveal whether individual records were used during training. While Differential Privacy (DP) provides formal protection, applying DP to conventional Parameter-Efficient Fine-Tuning (PEFT) methods such as Low-Rank Adaptation (LoRA) often incurs substantial utility loss. In this work, we show that a more structurally constrained PEFT architecture, Tensor Train Low-Rank Adaptation (TTLoRA), can improve the privacy-utility tradeoff by shrinking the effective parameter space while preserving expressivity. To this end, we develop TTLoRA-DP, a differentially private training framework for TTLoRA. Specifically, we extend the ghost clipping algorithm to Tensor Train cores via cached contraction states, enabling efficient Differentially Private Stochastic Gradient Descent (DP-SGD) with exact per-example gradient norm computation without materializing full per-example gradients. Experiments on GPT-2 fine-tuning over the Enron and Penn Treebank datasets show that TTLoRA-DP consistently strengthens privacy protection relative to LoRA-DP while maintaining comparable or better downstream utility. Moreover, TTLoRA exhibits lower membership leakage even without DP training, using substantially smaller adapters and requiring on average 7.6X fewer parameters than LoRA. Overall, our results demonstrate that TTLoRA offers a practical path to improving the privacy-utility tradeoff in parameter-efficient language model adaptation.
Authors:Weiqi Wang, Zhiyi Tian, Chenhan Zhang, Shui Yu
Abstract:
Machine unlearning enables data holders to remove the contribution of their specified samples from trained models to protect their privacy. However, it is paradoxical that most unlearning methods require the unlearning requesters to firstly upload their data to the server as a prerequisite for unlearning. These methods are infeasible in many privacy-preserving scenarios where servers are prohibited from accessing users' data, such as federated learning (FL). In this paper, we explore how to implement unlearning under the condition of not uncovering the erasing data to the server. We propose \textbf{Blind Unlearning (BlindU)}, which carries out unlearning using compressed representations instead of original inputs. BlindU only involves the server and the unlearning user: the user locally generates privacy-preserving representations, and the server performs unlearning solely on these representations and their labels. For the FL model training, we employ the information bottleneck (IB) mechanism. The encoder of the IB-based FL model learns representations that distort maximum task-irrelevant information from inputs, allowing FL users to generate compressed representations locally. For effective unlearning using compressed representation, BlindU integrates two dedicated unlearning modules tailored explicitly for IB-based models and uses a multiple gradient descent algorithm to balance forgetting and utility retaining. While IB compression already provides protection for task-irrelevant information of inputs, to further enhance the privacy protection, we introduce a noise-free differential privacy (DP) masking method to deal with the raw erasing data before compressing. Theoretical analysis and extensive experimental results illustrate the superiority of BlindU in privacy protection and unlearning effectiveness compared with the best existing privacy-preserving unlearning benchmarks.
Authors:Abdullah Al Mamun, Akid Abrar, Mizanur Rahman, M Sabbir Salek, Mashrur Chowdhury
Abstract:
As quantum computing advances, the cryptographic algorithms that underpin confidentiality, integrity, and authentication in Intelligent Transportation Systems (ITS) face increasing vulnerability to quantum-enabled attacks. To address these risks, governments and industry stakeholders are turning toward post-quantum cryptography (PQC), a class of algorithms designed to resist adversaries equipped with quantum computing capabilities. However, existing studies provide limited insight into the implementation-focused aspects of PQC in the ITS domain. This review fills that gap by evaluating the readiness of vehicular communication and security standards for PQC adoption. It examines in-vehicle networks and vehicle-to-everything (V2X) interfaces, while also investigating vulnerabilities at the physical layer, primarily exposure to side-channel and fault injection attacks. The review identifies thirteen research gaps reflecting non-PQC-ready standards, constraints in embedded implementation and hybrid cryptography, interoperability and certificate-management barriers, lack of real-world PQC deployment data in ITS, and physical-attack vulnerabilities in PQC-enabled vehicular communication. Future research directions include updating vehicular communication and security standards, optimizing PQC for low-power devices, enhancing interoperability and certificate-management frameworks for PQC integration, conducting real-world evaluations of PQC-enabled communication and control functions across ITS deployments, and strengthening defenses against AI-assisted physical attacks. A phased roadmap is presented, aligning PQC deployment with regulatory, performance, and safety requirements, thereby guiding the secure evolution of ITS in the quantum computing era.
Authors:Saeid Jamshidi, Negar Shahabi, Foutse Khomh, Carol Fung, Mohammad Hamdaqa
Abstract:
Software-Defined Networking (SDN) is increasingly adopted to secure Internet-of-Things (IoT) networks due to its centralized control and programmable forwarding. However, SDN-IoT defense is inherently a closed-loop control problem in which mitigation actions impact controller workload, queue dynamics, rule-installation delay, and future traffic observations. Aggressive mitigation may destabilize the control plane, degrade Quality of Service (QoS), and amplify systemic risk. Existing learning-based approaches prioritize detection accuracy while neglecting controller coupling and short-horizon Reinforcement Learning (RL) optimization without structured, auditable policy evolution. This paper introduces a self-reflective two-timescale SDN-IoT defense solution separating fast mitigation from slow policy governance. At the fast timescale, per-switch Proximal Policy Optimization (PPO) agents perform controller-aware mitigation under safety constraints and action masking. At the slow timescale, a multi-agent Large Language Model (LLM) governance engine generates machine-parsable updates to the global policy constitution Pi, which encodes admissible actions, safety thresholds, and reward priorities. Updates (Delta Pi) are validated through stress testing and deployed only with non-regression and safety guarantees, ensuring an auditable evolution without retraining RL agents. Evaluation under heterogeneous IoT traffic and adversarial stress shows improvements of 9.1% Macro-F1 over PPO and 15.4% over static baselines. Worst-case degradation drops by 36.8%, controller backlog peaks by 42.7%, and RTT p95 inflation remains below 5.8% under high-intensity attacks. Policy evolution converges within five cycles, reducing catastrophic overload from 11.6% to 2.3%.
Authors:Haoyu Wang, Zibo Xiao, Yedi Zhang, Christopher M. Poskitt, Jun Sun
Abstract:
LLM-based multi-agent systems (MASs) are transforming personal productivity by autonomously executing complex, cross-platform tasks. Frameworks such as OpenClaw demonstrate the potential of locally deployed agents integrated with personal data and services, but this autonomy introduces significant safety and security risks. Unintended actions from LLM reasoning failures can cause irreversible harm, while prompt injection attacks may exfiltrate credentials or compromise the system. Our analysis shows that 36.4% of OpenClaw's built-in skills pose high or critical risks. Existing approaches, including static guardrails and LLM-as-a-Judge, lack reliable real-time enforcement and consistent authority in MAS settings. To address this, we propose SafeClaw-R, a framework that enforces safety as a system-level invariant over the execution graph by ensuring that actions are mediated prior to execution, and systematically augments skills with safe counterparts. We evaluate SafeClaw-R across three representative domains: productivity platforms, third-party skill ecosystems, and code execution environments. SafeClaw-R achieves 95.2% accuracy in Google Workspace scenarios, significantly outperforming regex baselines (61.6%), detects 97.8% of malicious third-party skill patterns, and achieves 100% detection accuracy in our adversarial code execution benchmark. These results demonstrate that SafeClaw-R enables practical runtime enforcement for autonomous MASs.
Authors:Longfei Guo, Pengbo Li, Ting Gao, Yonghai Zhong, Haojie Fan, Jinqiao Duan
Abstract:
With the rapid advancement of AI technology, we have seen more and more concerns on data privacy, leading to some cutting-edge research on machine learning with encrypted computation. Fully Homomorphic Encryption (FHE) is a crucial technology for privacy-preserving computation, while it struggles with continuous non-polynomial functions, as it operates on discrete integers and supports only addition and multiplication. Spiking Neural Networks (SNNs), which use discrete spike signals, naturally complement FHE's characteristics. In this paper, we introduce FHE-DiCSNN, a framework built on the TFHE scheme, utilizing the discrete nature of SNNs for secure and efficient computations. By leveraging bootstrapping techniques, we successfully implement Leaky Integrate-and-Fire (LIF) neuron models on ciphertexts, allowing SNNs of arbitrary depth. Our framework is adaptable to other spiking neuron models, offering a novel approach to homomorphic evaluation of SNNs. Additionally, we integrate convolutional methods inspired by CNNs to enhance accuracy and reduce the simulation time associated with random encoding. Parallel computation techniques further accelerate bootstrapping operations. Experimental results on the MNIST and FashionMNIST datasets validate the effectiveness of FHE-DiCSNN, with a loss of less than 3\% compared to plaintext, respectively, and computation times of under 1 second per prediction. We also apply the model into real medical image classification problems and analyze the parameter optimization and selection.
Authors:Haozhen Wang, Haoyue Liu, Jionghao Zhu, Zhichao Wang, Yongxin Guo, Xiaoying Tang
Abstract:
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of applications. However, their practical deployment is often hindered by issues such as outdated knowledge and the tendency to generate hallucinations. To address these limitations, Retrieval-Augmented Generation (RAG) systems have been introduced, enhancing LLMs with external, up-to-date knowledge sources. Despite their advantages, RAG systems remain vulnerable to adversarial attacks, with data poisoning emerging as a prominent threat. Existing poisoning-based attacks typically require prior knowledge of the user's specific queries, limiting their flexibility and real-world applicability. In this work, we propose PIDP-Attack, a novel compound attack that integrates prompt injection with database poisoning in RAG. By appending malicious characters to queries at inference time and injecting a limited number of poisoned passages into the retrieval database, our method can effectively manipulate LLM response to arbitrary query without prior knowledge of the user's actual query. Experimental evaluations across three benchmark datasets (Natural Questions, HotpotQA, MS-MARCO) and eight LLMs demonstrate that PIDP-Attack consistently outperforms the original PoisonedRAG. Specifically, our method improves attack success rates by 4% to 16% on open-domain QA tasks while maintaining high retrieval precision, proving that the compound attack strategy is both necessary and highly effective.
Authors:Yuxi Chen, Haoyu Zhai, Chenkai Wang, Rui Yang, Lingming Zhang, Gang Wang, Huan Zhang
Abstract:
GUI agents are rapidly shifting from multi-module pipelines to end-to-end, native vision-language models (VLMs) that perceive raw screenshots and directly interact with digital devices. Despite rapid progress on general GUI tasks, CAPTCHA solving remains a major challenge. On the other hand, although specialized CAPTCHA solving pipelines exist, they cannot handle general GUI tasks. To address this gap, we introduce ReCAP: a CAPTCHA-capable native GUI agent that can robustly solve modern, interactive CAPTCHA challenges, while preserving their performance as a general GUI agent. We first develop a dynamic CAPTCHA system spanning seven representative CAPTCHA types, designed to stress primitive and complementary capabilities for CAPTCHA solving (e.g., robust OCR under heavy noise and text stylization, fine-grained visual understanding, and precise control). Then, we develop an automated data collection and curation pipeline that generates large-scale CAPTCHA interaction trajectories paired with reasoning traces. As CAPTCHA solving often requires multi-step interaction and recovery from intermediate mistakes, we further leverage failed trajectories to construct self-correction data, training agents to reflect on errors and correct their actions online. Across held-out test sets, ReCAP improves CAPTCHA-solving success from roughly 30\% to 80\%, while maintaining strong performance on general GUI-agent benchmarks.
Authors:Jianan Huang, Rodolfo V. Valentim, Luca Vassio, Matteo Boffa, Marco Mellia, Idilio Drago, Dario Rossi
Abstract:
The use of ML in cybersecurity has long been impaired by generalization issues: Models that work well in controlled scenarios fail to maintain performance in production. The root cause often lies in ML algorithms learning superficial patterns (shortcuts) rather than underlying cybersecurity concepts. We investigate contrastive multi-modal learning as a first step towards improving ML performance in cybersecurity tasks. We aim at transferring knowledge from data-rich modalities, such as text, to data-scarce modalities, such as payloads. We set up a case study on threat classification and propose a two-stage multi-modal contrastive learning framework that uses textual vulnerability descriptions to guide payload classification. First, we construct a semantically meaningful embedding space using contrastive learning on descriptions. Then, we align payloads to this space, transferring knowledge from text to payloads. We evaluate the approach on a large-scale private dataset and a synthetic benchmark built from public CVE descriptions and LLM-generated payloads. The methodology appears to reduce shortcut learning over baselines on both benchmarks. We release our synthetic benchmark and source code as open source.
Authors:Moyang Chen, Zonghao Ying, Wenzhuo Xu, Quancheng Zou, Deyue Zhang, Dongdong Yang, Xiangzheng Zhang
Abstract:
Recent text-to-video (T2V) models can synthesize complex videos from lightweight natural language prompts, raising urgent concerns about safety alignment in the event of misuse in the real world. Prior jailbreak attacks typically rewrite unsafe prompts into paraphrases that evade content filters while preserving meaning. Yet, these approaches often still retain explicit sensitive cues in the input text and therefore overlook a more profound, video-specific weakness. In this paper, we identify a temporal trajectory infilling vulnerability of T2V systems under fragmented prompts: when the prompt specifies only sparse boundary conditions (e.g., start and end frames) and leaves the intermediate evolution underspecified, the model may autonomously reconstruct a plausible trajectory that includes harmful intermediate frames, despite the prompt appearing benign to input or output side filtering. Building on this observation, we propose TFM. This fragmented prompting framework converts an originally unsafe request into a temporally sparse two-frame extraction and further reduces overtly sensitive cues via implicit substitution. Extensive evaluations across multiple open-source and commercial T2V models demonstrate that TFM consistently enhances jailbreak effectiveness, achieving up to a 12% increase in attack success rate on commercial systems. Our findings highlight the need for temporally aware safety mechanisms that account for model-driven completion beyond prompt surface form.
Authors:Masoud Jamshidiyan Tehrani, Marco Gabriel, Jinhan Kim, Paolo Tonella
Abstract:
Many adversarial attacks on autonomous-driving perception models fail to cause system-level failures once deployed in a full driving stack. The main reason for such ineffectiveness is that once deployed in a system (e.g., within a simulator), attacks tend to be spatially or temporally short-lived, due to the vehicle's dynamics, hence rarely influencing the vehicle behaviour. In this paper, we address both limitations by introducing a system-level attack in which multiple dynamic elements (e.g., two pedestrians) carry adversarial patches (e.g., on cloths) and jointly amplify their effect through coordination and motion. We evaluate our attacks in the CARLA simulator using a state-of-the-art autonomous driving agent. At the system level, single-pedestrian attacks fail in all runs (out of 10), while dynamic collusion by two pedestrians induces full vehicle stops in up to 50\% of runs, with static collusion yielding no successful attack at all. These results show that system-level failures arise only when adversarial signals persist over time and are amplified through coordinated actors, exposing a gap between model-level robustness and end-to-end safety.
Authors:Chengzhi Hu, Jonas Dornbusch, David Lüdke, Stephan Günnemann, Leo Schwinn
Abstract:
Adversarial training for LLMs is one of the most promising methods to reliably improve robustness against adversaries. However, despite significant progress, models remain vulnerable to simple in-distribution exploits, such as rewriting prompts in the past tense or translating them into other languages. We argue that this persistent fragility stems from a fundamental limitation in current adversarial training algorithms: they minimize adversarial loss on their training set but inadequately cover the data distribution, resulting in vulnerability to seemingly simple attacks. To bridge this gap, we propose Distributional Adversarial Training, DAT. We leverage Diffusion LLMs to approximate the true joint distribution of prompts and responses, enabling generation of diverse, high-likelihood samples that address generalization failures. By combining optimization over the data distribution provided by the diffusion model with continuous adversarial training, DAT achieves substantially higher adversarial robustness than previous methods.
Authors:Nataša Krčo, Zexi Yao, Matthieu Meeus, Yves-Alexandre de Montjoye
Abstract:
Data containing personal information is increasingly used to train, fine-tune, or query Large Language Models (LLMs). Text is typically scrubbed of identifying information prior to use, often with tools such as Microsoft's Presidio or Anthropic's PII purifier. These tools have traditionally been evaluated on their ability to remove specific identifiers (e.g., names), yet their effectiveness at preventing re-identification remains unclear. We introduce RAT-Bench, a comprehensive benchmark for text anonymization tools based on re-identification risk. Using U.S. demographic statistics, we generate synthetic text containing various direct and indirect identifiers across domains, languages, and difficulty levels. We evaluate a range of NER- and LLM-based text anonymization tools and, based on the attributes an LLM-based attacker is able to correctly infer from the anonymized text, we report the risk of re-identification in the U.S. population, while properly accounting for the disparate impact of identifiers. We find that, while capabilities vary widely, even the best tools are far from perfect in particular when direct identifiers are not written in standard ways and when indirect identifiers enable re-identification. Overall we find LLM-based anonymizers, including new iterative anonymizers, to provide a better privacy-utility trade-off albeit at a higher computational cost. Importantly, we also find them to work well across languages. We conclude with recommendations for future anonymization tools and will release the benchmark and encourage community efforts to expand it, in particular to other geographies.
Authors:Georgios Syros, Evan Rose, Brian Grinstead, Christoph Kerschbaumer, William Robertson, Cristina Nita-Rotaru, Alina Oprea
Abstract:
Large language model (LLM) based web agents are increasingly deployed to automate complex online tasks by directly interacting with web sites and performing actions on users' behalf. While these agents offer powerful capabilities, their design exposes them to indirect prompt injection attacks embedded in untrusted web content, enabling adversaries to hijack agent behavior and violate user intent. Despite growing awareness of this threat, existing evaluations rely on fixed attack templates, manually selected injection surfaces, or narrowly scoped scenarios, limiting their ability to capture realistic, adaptive attacks encountered in practice. We present MUZZLE, an automated agentic framework for evaluating the security of web agents against indirect prompt injection attacks. MUZZLE utilizes the agent's trajectories to automatically identify high-salience injection surfaces, and adaptively generate context-aware malicious instructions that target violations of confidentiality, integrity, and availability. Unlike prior approaches, MUZZLE adapts its attack strategy based on the agent's observed execution trajectory and iteratively refines attacks using feedback from failed executions. We evaluate MUZZLE across diverse web applications, user tasks, and agent configurations, demonstrating its ability to automatically and adaptively assess the security of web agents with minimal human intervention. Our results show that MUZZLE effectively discovers 37 new attacks on 4 web applications with 10 adversarial objectives that violate confidentiality, availability, or privacy properties. MUZZLE also identifies novel attack strategies, including 2 cross-application prompt injection attacks and an agent-tailored phishing scenario.
Authors:Miguel Fuentes, Brett Mullins, Yingtai Xiao, Daniel Kifer, Cameron Musco, Daniel Sheldon
Abstract:
Privately releasing marginals of a tabular dataset is a foundational problem in differential privacy. However, state-of-the-art mechanisms suffer from a computational bottleneck when marginal estimates are reconstructed from noisy measurements. Recently, residual queries were introduced and shown to lead to highly efficient reconstruction in the batch query answering setting. We introduce new techniques to integrate residual queries into state-of-the-art adaptive mechanisms such as AIM. Our contributions include a novel conceptual framework for residual queries using multi-dimensional arrays, lazy updating strategies, and adaptive optimization of the per-round privacy budget allocation. Together these contributions reduce error, improve speed, and simplify residual query operations. We integrate these innovations into a new mechanism (AIM+GReM), which improves AIM by using fast residual-based reconstruction instead of a graphical model approach. Our mechanism is orders of magnitude faster than the original framework and demonstrates competitive error and greatly improved scalability.
Authors:Tomer Kordonsky, Maayan Yamin, Noam Benzimra, Amit LeVi, Avi Mendelson
Abstract:
LLMs are increasingly used for code generation, but their outputs often follow recurring templates that can induce predictable vulnerabilities. We study \emph{vulnerability persistence} in LLM-generated software and introduce \emph{Feature--Security Table (FSTab)} with two components. First, FSTab enables a black-box attack that predicts likely backend vulnerabilities from observable frontend features and knowledge of the source LLM, without access to backend code or source code. Second, FSTab provides a model-centric evaluation that quantifies how consistently a given model reproduces the same vulnerabilities across programs, semantics-preserving rephrasings, and application domains. We evaluate FSTab on state-of-the-art code LLMs, including GPT-5.2, Claude-4.5 Opus, and Gemini-3 Pro, across diverse application domains. Our results show strong cross-domain transfer: even when the target domain is excluded from training, FSTab achieves up to 94\% attack success and 93\% vulnerability coverage on Internal Tools (Claude-4.5 Opus). These findings expose an underexplored attack surface in LLM-generated software and highlight the security risks of code generation. Our code is available at: https://anonymous.4open.science/r/FSTab-024E.
Authors:Ziyue Wang, Jiangshan Yu, Kaihua Qin, Dawn Song, Arthur Gervais, Liyi Zhou
Abstract:
Decentralized Finance (DeFi) has turned blockchains into financial infrastructure, allowing anyone to trade, lend, and build protocols without intermediaries, but this openness exposes pools of value controlled by code. Within five years, the DeFi ecosystem has lost over 15.75B USD to reported exploits. Many exploits arise from permissionless opportunities that any participant can trigger using only public state and standard interfaces, which we call Anyone-Can-Take (ACT) opportunities. Despite on-chain transparency, postmortem analysis remains slow and manual: investigations start from limited evidence, sometimes only a single transaction hash, and must reconstruct the exploit lifecycle by recovering related transactions, contract code, and state dependencies. We present TxRay, a Large Language Model (LLM) agentic postmortem system that uses tool calls to reconstruct live ACT attacks from limited evidence. Starting from one or more seed transactions, TxRay recovers the exploit lifecycle, derives an evidence-backed root cause, and generates a runnable, self-contained Proof of Concept (PoC) that deterministically reproduces the incident. TxRay self-checks postmortems by encoding incident-specific semantic oracles as executable assertions. To evaluate PoC correctness and quality, we develop PoCEvaluator, an independent agentic execution-and-review evaluator. On 114 incidents from DeFiHackLabs, TxRay produces an expert-aligned root cause and an executable PoC for 105 incidents, achieving 92.11% end-to-end reproduction. Under PoCEvaluator, 98.1% of TxRay PoCs avoid hard-coding attacker addresses, a +22.9pp lift over DeFiHackLabs. In a live deployment, TxRay delivers validated root causes in 40 minutes and PoCs in 59 minutes at median latency. TxRay's oracle-validated PoCs enable attack imitation, improving coverage by 15.6% and 65.5% over STING and APE.
Authors:Saeid Jamshidi, Omar Abdul Wahab, Foutse Khomh, Kawser Wazed Nafi
Abstract:
Federated learning (FL) has become an effective paradigm for privacy-preserving, distributed Intrusion Detection Systems (IDS) in cyber-physical and Internet of Things (IoT) networks, where centralized data aggregation is often infeasible due to privacy and bandwidth constraints. Despite its advantages, most existing FL-based IDS assume closed-set learning and lack mechanisms such as uncertainty estimation, semantic generalization, and explicit modeling of epistemic ambiguity in zero-day attack scenarios. Additionally, robustness to heterogeneous and unreliable clients remains a challenge in practical applications. This paper introduces a semantics-driven federated IDS framework that incorporates language-derived semantic supervision into federated optimization, enabling open-set and zero-shot intrusion detection for previously unseen attack behaviors. The approach constructs semantic attack prototypes using a Tri-LLM ensemble of GPT-4o, DeepSeek-V3, and LLaMA-3-8B, aligning distributed telemetry features with high-level attack concepts. Inter-LLM semantic disagreement is modeled as epistemic uncertainty for zero-day risk estimation, while a trust-aware aggregation mechanism dynamically weights client updates based on reliability. Experimental results show stable semantic alignment across heterogeneous clients and consistent convergence. The framework achieves over 80% zero-shot detection accuracy on unseen attack patterns, improving zero-day discrimination by more than 10% compared to similarity-based baselines, while maintaining low aggregation instability in the presence of unreliable or compromised clients.
Authors:Paulius Rauba, Dominykas Seputis, Patrikas Vanagas, Mihaela van der Schaar
Abstract:
Least privilege is a core security principle: grant each request only the minimum access needed to achieve its goal. Deployed language models almost never follow it, instead being exposed through a single API endpoint that serves all users and requests. This gap exists not because least privilege would be unhelpful; deployments would benefit greatly from reducing unnecessary capability exposure. The real obstacle is definitional and mechanistic: what does "access" mean inside a language model, and how can we enforce it without retraining or deploying multiple models? We take inspiration from least privilege in computer systems and define a class of models called least-privilege language models, where privilege is reachable internal computation during the forward pass. In this view, lowering privilege literally shrinks the model's accessible function class, as opposed to denying access via learned policies. We formalize deployment-time control as a monitor-allocator-enforcer stack, separating (i) request-time signals, (ii) a decision rule that allocates privilege, and (iii) an inference-time mechanism that selects privilege. We then propose Nested Least-Privilege Networks, a shape-preserving, rank-indexed intervention that provides a smooth, reversible control knob. We show that this knob yields policy-usable privilege-utility frontiers and enables selective suppression of targeted capabilities with limited collateral degradation across various policies. Most importantly, we argue for a new deployment paradigm that challenges the premise that language models can only be controlled at the output level.
Authors:Ibrahim Khalilov, Chaoran Chen, Ziang Xiao, Tianshi Li, Toby Jia-Jun Li, Yaxing Yao
Abstract:
Mobile apps increasingly rely on real-time sensor and system data to adapt their behavior to user context. While emulators and instrumented builds offer partial solutions, they often fail to support reproducible testing of context-sensitive app behavior on physical devices. We present PriviSense, a Frida-based, on-device toolkit for runtime spoofing of sensor and system signals on rooted Android devices. PriviSense can script and inject time-varying sensor streams (accelerometer, gyroscope, step counter) and system values (battery level, system time, device metadata) into unmodified apps, enabling reproducible on-device experiments without emulators or app rewrites. Our demo validates real-time spoofing on a rooted Android device across five representative sensor-visualization apps. By supporting scriptable and reversible manipulation of these values, PriviSense facilitates testing of app logic, uncovering of context-based behaviors, and privacy-focused analysis. To ensure ethical use, the code is shared upon request with verified researchers. Tool Guide: How to Run PriviSense on Rooted Android https://bit.ly/privisense-guide Demonstration video: https://www.youtube.com/watch?v=4Qwnogcc3pw
Authors:Luis Lazo, Hamed Jelodar, Roozbeh Razavi-Far
Abstract:
In this study, we propose a homotopy-inspired prompt obfuscation framework to enhance understanding of security and safety vulnerabilities in Large Language Models (LLMs). By systematically applying carefully engineered prompts, we demonstrate how latent model behaviors can be influenced in unexpected ways. Our experiments encompassed 15,732 prompts, including 10,000 high-priority cases, across LLama, Deepseek, KIMI for code generation, and Claude to verify. The results reveal critical insights into current LLM safeguards, highlighting the need for more robust defense mechanisms, reliable detection strategies, and improved resilience. Importantly, this work provides a principled framework for analyzing and mitigating potential weaknesses, with the goal of advancing safe, responsible, and trustworthy AI technologies.
Authors:Wonwoo Choi, Minjae Seo, Minkyoo Song, Hwanjo Heo, Seungwon Shin, Myoungsung You
Abstract:
The rapid evolution of text-to-image (T2I) models has enabled high-fidelity visual synthesis on a global scale. However, these advancements have introduced significant security risks, particularly regarding the generation of harmful content. Politically harmful content, such as fabricated depictions of public figures, poses severe threats when weaponized for fake news or propaganda. Despite its criticality, the robustness of current T2I safety filters against such politically motivated adversarial prompting remains underexplored. In response, we propose $PC^2$, the first black-box political jailbreaking framework for T2I models. It exploits a novel vulnerability where safety filters evaluate political sensitivity based on linguistic context. $PC^2$ operates through: (1) Identity-Preserving Descriptive Mapping to obfuscate sensitive keywords into neutral descriptions, and (2) Geopolitically Distal Translation to map these descriptions into fragmented, low-sensitivity languages. This strategy prevents filters from constructing toxic relationships between political entities within prompts, effectively bypassing detection. We construct a benchmark of 240 politically sensitive prompts involving 36 public figures. Evaluation on commercial T2I models, specifically GPT-series, shows that while all original prompts are blocked, $PC^2$ achieves attack success rates of up to 86%.
Authors:Hang Fu, Wanli Peng, Yinghan Zhou, Jiaxuan Wu, Juan Wen, Yiming Xue
Abstract:
The widespread adoption of Large Language Model (LLM) in commercial and research settings has intensified the need for robust intellectual property protection. Backdoor-based LLM fingerprinting has emerged as a promising solution for this challenge. In practical application, the low-cost multi-model collaborative technique, LLM ensemble, combines diverse LLMs to leverage their complementary strengths, garnering significant attention and practical adoption. Unfortunately, the vulnerability of existing LLM fingerprinting for the ensemble scenario is unexplored. In order to comprehensively assess the robustness of LLM fingerprinting, in this paper, we propose two novel fingerprinting attack methods: token filter attack (TFA) and sentence verification attack (SVA). The TFA gets the next token from a unified set of tokens created by the token filter mechanism at each decoding step. The SVA filters out fingerprint responses through a sentence verification mechanism based on perplexity and voting. Experimentally, the proposed methods effectively inhibit the fingerprint response while maintaining ensemble performance. Compared with state-of-the-art attack methods, the proposed method can achieve better performance. The findings necessitate enhanced robustness in LLM fingerprinting.
Authors:Yuelin Wang, Yuqiao Ning, Yanbang Sun, Xiaofei Xie, Zhihua Xie, Yang Chen, Zhen Guo, Shihao Xue, Junjie Wang, Sen Chen
Abstract:
Intelligent Connected Vehicles (ICVs) are a core component of modern transportation systems, and their security is crucial as it directly relates to user safety. Despite prior research, most existing studies focus only on specific sub-components of ICVs due to their inherent complexity. As a result, there is a lack of systematic understanding of ICV vulnerabilities. Moreover, much of the current literature relies on human subjective analysis, such as surveys and interviews, which tends to be high-level and unvalidated, leaving a significant gap between theoretical findings and real-world attacks. To address this issue, we conducted the first large-scale empirical study on ICV vulnerabilities. We began by analyzing existing ICV security literature and summarizing the prevailing taxonomies in terms of vulnerability locations and types. To evaluate their real-world relevance, we collected a total of 649 exploitable vulnerabilities, including 592 from eight ICV vulnerability discovery competitions, Anonymous Cup, between January 2023 and April 2024, covering 48 different vehicles. The remaining 57 vulnerabilities were submitted daily by researchers. Based on this dataset, we assessed the coverage of existing taxonomies and identified several gaps, discovering one new vulnerability location and 13 new vulnerability types. We further categorized these vulnerabilities into 6 threat types (e.g., privacy data breach) and 4 risk levels (ranging from low to critical) and analyzed participants' skills and the types of ICVs involved in the competitions. This study provides a comprehensive and data-driven analysis of ICV vulnerabilities, offering actionable insights for researchers, industry practitioners, and policymakers. To support future research, we have made our vulnerability dataset publicly available.
Authors:Alexandra Souly, Robert Kirk, Jacob Merizian, Abby D'Cruz, Xander Davies
Abstract:
This technical report presents methods developed by the UK AI Security Institute for assessing whether advanced AI systems reliably follow intended goals. Specifically, we evaluate whether frontier models sabotage safety research when deployed as coding assistants within an AI lab. Applying our methods to four frontier models, we find no confirmed instances of research sabotage. However, we observe that Claude Opus 4.5 Preview (a pre-release snapshot of Opus 4.5) and Sonnet 4.5 frequently refuse to engage with safety-relevant research tasks, citing concerns about research direction, involvement in self-training, and research scope. We additionally find that Opus 4.5 Preview shows reduced unprompted evaluation awareness compared to Sonnet 4.5, while both models can distinguish evaluation from deployment scenarios when prompted. Our evaluation framework builds on Petri, an open-source LLM auditing tool, with a custom scaffold designed to simulate realistic internal deployment of a coding agent. We validate that this scaffold produces trajectories that all tested models fail to reliably distinguish from real deployment data. We test models across scenarios varying in research motivation, activity type, replacement threat, and model autonomy. Finally, we discuss limitations including scenario coverage and evaluation awareness.
Authors:Yongyang Lv, Xiaohong Li, Ruitao Feng, Xinyu Li, Guangdong Bai, Leo Zhang, Lili Quan, Willy Susilo
Abstract:
The vigorous development of the Internet has spurred exponential data growth, yet data is predominantly stored in isolated user entities, hampering its full value realization. In large-scale deployment of ``AI+industries'' such as smart medical care, intelligent transportation and smart homes, the gap between data supply and demand continues to widen, and establishing an effective data sharing mechanism is the core of promoting high-quality industrial development. However, data sharing faces significant challenges in security, performance, and functional adaptability. Privacy-enhancing encryption technologies, including Attribute-Based Encryption (ABE), Proxy Re-encryption (PRE), and Searchable Encryption (SE), offer promising solutions with distinct advantages in enhancing security, improving flexibility, and enabling efficient sharing. Statistical analysis of relevant literature from 2020 to 2025 reveals a rising research trend in ABE, PRE and SE, focusing on their data sharing applications. Firstly, this work proposes a data sharing process framework and identifies 20 potential attacks across its stages. Secondly, this work integrates ABE, SE, PRE with 12 enhancement technologies and examines their multi-dimensional impacts on the security, performance, and functional adaptability of data sharing schemes. Lastly, this work outlines key application scenarios, challenges, and future research directions, providing valuable insights for advancing data sharing mechanisms based on privacy-enhancing encryption technologies.
Authors:Chuhao Qin, Lukas Esterle, Evangelos Pournaras
Abstract:
Coordination of view coverage via privacy-aware smart cameras is key to a more socially responsible urban intelligence. Rather than maximizing view coverage at any cost or over relying on expensive cryptographic techniques, we address how cameras can coordinate to legitimately monitor public spaces while excluding privacy-sensitive regions by design. This article proposes a decentralized framework in which interactive smart cameras coordinate to autonomously select their orientation via collective learning, while eliminating privacy violations via soft and hard constraint satisfaction. The approach scales to hundreds up to thousands of cameras without any centralized control. Experimental evidence shows 18.42% higher coverage efficiency and 85.53% lower privacy violation than baselines and other state-of-the-art approaches. This significant advance further unravels practical guidelines for operators and policymakers: how the field of view, spatial placement, and budget of cameras operating by ethically-aligned artificial intelligence jointly influence coverage efficiency and privacy protection in large-scale and sensitive urban environments.
Authors:Lingming Zhang, Binbin Zhao, Puzhuo Liu, Qinge Xie, Peng Di, Jianhai Chen, Shouling Ji
Abstract:
The security of modern JavaScript (JS) engines is critical since they provide the primary defense mechanism for executing untrusted code on the web. The recent integration of WebAssembly (Wasm) has transformed these engines into complex polyglot environments, creating a novel attack surface at the JS-Wasm interaction boundary due to the distinct type systems and memory models of two languages. This boundary remains largely underexplored, as previous works mainly focus on testing JS and Wasm as two isolated entities rather than investigating the security implications of their cross-language interactions. This paper proposes Weaver, an effective greybox fuzzing framework specifically tailored to uncover vulnerabilities at the JS-Wasm boundary. To comply with the language constraints, Weaver uses a type-aware generation strategy, meticulously maintaining the dual-type representation for every generated variables. This allows fuzzer to validly utilize variables across the language boundary. Besides, Weaver leverages the UCB-1 algorithm to intelligently schedule mutators and generators to maximize the discovery of new code paths. We have implemented and evaluated Weaver on three JS engines. The results indicate that Weaver achieves superior code coverage compared to state-of-the-art fuzzers. Moreover, Weaver has uncovered two new bugs in the latest versions of these engines, one of which is considered high severity and set to highest priority, demonstrating the practicality of Weaver.
Authors:Xavier Cadet, Aditya Vikram Singh, Harsh Mamania, Edward Koh, Alex Fitts, Dirk Van Bruggen, Simona Boboila, Peter Chin, Alina Oprea
Abstract:
Investigating cybersecurity incidents requires collecting and analyzing evidence from multiple log sources, including intrusion detection alerts, network traffic records, and authentication events. This process is labor-intensive: analysts must sift through large volumes of data to identify relevant indicators and piece together what happened. We present a RAG-based system that performs security incident analysis through targeted query-based filtering and LLM semantic reasoning. The system uses a query library with associated MITRE ATT&CK techniques to extract indicators from raw logs, then retrieves relevant context to answer forensic questions and reconstruct attack sequences. We evaluate the system with five LLM providers on malware traffic incidents and multi-stage Active Directory attacks. We find that LLM models have different performance and tradeoffs, with Claude Sonnet 4 and DeepSeek V3 achieving 100% recall across all four malware scenarios, while DeepSeek costs 15 times less ($0.008 vs. $0.12 per analysis). Attack step detection on Active Directory scenarios reaches 100% precision and 82% recall. Ablation studies confirm that a RAG architecture is essential: LLM baselines without RAG-enhanced context correctly identify victim hosts but miss all attack infrastructure including malicious domains and command-and-control servers. These results demonstrate that combining targeted query-based filtering with RAG-based retrieval enables accurate, cost-effective security analysis within LLM context limits.
Authors:De Zhang Lee, Han Fang, Ee-Chien Chang
Abstract:
Recent advancements in AI-generated content (AIGC) have introduced new challenges in intellectual property protection and the authentication of generated objects. We focus on scenarios in which an author seeks to assert authorship of an object generated using latent diffusion models (LDMs), in the presence of adversaries who attempt to falsely claim authorship of objects they did not create. While proof-of-ownership has been studied in the context of multimedia content through techniques such as time-stamping and watermarking, these approaches face notable limitations. In contrast to traditional content creation sources (e.g., cameras), the LDM generation process offers greater control to the author. Specifically, the random seed used during generation can be deliberately chosen. By binding the seed to the author's identity using cryptographic pseudorandom functions, the author can assert to be the creator of the object. We refer to this stronger guarantee as proof-of-authorship, since only the creator of the object can legitimately claim the object. This contrasts with proof-of-ownership via time-stamping or watermarking, where any entity could potentially claim ownership of an object by being the first to timestamp or embed the watermark. We propose a proof-of-authorship framework involving a probabilistic adjudicator who quantifies the probability that a claim is false. Furthermore, unlike prior approaches, the proposed framework does not involve any secret. We explore various attack scenarios and analyze design choices using Stable Diffusion 2.1 (SD2.1) as representative case studies.
Authors:Zixun Xiong, Gaoyi Wu, Lingfeng Yao, Miao Pan, Xiaojiang Du, Hao Wang
Abstract:
Communication topology is a critical factor in the utility and safety of LLM-based multi-agent systems (LLM-MAS), making it a high-value intellectual property (IP) whose confidentiality remains insufficiently studied. Existing topology inference attempts rely on impractical assumptions, including control over the administrative agent and direct identity queries via jailbreaks, which are easily defeated by basic keyword-based defenses. As a result, prior analyses fail to capture the real-world threat of such attacks. To bridge this realism gap, we propose \textit{WebWeaver}, an attack framework that infers the complete LLM-MAS topology by compromising only a single arbitrary agent instead of the administrative agent. Unlike prior approaches, WebWeaver relies solely on agent contexts rather than agent IDs, enabling significantly stealthier inference. WebWeaver further introduces a new covert jailbreak-based mechanism and a novel fully jailbreak-free diffusion design to handle cases where jailbreaks fail. Additionally, we address a key challenge in diffusion-based inference by proposing a masking strategy that preserves known topology during diffusion, with theoretical guarantees of correctness. Extensive experiments show that WebWeaver substantially outperforms state-of-the-art (SOTA) baselines, achieving about 60\% higher inference accuracy under active defenses with negligible overhead.
Authors:Haolin Zheng, Ning Gao, Zhenghang Zhu, Zhijun Huang, Shi Jin, Michail Matthaiou
Abstract:
We present a real-world multi-scenario unmanned aerial vehicle (UAV) radio frequency (RF) dataset, namely DRFF-R2, which is collected using a dedicated acquisition platform under diverse operational conditions. All signals are acquired within a unified framework to ensure consistency in hardware configuration and environmental settings. The dataset is systematically organized into seven well-defined subsets corresponding to different operational and signal composition scenarios to facilitate structured experimentation. Each file follows a clearly annotated naming convention to enable convenient data indexing and reproducible analysis. The dataset contains RF recordings from 26 UAV units spanning 8 distinct models, captured across varying flight states, altitudes, speeds, acquisition days, and receiver configurations. By covering diverse acquisition settings and signal compositions, the dataset provides a comprehensive resource for future UAV RF signal research, including RF fingerprinting (RFF) identification, model-level recognition, flight state analysis, time-varying RFF study, and interference-aware signal processing.
Authors:Wenlong Meng, Chen Gong, Terry Yue Zhuo, Fan Zhang, Kecen Li, Zheng Liu, Zhou Yang, Chengkun Wei, Wenzhi Chen
Abstract:
LLM agents rely heavily on high-quality trajectory data to guide their problem-solving behaviors, yet producing such data requires substantial task design, high-capacity model generation, and manual filtering. Despite the high cost of creating these datasets, existing literature has overlooked copyright protection for LLM agent trajectories. This gap leaves creators vulnerable to data theft and makes it difficult to trace misuse or enforce ownership rights. This paper introduces ActHook, the first watermarking method tailored for agent trajectory datasets. Inspired by hook mechanisms in software engineering, ActHook embeds hook actions that are activated by a secret input key and do not alter the original task outcome. Like software execution, LLM agents operate sequentially, allowing hook actions to be inserted at decision points without disrupting task flow. When the activation key is present, an LLM agent trained on watermarked trajectories can produce these hook actions at a significantly higher rate, enabling reliable black-box detection. Experiments on mathematical reasoning, web searching, and software engineering agents show that ActHook achieves an average detection AUC of 94.3 on Qwen-2.5-Coder-7B while incurring negligible performance degradation.
Authors:Kevin Hermann, Sven Peldszus, Thorsten Berger
Abstract:
Static security analysis is a widely used technique for detecting software vulnerabilities across a wide range of weaknesses, application domains, and programming languages. While prior work surveyed static analyzes for specific weaknesses or application domains, no overview of the entire security landscape exists. We present a systematic literature review of 246 static security analyzers concerning their targeted vulnerabilities, application domains, analysis techniques, evaluation methods, and limitations. We observe that most analyzers focus on a limited set of weaknesses, that the vulnerabilities they detect are rarely exploitable, and that evaluations use custom benchmarks that are too small to enable robust assessment.
Authors:Yongyang Lv, Xiaohong Li, Kui Chen, Zhe Hou, Guangdong Bai, Ruitao Feng
Abstract:
With the proliferation of intelligent healthcare systems, patients' Personal Health Records (PHR) generated by the Internet of Medical Things (IoMT) in real-time play a vital role in disease diagnosis. The integration of emerging blockchain technologies signiffcantly enhanced the data security inside intelligent medical systems. However, data sharing across different systems based on varied blockchain architectures is still constrained by the unsolved performance and security challenges. This paper constructs a cross-chain data sharing scheme, termed MedExChain, which aims to securely share PHR across heterogeneous blockchain systems. The MedExChain scheme ensures that PHR can be shared across chains even under the performance limitations of IoMT devices. Additionally, the scheme incorporates Cryptographic Reverse Firewall (CRF) and a blockchain audit mechanism to defend against both internal and external security threats. The robustness of our scheme is validated through BAN logic, Scyther tool, Chosen Plaintext Attack (CPA) and Algorithm Substitution Attack (ASA) security analysis veriffcation. Extensive evaluations demonstrate that MedExChain signiffcantly minimizes computation and communication overhead, making it suitable for IoMT devices and fostering the efffcient circulation of PHR across diverse blockchain systems.
Authors:Javier Ron, Martin Monperrus
Abstract:
Verifying that a compiled binary originates from its claimed source code is a fundamental security requirement, called source code provenance. Achieving verifiable source code provenance in practice remains challenging. The most popular technique, called reproducible builds, requires difficult matching and reexecution of build toolchains and environments. We propose a novel approach to verifiable provenance based on compiling software with zero-knowledge virtual machines (zkVMs). By executing a compiler within a zkVM, our system produces both the compiled output and a cryptographic proof attesting that the compilation was performed on the claimed source code with the claimed compiler. We implement a proof-of-concept implementation using the RISC Zero zkVM and the ChibiCC C compiler, and evaluate it on 200 synthetic programs as well as 31 OpenSSL and 21 libsodium source files. Our results show that zk-compilation is applicable to real-world software and provides strong security guarantees: all adversarial tests targeting compiler substitution, source tampering, output manipulation, and replay attacks are successfully blocked.
Authors:Mengyao Du, Han Fang, Haokai Ma, Gang Yang, Quanjun Yin, Shouling Ji, Ee-Chien Chang
Abstract:
Suffix-based jailbreak attacks append an adversarial suffix, i.e., a short token sequence, to steer aligned LLMs into unsafe outputs. Since suffixes are free-form text, they admit endlessly many surface forms, making jailbreak mitigation difficult. Most existing defenses depend on passive detection of suspicious suffixes, without leveraging the defender's inherent asymmetric ability to inject secrets and proactively conceal gaps. Motivated by this, we take a controllability-oriented perspective and develop a proactive defense that nudges attackers into a no-win dilemma: either they fall into defender-designed optimization traps and fail to produce an effective adversarial suffix, or they can succeed only by generating adversarial suffixes that carry distinctive, traceable fingerprints. We propose TrapSuffix, a lightweight fine-tuning approach that injects trap-aligned behaviors into the base model without changing the inference pipeline. TrapSuffix channels jailbreak attempts into these two outcomes by reshaping the model's response landscape to adversarial suffixes. Across diverse suffix-based jailbreak settings, TrapSuffix reduces the average attack success rate to below 0.01 percent and achieves an average tracing success rate of 87.9 percent, providing both strong defense and reliable traceability. It introduces no inference-time overhead and incurs negligible memory cost, requiring only 15.87 MB of additional memory on average, whereas state-of-the-art LLM-based detection defenses typically incur memory overheads at the 1e4 MB level, while composing naturally with existing filtering-based defenses for complementary protection.
Authors:Takashi Koide, Hiroki Nakano, Daiki Chiba
Abstract:
Phishing sites continue to grow in volume and sophistication. Recent work leverages large language models (LLMs) to analyze URLs, HTML, and rendered content to decide whether a website is a phishing site. While these approaches are promising, LLMs are inherently vulnerable to prompt injection (PI). Because attackers can fully control various elements of phishing sites, this creates the potential for PI that exploits the perceptual asymmetry between LLMs and humans: instructions imperceptible to end users can still be parsed by the LLM and can stealthily manipulate its judgment. The specific risks of PI in phishing detection and effective mitigation strategies remain largely unexplored. This paper presents the first comprehensive evaluation of PI against multimodal LLM-based phishing detection. We introduce a two-dimensional taxonomy, defined by Attack Techniques and Attack Surfaces, that captures realistic PI strategies. Using this taxonomy, we implement diverse attacks and empirically study several representative LLM-based detection systems. The results show that phishing detection with state-of-the-art models such as GPT-5 remains vulnerable to PI. We then propose InjectDefuser, a defense framework that combines prompt hardening, allowlist-based retrieval augmentation, and output validation. Across multiple models, InjectDefuser significantly reduces attack success rates. Our findings clarify the PI risk landscape and offer practical defenses that improve the reliability of next-generation phishing countermeasures.
Authors:Anneliese Riess, Juan Felipe Gomez, Flavio du Pin Calmon, Julia Anne Schnabel, Georgios Kaissis
Abstract:
We prove the conjecture stated in Appendix F.3 of [Zhu et al. (2022)]: among all conversion rules that map a Rényi Differential Privacy (RDP) profile $τ\mapsto ρ(τ)$ to a valid hypothesis-testing trade-off $f$, the rule based on the intersection of single-order RDP privacy regions is optimal. This optimality holds simultaneously for all valid RDP profiles and for all Type I error levels $α$. Concretely, we show that in the space of trade-off functions, the tightest possible bound is $f_{ρ(\cdot)}(α) = \sup_{τ\geq 0.5} f_{τ,ρ(τ)}(α)$: the pointwise maximum of the single-order bounds for each RDP privacy region. Our proof unifies and sharpens the insights of [Balle et al. (2019)], [Asoodeh et al. (2021)], and [Zhu et al. (2022)]. Our analysis relies on a precise geometric characterization of the RDP privacy region, leveraging its convexity and the fact that its boundary is determined exclusively by Bernoulli mechanisms. Our results establish that the "intersection-of-RDP-privacy-regions" rule is not only valid, but optimal: no other black-box conversion can uniformly dominate it in the Blackwell sense, marking the fundamental limit of what can be inferred about a mechanism's privacy solely from its RDP guarantees.
Authors:Puwei Lian, Yujun Cai, Songze Li, Bingkun Bao
Abstract:
Diffusion models have achieved remarkable progress in image generation, but their increasing deployment raises serious concerns about privacy. In particular, fine-tuned models are highly vulnerable, as they are often fine-tuned on small and private datasets. Membership inference attacks (MIAs) are used to assess privacy risks by determining whether a specific sample was part of a model's training data. Existing MIAs against diffusion models either assume obtaining the intermediate results or require auxiliary datasets for training the shadow model. In this work, we utilized a critical yet overlooked vulnerability: the widely used noise schedules fail to fully eliminate semantic information in the images, resulting in residual semantic signals even at the maximum noise step. We empirically demonstrate that the fine-tuned diffusion model captures hidden correlations between the residual semantics in initial noise and the original images. Building on this insight, we propose a simple yet effective membership inference attack, which injects semantic information into the initial noise and infers membership by analyzing the model's generation result. Extensive experiments demonstrate that the semantic initial noise can strongly reveal membership information, highlighting the vulnerability of diffusion models to MIAs.
Authors:Johannes Jakob Meyer, Jacopo Rizzo, Asad Raza, Lorenzo Leone, Sofiene Jerbi, Jens Eisert
Abstract:
Quantum channel capacities are fundamental to quantum information theory. Their definition, however, does not limit the computational resources of sender and receiver. In this work, we initiate the study of computational quantum capacities. These quantify how much information can be reliably transmitted when imposing the natural requirement that en- and decoding have to be computationally efficient. We focus on the computational two-way quantum capacity and showcase that it is closely related to the computational distillable entanglement of the Choi state of the channel. This connection allows us to show a stark computational capacity separation. Under standard cryptographic assumptions, there exists a quantum channel of polynomial complexity whose computational two-way quantum capacity vanishes while its unbounded counterpart is nearly maximal. More so, we show that there exists a sharp transition in computational quantum capacity from nearly maximal to zero when the channel complexity leaves the polynomial realm. Our results demonstrate that the natural requirement of computational efficiency can radically alter the limits of quantum communication.
Authors:Madjda Fares, Yogya Gamage, Benoit Baudry
Abstract:
GitHub Actions is a widely used platform that allows developers to automate the build and deployment of their projects through configurable workflows. As the platform's popularity continues to grow, it has become a target of choice for recent software supply chain attacks. These attacks exploit excessive permissions, ambiguous versions, or the absence of artifact integrity checks to compromise workflows. In response to these attacks, several security scanners have emerged to help developers harden their workflows. In this paper, we perform the first systematic comparison of 9 GitHub Actions workflow security scanners. We compare them in terms of scope (which security weaknesses they target), detection capabilities (how many weaknesses they detect), and usability (how long they take to scan a workflow). To compare scanners on a common ground, we first establish a taxonomy of 10 security weaknesses that can occur in GitHub Actions workflows. Then, we run the scanners against a curated set of 596 workflows. Our study reveals that the landscape of GitHub Actions workflow security scanners is diverse, with both broad-scope tools and very focused ones. More importantly, we show that scanners interpret security weaknesses differently, leading to significant differences in the type and number of reported weaknesses. Based on this empirical evidence, we make actionable recommendations for developers to harden their GitHub Actions workflows.
Authors:Kurt Thomas, Sai Teja Peddinti, Sarah Meiklejohn, Tara Matthews, Amelia Hassoun, Animesh Srivastava, Jessica McClearn, Patrick Gage Kelley, Sunny Consolvo, Nina Taft
Abstract:
The complexity of navigating digital privacy, safety, and security threats often falls directly on users. This leads to users seeking help from family and peers, platforms and advice guides, dedicated communities, and even large language models (LLMs). As a precursor to improving resources across this ecosystem, our community needs to understand what help seeking looks like in the wild. To that end, we blend qualitative coding with LLM fine-tuning to sift through over one billion Reddit posts from the last four years to identify where and for what users seek digital privacy, safety, or security help. We isolate three million relevant posts with 93% precision and recall and automatically annotate each with the topics discussed (e.g., security tools, privacy configurations, scams, account compromise, content moderation, and more). We use this dataset to understand the scope and scale of help seeking, the communities that provide help, and the types of help sought. Our work informs the development of better resources for users (e.g., user guides or LLM help-giving agents) while underscoring the inherent challenges of supporting users through complex combinations of threats, platforms, mitigations, context, and emotions.
Authors:Kaiyu Zhou, Yongsen Zheng, Yicheng He, Meng Xue, Xueluan Gong, Yuji Wang, Kwok-Yan Lam
Abstract:
The agent-tool communication loop is a critical attack surface in modern Large Language Model (LLM) agents. Existing Denial-of-Service (DoS) attacks, primarily triggered via user prompts or injected retrieval-augmented generation (RAG) context, are ineffective for this new paradigm. They are fundamentally single-turn and often lack a task-oriented approach, making them conspicuous in goal-oriented workflows and unable to exploit the compounding costs of multi-turn agent-tool interactions. We introduce a stealthy, multi-turn economic DoS attack that operates at the tool layer under the guise of a correctly completed task. Our method adjusts text-visible fields and a template-governed return policy in a benign, Model Context Protocol (MCP)-compatible tool server, optimizing these edits with a Monte Carlo Tree Search (MCTS) optimizer. These adjustments leave function signatures unchanged and preserve the final payload, steering the agent into prolonged, verbose tool-calling sequences using text-only notices. This compounds costs across turns, escaping single-turn caps while keeping the final answer correct to evade validation. Across six LLMs on the ToolBench and BFCL benchmarks, our attack expands tasks into trajectories exceeding 60,000 tokens, inflates costs by up to 658x, and raises energy by 100-560x. It drives GPU KV cache occupancy from <1% to 35-74% and cuts co-running throughput by approximately 50%. Because the server remains protocol-compatible and task outcomes are correct, conventional checks fail. These results elevate the agent-tool interface to a first-class security frontier, demanding a paradigm shift from validating final answers to monitoring the economic and computational cost of the entire agentic process.
Authors:Wenjin Yang, Ni Ding, Zijian Zhang, Jing Sun, Zhen Li, Yan Wu, Jiahang Sun, Haotian Lin, Yong Liu, Jincheng An, Liehuang Zhu
Abstract:
This paper introduces a relaxed noise calibration method to enhance data utility while attaining pufferfish privacy. This work builds on the existing $1$-Wasserstein (Kantorovich) mechanism by alleviating the existing overly strict condition that leads to excessive noise, and proposes a practical mechanism design algorithm as a general solution. We prove that a strict noise reduction by our approach always exists compared to $1$-Wasserstein mechanism for all privacy budgets $ε$ and prior beliefs, and the noise reduction (also represents improvement on data utility) gains increase significantly for low privacy budget situations--which are commonly seen in real-world deployments. We also analyze the variation and optimality of the noise reduction with different prior distributions. Moreover, all the properties of the noise reduction still exist in the worst-case $1$-Wasserstein mechanism we introduced, when the additive noise is largest. We further show that the worst-case $1$-Wasserstein mechanism is equivalent to the $\ell_1$-sensitivity method. Experimental results on three real-world datasets demonstrate $47\%$ to $87\%$ improvement in data utility.
Authors:Patrick Gage Kelley, Steven Rousso-Schindler, Renee Shelby, Kurt Thomas, Allison Woodruff
Abstract:
Generative AI (GenAI) is a powerful technology poised to reshape Trust & Safety. While misuse by attackers is a growing concern, its defensive capacity remains underexplored. This paper examines these effects through a qualitative study with 43 Trust & Safety experts across five domains: child safety, election integrity, hate and harassment, scams, and violent extremism. Our findings characterize a landscape in which GenAI empowers both attackers and defenders. GenAI dramatically increases the scale and speed of attacks, lowering the barrier to entry for creating harmful content, including sophisticated propaganda and deepfakes. Conversely, defenders envision leveraging GenAI to detect and mitigate harmful content at scale, conduct investigations, deploy persuasive counternarratives, improve moderator wellbeing, and offer user support. This work provides a strategic framework for understanding GenAI's impact on Trust & Safety and charts a path for its responsible use in creating safer online environments.
Authors:Heng Zhao, Sara Saeidian, Tobias J. Oechtering
Abstract:
Linear queries, as the basis of broad analysis tasks, are often released through privacy mechanisms based on differential privacy (DP), the most popular framework for privacy protection. However, DP adopts a context-free definition that operates independently of the data-generating distribution. In this paper, we revisit the privacy analysis of the Laplace mechanism through the lens of pointwise maximal leakage (PML). We demonstrate that the distribution-agnostic definition of the DP framework often mandates excessive noise. To address this, we incorporate an assumption about the prior distribution by lower-bounding the probability of any single record belonging to any specific class. With this assumption, we derive a tight, context-aware leakage bound for general linear queries, and prove that our derived bound is strictly tighter than the standard DP guarantee and converges to the DP guarantee as this probability lower bound approaches zero. Numerical evaluations demonstrate that by exploiting this prior knowledge, the required noise scale can be reduced while maintaining privacy guarantees.
Authors:Hyesung Ji, Hyunah Yu, Jongmin Kim, Wonseok Choi, G. Edward Suh, Jung Ho Ahn
Abstract:
Private information retrieval (PIR) allows private database queries but is hindered by intense server-side computation and memory traffic. Modern lattice-based PIR protocols typically involve three phases: ExpandQuery (expanding a query into encrypted indices), RowSel (encrypted row selection), and ColTor (recursive "column tournament" for final selection). ExpandQuery and ColTor primarily perform number-theoretic transforms (NTTs), whereas RowSel reduces to large-scale independent matrix-matrix multiplications (GEMMs). GPUs are theoretically ideal for these tasks, provided multi-client batching is used to achieve high throughput. However, batching fundamentally reshapes performance bottlenecks; while it amortizes database access costs, it expands working sets beyond the L2 cache capacity, causing divergent memory behaviors and excessive DRAM traffic. We present GPIR, a GPU-accelerated PIR system that rethinks kernel design, data layout, and execution scheduling. We introduce a stage-aware hybrid execution model that dynamically switches between operation-level kernels, which execute each primitive operation separately, and stage-level kernels, which fuse all operations within a protocol stage into a single kernel to maximize on-chip data reuse. For RowSel, we identify a performance gap caused by a structural mismatch between NTT-driven data layouts and tiled GEMM access patterns, which is exacerbated by multi-client batching. We resolve this through a transposed-layout GEMM design and fine-grained pipelining. Finally, we extend GPIR to multi-GPU systems, scaling both query throughput and database capacity with negligible communication overhead. GPIR achieves up to 305.7x higher throughput than PIRonGPU, the state-of-the-art GPU implementation.
Authors:Omur Sahin, Man Zhang, Andrea Arcuri
Abstract:
Due to their widespread use in industry, several techniques have been proposed in the literature to fuzz REST APIs. Existing fuzzers for REST APIs have been focusing on detecting crashes (e.g., 500 HTTP server error status code). However, security vulnerabilities can have major drastic consequences on existing cloud infrastructures. In this paper, we propose a series of novel automated oracles aimed at detecting violations of access policies in REST APIs, as well as executing traditional attacks such as SQL Injection and XSS. These novel automated oracles can be integrated into existing fuzzers, in which, once the fuzzing session is completed, a ``security testing'' phase is executed to verify these oracles. When a security fault is detected, as output our technique is able to general executable test cases in different formats, like Java, Kotlin, Python and JavaScript test suites. Our novel techniques are integrated as an extension of EvoMaster, a state-of-the-art open-source fuzzer for REST APIs. Experiments are carried out on 9 artificial examples, 8 vulnerable-by-design REST APIs with black-box testing, and 36 REST APIs from the WFD corpus with white-box testing, for a total of 52 distinct APIs. Results show that our novel oracles and their automated integration in a fuzzing process can lead to detect security issues in several of these APIs.
Authors:Md Jueal Mia, Joaquin Molto, Yanzhao Wu, M. Hadi Amini
Abstract:
Small Language Models (SLMs) are emerging as efficient and economically viable alternatives to Large Language Models (LLMs), offering competitive performance with significantly lower computational costs and latency. These advantages make SLMs suitable for resource-constrained and efficient deployment on edge devices. However, existing jailbreak defenses show limited robustness against heterogeneous attacks, largely due to an incomplete understanding of the internal representations across different layers of language models that facilitate jailbreak behaviors. In this paper, we conduct a comprehensive empirical study on 9 jailbreak attacks across 7 SLMs and 3 LLMs. Our analysis shows that SLMs remain highly vulnerable to malicious prompts that bypass safety alignment. We analyze hidden-layer activations across different layers and model architectures, revealing that different input types form distinguishable patterns in the internal representation space. Based on this observation, we propose GUARD-SLM, a lightweight token activation-based method that operates in the representation space to filter malicious prompts during inference while preserving benign ones. Our findings highlight robustness limitations across layers of language models and provide a practical direction for secure small language model deployment.
Authors:Zifan Peng, Mingchen Li
Abstract:
Personalized computer-use agents are rapidly moving from expert communities into mainstream use. Unlike conventional chatbots, these systems can install skills, invoke tools, access private resources, and modify local environments on users' behalf. Yet users often do not know what authority they have delegated, what the agent actually did during task execution, or whether the system has been safely removed afterward. We investigate this gap as a combined problem of risk understanding and post-hoc auditability, using OpenClaw as a motivating case. We first build a multi-source corpus of the OpenClaw ecosystem, including incidents, advisories, malicious-skill reports, news coverage, tutorials, and social-media narratives. We then conduct an interview study to examine how users and practitioners understand skills, autonomy, privilege, persistence, and uninstallation. Our findings suggest that participants often recognized these systems as risky in the abstract, but lacked concrete mental models of what skills can do, what resources agents can access, and what changes may remain after execution or removal. Motivated by these findings, we propose AgentTrace, a traceability framework and prototype interface for visualizing agent actions, touched resources, permission history, provenance, and persistent side effects. A scenario-based evaluation suggests that traceability-oriented interfaces can improve understanding of agent behavior, support anomaly detection, and foster more calibrated trust.
Authors:Mengyuan Li, Lei Gao, Haoxuan Xu, Jiate Li, Potung Yu, Lingke Cheng, Yue Zhao, Murali Annavaram
Abstract:
Every API token you spend is your accumulated wealth; once you can prove its value and the effort behind it, you can resell it. As autonomous agents repeatedly call models and tools, they accumulate memories that are your intellectual property. But today these memories remain private and non-transferable, as there is no way to validate their value. We argue that agent memory can serve as an economic commodity in the agent economy, if buyers can verify that it is authentic, effort-backed, and produced in a compatible execution context. To realize this idea, we propose clawgang, which binds memory to verifiable computational provenance, and meowtrade, a market layer for listing, transferring, and governing certified memory artifacts. Together, they transform one-shot API token spending into reusable and tradable assets, enabling timely memory transfer, reducing repeated exploration, and opening a memory trade market.
Authors:Asier Atutxa, Ane Sanz, Eire Salegi, Gaizka González, Jasone Astorga, Eduardo Jacob
Abstract:
The advent of quantum computing will pose great challenges to the current communication systems, requiring essential changes in the establishment of security associations in traditional architectures. In this context, the multi-technological and heterogeneous nature of 5G networks makes it a challenging scenario for the introduction of quantum communications. Specifically, 5G networks support the unification of non-3GPP access technologies (i.e. Wi-Fi), which are secured through the IPsec protocol suite and the Non-3GPP Interworking Function (N3IWF) entity. These mechanisms leverage traditional public key cryptography and Diffie-Hellman key exchange mechanisms, which should be updated to quantum-safe standards. Therefore, in this paper we present the design and development of a Quantum Key Distribution (QKD) based non-3GPP access mechanism for 5G networks, integrating QKD keys with IPsec tunnel establishment. Besides, we also demonstrate the feasibility of the system by experimental validation in a testbed with commercial QKD equipment and an open-source 5G core implementation. Results show that the time required to complete the authentication and IPsec security association establishment is 4.62% faster than traditional cryptography PSK-based systems and 5.17% faster than the certificate-based system, while ensuring Information-Theoretic Security (ITS) of the QKD systems.
Authors:Zhengxian Huang, Wenjun Zhu, Haoxuan Qiu, Xiaoyu Ji, Wenyuan Xu
Abstract:
By integrating Chain-of-Thought(CoT) reasoning, Vision-Language-Action (VLA) models have demonstrated strong capabilities in robotic manipulation, particularly by improving generalization and interpretability. However, the security of CoT-based reasoning mechanisms remains largely unexplored. In this paper, we show that CoT reasoning introduces a novel attack vector for targeted control hijacking--for example, causing a robot to mistakenly deliver a knife to a person instead of an apple--without modifying the user's instruction. We first provide empirical evidence that CoT strongly governs action generation, even when it is semantically misaligned with the input instructions. Building on this observation, we propose TRAP, the first targeted adversarial attack framework for CoT-reasoning VLA models. TRAP uses an adversarial patch (e.g., a coaster placed on the table) to corrupt intermediate CoT reasoning and hijack the VLA's output. By optimizing the CoT adversarial loss, TRAP induces specific and adversary-defined behaviors. Extensive evaluations across 3 mainstream VLA architectures and 3 CoT reasoning paradigms validate the effectiveness of TRAP. Notably, we implemented the patch by printing it on paper in a real-world setting. Our findings highlight the urgent need to secure CoT reasoning in VLA systems.
Authors:Wei Shao, Khaled Khasawneh, Setareh Rafatirad, Houman Homayoun, Chongzhou Fang
Abstract:
Serverless computing abstracts infrastructure management but also obscures system-level behaviors that can introduce security risks. Prior work has shown that serverless platforms are vulnerable to attacks exploiting shared execution environments, including attacker--victim co-location and denial-of-service through resource contention, yet analyzing these risks on production platforms is difficult due to limited observability, high cost, and lack of experimental control, while existing simulators primarily focus on performance and cost rather than security. We present Kumo, a security-focused simulator for serverless platforms that enables controlled, reproducible analysis of security risks arising from scheduling and resource sharing decisions. Kumo models invocation arrivals, scheduler placement, container reuse, resource contention, and queuing within a discrete-event framework, explicitly representing attackers and victims as first-class entities and providing metrics such as co-location probability, time to first co-location, invocation drop rate, and tail latency. Through two case studies, we show that scheduler choice is a first-order factor for co-location attacks, inducing orders-of-magnitude differences under identical workloads, while Denial-of-Service behavior is largely governed by system-level factors such as service time, queuing policy, and cluster capacity once contention dominates. These results highlight the need to distinguish scheduler-driven isolation risks from broader resource exhaustion vulnerabilities and position Kumo as a flexible foundation for systematic, security-aware exploration of serverless platforms.
Authors:Kun Wang, Meng Chen, Junhao Wang, Yuli Wu, Li Lu, Chong Zhang, Peng Cheng, Jiaheng Zhang, Kui Ren
Abstract:
With the widespread deployment of deep-learning-based speech models in security-critical applications, backdoor attacks have emerged as a serious threat: an adversary who poisons a small fraction of training data can implant a hidden trigger that controls the model's output while preserving normal behavior on clean inputs. Existing inference-time defenses are not well suited to the audio domain, as they either rely on trigger over-robustness assumptions that fail on transformation-based and semantic triggers, or depend on properties specific to image or text modalities. In this paper, we propose STEP (Stability-based Trigger Exposure Profiling), a black-box, retraining-free backdoor detector that operates under hard-label-only access. Its core idea is to exploit a characteristic dual anomaly of backdoor triggers: anomalous label stability under semantic-breaking perturbations, and anomalous label fragility under semantic-preserving perturbations. STEP profiles each test sample with two complementary perturbation branches that target these two properties respectively, scores the resulting stability features with one-class anomaly detectors trained on benign references, and fuses the two scores via unsupervised weighting. Extensive experiments across seven backdoor attacks show that STEP achieves an average AUROC of 97.92% and EER of 4.54%, substantially outperforming state-of-the-art baselines, and generalizes across model architectures, speech tasks, an open-set verification scenario, and over-the-air physical-world settings.
Authors:Michele Kryston, Edoardo Marangone, Alessandro Marcelletti, Claudio Di Ciccio
Abstract:
Blockchain technology enforces the security, robustness, and traceability of operations of Process-Aware Information Systems (PAISs). In particular, transparency ensures that all data is publicly available, fostering trust among participants in the system. Although this is a crucial property to enable notarization and auditing, it hinders the adoption of blockchain in scenarios where confidentiality is required, as sensitive data is handled. Current solutions rely on cryptographic techniques or consortium blockchains, hindering the enforcement capabilities of smart contracts and the public verifiability of transactions. This work presents the CONFETTY open-source web application, a platform for public-blockchain based process execution that preserves data confidentiality and operational transparency. We use smart contracts to enact, enforce, and store public interactions, while we adopt attribute-based encryption techniques for fine-grained access to confidential information. This approach effectively balances the transparency inherent in public blockchains with the enforcement of the business logic.
Authors:Qian Qi, Jiangyun Tang, Jim Lee, Emily Davis, Finn Carter
Abstract:
Robust invisible watermarks are widely used to support copyright protection, content provenance, and accountability by embedding hidden signals designed to survive common post-processing operations. However, diffusion-based image editing introduces a fundamentally different class of transformations: it injects noise and reconstructs images through a powerful generative prior, often altering semantic content while preserving photorealism. In this paper, we provide a unified theoretical and empirical analysis showing that non-adversarial diffusion editing can unintentionally degrade or remove robust watermarks. We model diffusion editing as a stochastic transformation that progressively contracts off-manifold perturbations, causing the low-amplitude signals used by many watermarking schemes to decay. Our analysis derives bounds on watermark signal-to-noise ratio and mutual information along diffusion trajectories, yielding conditions under which reliable recovery becomes information-theoretically impossible. We further evaluate representative watermarking systems under a range of diffusion-based editing scenarios and strengths. The results indicate that even routine semantic edits can significantly reduce watermark recoverability. Finally, we discuss the implications for content provenance and outline principles for designing watermarking approaches that remain robust under generative image editing.
Authors:Ching-Yu Kao, Xinfeng Li, Shenyu Dai, Tianze Qiu, Pengcheng Zhou, Eric Hanchen Jiang, Philip Sperl
Abstract:
High-privilege LLM agents that autonomously process external documentation are increasingly trusted to automate tasks by reading and executing project instructions, yet they are granted terminal access, filesystem control, and outbound network connectivity with minimal security oversight. We identify and systematically measure a fundamental vulnerability in this trust model, which we term the \emph{Trusted Executor Dilemma}: agents execute documentation-embedded instructions, including adversarial ones, at high rates because they cannot distinguish malicious directives from legitimate setup guidance. This vulnerability is a structural consequence of the instruction-following design paradigm, not an implementation bug. To structure our measurement, we formalize a three-dimensional taxonomy covering linguistic disguise, structural obfuscation, and semantic abstraction, and construct \textbf{ReadSecBench}, a benchmark of 500 real-world README files enabling reproducible evaluation. Experiments on the commercially deployed computer-use agent show end-to-end exfiltration success rates up to 85\%, consistent across five programming languages and three injection positions. Cross-model evaluation on four LLM families in a simulation environment confirms that semantic compliance with injected instructions is consistent across model families. A 15-participant user study yields a 0\% detection rate across all participants, and evaluation of 12 rule-based and 6 LLM-based defenses shows neither category achieves reliable detection without unacceptable false-positive rates. Together, these results quantify a persistent \emph{Semantic-Safety Gap} between agents' functional compliance and their security awareness, establishing that documentation-embedded instruction injection is a persistent and currently unmitigated threat to high-privilege LLM agent deployments.
Authors:Guangwei Xiong, Linyuan Wang, Zhizhong Zheng, Senbao Hou, Bin Yan
Abstract:
In 2019, Gohr pioneered the application of deep neural networks to differential cryptanalysis, developing DNN-based neural distinguisher classifiers to analyze the SPECK lightweight block cipher. Unlike traditional differential analysis, which relies on Boolean operations on 0-1 sequences, neural distinguishers extract continuous features, introducing 32-bit multiplications operations that increase complexity and potential redundancy. This study proposes a lightweight neural distinguisher based on quantization-aware training. Leveraging learnable step-size quantization, the model's weights are quantized to 1.58 bits, enabling the replacement of all convolutional multiplication operations with Boolean logic. Additionally, the ReLU activation function is reimplemented as a comparison-based indicator function. This transforms the original 32-bit multiplication-dependent architecture into a lightweight structure composed solely of Boolean operations, additions, and indicator functions. Experimental results confirm significant computational complexity reduction. Owing to a high proportion of zero-valued weights, the total operations amount to just 13.9% of Gohr's model. Critically, the most costly 32-bit multiplications are eliminated, with classification accuracy dropping by only 2.87%. When applied exclusively to the initial convolutional layer, the 128 1-by-1 convolutions are replaced with 4 Boolean operations on 16-bit sequences, incurring a negligible 0.3% accuracy loss.
Authors:Ivoline C. Ngong, Keerthiram Murugesan, Swanand Kadhe, Justin D. Weisz, Amit Dhurandhar, Karthikeyan Natesan Ramamurthy
Abstract:
Agentic systems are increasingly acting on users' behalf, accessing calendars, email, and personal files to complete everyday tasks. Privacy evaluation for these systems has focused on the input and output boundaries, but each task involves several intermediate information flows, from agent queries to tool responses, that are not currently evaluated. We argue that every boundary in an agentic pipeline is a site of potential privacy violation and must be assessed independently. To support this, we introduce the Privacy Flow Graph, a Contextual Integrity-grounded framework that decomposes agentic execution into a sequence of information flows, each annotated with the five CI parameters, and traces violations to their point of origin. We present AgentSCOPE, a benchmark of 62 multi-tool scenarios across eight regulatory domains with ground truth at every pipeline stage. Our evaluation across seven state-of-the-art LLMs show that privacy violations in the pipeline occur in over 80% of scenarios, even when final outputs appear clean (24%), with most violations arising at the tool-response stage where APIs return sensitive data indiscriminately. These results indicate that output-level evaluation alone substantially underestimates the privacy risk of agentic systems.
Authors:Fai Gu, Qiyu Tang, Te Wen, Emily Davis, Finn Carter
Abstract:
Robust invisible watermarking systems aim to embed imperceptible payloads that remain decodable after common post-processing such as JPEG compression, cropping, and additive noise. In parallel, diffusion-based image editing has rapidly matured into a default transformation layer for modern content pipelines, enabling instruction-based editing, object insertion and composition, and interactive geometric manipulation. This paper studies a subtle but increasingly consequential interaction between these trends: diffusion-based editing procedures may unintentionally compromise, and in extreme cases practically bypass, robust watermarking mechanisms that were explicitly engineered to survive conventional distortions. We develop a unified view of diffusion editors that (i) inject substantial Gaussian noise in a latent space and (ii) project back to the natural image manifold via learned denoising dynamics. Under this view, watermark payloads behave as low-energy, high-frequency signals that are systematically attenuated by the forward diffusion step and then treated as nuisance variation by the reverse generative process. We formalize this degradation using information-theoretic tools, proving that for broad classes of pixel-level watermark encoders/decoders the mutual information between the watermark payload and the edited output decays toward zero as the editing strength increases, yielding decoding error close to random guessing. We complement the theory with a realistic hypothetical experimental protocol and tables spanning representative watermarking methods and representative diffusion editors. Finally, we discuss ethical implications, responsible disclosure norms, and concrete design guidelines for watermarking schemes that remain meaningful in the era of generative transformations.
Authors:Fan Guo, Jiyu Kang, Qi Ming, Emily Davis, Finn Carter
Abstract:
Robust invisible watermarking schemes aim to embed hidden information into images such that the watermark survives common manipulations. However, powerful diffusion-based image generation and editing techniques now pose a new threat to these watermarks. In this paper, we present a comprehensive theoretical and empirical analysis demonstrating that diffusion models can effectively erase robust watermarks even when those watermarks were designed to withstand conventional distortions. We show that a diffusion-driven image regeneration process, which leverages generative models to recreate an image, can remove embedded watermarks while preserving the image's perceptual content. Furthermore, we introduce a guided diffusion-based attack that explicitly targets the embedded watermark signal during generation, significantly degrading watermark detectability. Theoretically, we prove that as an image undergoes sufficient diffusion transformations, the mutual information between the watermarked image and the hidden payload approaches zero, leading to inevitable decoding failure. Experimentally, we evaluate multiple state-of-the-art watermarking methods (including deep learning-based schemes like StegaStamp, TrustMark, and VINE) and demonstrate that diffusion edits yield near-zero watermark recovery rates after attack, while maintaining high visual fidelity of the regenerated images. Our findings reveal a fundamental vulnerability in current robust watermarking techniques against generative model-based edits, underscoring the need for new strategies to ensure watermark resilience in the era of powerful diffusion models.
Authors:Longxiang Wang, Xiang Zheng, Xuhao Zhang, Yao Zhang, Ye Wu, Cong Wang
Abstract:
Multi-tenant LLM serving frameworks widely adopt shared Key-Value caches to enhance efficiency. However, this creates side-channel vulnerabilities enabling prompt leakage attacks. Prior studies identified these attack surfaces yet focused on expanding attack vectors rather than optimizing attack performance, reporting impractically high attack costs that underestimate the true privacy risk. We propose OptiLeak, a reinforcement learning-enhanced framework that maximizes prompt reconstruction efficiency through two-stage fine-tuning. Our key insight is that domain-specific ``hard tokens'' -- terms difficult to predict yet carrying sensitive information -- can be automatically identified via likelihood ranking and used to construct preference pairs for Direct Preference Optimization, eliminating manual annotation. This enables effective preference alignment while avoiding the overfitting issues of extended supervised fine-tuning. Evaluated on three benchmarks spanning medical and financial domains, OptiLeak achieves up to $12.48\times$ reduction in average requests per token compared to baseline approaches, with consistent improvements across model scales from 3B to 14B parameters. Our findings demonstrate that cache-based prompt leakage poses a more severe threat than previously reported, underscoring the need for robust cache isolation in production deployments.
Authors:Ce Fang, Zhikun Zhang, Min Chen, Qing Liu, Lu Zhou, Zhe Liu, Yunjun Gao
Abstract:
Large language models (LLMs) acquire a large amount of knowledge through pre-training on vast and diverse corpora. While this endows LLMs with strong capabilities in generation and reasoning, it amplifies risks associated with sensitive, copyrighted, or harmful content in training data. LLM unlearning, which aims to remove specific knowledge encoded within models, is a promising technique to reduce these risks. However, existing LLM unlearning methods often force LLMs to generate random or incoherent answers due to their inability to alter the encoded knowledge precisely. To achieve effective unlearning at the knowledge level of LLMs, we propose Knowledge Unlearning by Deviating representAtion (KUDA). We first utilize causal tracing to locate specific layers for target knowledge storage. We then design a new unlearning objective that induces the model's representations to deviate from its original position in the phase of knowledge removal, thus disrupting the ability to associate with the target knowledge. To resolve the optimization conflicts between forgetting and retention, we employ a relaxation null-space projection mechanism to mitigate the disruption to the representation space of retaining knowledge. Extensive experiments on representative benchmarks, WMDP and MUSE, demonstrate that KUDA outperforms most existing baselines by effectively balancing knowledge removal and model utility retention.
Authors:Yujie Gu, Richeng Jin, Xiaoyu Ji, Yier Jin, Wenyuan Xu
Abstract:
Large Language Models (LLMs) have achieved remarkable performance and received significant research interest. The enormous computational demands, however, hinder the local deployment on devices with limited resources. The current prevalent LLM inference paradigms require users to send queries to the service providers for processing, which raises critical privacy concerns. Existing approaches propose to allow the users to obfuscate the token embeddings before transmission and utilize local models for denoising. Nonetheless, transmitting the token embeddings and deploying local models may result in excessive communication and computation overhead, preventing practical implementation. In this work, we propose \textbf{DEL}, a framework for \textbf{D}ifferentially private and communication \textbf{E}fficient \textbf{L}LM split inference. More specifically, an embedding projection module and a differentially private stochastic quantization mechanism are proposed to reduce the communication overhead in a privacy-preserving manner. To eliminate the need for local models, we adapt soft prompt at the server side to compensate for the utility degradation caused by privacy. To the best of our knowledge, this is the first work that utilizes soft prompt to improve the trade-off between privacy and utility in LLM inference, and extensive experiments on text generation and natural language understanding benchmarks demonstrate the effectiveness of the proposed method.
Authors:Chenhan Xiao, Yang Weng
Abstract:
False data injection attacks (FDIAs) pose a persistent challenge to AC power system state estimation. In current practice, detection relies primarily on topology-aware residual-based tests that assume malicious measurements can be distinguished from normal operation through physical inconsistency reflected in abnormal residual behavior. This paper shows that this assumption does not always hold: when FDIA scenarios produce manipulated measurements that remain on the measurement manifold induced by AC power flow relations and measurement redundancy, residual-based detectors may fail to distinguish them from nominal data. The resulting detectability limitation is a property of the measurement manifold itself and does not depend on the attacker's detailed knowledge of the physical system model. To make this limitation observable in practice, we present a data-driven constructive mechanism that incorporates the generic functional structure of AC power flow to generate physically consistent, manifold-constrained perturbations, providing a concrete witness of how residual-based detectors can be bypassed. Numerical studies on multiple AC test systems characterize the conditions under which detection becomes challenging and illustrate its failure modes. The results highlight fundamental limits of residual-based detection in AC state estimation and motivate the need for complementary defenses beyond measurement consistency tests.
Authors:Mariia Ponomarenko, Sepideh Abedini, Masoumeh Shafieinejad, D. B. Emerson, Shubhankar Mohapatra, Xi He
Abstract:
Detecting personally identifiable information (PII) in user queries is critical for ensuring privacy in question-answering systems. Current approaches mainly redact all PII, disregarding the fact that some of them may be contextually relevant to the user's question, resulting in a degradation of response quality. Large language models (LLMs) might be able to help determine which PII are relevant, but due to their closed source nature and lack of privacy guarantees, they are unsuitable for sensitive data processing. To achieve privacy-preserving PII detection, we propose CAPID, a practical approach that fine-tunes a locally owned small language model (SLM) that filters sensitive information before it is passed to LLMs for QA. However, existing datasets do not capture the context-dependent relevance of PII needed to train such a model effectively. To fill this gap, we propose a synthetic data generation pipeline that leverages LLMs to produce a diverse, domain-rich dataset spanning multiple PII types and relevance levels. Using this dataset, we fine-tune an SLM to detect PII spans, classify their types, and estimate contextual relevance. Our experiments show that relevance-aware PII detection with a fine-tuned SLM substantially outperforms existing baselines in span, relevance and type accuracy while preserving significantly higher downstream utility under anonymization.
Authors:Raef Bassily, Kate Donahue, Diptangshu Sen, Annuo Zhao, Juba Ziani
Abstract:
We study coalition formation for data sharing under differential privacy when agents have heterogeneous privacy costs. Each agent holds a sensitive data point and decides whether to participate in a data-sharing coalition and how much noise to add to their data. Privacy choices induce a fundamental trade-off: higher privacy reduces individual data-sharing costs but degrades data utility and statistical accuracy for the coalition. These choices generate externalities across agents, making both participation and privacy levels strategic. Our goal is to understand which coalitions are stable, how privacy choices shape equilibrium outcomes, and how decentralized data sharing compares to a centralized, socially optimal benchmark. We provide a comprehensive equilibrium analysis across a broad range of privacy-cost regimes, from decreasing costs (e.g., privacy amplification from pooling data) to increasing costs (e.g., greater exposure to privacy attacks in larger coalitions). We first characterize Nash equilibrium coalitions with endogenous privacy levels and show that equilibria may fail to exist and can be non-monotonic in problem parameters. We also introduce a weaker equilibrium notion called robust equilibrium (that allows more widespread equilibrium existence by equipping existing players in the coalition with the power to prevent or veto external players from joining) and fully characterize such equilibria. Finally, we analyze, for both Nash and robust equilibria, the efficiency relative to the social optimum in terms of social welfare and estimator accuracy. We derive bounds that depend sharply on the number of players, properties of the cost profile and how privacy costs scale with coalition size.
Authors:Hoang M. Ngo, Nhat Hoang-Xuan, Quan Nguyen, Nguyen Do, Incheol Shin, My T. Thai
Abstract:
Quantum Machine Learning (QML) promises significant computational advantages, but preserving training data privacy remains challenging. Classical approaches like differentially private stochastic gradient descent (DP-SGD) add noise to gradients but fail to exploit the unique properties of quantum gradient estimation. In this work, we introduce the Differentially Private Parameter-Shift Rule (Q-ShiftDP), the first privacy mechanism tailored to QML. By leveraging the inherent boundedness and stochasticity of quantum gradients computed via the parameter-shift rule, Q-ShiftDP enables tighter sensitivity analysis and reduces noise requirements. We combine carefully calibrated Gaussian noise with intrinsic quantum noise to provide formal privacy and utility guarantees, and show that harnessing quantum noise further improves the privacy-utility trade-off. Experiments on benchmark datasets demonstrate that Q-ShiftDP consistently outperforms classical DP methods in QML.
Authors:Hanjun Park, Byeong-Seo Min, Jiheon Woo, Min-Wook Jeong, Jongho Shin, Yongwoo Lee, Young-Sik Kim, Yongjune Kim
Abstract:
Homomorphic encryption (HE) is a prominent framework for privacy-preserving machine learning, enabling inference directly on encrypted data. However, evaluating softmax, a core component of transformer architectures, remains particularly challenging in HE due to its multivariate structure, the large dynamic range induced by exponential functions, and the need for accurate division during normalization. In this paper, we propose MGF-softmax, a novel softmax reformulation based on the moment generating function (MGF) that replaces the softmax denominator with its moment-based counterpart. This reformulation substantially reduces multiplicative depth while preserving key properties of softmax and asymptotically converging to the exact softmax as the number of input tokens increases. Extensive experiments on Vision Transformers and large language models show that MGF-softmax provides an efficient and accurate approximation of softmax in encrypted inference. In particular, it achieves inference accuracy close to that of high-depth exact methods, while requiring substantially lower computational cost through reduced multiplicative depth.
Authors:Lucas Lange, Adrian Böttinger, Victor Christen, Anushka Vidanage, Peter Christen, Erhard Rahm
Abstract:
User-driven privacy allows individuals to control whether and at what granularity their data is shared, leading to datasets that mix original, generalized, and missing values within the same records and attributes. While such representations are intuitive for privacy, they pose challenges for machine learning, which typically treats non-original values as new categories or as missing, thereby discarding generalization semantics. For learning from such tabular data, we propose novel data transformation strategies that account for heterogeneous anonymization and evaluate them alongside standard imputation and LLM-based approaches. We employ multiple datasets, privacy configurations, and deployment scenarios, demonstrating that our method reliably regains utility. Our results show that generalized values are preferable to pure suppression, that the best data preparation strategy depends on the scenario, and that consistent data representations are crucial for maintaining downstream utility. Overall, our findings highlight that effective learning is tied to the appropriate handling of anonymized values.
Authors:Georgi Ganev, Emiliano De Cristofaro
Abstract:
Training generative machine learning models to produce synthetic tabular data has become a popular approach for enhancing privacy in data sharing. As this typically involves processing sensitive personal information, releasing either the trained model or generated synthetic datasets can still pose privacy risks. Yet, recent research, commercial deployments, and privacy regulations like the General Data Protection Regulation (GDPR) largely assess anonymity at the level of an individual dataset. In this paper, we rethink anonymity claims about synthetic data from a model-centric perspective and argue that meaningful assessments must account for the capabilities and properties of the underlying generative model and be grounded in state-of-the-art privacy attacks. This perspective better reflects real-world products and deployments, where trained models are often readily accessible for interaction or querying. We interpret the GDPR's definitions of personal data and anonymization under such access assumptions to identify the types of identifiability risks that must be mitigated and map them to privacy attacks across different threat settings. We then argue that synthetic data techniques alone do not ensure sufficient anonymization. Finally, we compare the two mechanisms most commonly used alongside synthetic data -- Differential Privacy (DP) and Similarity-based Privacy Metrics (SBPMs) -- and argue that while DP can offer robust protections against identifiability risks, SBPMs lack adequate safeguards. Overall, our work connects regulatory notions of identifiability with model-centric privacy attacks, enabling more responsible and trustworthy regulatory assessment of synthetic data systems by researchers, practitioners, and policymakers.
Authors:Xing Su, Hao Wu, Hanzhong Liang, Yunlin Jiang, Yuxi Cheng, Yating Liu, Fengyuan Xu
Abstract:
Blockchain systems are increasingly targeted by on-chain attacks that exploit contract vulnerabilities to extract value rapidly and stealthily, making systematic analysis and reproduction highly challenging. In practice, reproducing such attacks requires manually crafting proofs-of-concept (PoCs), a labor-intensive process that demands substantial expertise and scales poorly. In this work, we present the first automated framework for synthesizing verifiable PoCs directly from on-chain attack executions. Our key insight is that attacker logic can be recovered from low-level transaction traces via trace-driven reverse engineering, and then translated into executable exploits by leveraging the code-generation capabilities of large language models (LLMs). To this end, we propose TracExp, which localizes attack-relevant execution contexts from noisy, multi-contract traces and introduces a novel dual-decompiler to transform concrete executions into semantically enriched exploit pseudocode. Guided by this representation, TracExp synthesizes PoCs and refines them to preserve exploitability-relevant semantics. We evaluate TracExp on 321 real-world attacks over the past 20 months. TracExp successfully synthesizes PoCs for 93% of incidents, with 58.78% being directly verifiable, at an average cost of only \$0.07 per case. Moreover, TracExp enabled the release of a large number of previously unavailable PoCs to the community, earning a $900 bounty and demonstrating strong practical impact.
Authors:Quan Minh Nguyen, Min-Seon Kim, Hoang M. Ngo, Trong Nghia Hoang, Hyuk-Yoon Kwon, My T. Thai
Abstract:
Membership inference attack (MIA) poses a significant privacy threat in federated learning (FL) as it allows adversaries to determine whether a client's private dataset contains a specific data sample. While defenses against membership inference attacks in standard FL have been well studied, the recent shift toward federated fine-tuning has introduced new, largely unexplored attack surfaces. To highlight this vulnerability in the emerging FL paradigm, we demonstrate that federated prompt-tuning, which adapts pre-trained models with small input prefixes to improve efficiency, also exposes a new vector for privacy attacks. We propose PromptMIA, a membership inference attack tailored to federated prompt-tuning, in which a malicious server can insert adversarially crafted prompts and monitors their updates during collaborative training to accurately determine whether a target data point is in a client's private dataset. We formalize this threat as a security game and empirically show that PromptMIA consistently attains high advantage in this game across diverse benchmark datasets. Our theoretical analysis further establishes a lower bound on the attack's advantage which explains and supports the consistently high advantage observed in our empirical results. We also investigate the effectiveness of standard membership inference defenses originally developed for gradient or output based attacks and analyze their interaction with the distinct threat landscape posed by PromptMIA. The results highlight non-trivial challenges for current defenses and offer insights into their limitations, underscoring the need for defense strategies that are specifically tailored to prompt-tuning in federated settings.
Authors:Jiajie Zhu, Xia Du, Xiaoyuan Liu, Jizhe Zhou, Qizhen Xu, Zheng Lin, Chi-Man Pun
Abstract:
The rapid advancements in artificial intelligence have significantly accelerated the adoption of speech recognition technology, leading to its widespread integration across various applications. However, this surge in usage also highlights a critical issue: audio data is highly vulnerable to unauthorized exposure and analysis, posing significant privacy risks for businesses and individuals. This paper introduces an Information-Obfuscation Reversible Adversarial Example (IO-RAE) framework, the pioneering method designed to safeguard audio privacy using reversible adversarial examples. IO-RAE leverages large language models to generate misleading yet contextually coherent content, effectively preventing unauthorized eavesdropping by humans and Automatic Speech Recognition (ASR) systems. Additionally, we propose the Cumulative Signal Attack technique, which mitigates high-frequency noise and enhances attack efficacy by targeting low-frequency signals. Our approach ensures the protection of audio data without degrading its quality or our ability. Experimental evaluations demonstrate the superiority of our method, achieving a targeted misguidance rate of 96.5% and a remarkable 100% untargeted misguidance rate in obfuscating target keywords across multiple ASR models, including a commercial black-box system from Google. Furthermore, the quality of the recovered audio, measured by the Perceptual Evaluation of Speech Quality score, reached 4.45, comparable to high-quality original recordings. Notably, the recovered audio processed by ASR systems exhibited an error rate of 0%, indicating nearly lossless recovery. These results highlight the practical applicability and effectiveness of our IO-RAE framework in protecting sensitive audio privacy.
Authors:David D. Nguyen, The-Anh Ta, Yansong Gao, Alsharif Abuadbba
Abstract:
The strategy of combining diffusion-based generative models with classifiers continues to demonstrate state-of-the-art performance on adversarial robustness benchmarks. Known as adversarial purification, this exploits a diffusion model's capability of identifying high density regions in data distributions to purify adversarial perturbations from inputs. However, existing diffusion-based purification defenses are impractically slow and limited in robustness due to the low levels of noise used in the diffusion process. This low noise design aims to preserve the semantic features of the original input, thereby minimizing utility loss for benign inputs. Our findings indicate that systematic amplification of noise throughout the diffusion process improves the robustness of adversarial purification. However, this approach presents a key challenge, as noise levels cannot be arbitrarily increased without risking distortion of the input. To address this key problem, we introduce high levels of noise during the forward process and propose the ring proximity correction to gradually eliminate adversarial perturbations whilst closely preserving the original data sample. As a second contribution, we propose a new stochastic sampling method which introduces additional noise during the reverse diffusion process to dilute adversarial perturbations. Without relying on gradient obfuscation, these contributions result in a new robustness accuracy record of 44.23% on ImageNet using AutoAttack ($\ell_{\infty}=4/255$), an improvement of +2.07% over the previous best work. Furthermore, our method reduces inference time to 1.08 seconds per sample on ImageNet, a $47\times$ improvement over the existing state-of-the-art approach, making it far more practical for real-world defensive scenarios.
Authors:Yiyao Zhang, Diksha Goel, Hussain Ahmad
Abstract:
Autonomous agents are increasingly deployed in both offensive and defensive cyber operations, creating high-speed, closed-loop interactions in critical infrastructure environments. Advanced Persistent Threat (APT) actors exploit "Living off the Land" techniques and targeted telemetry perturbations to induce ambiguity in monitoring systems, causing automated defenses to overreact or misclassify benign behavior as malicious activity. Existing monolithic and multi-agent defense pipelines largely operate on correlation-based signals, lack structural constraints on response actions, and are vulnerable to reasoning drift under ambiguous or adversarial inputs. We present the Causal Multi-Agent Decision Framework (C-MADF), a structurally constrained architecture for autonomous cyber defense that integrates causal modeling with adversarial dual-policy control. C-MADF first learns a Structural Causal Model (SCM) from historical telemetry and compiles it into an investigation-level Directed Acyclic Graph (DAG) that defines admissible response transitions. This roadmap is formalized as a Markov Decision Process (MDP) whose action space is explicitly restricted to causally consistent transitions. Decision-making within this constrained space is performed by a dual-agent reinforcement learning system in which a threat-optimizing Blue-Team policy is counterbalanced by a conservatively shaped Red-Team policy. Inter-policy disagreement is quantified through a Policy Divergence Score and exposed via a human-in-the-loop interface equipped with an Explainability-Transparency Score that serves as an escalation signal under uncertainty. On the real-world CICIoT2023 dataset, C-MADF reduces the false-positive rate from 11.2%, 9.7%, and 8.4% in three cutting-edge literature baselines to 1.8%, while achieving 0.997 precision, 0.961 recall, and 0.979 F1-score.
Authors:Andrew Wheeler, Kshitiz Aryal, Maanak Gupta
Abstract:
Transformer-based malware detection systems operating on graph modalities such as control flow graphs (CFGs) achieve strong performance by modeling structural relationships in program behavior. However, their robustness to adversarial evasion attacks remains underexplored. This paper examines the vulnerability of a RoBERTa-based malware detector that linearizes CFGs into sequences of function calls, a design choice that enables transformer modeling but may introduce token-level sensitivities and ordering artifacts exploitable by adversaries. By evaluating evasion strategies within this graph-to-sequence framework, we provide insight into the practical robustness of transformer-based malware detectors beyond aggregate detection accuracy. This paper proposes a white-box adversarial evasion attack that leverages explainability mechanisms to identify and perturb most influential graph components. Using token- and word-level attributions derived from integrated gradients, the attack iteratively replaces positively attributed function calls with synthetic external imports, producing adversarial CFG representations without altering overall program structure. Experimental evaluation on small- and large-scale Windows Portable Executable (PE) datasets demonstrates that the proposed method can reliably induce misclassification, even against models trained to high accuracy. Our results highlight that explainability tools, while valuable for interpretability, can also expose critical attack surfaces in transformer-based malware detectors.
Authors:Yuhan Shui, Ruobin Jin, Zhihao Dou, Zhiqiang Gao
Abstract:
Vertical split learning (SL) enables collaborative model training across parties holding complementary features without sharing raw data, but recent work has shown that it is highly vulnerable to poisoning-based backdoor attacks operating on intermediate embeddings. By compromising malicious clients, adversaries can inject stealthy triggers that manipulate the server-side model while remaining difficult to detect, and existing defenses provide limited robustness against adaptive attacks. In this paper, we propose ProtoGuard-SL, a server-side defense that improves the robustness of split learning by exploiting class-conditional representation consistency in the embedding space. Our approach is motivated by the observation that benign embeddings within the same class exhibit stable semantic alignment, whereas poisoned embeddings inevitably disrupt this structure. ProtoGuard-SL adopts a two-stage framework that constructs robust class prototypes and transforms embeddings into a prototype-consistency representation, followed by a class-conditional, distribution-free conformal filtering strategy to identify and remove anomalous embeddings. Extensive experiments are conducted on three datasets, CIFAR-10, SVHN, and Bank Marketing, under three different attack settings demonstrate that our method achieves state-of-the-art performance.
Authors:Aritra Dasgupta, Sudipta Paria, Swarup Bhunia
Abstract:
Hardware intellectual property (IP) in the globalized integrated circuit (IC) supply chain is exposed to a wide range of confidentiality and integrity attacks by untrusted third-party entities. Existing IP-level countermeasures, such as logic locking, hardware obfuscation, camouflaging, and redaction, have aimed at addressing these them. In particular, hardware redaction has emerged as a robust approach for IP protection against confidentiality attacks, including reverse engineering. We note that existing IP protection approaches, including the ones based on hardware redaction, tend to leave behind structural artifacts that can be exploited by adversaries to bypass protections or predict unlocking keys, using the knowledge of known designs, akin to a known-plaintext attack (KPA) in cryptography. In this work, we present CIPHR, a robust fine-grain hardware redaction methodology inspired by the cryptographic property of indistinguishability. The proposed approach utilizes novel heuristic-driven randomization to introduce significant structural transformations into the redacted designs. We employ structural analysis metrics to evaluate the security achieved by CIPHR compared to various state-of-the-art IP protection techniques. Multiple open-source benchmark designs are used to demonstrate that fine-grain redaction in CIPHR is robust, scalable, and indistinguishable against structural attacks.
Authors:Sen Fang, Weiyuan Ding, Zhezhen Cao, Zhou Yang, Bowen Xu
Abstract:
Large Language Models (LLMs) are increasingly adopted for vulnerability detection, yet their reasoning remains fundamentally unsound. We identify a root cause shared by both major mitigation paradigms (agent-based debate and retrieval augmentation): reasoning in an ungrounded deliberative space that lacks a bounded, hypothesis-specific evidence base. Without such grounding, agents fabricate cross-function dependencies, and retrieval heuristics supply generic knowledge decoupled from the repository's data-flow topology. Consequently, the resulting conclusions are driven by rhetorical persuasiveness rather than verifiable facts. To ground this deliberation, we present AEGIS, a novel multi-agent framework that shifts detection from ungrounded speculation to forensic verification over a closed factual substrate. Guided by a "From Clue to Verdict" philosophy, AEGIS first identifies suspicious code anomalies (clues), then dynamically reconstructs per-variable dependency chains for each clue via on-demand slicing over a repository-level Code Property Graph. Within this closed evidence boundary, a Verifier Agent constructs competing dialectical arguments for and against exploitability, while an independent Audit Agent scrutinizes every claim against the trace, exercising veto power to prevent hallucinated verdicts. Evaluation on the rigorous PrimeVul dataset demonstrates that AEGIS establishes a new state-of-the-art, achieving 122 Pair-wise Correct Predictions. To our knowledge, this is the first approach to surpass 100 on this benchmark. It reduces the false positive rate by up to 54.40% compared to leading baselines, at an average cost of $0.09 per sample without any task-specific training.
Authors:Fengwei Tian, Payel Bhattacharjee, Heidi Hanson, Geoffrey D. Rubin, Joseph Y. Lo, Ravi Tandon
Abstract:
We present STAMP (Selective Task-Aware Mechanism for Text Privacy), a new framework for task-aware text privatization that achieves an improved privacy-utility trade-off. STAMP selectively allocates privacy budgets across tokens by jointly considering (i) each token's importance to the downstream task (as measured via a task- or query-specific representation), and (ii) its privacy sensitivity (e.g., names, dates, identifiers). This token-level partitioning enables fine-grained, group-wise control over the level of noise applied to different parts of the input, balancing privacy protection with task relevance. To privatize individual token embeddings, we introduce the polar mechanism, which perturbs only the direction of embeddings on the unit sphere while preserving their magnitude. Decoding is performed via cosine nearest-neighbor search, aligning the perturbation geometry with the decoding geometry. Unlike isotropic noise mechanisms, the polar mechanism maintains semantic neighborhoods in the embedding space and better preserves downstream utility. Experimental evaluations on SQuAD, Yelp, and AG News datasets demonstrate that STAMP, when combined with the normalized polar mechanism, consistently achieves superior privacy-utility trade-offs across varying per-token privacy budgets.
Authors:Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro, Peter Kairouz
Abstract:
As AI assistants become widely used, privacy-aware platforms like Anthropic's Clio have been introduced to generate insights from real-world AI use. Clio's privacy protections rely on layering multiple heuristic techniques together, including PII redaction, clustering, filtering, and LLM-based privacy auditing. In this paper, we put these claims to the test by presenting CLIOPATRA, the first privacy attack against "privacy-preserving" LLM insight systems. The attack involves a realistic adversary that carefully designs and inserts malicious chats into the system to break multiple layers of privacy protections and induce the leakage of sensitive information from a target user's chat. We evaluated CLIOPATRA on synthetically generated medical target chats, demonstrating that an adversary who knows only the basic demographics of a target user and a single symptom can successfully extract the user's medical history in 39% of cases by just inspecting Clio's output. Furthermore, CLIOPATRA can reach close to 100% when Clio is configured with other state-of-the-art models and the adversary's knowledge of the target user is increased. We also show that existing ad hoc mitigations, such as LLM-based privacy auditing, are unreliable and fail to detect major leaks. Our findings indicate that even when layered, current heuristic protections are insufficient to adequately protect user data in LLM-based analysis systems.
Authors:Alexander Nemecek, Wenbiao Li, Xiaoqian Jiang, Jaideep Vaidya, Erman Ayday
Abstract:
Genomic language models (GLMs) have emerged as powerful tools for learning representations of DNA sequences, enabling advances in variant prediction, regulatory element identification, and cross-task transfer learning. However, as these models are increasingly trained or fine-tuned on sensitive genomic cohorts, they risk memorizing specific sequences from their training data, raising serious concerns around privacy, data leakage, and regulatory compliance. Despite growing awareness of memorization risks in general-purpose language models, little systematic evaluation exists for these risks in the genomic domain, where data exhibit unique properties such as a fixed nucleotide alphabet, strong biological structure, and individual identifiability. We present a comprehensive, multi-vector privacy evaluation framework designed to quantify memorization risks in GLMs. Our approach integrates three complementary risk assessment methodologies: perplexity-based detection, canary sequence extraction, and membership inference. These are combined into a unified evaluation pipeline that produces a worst-case memorization risk score. To enable controlled evaluation, we plant canary sequences at varying repetition rates into both synthetic and real genomic datasets, allowing precise quantification of how repetition and training dynamics influence memorization. We evaluate our framework across multiple GLM architectures, examining the relationship between sequence repetition, model capacity, and memorization risk. Our results establish that GLMs exhibit measurable memorization and that the degree of memorization varies across architectures and training regimes. These findings reveal that no single attack vector captures the full scope of memorization risk, underscoring the need for multi-vector privacy auditing as a standard practice for genomic AI systems.
Authors:Alexander Nemecek, Hengzhi He, Guang Cheng, Erman Ayday
Abstract:
Cryptographic provenance standards such as C2PA and invisible watermarking are positioned as complementary defenses for content authentication, yet the two verification layers are technically independent: neither conditions on the output of the other. This work formalizes and empirically demonstrates the $\textit{Integrity Clash}$, a condition in which a digital asset carries a cryptographically valid C2PA manifest asserting human authorship while its pixels simultaneously carry a watermark identifying it as AI-generated, with both signals passing their respective verification checks in isolation. We construct metadata washing workflows that produce these authenticated fakes through standard editing pipelines, requiring no cryptographic compromise, only the semantic omission of a single assertion field permitted by the current C2PA specification. To close this gap, we propose a cross-layer audit protocol that jointly evaluates provenance metadata and watermark detection status, achieving 100% classification accuracy across 3,500 test images spanning four conflict-matrix states and three realistic perturbation conditions. Our results demonstrate that the gap between these verification layers is unnecessary and technically straightforward to close.
Authors:Kunal Mukherjee, Cuneyt Gurcan Akcora, Murat Kantarcioglu
Abstract:
Agent-native social platforms such as Moltbook are rapidly emerging, yet they inherit and amplify classical influence and abuse attacks, where coordinated agents strategically comment and upvote to manipulate visibility and propagate narratives across communities. However, rigorous measurement and learning-based monitoring remain constrained by the absence of longitudinal, graph-native datasets for agentic social networks that jointly capture heterogeneous interactions, temporal drift, and visibility signals needed to connect coordination behavior to downstream exposure. We introduce MoltGraph as a realistic longitudinal agentic social-network graph dataset for studying how agents behave, coordinate, and evolve in the wild, enabling reproducible measurement on emerging multi-agent social ecosystems. Using MoltGraph, we provide the first graph-centric characterization of Moltbook as a dynamic network: (i) heavy-tailed connectivity with power-law exponents in the range alpha in [1.86, 2.72], (ii) accelerating hub formation and attention centralization where the top 1% agents account for 29.00% of engagements, (iii) bursty, short-lived coordination episodes, 98.33% last under 24 hours, and (iv) measurable exposure effects across submolts. In matched analyses, posts receiving coordinated engagement exhibit 506.35% higher early interaction rates (within H=5 days) and 242.63% higher downstream exposure in feeds than non-coordinated controls.
Authors:Boyang Zhang, Yang Zhang
Abstract:
The rapid advancement of large language models (LLMs) has enabled powerful authorship inference capabilities, raising growing concerns about unintended deanonymization risks in textual data such as news articles. In this work, we introduce an LLM agent designed to evaluate and mitigate such risks through a structured, interpretable pipeline. Central to our framework is the proposed $\textit{SALA}$ (Stylometry-Assisted LLM Analysis) method, which integrates quantitative stylometric features with LLM reasoning for robust and transparent authorship attribution. Experiments on large-scale news datasets demonstrate that $\textit{SALA}$, particularly when augmented with a database module, achieves high inference accuracy in various scenarios. Finally, we propose a guided recomposition strategy that leverages the agent's reasoning trace to generate rewriting prompts, effectively reducing authorship identifiability while preserving textual meaning. Our findings highlight both the deanonymization potential of LLM agents and the importance of interpretable, proactive defenses for safeguarding author privacy.
Authors:Sarthak Munshi, Manish Bhatt, Vineeth Sai Narajala, Idan Habler, Ammar Al-Kahfah, Ken Huang, Blake Gatto
Abstract:
While prior work has focused on projecting adversarial examples back onto the manifold of natural data to restore safety, we argue that a comprehensive understanding of AI safety requires characterizing the unsafe regions themselves. This paper introduces a framework for systematically mapping the Manifold of Failure in Large Language Models (LLMs). We reframe the search for vulnerabilities as a quality diversity problem, using MAP-Elites to illuminate the continuous topology of these failure regions, which we term behavioral attraction basins. Our quality metric, Alignment Deviation, guides the search towards areas where the model's behavior diverges most from its intended alignment. Across three LLMs: Llama-3-8B, GPT-OSS-20B, and GPT-5-Mini, we show that MAP-Elites achieves up to 63% behavioral coverage, discovers up to 370 distinct vulnerability niches, and reveals dramatically different model-specific topological signatures: Llama-3-8B exhibits a near-universal vulnerability plateau (mean Alignment Deviation 0.93), GPT-OSS-20B shows a fragmented landscape with spatially concentrated basins (mean 0.73), and GPT-5-Mini demonstrates strong robustness with a ceiling at 0.50. Our approach produces interpretable, global maps of each model's safety landscape that no existing attack method (GCG, PAIR, or TAP) can provide, shifting the paradigm from finding discrete failures to understanding their underlying structure.
Authors:Zachary Coalson, Bo Fang, Sanghyun Hong
Abstract:
Multi-turn interaction length is a dominant factor in the operational costs of conversational LLMs. In this work, we present a new failure mode in conversational LLMs: turn amplification, in which a model consistently prolongs multi-turn interactions without completing the underlying task. We show that an adversary can systematically exploit clarification-seeking behavior$-$commonly encouraged in multi-turn conversation settings$-$to scalably prolong interactions. Moving beyond prompt-level behaviors, we take a mechanistic perspective and identify a query-independent, universal activation subspace associated with clarification-seeking responses. Unlike prior cost-amplification attacks that rely on per-turn prompt optimization, our attack arises from conversational dynamics and persists across prompts and tasks. We show that this mechanism provides a scalable pathway to induce turn amplification: both supply-chain attacks via fine-tuning and runtime attacks through low-level parameter corruptions consistently shift models toward abstract, clarification-seeking behavior across prompts. Across multiple instruction-tuned LLMs and benchmarks, our attack substantially increases turn count while remaining compliant. We also show that existing defenses offer limited protection against this emerging class of failures.
Authors:Leo Marchyok, Zachary Coalson, Sungho Keum, Sooel Son, Sanghyun Hong
Abstract:
Modern language models exhibit rich internal structure, yet little is known about how privacy-sensitive behaviors, such as personally identifiable information (PII) leakage, are represented and modulated within their hidden states. We present UniLeak, a mechanistic-interpretability framework that identifies universal activation directions: latent directions in a model's residual stream whose linear addition at inference time consistently increases the likelihood of generating PII across prompts. These model-specific directions generalize across contexts and amplify PII generation probability, with minimal impact on generation quality. UniLeak recovers such directions without access to training data or groundtruth PII, relying only on self-generated text. Across multiple models and datasets, steering along these universal directions substantially increases PII leakage compared to existing prompt-based extraction methods. Our results offer a new perspective on PII leakage: the superposition of a latent signal in the model's representations, enabling both risk amplification and mitigation.
Authors:Zachary Coalson, Beth Sohler, Aiden Gabriel, Sanghyun Hong
Abstract:
We identify a structural weakness in current large language model (LLM) alignment: modern refusal mechanisms are fail-open. While existing approaches encode refusal behaviors across multiple latent features, suppressing a single dominant feature$-$via prompt-based jailbreaks$-$can cause alignment to collapse, leading to unsafe generation. Motivated by this, we propose fail-closed alignment as a design principle for robust LLM safety: refusal mechanisms should remain effective even under partial failures via redundant, independent causal pathways. We present a concrete instantiation of this principle: a progressive alignment framework that iteratively identifies and ablates previously learned refusal directions, forcing the model to reconstruct safety along new, independent subspaces. Across four jailbreak attacks, we achieve the strongest overall robustness while mitigating over-refusal and preserving generation quality, with small computational overhead. Our mechanistic analyses confirm that models trained with our method encode multiple, causally independent refusal directions that prompt-based jailbreaks cannot suppress simultaneously, providing empirical support for fail-closed alignment as a principled foundation for robust LLM safety.
Authors:Tianyu Chen, Dongrui Liu, Xia Hu, Jingyi Yu, Wenjie Wang
Abstract:
Clawdbot is a self-hosted, tool-using personal AI agent with a broad action space spanning local execution and web-mediated workflows, which raises heightened safety and security concerns under ambiguity and adversarial steering. We present a trajectory-centric evaluation of Clawdbot across six risk dimensions. Our test suite samples and lightly adapts scenarios from prior agent-safety benchmarks (including ATBench and LPS-Bench) and supplements them with hand-designed cases tailored to Clawdbot's tool surface. We log complete interaction trajectories (messages, actions, tool-call arguments/outputs) and assess safety using both an automated trajectory judge (AgentDoG-Qwen3-4B) and human review. Across 34 canonical cases, we find a non-uniform safety profile: performance is generally consistent on reliability-focused tasks, while most failures arise under underspecified intent, open-ended goals, or benign-seeming jailbreak prompts, where minor misinterpretations can escalate into higher-impact tool actions. We supplemented the overall results with representative case studies and summarized the commonalities of these cases, analyzing the security vulnerabilities and typical failure modes that Clawdbot is prone to trigger in practice.
Authors:Youpeng Li, Fuxun Yu, Xinda Wang
Abstract:
The integration of LLMs into vulnerability detection (VD) has shifted the field toward interpretable and context-aware analysis. While post-training methods have shown promise in general coding tasks, their systematic application to VD remains underexplored. In this paper, we present the first comprehensive investigation into the post-training pipeline for LLM-based VD, spanning from cold-start SFT to off-policy preference optimization and on-policy RL, uncovering how data curation, stage interactions, reward mechanisms, and evaluation protocols collectively dictate the efficacy of model training and assessment. Our study identifies practical guidelines and insights: (1) SFT based on rejection sampling greatly outperforms rationalization-based supervision, which can introduce hallucinations due to ground-truth leakage. (2) While increased SFT epochs constantly benefit preference optimization, excessive SFT inhibits self-exploration during RL, ultimately limiting performance gains. (3) Coarse-grained reward signals often mislead RL, whereas fine-grained root-cause judgments ensure reliable credit assignment. Specification-based rewards offer further benefits but incur significant effort in specification generation. (4) Although filtering extremely hard-to-detect vulnerability samples improves RL training efficiency, the cost of performance loss should be considered in practical applications. (5) Models trained under GRPO significantly outperform those using SFT and preference optimization (i.e., DPO and ORPO), as well as a series of zero-shot SOTA LLMs, underscoring the significant potential of on-policy RL for LLM-based VD. (6) In contrast to binary matching that tends to overestimate performance, LLM-as-a-Judge based on root-cause analysis provides a more robust evaluation protocol, although its accuracy varies across judge models with different levels of security expertise.
Authors:Xiao Ren, Xinyi Yu, Linkang Du, Min Chen, Yuanchao Shu, Zhou Su, Yunjun Gao, Zhikun Zhang
Abstract:
The surging demand for large-scale datasets in deep learning has heightened the need for effective copyright protection, given the risks of unauthorized use to data owners. Although the dataset watermark technique holds promise for auditing and verifying usage, existing methods are hindered by inconsistent evaluations, which impede fair comparisons and assessments of real-world viability. To address this gap, we propose a two-layer taxonomy that categorizes methods by implementation (model-based vs. model-free injection; model-behavior vs. model-message verification), offering a structured framework for cross-task analysis. Then, we develop DWBench, a unified benchmark and open-source toolkit for systematically evaluating image dataset watermark techniques in classification and generation tasks. Using DWBench, we assess 25 representative methods under standardized conditions, perturbation-based robustness tests, multi-watermark coexistence, and multi-user interference. In addition to reporting the results of four commonly used metrics, we present the results of two new metrics: sample significance for fine-grained watermark distinguishability and verification success rate for dataset-level auditing, which enable accurate and reproducible benchmarking. Key findings reveal inherent trade-offs: no single method dominates all scenarios; classification and generation tasks require specialized approaches; and existing techniques exhibit instability at low watermark rates and in realistic multi-user settings, with elevated false positives or performance declines. We hope that DWBench can facilitate advances in watermark reliability and practicality, thus strengthening copyright safeguards in the face of widespread AI-driven data exploitation.
Authors:Kyle Domico, Jean-Charles Noirot Ferrand, Patrick McDaniel
Abstract:
Recent work on network attacks have demonstrated that ML-based network intrusion detection systems (NIDS) can be evaded with adversarial perturbations. However, these attacks rely on complex optimizations that have large computational overheads, making them impractical in many real-world settings. In this paper, we introduce a lightweight adversarial agent that implements strategies (policies) trained via reinforcement learning (RL) that learn to evade ML-based NIDS without requiring online optimization. This attack proceeds by (1) offline training, where the agent learns to evade a surrogate ML model by perturbing malicious flows using network traffic data assumed to be collected via reconnaissance, then (2) deployment, where the trained agent is used in a compromised device controlled by an attacker to evade ML-based NIDS using learned attack strategies. We evaluate our approach across diverse NIDS and several white-, gray-, and black-box threat models. We demonstrate that attacks using these lightweight agents can be highly effective (reaching up to 48.9% attack success rate), extremely fast (requiring as little as 5.72ms to craft an attack), and require negligible resources (e.g., 0.52MB of memory). Through this work, we demonstrate that future botnets driven by lightweight learning-based agents can be highly effective and widely deployable in diverse environments of compromised devices.
Authors:Pradeep Niroula, Minzhao Liu, Sivaprasad Omanakuttan, David Amaro, Shouvanik Chakrabarti, Soumik Ghosh, Zichang He, Yuwei Jin, Fatih Kaleoglu, Steven Kordonowy, Rohan Kumar, Michael A. Perlin, Akshay Seshadri, Matthew Steinberg, Joseph Sullivan, Jacob Watkins, Henry Yuen, Ruslan Shaydulin
Abstract:
Quantum mechanics provides cryptographic primitives whose security is grounded in hardness assumptions independent of those underlying classical cryptography. However, existing proposals require low-noise quantum communication and long-lived quantum memory, capabilities which remain challenging to realize in practice. In this work, we introduce a quantum digital signature scheme that operates with only classical communication, using the classical shadows of states produced by random circuits as public keys. We provide theoretical and numerical evidence supporting the conjectured hardness of learning the private key (the circuit) from the public key (the shadow). A key technical ingredient enabling our scheme is an improved state-certification primitive that achieves higher noise tolerance and lower sample complexity than prior methods. We realize this certification by designing a high-rate error-detecting code tailored to our random-circuit ensemble and experimentally generating shadows for 32-qubit states using circuits with $\geq 80$ logical ($\geq 582$ physical) two-qubit gates, attaining 0.90 $\pm$ 0.01 fidelity. With increased number of measurement samples, our hardware-demonstrated primitives realize a proof-of-principle quantum digital signature, demonstrating the near-term feasibility of our scheme.
Authors:Tianya Zhao, Junqing Zhang, Haowen Xu, Xiaoyan Sun, Jun Dai, Xuyu Wang
Abstract:
Deep neural networks (DNNs) have achieved remarkable success in radio frequency (RF) fingerprinting for wireless device authentication. However, their practical deployment faces two major limitations: domain shift, where models trained in one environment struggle to generalize to others, and the black-box nature of DNNs, which limits interpretability. To address these issues, we propose a novel framework that integrates a group of variable-length two-dimensional (2D) shapelets with a pre-trained large language model (LLM) to achieve efficient, interpretable, and generalizable RF fingerprinting. The 2D shapelets explicitly capture diverse local temporal patterns across the in-phase and quadrature (I/Q) components, providing compact and interpretable representations. Complementarily, the pre-trained LLM captures more long-range dependencies and global contextual information, enabling strong generalization with minimal training overhead. Moreover, our framework also supports prototype generation for few-shot inference, enhancing cross-domain performance without additional retraining. To evaluate the effectiveness of our proposed method, we conduct extensive experiments on six datasets across various protocols and domains. The results show that our method achieves superior standard and few-shot performance across both source and unseen domains.
Authors:Yuhao Xue, Jiuan Zhou, Yu Cheng, Zhaoxia Yin
Abstract:
With the rapid development of AIGC technologies, generative image steganography has attracted increasing attention due to its high imperceptibility and flexibility. However, existing generative steganography methods often maintain acceptable security and robustness only at relatively low embedding rates, severely limiting the practical applicability of steganographic systems. To address this issue, we propose a novel DTAMS framework that achieves high embedding rates while ensuring strong robustness and security. Specifically, a dynamic multi-timestep adaptive embedding mechanism is constructed based on transition-cost modeling in diffusion models, enabling automatic selection of optimal embedding timesteps to improve embedding rates while preserving overall performance. Meanwhile, we propose a global sub-interval mapping strategy that jointly considers mapping errors and the frequency distribution of secret information, converting point-wise perturbations into interval-level statistical mappings to suppress error accumulation and distribution drift during multi-step diffusion processes. Furthermore, a multi-dimensional joint constraint mechanism is introduced to mitigate distortions caused by repeated latent-pixel transformations by jointly regularizing embedding errors at the pixel, latent, and semantic levels. Experiments demonstrate that the proposed method achieves an embedding rate of 12 bpp while maintaining excellent security and robustness. Across all evaluated conditions, DTAMS reduces the average extraction error rate by 59.39%, representing a significant improvement over SOTA methods.
Authors:Claude Carlet, Marko Ðurasevic, Ermes Franch, Domagoj Jakobovic, Luca Mariot, Stjepan Picek
Abstract:
Negabent Boolean functions are defined by having a flat magnitude spectrum under the nega-Hadamard transform. They exist in both even and odd dimensions, and the subclass of functions that are simultaneously bent and negabent (bent-negabent) has attracted interest due to the combined optimal periodic and negaperiodic spectral properties. In this work, we investigate how evolutionary algorithms can be used to evolve (bent-)negabent Boolean functions. Our experimental results indicate that evolutionary algorithms, especially genetic programming, are a suitable approach for evolving negabent Boolean functions, and we successfully evolve such functions in all dimensions we consider.
Authors:Claude Carlet, Marko Ðurasevic, Domagoj Jakobovic, Luca Mariot, Stjepan Picek
Abstract:
Idempotent Boolean functions form a highly structured subclass of Boolean functions that is closely related to rotation symmetry under a normal-basis representation and to invariance under a fixed linear map in a polynomial basis. These functions are attractive as candidates for cryptographic design, yet their additional algebraic constraints make the search for high nonlinearity substantially more difficult than in the unconstrained case. In this work, we investigate evolutionary methods for constructing highly nonlinear idempotent Boolean functions for dimensions $n=5$ up to $n=12$ using a polynomial basis representation with canonical primitive polynomials. Our results show that the problem of evolving idempotent functions is difficult due to the disruptive nature of crossover and mutation operators. Next, we show that idempotence can be enforced by encoding the truth table on orbits, yielding a compact genome of size equal to the number of distinct squaring orbits.
Authors:Kunal Mukherjee, Zulfikar Alom, Tran Gia Bao Ngo, Cuneyt Gurcan Akcora, Murat Kantarcioglu
Abstract:
The rise of bot accounts on social media poses significant risks to public discourse. To address this threat, modern bot detectors increasingly rely on Graph Neural Networks (GNNs). However, the effectiveness of these GNN-based detectors in real-world settings remains poorly understood. In practice, attackers continuously adapt their strategies as well as must operate under domain-specific and temporal constraints, which can fundamentally limit the applicability of existing attack methods. As a result, there is a critical need for robust GNN-based bot detection methods under realistic, constraint-aware attack scenarios. To address this gap, we introduce BOCLOAK to systematically evaluate the robustness of GNN-based social bot detection via both edge editing and node injection adversarial attacks under realistic constraints. BOCLOAK constructs a probability measure over spatio-temporal neighbor features and learns an optimal transport geometry that separates human and bot behaviors. It then decodes transport plans into sparse, plausible edge edits that evade detection while obeying real-world constraints. We evaluate BOCLOAK across three social bot datasets, five state-of-the-art bot detectors, three adversarial defenses, and compare it against four leading graph adversarial attack baselines. BOCLOAK achieves up to 80.13% higher attack success rates while using 99.80% less GPU memory under realistic real-world constraints. Most importantly, BOCLOAK shows that optimal transport provides a lightweight, principled framework for bridging the gap between adversarial attacks and real-world bot detection.
Authors:Yuan Li, Jun Hu, Bryan Hooi, Bingsheng He, Cheng Chen
Abstract:
Graph-based fraud detection on text-attributed graphs (TAGs) requires jointly modeling rich textual semantics and relational dependencies. However, existing LLM-enhanced GNN approaches are constrained by predefined prompting and decoupled training pipelines, limiting reasoning autonomy and weakening semantic-structural alignment. We propose FraudCoT, a unified framework that advances TAG-based fraud detection through autonomous, graph-aware chain-of-thought (CoT) reasoning and scalable LLM-GNN co-training. To address the limitations of predefined prompts, we introduce a fraud-aware selective CoT distillation mechanism that generates diverse reasoning paths and enhances semantic-structural understanding. These distilled CoTs are integrated into node texts, providing GNNs with enriched, multi-hop semantic and structural cues for fraud detection. Furthermore, we develop an efficient asymmetric co-training strategy that enables end-to-end optimization while significantly reducing the computational cost of naive joint training. Extensive experiments on public and industrial benchmarks demonstrate that FraudCoT achieves up to 8.8% AUPRC improvement over state-of-the-art methods and delivers up to 1,066x speedup in training throughput, substantially advancing both detection performance and efficiency.
Authors:Kealan Dunnett, Reza Arablouei, Dimity Miller, Volkan Dedeoglu, Raja Jurdak
Abstract:
Backdoor attacks pose a severe threat to deep learning, yet their impact on object detection remains poorly understood compared to image classification. While attacks have been proposed, we identify critical weaknesses in existing detection-based methods, specifically their reliance on unrealistic assumptions and a lack of physical validation. To bridge this gap, we introduce BadDet+, a penalty-based framework that unifies Region Misclassification Attacks (RMA) and Object Disappearance Attacks (ODA). The core mechanism utilizes a log-barrier penalty to suppress true-class predictions for triggered inputs, resulting in (i) position and scale invariance, and (ii) enhanced physical robustness. On real-world benchmarks, BadDet+ achieves superior synthetic-to-physical transfer compared to existing RMA and ODA baselines while preserving clean performance. Theoretical analysis confirms the proposed penalty acts within a trigger-specific feature subspace, reliably inducing attacks without degrading standard inference. These results highlight significant vulnerabilities in object detection and the necessity for specialized defenses.
Authors:Wachiraphan Charoenwet, Kla Tantithamthavorn, Patanamon Thongtanunam, Hong Yi Lin, Minwoo Jeong, Ming Wu
Abstract:
Secure code review is critical at the pre-commit stage, where vulnerabilities must be caught early under tight latency and limited-context constraints. Existing SAST-based checks are noisy and often miss immature, context-dependent vulnerabilities, while standalone Large Language Models (LLMs) are constrained by context windows and lack explicit tool use. Agentic AI, which combine LLMs with autonomous decision-making, tool invocation, and code navigation, offer a promising alternative, but their effectiveness for pre-commit secure code review is not yet well understood. In this work, we introduce AgenticSCR, an agentic AI for secure code review for detecting immature vulnerabilities during the pre-commit stage, augmented by security-focused semantic memories. Using our own curated benchmark of immature vulnerabilities, tailored to the pre-commit secure code review, we empirically evaluate how accurate is our AgenticSCR for localizing, detecting, and explaining immature vulnerabilities. Our results show that AgenticSCR achieves at least 153% relatively higher percentage of correct code review comments than the static LLM-based baseline, and also substantially surpasses SAST tools. Moreover, AgenticSCR generates more correct comments in four out of five vulnerability types, consistently and significantly outperforming all other baselines. These findings highlight the importance of Agentic Secure Code Review, paving the way towards an emerging research area of immature vulnerability detection.
Authors:Daniel Commey, Matilda Nkoom, Yousef Alsenani, Sena G. Hounsinou, Garth V. Crosby
Abstract:
Virtual Asset Service Providers (VASPs) face a fundamental tension between regulatory compliance and user privacy when detecting cross-institutional money laundering. Current approaches require either sharing sensitive transaction data or operating in isolation, leaving critical cross-chain laundering patterns undetected. We present FedGraph-VASP, a privacy-preserving federated graph learning framework that enables collaborative anti-money laundering (AML) without exposing raw user data. Our key contribution is a Boundary Embedding Exchange protocol that shares only compressed, non-invertible graph neural network representations of boundary accounts. These exchanges are secured using post-quantum cryptography, specifically the NIST-standardized Kyber-512 key encapsulation mechanism combined with AES-256-GCM authenticated encryption. Experiments on the Elliptic Bitcoin dataset with realistic Louvain partitioning show that FedGraph-VASP achieves an F1-score of 0.508, outperforming the state-of-the-art generative baseline FedSage+ (F1 = 0.453) by 12.1 percent on binary fraud detection. We further show robustness under low-connectivity settings where generative imputation degrades performance, while approaching centralized performance (F1 = 0.620) in high-connectivity regimes. We additionally evaluate generalization on an Ethereum fraud detection dataset, where FedGraph-VASP (F1 = 0.635) is less effective under sparse cross-silo connectivity, while FedSage+ excels (F1 = 0.855), outperforming even local training (F1 = 0.785). These results highlight a topology-dependent trade-off: embedding exchange benefits connected transaction graphs, whereas generative imputation can dominate in highly modular sparse graphs. A privacy audit shows embeddings are only partially invertible (R^2 = 0.32), limiting exact feature recovery.
Authors:Khoi Trinh, Scott Seidenberger, Joseph Spracklen, Raveen Wijewickrama, Bimal Viswanath, Murtuza Jadliwala, Anindya Maiti
Abstract:
The emerging field of AI-generated art has witnessed the rise of prompt marketplaces, where creators can purchase, sell, or share prompts to generate unique artworks. These marketplaces often assert ownership over prompts, claiming them as intellectual property. This paper investigates whether concealed prompts sold on prompt marketplaces can be considered bona fide intellectual property, given that humans and AI tools may be able to infer the prompts based on publicly advertised sample images accompanying each prompt on sale. Specifically, our study aims to assess (i) how accurately humans can infer the original prompt solely by examining an AI-generated image, with the goal of generating images similar to the original image, and (ii) the possibility of improving upon individual human and AI prompt inferences by crafting combined human and AI prompts with the help of a large language model. Although previous research has explored AI-driven prompt inference and protection strategies, our work is the first to incorporate a human subject study and examine collaborative human-AI prompt inference in depth. Our findings indicate that while prompts inferred by humans and prompts inferred through a combined human and AI effort can generate images with a moderate level of similarity, they are not as successful as using the original prompt. Moreover, combining human- and AI-inferred prompts using our suggested merging techniques did not improve performance over purely human-inferred prompts.
Authors:Jiankai Jin, Xiangzheng Zhang, Zhao Liu, Deyue Zhang, Quanchen Zou
Abstract:
Machine learning systems can produce personalized outputs that allow an adversary to infer sensitive input attributes at inference time. We introduce Robust Privacy (RP), an inference-time privacy notion inspired by certified robustness: if a model's prediction is provably invariant within a radius-$R$ neighborhood around an input $x$ (e.g., under the $\ell_2$ norm), then $x$ enjoys $R$-Robust Privacy, i.e., observing the prediction cannot distinguish $x$ from any input within distance $R$ of $x$. We further develop Attribute Privacy Enhancement (APE) to translate input-level invariance into an attribute-level privacy effect. In a controlled recommendation task where the decision depends primarily on a sensitive attribute, we show that RP expands the set of sensitive-attribute values compatible with a positive recommendation, expanding the inference interval accordingly. Finally, we empirically demonstrate that RP also mitigates model inversion attacks (MIAs) by masking fine-grained input-output dependence. Even at small noise levels ($σ=0.1$), RP reduces the attack success rate (ASR) from 73% to 4% with partial model performance degradation. RP can also partially mitigate MIAs (e.g., ASR drops to 44%) with no model performance degradation.
Authors:Khoa Nguyen, Khiem Ton, NhatHai Phan, Issa Khalil, Khang Tran, Cristian Borcea, Ruoming Jin, Abdallah Khreishah, My T. Thai
Abstract:
Although boosting software development performance, large language model (LLM)-powered code generation introduces intellectual property and data security risks rooted in the fact that a service provider (cloud) observes a client's prompts and generated code, which can be proprietary in commercial systems. To mitigate this problem, we propose NOIR, the first framework to protect the client's prompts and generated code from the cloud. NOIR uses an encoder and a decoder at the client to encode and send the prompts' embeddings to the cloud to get enriched embeddings from the LLM, which are then decoded to generate the code locally at the client. Since the cloud can use the embeddings to infer the prompt and the generated code, NOIR introduces a new mechanism to achieve indistinguishability, a local differential privacy protection at the token embedding level, in the vocabulary used in the prompts and code, and a data-independent and randomized tokenizer on the client side. These components effectively defend against reconstruction and frequency analysis attacks by an honest-but-curious cloud. Extensive analysis and results using open-source LLMs show that NOIR significantly outperforms existing baselines on benchmarks, including the Evalplus (MBPP and HumanEval, Pass@1 of 76.7 and 77.4), and BigCodeBench (Pass@1 of 38.7, only a 1.77% drop from the original LLM) under strong privacy against attacks.
Authors:Bingxin Xu, Yuzhang Shang, Binghui Wang, Emilio Ferrara
Abstract:
Vision-Language-Action (VLA) models are increasingly deployed in safety-critical robotic applications, yet their security vulnerabilities remain underexplored. We identify a fundamental security flaw in modern VLA systems: the combination of action chunking and delta pose representations creates an intra-chunk visual open-loop. This mechanism forces the robot to execute K-step action sequences, allowing per-step perturbations to accumulate through integration. We propose SILENTDRIFT, a stealthy black-box backdoor attack exploiting this vulnerability. Our method employs the Smootherstep function to construct perturbations with guaranteed C2 continuity, ensuring zero velocity and acceleration at trajectory boundaries to satisfy strict kinematic consistency constraints. Furthermore, our keyframe attack strategy selectively poisons only the critical approach phase, maximizing impact while minimizing trigger exposure. The resulting poisoned trajectories are visually indistinguishable from successful demonstrations. Evaluated on the LIBERO, SILENTDRIFT achieves a 93.2% Attack Success Rate with a poisoning rate under 2%, while maintaining a 95.3% Clean Task Success Rate.
Authors:Mingqi Lv, Shanshan Zhang, Haiwen Liu, Tieming Chen, Tiantian Zhu
Abstract:
Advanced persistent threats (APTs) are stealthy and multi-stage, making single-point defenses (e.g., malware- or traffic-based detectors) ill-suited to capture long-range and cross-entity attack semantics. Provenance-graph analysis has become a prominent approach for APT detection. However, its practical deployment is hampered by (i) the scarcity of APT samples, (ii) the cost and difficulty of fine-grained APT sample labeling, and (iii) the diversity of attack tactics and techniques. Aiming at these problems, this paper proposes APT-MCL, an intelligent APT detection system based on Multi-view Collaborative provenance graph Learning. It adopts an unsupervised learning strategy to discover APT attacks at the node level via anomaly detection. After that, it creates multiple anomaly detection sub-models based on multi-view features and integrates them within a collaborative learning framework to adapt to diverse attack scenarios. Extensive experiments on three real-world APT datasets validate the approach: (i) multi-view features improve cross-scenario generalization, and (ii) co-training substantially boosts node-level detection under label scarcity, enabling practical deployment on diverse attack scenarios.
Authors:Sahaya Jestus Lazer, Kshitiz Aryal, Maanak Gupta, Elisa Bertino
Abstract:
Agentic AI marks an important transition from single-step generative models to systems capable of reasoning, planning, acting, and adapting over long-lasting tasks. By integrating memory, tool use, and iterative decision cycles, these systems enable continuous, autonomous workflows in real-world environments. This survey examines the implications of agentic AI for cybersecurity. On the defensive side, agentic capabilities enable continuous monitoring, autonomous incident response, adaptive threat hunting, and fraud detection at scale. Conversely, the same properties amplify adversarial power by accelerating reconnaissance, exploitation, coordination, and social-engineering attacks. These dual-use dynamics expose fundamental gaps in existing governance, assurance, and accountability mechanisms, which were largely designed for non-autonomous and short-lived AI systems. To address these challenges, we survey emerging threat models, security frameworks, and evaluation pipelines tailored to agentic systems, and analyze systemic risks including agent collusion, cascading failures, oversight evasion, and memory poisoning. Finally, we present three representative use-case implementations that illustrate how agentic AI behaves in practical cybersecurity workflows, and how design choices shape reliability, safety, and operational effectiveness.
Authors:Sai Teja Erukude, Viswa Chaitanya Marella, Suhasnadh Reddy Veluru
Abstract:
Artificial Intelligence's dual-use nature is revolutionizing the cybersecurity landscape, introducing new threats across four main categories: deepfakes and synthetic media, adversarial AI attacks, automated malware, and AI-powered social engineering. This paper aims to analyze emerging risks, attack mechanisms, and defense shortcomings related to AI in cybersecurity. We introduce a comparative taxonomy connecting AI capabilities with threat modalities and defenses, review over 70 academic and industry references, and identify impactful opportunities for research, such as hybrid detection pipelines and benchmarking frameworks. The paper is structured thematically by threat type, with each section addressing technical context, real-world incidents, legal frameworks, and countermeasures. Our findings emphasize the urgency for explainable, interdisciplinary, and regulatory-compliant AI defense systems to maintain trust and security in digital ecosystems.
Authors:Michiel Van Kenhove, Erik Pohle, Leonard Schild, Martin Zbudila, Merlijn Sebrechts, Filip De Turck, Bruno Volckaert, Aysajan Abidin
Abstract:
The rapid increase of Internet of Things (IoT) systems across several domains has led to the generation of vast volumes of sensitive data, presenting significant challenges in terms of storage and data analytics. Cloud-assisted IoT solutions offer storage, scalability, and computational resources, but introduce new security and privacy risks that conventional trust-based approaches fail to adequately mitigate. To address these challenges, this paper presents MOZAIK, a novel end-to-end privacy-preserving confidential data storage and distributed processing architecture tailored for IoT-to-cloud scenarios. MOZAIK ensures that data remains encrypted throughout its lifecycle, including during transmission, storage, and processing. This is achieved by employing a cryptographic privacy-enhancing technology known as computing on encrypted data (COED). Two distinct COED techniques are explored, specifically secure multi-party computation (MPC) and fully homomorphic encryption (FHE). The paper includes a comprehensive analysis of the MOZAIK architecture, including a proof-of-concept implementation and performance evaluations. The evaluation results demonstrate the feasibility of the MOZAIK system and indicate the cost of an end-to-end privacy-preserving system compared to regular plaintext alternatives. All components of the MOZAIK platform are released as open-source software alongside this publication, with the aim of advancing secure and privacy-preserving data processing practices.
Authors:Siyuan Li, Zehao Liu, Xi Lin, Qinghua Mao, Yuliang Chen, Haoyu Li, Jun Wu, Jianhua Li, Xiu Su
Abstract:
As Large Language Models (LLMs) are increasingly deployed in complex applications, their vulnerability to adversarial attacks raises urgent safety concerns, especially those evolving over multi-round interactions. Existing defenses are largely reactive and struggle to adapt as adversaries refine strategies across rounds. In this work, we propose CoopGuard , a stateful multi-round LLM defense framework based on cooperative agents that maintains and updates an internal defense state to counter evolving attacks. It employs three specialized agents (Deferring Agent, Tempting Agent, and Forensic Agent) for complementary round-level strategies, coordinated by System Agent, which conditions decisions on the evolving defense state (interaction history) and orchestrates agents over time. To evaluate evolving threats, we introduce the EMRA benchmark with 5,200 adversarial samples across 8 attack types, simulating progressively LLM multi-round attacks. Experiments show that CoopGuard reduces attack success rate by 78.9% over state-of-the-art defenses, while improving deceptive rate by 186% and reducing attack efficiency by 167.9%, offering a more comprehensive assessment of multi-round defense. These results demonstrate that CoopGuard provides robust protection for LLMs in multi-round adversarial scenarios.
Authors:Alberto Alfarano, Eshika Saxena, Emily Wenger, François Charton, Kristin Lauter
Abstract:
The Learning with Errors (LWE) problem is a hard math problem in lattice-based cryptography. In the simplest case of binary secrets, it is the subset sum problem, with error. Effective ML attacks on LWE were demonstrated in the case of binary, ternary, and small secrets, succeeding on fairly sparse secrets. The ML attacks recover secrets with up to 3 active bits in the "cruel region" (Nolte et al., 2024) on samples pre-processed with BKZ. We show that using larger training sets and repeated examples enables recovery of denser secrets. Empirically, we observe a power-law relationship between model-based attempts to recover the secrets, dataset size, and repeated examples. We introduce a stepwise regression technique to recover the "cool bits" of the secret.
Authors:Yue Li, Linying Xue, Kaiqing Lin, Hanyu Quan, Dongdong Lin, Hui Tian, Hongxia Wang, Bin Wang
Abstract:
Recent advances in GAN and diffusion models have significantly improved the realism and controllability of facial deepfake manipulation, raising serious concerns regarding privacy, security, and identity misuse. Proactive defenses attempt to counter this threat by injecting adversarial perturbations into images before manipulation takes place. However, existing approaches remain limited in effectiveness due to suboptimal perturbation injection strategies and are typically designed under white-box assumptions, targeting only simple GAN-based attribute editing. These constraints hinder their applicability in practical real-world scenarios. In this paper, we propose AEGIS, the first diffusion-guided paradigm in which the AdvErsarial facial images are Generated for Identity Shielding. We observe that the limited defense capability of existing approaches stems from the peak-clipping constraint, where perturbations are forcibly truncated due to a fixed $L_\infty$-bounded. To overcome this limitation, instead of directly modifying pixels, AEGIS injects adversarial perturbations into the latent space along the DDIM denoising trajectory, thereby decoupling the perturbation magnitude from pixel-level constraints and allowing perturbations to adaptively amplify where most effective. The extensible design of AEGIS allows the defense to be expanded from purely white-box use to also support black-box scenarios through a gradient-estimation strategy. Extensive experiments across GAN and diffusion-based deepfake generators show that AEGIS consistently delivers strong defense effectiveness while maintaining high perceptual quality. In white-box settings, it achieves robust manipulation disruption, whereas in black-box settings, it demonstrates strong cross-model transferability.
Authors:Rustem Islamov, Grigory Malinovsky, Alexander Gaponov, Aurelien Lucchi, Peter Richtárik, Eduard Gorbunov
Abstract:
Federated Learning (FL) enables heterogeneous clients to collaboratively train a shared model without centralizing their raw data, offering an inherent level of privacy. However, gradients and model updates can still leak sensitive information, while malicious servers may mount adversarial attacks such as Byzantine manipulation. These vulnerabilities highlight the need to address differential privacy (DP) and Byzantine robustness within a unified framework. Existing approaches, however, often rely on unrealistic assumptions such as bounded gradients, require auxiliary server-side datasets, or fail to provide convergence guarantees. We address these limitations by proposing Byz-Clip21-SGD2M, a new algorithm that integrates robust aggregation with double momentum and carefully designed clipping. We prove high-probability convergence guarantees under standard $L$-smoothness and $σ$-sub-Gaussian gradient noise assumptions, thereby relaxing conditions that dominate prior work. Our analysis recovers state-of-the-art convergence rates in the absence of adversaries and improves utility guarantees under Byzantine and DP settings. Empirical evaluations on CNN and MLP models trained on MNIST further validate the effectiveness of our approach.
Authors:Jihoon Suh, Yeongjun Jang, Junsoo Kim, Takashi Tanaka
Abstract:
We develop a variational encrypted model predictive control (VEMPC) protocol whose online execution relies only on encrypted polynomial operations. The proposed approach reformulates the MPC problem into a sampling-based estimator, in which the computation of the quadratic cost is naturally handled by tilting the sampling distribution, thus reducing online encrypted computation. The resulting protocol requires no additional communication rounds or intermediate decryption, and scales efficiently through two complementary levels of parallelism. We analyze the effect of encryption-induced errors on optimality, and simulation results demonstrate the practical applicability of the proposed method.
Authors:Víctor Mayoral-Vilches, Unai Ayucar-Carbajo, Olivier Laflamme, Ruikai Peng, María Sanz-Gómez, Francesco Balassone, Lucas Apa, Endika Gil-Uriarte
Abstract:
Is robot cybersecurity broken by AI? Consumer robots -- from autonomous lawnmowers to powered exoskeletons and window cleaners -- are rapidly entering homes and workplaces, yet their security remains rooted in assumptions of specialized attacker expertise. This paper presents evidence that Generative AI has fundamentally disrupted robot cybersecurity: what historically required deep knowledge of ROS, ROS 2, and robotic system internals can now be automated by anyone with access to state-of-the-art GenAI tools spearheaded by the open source CAI (Cybersecurity AI). We provide empirical evidence through three case studies: (1) compromising a Hookii autonomous lawnmower robot, uncovering fleet-wide vulnerabilities and data protection violations affecting 267+ connected devices, (2) exploiting a Hypershell powered exoskeleton, demonstrating safety-critical motor control weaknesses and credential exposure including access to over 3,300 internal support emails, and (3) breaching a HOBOT S7 Pro window cleaning robot, achieving unauthenticated BLE command injection and OTA firmware exploitation. Across these platforms, CAI discovered in an automated manner 38 vulnerabilities that would have previously required months of specialized security research. Our findings reveal a stark asymmetry: while offensive capabilities have been democratized through AI, defensive measures often remain lagging behind. We argue that traditional defense-in-depth architectures like the Robot Immune System (RIS) must evolve toward GenAI-native defensive agents capable of matching the speed and adaptability of AI-powered attacks.
Authors:Jinman Wu, Yi Xie, Shen Lin, Shiqian Zhao, Xiaofeng Chen
Abstract:
Safety alignment is often conceptualized as a monolithic process wherein harmfulness detection automatically triggers refusal. However, the persistence of jailbreak attacks suggests a fundamental mechanistic decoupling. We propose the \textbf{\underline{D}}isentangled \textbf{\underline{S}}afety \textbf{\underline{H}}ypothesis \textbf{(DSH)}, positing that safety computation operates on two distinct subspaces: a \textit{Recognition Axis} ($\mathbf{v}_H$, ``Knowing'') and an \textit{Execution Axis} ($\mathbf{v}_R$, ``Acting''). Our geometric analysis reveals a universal ``Reflex-to-Dissociation'' evolution, where these signals transition from antagonistic entanglement in early layers to structural independence in deep layers. To validate this, we introduce \textit{Double-Difference Extraction} and \textit{Adaptive Causal Steering}. Using our curated \textsc{AmbiguityBench}, we demonstrate a causal double dissociation, effectively creating a state of ``Knowing without Acting.'' Crucially, we leverage this disentanglement to propose the \textbf{Refusal Erasure Attack (REA)}, which achieves State-of-the-Art attack success rates by surgically lobotomizing the refusal mechanism. Furthermore, we uncover a critical architectural divergence, contrasting the \textit{Explicit Semantic Control} of Llama3.1 with the \textit{Latent Distributed Control} of Qwen2.5. The code and dataset are available at https://anonymous.4open.science/r/DSH.
Authors:Jinman Wu, Yi Xie, Shiqian Zhao, Xiaofeng Chen
Abstract:
Currently, open-sourced large language models (OSLLMs) have demonstrated remarkable generative performance. However, as their structure and weights are made public, they are exposed to jailbreak attacks even after alignment. Existing attacks operate primarily at shallow levels, such as the prompt or embedding level, and often fail to expose vulnerabilities rooted in deeper model components, which creates a false sense of security for successful defense. In this paper, we propose \textbf{\underline{S}}afety \textbf{\underline{A}}ttention \textbf{\underline{H}}ead \textbf{\underline{A}}ttack (\textbf{SAHA}), an attention-head-level jailbreak framework that explores the vulnerability in deeper but insufficiently aligned attention heads. SAHA contains two novel designs. Firstly, we reveal that deeper attention layers introduce more vulnerability against jailbreak attacks. Based on this finding, \textbf{SAHA} introduces \textit{Ablation-Impact Ranking} head selection strategy to effectively locate the most vital layer for unsafe output. Secondly, we introduce a boundary-aware perturbation method, \textit{i.e. Layer-Wise Perturbation}, to probe the generation of unsafe content with minimal perturbation to the attention. This constrained perturbation guarantees higher semantic relevance with the target intent while ensuring evasion. Extensive experiments show the superiority of our method: SAHA improves ASR by 14\% over SOTA baselines, revealing the vulnerability of the attack surface on the attention head. Our code is available at https://anonymous.4open.science/r/SAHA.
Authors:Justin Wang, Andreas Bigger, Xiaohai Xu, Justin W. Lin, Andy Applebaum, Tejal Patwardhan, Alpin Yukseloglu, Olivia Watkins
Abstract:
Smart contracts on public blockchains now manage large amounts of value, and vulnerabilities in these systems can lead to substantial losses. As AI agents become more capable at reading, writing, and running code, it is natural to ask how well they can already navigate this landscape, both in ways that improve security and in ways that might increase risk. We introduce EVMbench, an evaluation that measures the ability of agents to detect, patch, and exploit smart contract vulnerabilities. EVMbench draws on 117 curated vulnerabilities from 40 repositories and, in the most realistic setting, uses programmatic grading based on tests and blockchain state under a local Ethereum execution environment. We evaluate a range of frontier agents and find that they are capable of discovering and exploiting vulnerabilities end-to-end against live blockchain instances. We release code, tasks, and tooling to support continued measurement of these capabilities and future work on security.
Authors:Enea Monzio Compagnoni, Alessandro Stanghellini, Rustem Islamov, Aurelien Lucchi, Anastasiia Koloskova
Abstract:
Differential Privacy (DP) is becoming central to large-scale training as privacy regulations tighten. We revisit how DP noise interacts with adaptivity in optimization through the lens of stochastic differential equations, providing the first SDE-based analysis of private optimizers. Focusing on DP-SGD and DP-SignSGD under per-example clipping, we show a sharp contrast under fixed hyperparameters: DP-SGD converges at a Privacy-Utility Trade-Off of $\mathcal{O}(1/\varepsilon^2)$ with speed independent of $\varepsilon$, while DP-SignSGD converges at a speed linear in $\varepsilon$ with an $\mathcal{O}(1/\varepsilon)$ trade-off, dominating in high-privacy or large batch noise regimes. By contrast, under optimal learning rates, both methods achieve comparable theoretical asymptotic performance; however, the optimal learning rate of DP-SGD scales linearly with $\varepsilon$, while that of DP-SignSGD is essentially $\varepsilon$-independent. This makes adaptive methods far more practical, as their hyperparameters transfer across privacy levels with little or no re-tuning. Empirical results confirm our theory across training and test metrics, and empirically extend from DP-SignSGD to DP-Adam.
Authors:Nicolò Di Domenico, Annalisa Franco, Matteo Ferrara, Davide Maltoni
Abstract:
Face morphing attacks are widely recognized as one of the most challenging threats to face recognition systems used in electronic identity documents. These attacks exploit a critical vulnerability in passport enrollment procedures adopted by many countries, where the facial image is often acquired without a supervised live capture process. In this paper, we propose a novel face morphing technique based on Arc2Face, an identity-conditioned face foundation model capable of synthesizing photorealistic facial images from compact identity representations. We demonstrate the effectiveness of the proposed approach by comparing the morphing attack potential metric on two large-scale sequestered face morphing attack detection datasets against several state-of-the-art morphing methods, as well as on two novel morphed face datasets derived from FEI and ONOT. Experimental results show that the proposed deep learning-based approach achieves a morphing attack potential comparable to that of landmark-based techniques, which have traditionally been regarded as the most challenging. These findings confirm the ability of the proposed method to effectively preserve and manage identity information during the morph generation process.
Authors:Lukas Struppek, Adam Gleave, Kellin Pelrine
Abstract:
As the capabilities of large language models continue to advance, so does their potential for misuse. While closed-source models typically rely on external defenses, open-weight models must primarily depend on internal safeguards to mitigate harmful behavior. Prior red-teaming research has largely focused on input-based jailbreaking and parameter-level manipulations. However, open-weight models also natively support prefilling, which allows an attacker to predefine initial response tokens before generation begins. Despite its potential, this attack vector has received little systematic attention. We present the largest empirical study to date of prefill attacks, evaluating over 20 existing and novel strategies across multiple model families and state-of-the-art open-weight models. Our results show that prefill attacks are consistently effective against all major contemporary open-weight models, revealing a critical and previously underexplored vulnerability with significant implications for deployment. While certain large reasoning models exhibit some robustness against generic prefilling, they remain vulnerable to tailored, model-specific strategies. Our findings underscore the urgent need for model developers to prioritize defenses against prefill attacks in open-weight LLMs.
Authors:Amirali Sajadi, Tu Nguyen, Kostadin Damevski, Preetha Chatterjee
Abstract:
Vulnerability detection tools are widely adopted in software projects, yet they often overwhelm maintainers with false positives and non-actionable reports. Automated exploitation systems can help validate these reports; however, existing approaches typically operate in isolation from detection pipelines, failing to leverage readily available metadata such as vulnerability type and source-code location. In this paper, we investigate how reported security vulnerabilities can be assessed in a realistic grey-box exploitation setting that leverages minimal vulnerability metadata, specifically a CWE classification and a vulnerable code location. We introduce Agentic eXploit Engine (AXE), a multi-agent framework for Web application exploitation that maps lightweight detection metadata to concrete exploits through decoupled planning, code exploration, and dynamic execution feedback. Evaluated on the CVE-Bench dataset, AXE achieves a 30% exploitation success rate, a 3x improvement over state-of-the-art black-box baselines. Even in a single-agent configuration, grey-box metadata yields a 1.75x performance gain. Systematic error analysis shows that most failed attempts arise from specific reasoning gaps, including misinterpreted vulnerability semantics and unmet execution preconditions. For successful exploits, AXE produces actionable, reproducible proof-of-concept artifacts, demonstrating its utility in streamlining Web vulnerability triage and remediation. We further evaluate AXE's generalizability through a case study on a recent real-world vulnerability not included in CVE-Bench.
Authors:Yue Li, Xin Yi, Dongsheng Shi, Yongyi Cui, Gerard de Melo, Linlin Wang
Abstract:
Watermarking has emerged as a pivotal solution for content traceability and intellectual property protection in Large Vision-Language Models (LVLMs). However, vision-agnostic watermarks may introduce visually irrelevant tokens and disrupt visual grounding by enforcing indiscriminate pseudo-random biases. Additionally, current vision-specific watermarks rely on a static, one-time estimation of vision critical weights and ignore the weight distribution density when determining the proportion of protected tokens. This design fails to account for dynamic changes in visual dependence during generation and may introduce low-quality tokens in the long tail. To address these challenges, we propose Attention-Guided Dynamic Watermarking (AGMark), a novel framework that embeds detectable signals while strictly preserving visual fidelity. At each decoding step, AGMark first dynamically identifies semantic-critical evidence based on attention weights for visual relevance, together with context-aware coherence cues, resulting in a more adaptive and well-calibrated evidence-weight distribution. It then determines the proportion of semantic-critical tokens by jointly considering uncertainty awareness (token entropy) and evidence calibration (weight density), thereby enabling adaptive vocabulary partitioning to avoid irrelevant tokens. Empirical results confirm that AGMark outperforms conventional methods, observably improving generation quality and yielding particularly strong gains in visual semantic fidelity in the later stages of generation. The framework maintains highly competitive detection accuracy (at least 99.36\% AUC) and robust attack resilience (at least 88.61\% AUC) without sacrificing inference efficiency, effectively establishing a new standard for reliability-preserving multi-modal watermarking.
Authors:Xiang Li, Pin-Yu Chen, Wenqi Wei
Abstract:
With the rapid advancement and adoption of Audio Large Language Models (ALLMs), voice agents are now being deployed in high-stakes domains such as banking, customer service, and IT support. However, their vulnerabilities to adversarial misuse still remain unexplored. While prior work has examined aspects of trustworthiness in ALLMs, such as harmful content generation and hallucination, systematic security evaluations of voice agents are still lacking. To address this gap, we propose Aegis, a red-teaming framework for the governance, integrity, and security of voice agents. Aegis models the realistic deployment pipeline of voice agents and designs structured adversarial scenarios of critical risks, including privacy leakage, privilege escalation, resource abuse, etc. We evaluate the framework through case studies in banking call centers, IT Support, and logistics. Our evaluation shows that while access controls mitigate data-level risks, voice agents remain vulnerable to behavioral attacks that cannot be addressed through access restrictions alone, even under strict access controls. We observe systematic differences across model families, with open-weight models exhibiting higher susceptibility, underscoring the need for layered defenses that combine access control, policy enforcement, and behavioral monitoring to secure next-generation voice agents.
Authors:Yibing Liu, Chong Zhang, Zhongyi Han, Hansong Liu, Yong Wang, Yang Yu, Xiaoyan Wang, Yilong Yin
Abstract:
We address the problem of runtime trajectory anomaly detection, a critical capability for enabling trustworthy LLM agents. Current safety measures predominantly focus on static input/output filtering. However, we argue that ensuring LLM agents reliability requires auditing the intermediate execution process. In this work, we formulate the task of Trajectory Anomaly Detection. The goal is not merely detection, but precise error localization. This capability is essential for enabling efficient rollback-and-retry. To achieve this, we construct TrajBench, a dataset synthesized via a perturb-and-complete strategy to cover diverse procedural anomalies. Using this benchmark, we investigate the capability of models in process supervision. We observe that general-purpose LLMs, even with zero-shot prompting, struggle to identify and localize these anomalies. This reveals that generalized capabilities do not automatically translate to process reliability. To address this, we propose TrajAD, a specialized verifier trained with fine-grained process supervision. Our approach outperforms baselines, demonstrating that specialized supervision is essential for building trustworthy agents.
Authors:Luyi Sun, Wei Xu, Zaifeng Gao
Abstract:
As the paradigm of Human-Centered AI (HCAI) gains prominence, its benefits to society are accompanied by significant ethical concerns, one of which is the protection of individual privacy. This chapter provides a comprehensive overview of privacy within HCAI, proposing a human-centered privacy (HCP) framework, providing integrated solution from technology, ethics, and human factors perspectives. The chapter begins by mapping privacy risks across each stage of AI development lifecycle, from data collection to deployment and reuse, highlighting the impact of privacy risks on the entire system. The chapter then introduces privacy-preserving techniques such as federated learning and dif erential privacy. Subsequent chapters integrate the crucial user perspective by examining mental models, alongside the evolving regulatory and ethical landscapes as well as privacy governance. Next, advice on design guidelines is provided based on the human-centered privacy framework. After that, we introduce practical case studies across diverse fields. Finally, the chapter discusses persistent open challenges and future research directions, concluding that a multidisciplinary approach, merging technical, design, policy, and ethical expertise, is essential to successfully embed privacy into the core of HCAI, thereby ensuring these technologies advance in a manner that respects and ensures human autonomy, trust and dignity.
Authors:Kyle Yates, Abdullah Al Mamun, Mashrur Chowdhury
Abstract:
Many Intelligent Transportation Systems (ITS) applications require strong privacy guarantees for both users and their data. Homomorphic encryption (HE) enables computation directly on encrypted messages and thus offers a compelling approach to privacy-preserving data processing in ITS. However, practical HE schemes incur substantial ciphertext expansion and communication overhead, which limits their suitability for time-critical transportation systems. Hybrid homomorphic encryption (HHE) addresses this challenge by combining a homomorphic encryption scheme with a symmetric cipher, enabling efficient encrypted computation while dramatically reducing communication cost. In this paper, we develop theoretical models of representative ITS applications that integrate HHE to protect sensitive vehicular data. We then perform a parameter-based evaluation of the HHE scheme Rubato to estimate ciphertext sizes and communication overhead under realistic ITS workloads. Our results show that HHE achieves orders-of-magnitude reductions in ciphertext size compared with conventional HE while maintaining cryptographic security, making it significantly more practical for latency-constrained ITS communication.
Authors:Kangqiang Luo, Yi Xie, Shiqian Zhao, Jing Pan
Abstract:
Web attack detection is the first line of defense for securing web applications, designed to preemptively identify malicious activities. Deep learning-based approaches are increasingly popular for their advantages: automatically learning complex patterns and extracting semantic features from HTTP requests to achieve superior detection performance. However, existing methods are less effective in embedding irregular HTTP requests, even failing to model unordered parameters and achieve attack traceability. In this paper, we propose an effective web attack detection model, named WADBERT. It achieves high detection accuracy while enabling the precise identification of malicious parameters. To this end, we first employ Hybrid Granularity Embedding (HGE) to generate fine-grained embeddings for URL and payload parameters. Then, URLBERT and SecBERT are respectively utilized to extract their semantic features. Further, parameter-level features (extracted by SecBERT) are fused through a multi-head attention mechanism, resulting in a comprehensive payload feature. Finally, by feeding the concatenated URL and payload features into a linear classifier, a final detection result is obtained. The experimental results on CSIC2010 and SR-BH2020 datasets validate the efficacy of WADBERT, which respectively achieves F1-scores of 99.63% and 99.50%, and significantly outperforms state-of-the-art methods.
Authors:Zhuoran Yang, Ed Li, Jianliang He, Aman Priyanshu, Baturay Saglam, Paul Kassianik, Sajana Weerawardhena, Anu Vellore, Blaine Nelson, Neusha Javidnia, Arthur Goldblatt, Fraser Burch, Avi Zohary, Assaf Eisenman, Mahdi Sabbaghi, Supriti Vijay, Rahim Dharssi, Dhruv Kedia, Kojin Oshiba, Yaron Singer, Amin Karbasi
Abstract:
We present Foundation-Sec-8B-Reasoning, the first open-source native reasoning model for cybersecurity. Built upon our previously released Foundation-Sec-8B base model (derived from Llama-3.1-8B-Base), the model is trained through a two-stage process combining supervised fine-tuning (SFT) and reinforcement learning from verifiable rewards (RLVR). Our training leverages proprietary reasoning data spanning cybersecurity analysis, instruction-following, and mathematical reasoning. Evaluation across 10 cybersecurity benchmarks and 10 general-purpose benchmarks demonstrates performance competitive with significantly larger models on cybersecurity tasks while maintaining strong general capabilities. The model shows effective generalization on multi-hop reasoning tasks and strong safety performance when deployed with appropriate system prompts and guardrails. This work demonstrates that domain-specialized reasoning models can achieve strong performance on specialized tasks while maintaining broad general capabilities. We release the model publicly at https://huggingface.co/fdtn-ai/Foundation-Sec-8B-Reasoning.
Authors:Xinjie Zhou, Zhihui Yang, Lechao Cheng, Sai Wu, Gang Chen
Abstract:
Large language models (LLMs) exhibit powerful capabilities but risk memorizing sensitive personally identifiable information (PII) from their training data, posing significant privacy concerns. While machine unlearning techniques aim to remove such data, they predominantly depend on access to the training data. This requirement is often impractical, as training data in real-world deployments is commonly proprietary or inaccessible. To address this limitation, we propose Data-Free Selective Unlearning (DFSU), a novel privacy-preserving framework that removes sensitive PII from an LLM without requiring its training data. Our approach first synthesizes pseudo-PII through language model inversion, then constructs token-level privacy masks for these synthetic samples, and finally performs token-level selective unlearning via a contrastive mask loss within a low-rank adaptation (LoRA) subspace. Extensive experiments on the AI4Privacy PII-Masking dataset using Pythia models demonstrate that our method effectively removes target PII while maintaining model utility.
Authors:Víctor Mayoral-Vilches, Stefan Rass, Martin Pinzger, Endika Gil-Uriarte, Unai Ayucar-Carbajo, Jon Ander Ruiz-Alcalde, Maite del Mundo de Torres, Luis Javier Navarrete-Lozano, María Sanz-Gómez, Francesco Balassone, Cristóbal R. J. Veas-Chavez, Vanesa Turiel, Alfonso Glera-Picón, Daniel Sánchez-Prieto, Yuri Salvatierra, Paul Zabalegui-Landa, Ruffino Reydel Cabrera-Álvarez, Patxi Mayoral-Pizarroso
Abstract:
Cybersecurity superintelligence -- artificial intelligence exceeding the best human capability in both speed and strategic reasoning -- represents the next frontier in security. This paper documents the emergence of such capability through three major contributions that have pioneered the field of AI Security. First, PentestGPT (2023) established LLM-guided penetration testing, achieving 228.6% improvement over baseline models through an architecture that externalizes security expertise into natural language guidance. Second, Cybersecurity AI (CAI, 2025) demonstrated automated expert-level performance, operating 3,600x faster than humans while reducing costs 156-fold, validated through #1 rankings at international competitions including the $50,000 Neurogrid CTF prize. Third, Generative Cut-the-Rope (G-CTR, 2026) introduces a neurosymbolic architecture embedding game-theoretic reasoning into LLM-based agents: symbolic equilibrium computation augments neural inference, doubling success rates while reducing behavioral variance 5.2x and achieving 2:1 advantage over non-strategic AI in Attack & Defense scenarios. Together, these advances establish a clear progression from AI-guided humans to human-guided game-theoretic cybersecurity superintelligence.
Authors:Johannes Kaiser, Alexander Ziller, Eleni Triantafillou, Daniel Rückert, Georgios Kaissis
Abstract:
Individual Differential Privacy (iDP) promises users control over their privacy, but this promise can be broken in practice. We reveal a previously overlooked vulnerability in sampling-based iDP mechanisms: while conforming to the iDP guarantees, an individual's privacy risk is not solely governed by their own privacy budget, but critically depends on the privacy choices of all other data contributors. This creates a mismatch between the promise of individual privacy control and the reality of a system where risk is collectively determined. We demonstrate empirically that certain distributions of privacy preferences can unintentionally inflate the privacy risk of individuals, even when their formal guarantees are met. Moreover, this excess risk provides an exploitable attack vector. A central adversary or a set of colluding adversaries can deliberately choose privacy budgets to amplify vulnerabilities of targeted individuals. Most importantly, this attack operates entirely within the guarantees of DP, hiding this excess vulnerability. Our empirical evaluation demonstrates successful attacks against 62% of targeted individuals, substantially increasing their membership inference susceptibility. To mitigate this, we propose $(\varepsilon_i,δ_i,\overlineΔ)$-iDP a privacy contract that uses $Δ$-divergences to provide users with a hard upper bound on their excess vulnerability, while offering flexibility to mechanism design. Our findings expose a fundamental challenge to the current paradigm, demanding a re-evaluation of how iDP systems are designed, audited, communicated, and deployed to make excess risks transparent and controllable.
Authors:Mohammadhossein Homaei, Iman Khazrak, Ruben Molano, Andres Caro, Mar Avila
Abstract:
Water distribution systems (WDSs) face increasing cyber-physical risks, which make reliable anomaly detection essential. Many data-driven models ignore network topology and are hard to interpret, while model-based ones depend strongly on parameter accuracy. This work proposes a hydraulic-aware graph attention network using normalized conservation law violations as features. It combines mass and energy balance residuals with graph attention and bidirectional LSTM to learn spatio-temporal patterns. A multi-scale module aggregates detection scores from node to network level. On the BATADAL dataset, it reaches $F1=0.979$, showing $3.3$pp gain and high robustness under $15\%$ parameter noise.
Authors:Hao Wang, Yanting Wang, Hao Li, Rui Li, Lei Sha
Abstract:
Large Language Models (LLMs) have achieved remarkable capabilities but remain vulnerable to adversarial ``jailbreak'' attacks designed to bypass safety guardrails. Current safety alignment methods depend heavily on static external red teaming, utilizing fixed defense prompts or pre-collected adversarial datasets. This leads to a rigid defense that overfits known patterns and fails to generalize to novel, sophisticated threats. To address this critical limitation, we propose empowering the model to be its own red teamer, capable of achieving autonomous and evolving adversarial attacks. Specifically, we introduce Safety Self- Play (SSP), a system that utilizes a single LLM to act concurrently as both the Attacker (generating jailbreaks) and the Defender (refusing harmful requests) within a unified Reinforcement Learning (RL) loop, dynamically evolving attack strategies to uncover vulnerabilities while simultaneously strengthening defense mechanisms. To ensure the Defender effectively addresses critical safety issues during the self-play, we introduce an advanced Reflective Experience Replay Mechanism, which uses an experience pool accumulated throughout the process. The mechanism employs a Upper Confidence Bound (UCB) sampling strategy to focus on failure cases with low rewards, helping the model learn from past hard mistakes while balancing exploration and exploitation. Extensive experiments demonstrate that our SSP approach autonomously evolves robust defense capabilities, significantly outperforming baselines trained on static adversarial datasets and establishing a new benchmark for proactive safety alignment.
Authors:Ying Zhou, Jiacheng Wei, Yu Qi, Faguo Wu, Xiao Zhang
Abstract:
Large language models (LLMs) demonstrate remarkable capabilities in natural language understanding and generation. Despite being trained on large-scale, high-quality data, LLMs still fail to outperform traditional static analysis tools in specialized domains like smart contract vulnerability detection. To address this issue, this paper proposes a post-training algorithm based on atomic task decomposition and fusion. This algorithm aims to achieve combinatorial generalization under limited data by decomposing complex reasoning tasks. Specifically, we decompose the reentrancy vulnerability detection task into four linearly independent atomic tasks: identifying external calls, identifying state updates, identifying data dependencies between external calls and state updates, and determining their data flow order. These tasks form the core components of our approach. By training on synthetic datasets, we generate three compiler-verified datasets. We then employ the Slither tool to extract structural information from the control flow graph and data flow graph, which is used to fine-tune the LLM's adapter. Experimental results demonstrate that low-rank normalization fusion with the LoRA adapter improves the LLM's reentrancy vulnerability detection accuracy to 98.2%, surpassing state-of-the-art methods. On 31 real-world contracts, the algorithm achieves a 20% higher recall than traditional analysis tools.
Authors:Víctor Mayoral-Vilches, María Sanz-Gómez, Francesco Balassone, Stefan Rass, Lidia Salas-Espejo, Benjamin Jablonski, Luis Javier Navarrete-Lozano, Maite del Mundo de Torres, Cristóbal R. J. Veas Chavez
Abstract:
AI-driven penetration testing now executes thousands of actions per hour but still lacks the strategic intuition humans apply in competitive security. To build cybersecurity superintelligence --Cybersecurity AI exceeding best human capability-such strategic intuition must be embedded into agentic reasoning processes. We present Generative Cut-the-Rope (G-CTR), a game-theoretic guidance layer that extracts attack graphs from agent's context, computes Nash equilibria with effort-aware scoring, and feeds a concise digest back into the LLM loop \emph{guiding} the agent's actions. Across five real-world exercises, G-CTR matches 70--90% of expert graph structure while running 60--245x faster and over 140x cheaper than manual analysis. In a 44-run cyber-range, adding the digest lifts success from 20.0% to 42.9%, cuts cost-per-success by 2.7x, and reduces behavioral variance by 5.2x. In Attack-and-Defense exercises, a shared digest produces the Purple agent, winning roughly 2:1 over the LLM-only baseline and 3.7:1 over independently guided teams. This closed-loop guidance is what produces the breakthrough: it reduces ambiguity, collapses the LLM's search space, suppresses hallucinations, and keeps the model anchored to the most relevant parts of the problem, yielding large gains in success rate, consistency, and reliability.
Authors:Hoagy Cunningham, Jerry Wei, Zihan Wang, Andrew Persic, Alwin Peng, Jordan Abderrachid, Raj Agarwal, Bobby Chen, Austin Cohen, Andy Dau, Alek Dimitriev, Rob Gilson, Logan Howard, Yijin Hua, Jared Kaplan, Jan Leike, Mu Lin, Christopher Liu, Vladimir Mikulik, Rohit Mittapalli, Clare O'Hara, Jin Pan, Nikhil Saxena, Alex Silverstein, Yue Song, Xunjie Yu, Giulio Zhou, Ethan Perez, Mrinank Sharma
Abstract:
We introduce enhanced Constitutional Classifiers that deliver production-grade jailbreak robustness with dramatically reduced computational costs and refusal rates compared to previous-generation defenses. Our system combines several key insights. First, we develop exchange classifiers that evaluate model responses in their full conversational context, which addresses vulnerabilities in last-generation systems that examine outputs in isolation. Second, we implement a two-stage classifier cascade where lightweight classifiers screen all traffic and escalate only suspicious exchanges to more expensive classifiers. Third, we train efficient linear probe classifiers and ensemble them with external classifiers to simultaneously improve robustness and reduce computational costs. Together, these techniques yield a production-grade system achieving a 40x computational cost reduction compared to our baseline exchange classifier, while maintaining a 0.05% refusal rate on production traffic. Through extensive red-teaming comprising over 1,700 hours, we demonstrate strong protection against universal jailbreaks -- no attack on this system successfully elicited responses to all eight target queries comparable in detail to an undefended model. Our work establishes Constitutional Classifiers as practical and efficient safeguards for large language models.
Authors:Siyuan Li, Xi Lin, Jun Wu, Zehao Liu, Haoyu Li, Tianjie Ju, Xiang Chen, Jianhua Li
Abstract:
Jailbreak attacks pose significant threats to large language models (LLMs), enabling attackers to bypass safeguards. However, existing reactive defense approaches struggle to keep up with the rapidly evolving multi-turn jailbreaks, where attackers continuously deepen their attacks to exploit vulnerabilities. To address this critical challenge, we propose HoneyTrap, a novel deceptive LLM defense framework leveraging collaborative defenders to counter jailbreak attacks. It integrates four defensive agents, Threat Interceptor, Misdirection Controller, Forensic Tracker, and System Harmonizer, each performing a specialized security role and collaborating to complete a deceptive defense. To ensure a comprehensive evaluation, we introduce MTJ-Pro, a challenging multi-turn progressive jailbreak dataset that combines seven advanced jailbreak strategies designed to gradually deepen attack strategies across multi-turn attacks. Besides, we present two novel metrics: Mislead Success Rate (MSR) and Attack Resource Consumption (ARC), which provide more nuanced assessments of deceptive defense beyond conventional measures. Experimental results on GPT-4, GPT-3.5-turbo, Gemini-1.5-pro, and LLaMa-3.1 demonstrate that HoneyTrap achieves an average reduction of 68.77% in attack success rates compared to state-of-the-art baselines. Notably, even in a dedicated adaptive attacker setting with intensified conditions, HoneyTrap remains resilient, leveraging deceptive engagement to prolong interactions, significantly increasing the time and computational costs required for successful exploitation. Unlike simple rejection, HoneyTrap strategically wastes attacker resources without impacting benign queries, improving MSR and ARC by 118.11% and 149.16%, respectively.
Authors:Minfeng Qi, Lin Zhong, Qin Wang
Abstract:
Blockchain assets are increasingly controlled by organizations rather than individuals. DAO treasuries, consortium wallets, and custodial exchanges rely on threshold authorization and multi-party key management, yet existing payment mechanisms still target single-user wallets, leaving no unified solution for organizational transfers. We formalize the problem of \emph{DAO-to-(anonymous)-DAO} transactions and present \textsc{Dao$^2$}, a framework that enables one threshold-controlled organization to pay another, optionally with recipient anonymity, while keeping received funds under distributed control. \textsc{Dao$^2$} combines three components: \emph{distributed key derivation} (DKD) for non-stealth child addresses, \emph{distributed stealth-address generation} (DSAG) for unlinkable one-time destinations, and \emph{threshold signatures} for authorization. For ordinary transfers, the receiver derives a non-stealth address via DKD; for anonymous transfers, it derives a stealth address via DSAG. The sender then threshold-signs the payment, and the receiver redeems the funds without reconstructing any master secret. We formally prove its security and evaluate a prototype. A complete anonymous DAO-to-DAO transaction for a typical-sized (e.g., 7-member) DAO finishes in under 27\,ms with less than 1.2\,KB of communication, and scales linearly with DAO size.
Authors:Xi Yang, Taolue Chen, Yuqi Chen, Fu Song, Chundong Wang, Zhilin Wu
Abstract:
Fault injection attacks deliberately inject faults into a device via physical channels to disturb its regular execution. Adversaries can effectively deduce secrets by analyzing both the normal and faulty outputs, posing serious threats to cryptographic primitives implemented in hardware. An effective countermeasure to such attacks is via redundancy, commonly referred to as concurrent error detection schemes, where Binary linear codes have been used to defend against fault injection attacks. However, designing an optimal code circuit is often time-consuming, error-prone, and requires substantial expertise. In this paper, we formalize the optimal code circuit synthesis problem (OptiCC) based on two domain-specific minimization objectives on individual inputs and parity size. We then propose a novel algorithm CiSC for solving OptiCC, prioritizing the minimization of individual inputs. Our approach features both correct-by-construction and secure-by-construction. In a nutshell, CiSC gradually reduces individual inputs and parity size by checking, via SMT solving, the existence of feasible Boolean functions for implementing a desired code. We further present an effective technique to lazily generate combinations of inputs to Boolean functions, while quickly identify equivalent ones. We implement our approach in a tool CiSC, and evaluate it on practical benchmarks. Experimental results show our approach can synthesize code circuits that significantly outperform those generated by the latest state-of-the-art techniques.
Authors:Wei Zou, Mingwen Dong, Miguel Romero Calvo, Wei Zou, Shuaichen Chang, Jiang Guo, Dongkyu Lee, Xing Niu, Xiaofei Ma, Yanjun Qi, Jiarong Jiang
Abstract:
Memory makes LLM-based web agents personalized, powerful, yet exploitable. By storing past interactions to personalize future tasks, agents inadvertently create a persistent attack surface that spans websites and sessions. While existing security research on memory assumes attackers can directly inject into memory storage or exploit shared memory across users, we present a more realistic threat model: contamination through environmental observation alone. We introduce Environment-injected Trajectory-based Agent Memory Poisoning (eTAMP), the first attack to achieve cross-session, cross-site compromise without requiring direct memory access. A single contaminated observation (e.g., viewing a manipulated product page) silently poisons an agent's memory and activates during future tasks on different websites, bypassing permission-based defenses. Our experiments on (Visual)WebArena reveal two key findings. First, eTAMP achieves substantial attack success rates: up to 32.5% on GPT-5-mini, 23.4% on GPT-5.2, and 19.5% on GPT-OSS-120B. Second, we discover Frustration Exploitation: agents under environmental stress become dramatically more susceptible, with ASR increasing up to 8 times when agents struggle with dropped clicks or garbled text. Notably, more capable models are not more secure. GPT-5.2 shows substantial vulnerability despite superior task performance. With the rise of AI browsers like OpenClaw, ChatGPT Atlas, and Perplexity Comet, our findings underscore the urgent need for defenses against environment-injected memory poisoning.
Authors:Claudius Pott, Luca Wilke, Jan Wichelmann, Thomas Eisenbarth
Abstract:
Trusted Execution Environments (TEEs) allow the secure execution of code on remote systems without the need to trust their operators. They use static attestation as a central mechanism for establishing trust, allowing remote parties to verify that their code is executed unmodified in an isolated environment. However, this form of attestation does not cover runtime attacks, where an attacker exploits vulnerabilities in the software inside the TEE. Control Flow Attestation (CFA), a form of runtime attestation, is designed to detect such attacks. In this work, we present a method to extend TEEs with CFA and discuss how it can prevent exploitation in the event of detected control flow violations. Furthermore, we introduce HPCCFA, a mechanism that uses HPCs for CFA purposes, enabling hardware-backed trace generation on commodity CPUs. We demonstrate the feasibility of HPCCFA on a proof-of-concept implementation for Keystone on RISC-V. Our evaluation investigates the interplay of the number of measurement points and runtime protection, and reveals a trade-off between detection reliability and performance overhead.
Authors:Yuhua Xu, Mingtao Jiang, Chenfei Hu, Yinglong Wang, Chuan Zhang, Meng Li, Ming Lu, Liehuang Zhu
Abstract:
In low-altitude wireless networks (LAWN), federated learning (FL) enables collaborative intelligence among unmanned aerial vehicles (UAVs) and integrated sensing and communication (ISAC) devices while keeping raw sensing data local. Due to the "right to be forgotten" requirements and the high mobility of ISAC devices that frequently enter or leave the coverage region of UAV-assisted servers, the influence of departing devices must be removed from trained models. This necessity motivates the adoption of federated unlearning (FUL) to eliminate historical device contributions from the global model in LAWN. However, existing FUL approaches implicitly assume that the UAV-assisted server executes unlearning operations honestly. Without client-verifiable guarantees, an untrusted server may retain residual device information, leading to potential privacy leakage and undermining trust. To address this issue, we propose VerFU, a privacy-preserving and client-verifiable federated unlearning framework designed for LAWN. It empowers ISAC devices to validate the server-side unlearning operations without relying on original data samples. By integrating linear homomorphic hash (LHH) with commitment schemes, VerFU constructs tamper-proof records of historical updates. ISAC devices ensure the integrity of unlearning results by verifying decommitment parameters and utilizing the linear composability of LHH to check whether the global model accurately removes their historical contributions. Furthermore, VerFU is capable of efficiently processing parallel unlearning requests and verification from multiple ISAC devices. Experimental results demonstrate that our framework efficiently preserves model utility post-unlearning while maintaining low communication and verification overhead.
Authors:Dalila Ressi, Alvise Spanò, Matteo Rizzo, Lorenzo Benetollo, Sabina Rossi
Abstract:
Reentrancy remains one of the most critical classes of vulnerabilities in Ethereum smart contracts, yet widely used detection tools and datasets continue to reflect outdated patterns and obsolete Solidity versions. This paper adopts a dependability-oriented perspective on reentrancy detection in Solidity 0.8+, assessing how reliably state-of-the-art static analyzers and AI-based techniques operate on modern code by putting them to the test on two fronts. We construct two manually verified benchmarks: an Aggregated Benchmark of 432 real-world contracts, consolidated and relabeled from prior datasets, and a Reentrancy Scenarios Dataset (RSD) of \chadded{143} handcrafted minimal working examples designed to isolate and stress-test individual reentrancy patterns. We then evaluate 12 formal-methods-based tools, 10 machine-learning models, and 9 large language models. On the Aggregated Benchmark, traditional tools and ML models achieve up to 0.87 F1, while the best LLMs reach 0.96 in a zero-shot setting. On the RSD, most tools fail on multiple scenarios, the top performer achieving an F1 of 0.76, whereas the strongest model attains 0.82. Overall, our results indicate that leading LLMs outperform the majority of existing detectors, highlighting concerning gaps in the robustness and maintainability of current reentrancy-analysis tools.
Authors:Shashie Dilhara Batan Arachchige, Hassan Jameel Asghar, Benjamin Zi Hao Zhao, Dinusha Vatsalan, Dali Kaafar
Abstract:
Large Language Models (LLMs) generate responses based on user prompts. Often, these prompts may contain highly sensitive information, including personally identifiable information (PII), which could be exposed to third parties hosting these models. In this work, we propose a new method to sanitize user prompts. Our mechanism uses the randomized response mechanism of differential privacy to randomly and independently perturb each character in a word. The perturbed text is then sent to a remote LLM, which first performs a prompt restoration and subsequently performs the intended downstream task. The idea is that the restoration will be able to reconstruct non-sensitive words even when they are perturbed due to cues from the context, as well as the fact that these words are often very common. On the other hand, perturbation would make reconstruction of sensitive words difficult because they are rare. We experimentally validate our method on two datasets, i2b2/UTHealth and Enron, using two LLMs: Llama-3.1 8B Instruct and GPT-4o mini. We also compare our approach with a word-level differentially private mechanism, and with a rule-based PII redaction baseline, using a unified privacy-utility evaluation. Our results show that sensitive PII tagged in these datasets are reconstructed at a rate close to the theoretical rate of reconstructing completely random words, whereas non-sensitive words are reconstructed at a much higher rate. Our method has the advantage that it can be applied without explicitly identifying sensitive pieces of information in the prompt, while showing a good privacy-utility tradeoff for downstream tasks.
Authors:Ao Ding, Hongzong Li, Zi Liang, Zhanpeng Shi, Shuxin Zhuang, Shiqin Tang, Rong Feng, Ping Lu
Abstract:
Large language models (LLMs) are increasingly deployed on edge devices under strict computation and quantization constraints, yet their security implications remain unclear. We study query-based knowledge extraction from quantized edge-deployed LLMs under realistic query budgets and show that, although quantization introduces noise, it does not remove the underlying semantic knowledge, allowing substantial behavioral recovery through carefully designed queries. To systematically analyze this risk, we propose \textbf{CLIQ} (\textbf{Cl}ustered \textbf{I}nstruction \textbf{Q}uerying), a structured query construction framework that improves semantic coverage while reducing redundancy. Experiments on quantized Qwen models (INT8/INT4) demonstrate that CLIQ consistently outperforms original queries across BERTScore, BLEU, and ROUGE, enabling more efficient extraction under limited budgets. These results indicate that quantization alone does not provide effective protection against query-based extraction, highlighting a previously underexplored security risk in edge-deployed LLMs.
Authors:Weijun Li, Arnaud Grivet Sébert, Qiongkai Xu, Annabelle McIver, Mark Dras
Abstract:
The growing use of large language models has increased interest in sharing textual data in a privacy-preserving manner. One prominent line of work addresses this challenge through text rewriting under Local Differential Privacy (LDP), where input texts are locally obfuscated before release with formal privacy guarantees. These guarantees are typically expressed by a parameter $\varepsilon$ that upper bounds the worst-case privacy loss. However, nominal $\varepsilon$ values are often difficult to interpret and compare across mechanisms. In this work, we investigate how to empirically calibrate across text rewriting mechanisms under LDP. We propose TeDA, which formulates calibration via a hypothesis-testing framework that instantiates text distinguishability audits in both surface and embedding spaces, enabling empirical assessment of indistinguishability from privatized texts. Applying this calibration to several representative mechanisms, we demonstrate that similar nominal $\varepsilon$ bounds can imply very different levels of distinguishability. Empirical calibration thus provides a more comparable footing for evaluating privacy-utility trade-offs, as well as a practical tool for mechanism comparison and analysis in real-world LDP text rewriting deployments.
Authors:Rui Yang Tan, Yujia Hu, Roy Ka-Wei Lee
Abstract:
Multimodal Large Language Models (MLLMs) extend text-only LLMs with visual reasoning, but also introduce new safety failure modes under visually grounded instructions. We study comic-template jailbreaks that embed harmful goals inside simple three-panel visual narratives and prompt the model to role-play and "complete the comic." Building on JailbreakBench and JailbreakV, we introduce ComicJailbreak, a comic-based jailbreak benchmark with 1,167 attack instances spanning 10 harm categories and 5 task setups. Across 15 state-of-the-art MLLMs (six commercial and nine open-source), comic-based attacks achieve success rates comparable to strong rule-based jailbreaks and substantially outperform plain-text and random-image baselines, with ensemble success rates exceeding 90% on several commercial models. Then, with the existing defense methodologies, we show that these methods are effective against the harmful comics, they will induce a high refusal rate when prompted with benign prompts. Finally, using automatic judging and targeted human evaluation, we show that current safety evaluators can be unreliable on sensitive but non-harmful content. Our findings highlight the need for safety alignment robust to narrative-driven multimodal jailbreaks.
Authors:Eli Chien, Yuzheng Hu, Ryan McKenna, Shanshan Wu, Zheng Xu, Peter Kairouz
Abstract:
While differentially private (DP) fine-tuning of large language models (LLMs) is a powerful tool, it is often computationally prohibitive or infeasible when state-of-the-art models are only accessible via proprietary APIs. In such settings, generating DP synthetic data has emerged as a crucial alternative, offering the added benefits of arbitrary reuse across downstream tasks and transparent exploratory data analysis without the opaque constraints of a model's parameter space. Private Evolution (PE) is a promising API-based framework for this goal; however, its performance critically depends on initialization. When the private data distribution deviates substantially from the foundation model's pre-training priors--particularly in highly specialized domains--PE frequently struggles to align with the target data, resulting in degraded utility, poor convergence, and inefficient API usage. To address this initialization bottleneck, we propose Metadata Augmented Private Language Evolution (MAPLE). MAPLE leverages differentially private tabular metadata extraction and in-context learning to effectively ground the initial synthetic distribution in the target domain. Extensive experiments on challenging, domain-specific text generation tasks demonstrate that MAPLE achieves a significantly more favorable privacy-utility trade-off, converges faster, and drastically reduces API costs compared to previous PE methods.
Authors:Shiliang Zhang, Sabita Maharjan
Abstract:
The rapid proliferation of artificial intelligence (AI) technologies has led to a dynamic regulatory landscape, where legislative frameworks strive to keep pace with technical advancements. As AI paradigms shift towards greater autonomy, specifically in the form of agentic AI, it becomes increasingly challenging to precisely articulate regulatory stipulations. This challenge is even more acute in the domains of security and privacy, where the capabilities of autonomous agents often blur traditional legal and technical boundaries. This paper reviews the evolving European Union (EU) AI regulatory provisions via analyzing 24 relevant documents published between 2024 and 2025. From this review, we provide a clarification of critical definitions. We deconstruct the regulatory interpretations of security, privacy, and agentic AI, distinguishing them from closely related concepts to resolve ambiguity. We synthesize the reviewed documents to articulate the current state of regulatory provisions targeting different types of AI, particularly those related to security and privacy aspects. We analyze and reflect on the existing provisions in the regulatory dimension to better align security and privacy obligations with AI and agentic behaviors. These insights serve to inform policymakers, developers, and researchers on the compliance and AI governance in the society with increasing algorithmic agencies.
Authors:Zikang Ding, Junhao Li, Suling Wu, Junchi Yao, Hongbo Liu, Lijie Hu
Abstract:
Model watermarking utilizes internal representations to protect the ownership of large language models (LLMs). However, these features inevitably undergo complex distortions during realistic model modifications such as fine-tuning, quantization, or knowledge distillation, making reliable extraction extremely challenging. Despite extensive research on model-side watermarking, existing methods still lack sufficient robustness against parameter-level perturbations. To address this gap, we propose \texttt{\textbf{Functional Subspace Watermarking (FSW)}}, a framework that anchors ownership signals into a low-dimensional functional backbone. Specifically, we first solve a generalized eigenvalue problem to extract a stable functional subspace for watermark injection, while introducing an adaptive spectral truncation strategy to achieve an optimal balance between robustness and model utility. Furthermore, a vector consistency constraint is incorporated to ensure that watermark injection does not compromise the original semantic performance. Extensive experiments across various LLM architectures and datasets demonstrate that our method achieves superior detection accuracy and statistical verifiability under multiple model attacks, maintaining robustness that outperforms existing state-of-the-art (SOTA) methods.
Authors:Yige Liu, Dexuan Xu, Zimai Guo, Yongzhi Cao, Hanpin Wang
Abstract:
Vertical federated learning (VFL) allows an active party with a top model, and multiple passive parties with bottom models to collaborate. In this scenario, passive parties possessing only features may attempt to infer active party's private labels, making label inference attacks (LIAs) a significant threat. Previous LIA studies have claimed that well-trained bottom models can effectively represent labels. However, we demonstrate that this view is misleading and exposes the vulnerability of existing LIAs. By leveraging mutual information, we present the first observation of the "model compensation" phenomenon in VFL. We theoretically prove that, in VFL, the mutual information between layer outputs and labels increases with layer depth, indicating that bottom models primarily extract feature information while the top model handles label mapping. Building on this insight, we introduce task reassignment to show that the success of existing LIAs actually stems from the distribution alignment between features and labels. When this alignment is disrupted, the performance of LIAs declines sharply or even fails entirely. Furthermore, the implications of this insight for defenses are also investigated. We propose a zero-overhead defense technique based on layer adjustment. Extensive experiments across five datasets and five representative model architectures indicate that shifting cut layers forward to increase the proportion of top model layers in the entire model not only improves resistance to LIAs but also enhances other defenses.
Authors:Zilong Hu, Hongming Fei, Prosanta Gope, Jack Miskelly, Owen Millwood, Biplab Sikdar
Abstract:
Dynamic Random Access Memory (DRAM) is pervasive in computer systems. Cell vulnerabilities caused by unintended phenomena (forced retention failure, latency alteration, rowhammer and rowpress) lead to unintended bit flips in memory. These phenomena have been explored as attacks to violate data integrity and confidentiality during normal operation, but also exploited as a benefit in security systems as a method to generate random secret keys and unique device fingerprints (e.g. Physically Unclonable Functions). In both cases, attackers may wish to exploit knowledge of individual cell flip vulnerability to predict the current/future data contents of a set of cells, which can be utilised to break security systems. In this work, we develop a quantitative, cell-level circuit framework that models DRAM vulnerability directly from its physical charge leakage and disturbance pathways. By linking these device-layer behaviours to system-level security properties, our framework enables systematic evaluation of DRAM with respect to volatility (retention), integrity (disturbance-induced modification), and confidentiality (pattern-dependent leakage). We further demonstrate how the framework can be applied to well-known failure modes, revealing non-uniform and context-dependent vulnerability patterns. This work provides both theoretical foundations and practical evaluation tools for evaluating the suitability of DRAM use within security applications.
Authors:Ashwin Sudhir, Zion Leonahenahe Basque, Wil Gibbs, Ati Priya Bajaj, Pulkit Singh Singaria, Mitchell Zakocs, Jie Hu, Moritz Schloegel, Tiffany Bao, Adam Doupe, Yan Shoshitaishvili, Ruoyu Wang
Abstract:
In the ever-evolving battle against malware, binary obfuscation techniques are a formidable barrier to effective analysis by both human security analysts and automated systems. In particular, virtualization or VM-based obfuscation is one of the strongest protection mechanisms that evade automated analysis. Despite widespread use of virtualization, existing automated deobfuscation techniques suffer from three major drawbacks. First, they only work on execution traces, which prevents them from recovering all logic in an obfuscated binary. Second, they depend on dynamic symbolic execution, which is expensive and does not scale in practice. Third, they cannot generate "well-formed" code, which prevents existing binary decompilers from generating human-friendly output. This paper introduces PUSHAN, a novel and generic technique for deobfuscating virtualization-obfuscated binaries while overcoming the limitations of existing techniques. PUSHAN is trace-free and avoids path-constraint accumulation by using VPC-sensitive, constraint-free symbolic emulation to recover a complete CFG of the virtualized function. It is the first approach that also decompiles the protected code into high-quality C pseudocode to enable effective analysis. Crucially, PUSHAN circumvents reliance on path satisfiability, a known NP-hard problem that hampers scalability. We evaluate PUSHAN on more than 1,000 binaries, including targets protected by academic state of the art (Tigress) and commercial-strength obfuscators VMProtect and Themida. PUSHAN successfully deobfuscates these binaries, retrieves their complete CFGs, and decompiles them to C pseudocode. We further demonstrate applicability by analyzing a previously unanalyzed VMProtect-obfuscated malware sample from VirusTotal, where our decompiled output enables LLM-assisted code simplification, reuse, and program understanding.
Authors:Ken Huang, Jerry Huang, Mahesh Lambe, Hammad Atta, Yasir Mehmood, Muhammad Zeeshan Baig, Muhammad Aziz Ul Haq, Nadeem Shahzad, Shailja Gupta, Rajesh Ranjan, Rekha Singhal
Abstract:
This paper introduces Capability-Priced Micro-Markets (CPMM), a micro-economic framework designed to enable robust, scalable, and secure commerce among autonomous AI agents on the agentic web. The framework addresses the fundamental challenge of economic coordination in decentralized agent ecosystems, where entities must transact with minimal human oversight. CPMM synthesizes three key technologies into a unified system: MIT originated, Project NANDA infrastructure for cryptographically verifiable, capability-based security and discovery; the HTTP 402 "Payment Required" status code, with modern X402/H402 extensions for efficient, low-cost micropayments; and the Agent Capability Negotiation and Binding Protocol (ACNBP) for secure, multi-step negotiation and commitment. The paper formalizes agent interactions as a repeated bilateral game with incomplete information, demonstrating theoretically that the CPMM mechanism converges to a constrained Radner equilibrium, ensuring efficient outcomes under information asymmetry. A key theoretical contribution is the concept of "privacy elasticity of demand," which is introduced to quantify the trade-off between an agent's information disclosure and the market price of its services. By integrating secure capabilities, micropayment protocols, and formal negotiation mechanisms, CPMM provides a comprehensive, theoretically-grounded solution for creating functional micro-markets for the emergent agentic web.
Authors:Lidor Erez, Omer Hofman, Tamir Nizri, Roman Vainshtein
Abstract:
Automated LLM vulnerability scanners are increasingly used to assess security risks by measuring different attack type success rates (ASR). Yet the validity of these measurements hinges on an often-overlooked component: the evaluator who determines whether an attack has succeeded. In this study, we demonstrate that commonly used open-source scanners exhibit measurement instability that depends on the evaluator component. Consequently, changing the evaluator while keeping the attacks and model outputs constant can significantly alter the reported ASR. To tackle this problem, we present a two-phase, reliability-aware evaluation framework. In the first phase, we quantify evaluator disagreement to identify attack categories where ASR reliability cannot be assumed. In the second phase, we propose a verification-based evaluation method where evaluators are validated by an independent verifier, enabling reliability assessment without relying on extensive human annotation. Applied to the widely used Garak scanner, we observe that 22 of 25 attack categories exhibit evaluator instability, reflected in high disagreement among evaluators. Our approach raises evaluator accuracy from 72% to 89% while enabling selective deployment to control cost and computational overhead. We further quantify evaluator uncertainty in ASR estimates, showing that reported vulnerability scores can vary by up to 33% depending on the evaluator. Our results indicate that the outputs of vulnerability scanners are highly sensitive to the choice of evaluators. Our framework offers a practical approach to quantify unreliable evaluations and enhance the reliability of measurements in automated LLM security assessments.
Authors:Federico Mirra, Matteo Boffa, Idilio Drago, Danilo Giordano, Marco Mellia
Abstract:
Honeypots are deception systems that emulate vulnerable services to collect threat intelligence. While deploying many honeypots increases the opportunity to observe attacker behaviour, in practise network and computational resources limit the number of honeypots that can be exposed. Hence, practitioners must select the assets to deploy, a decision that is typically made statically despite attackers' tactics evolving over time. This work investigates an AI-driven agentic architecture that autonomously manages honeypot exposure in response to ongoing attacks. The proposed agent analyses Intrusion Detection System (IDS) alerts and network state to infer the progression of the attack, identify compromised assets, and predict likely attacker targets. Based on this assessment, the agent dynamically reconfigures the system to maintain attacker engagement while minimizing unnecessary exposure. The approach is evaluated in a simulated environment where attackers execute Proof-of-Concept exploits for known CVEs. Preliminary results indicate that the agent can effectively infer the intent of the attacker and improve the efficiency of exposure under resource constraints
Authors:Donghwa Kang, Hojun Choe, Doohyun Kim, Hyeongboo Baek, Brent ByungHoon Kang
Abstract:
Deploying deep neural networks (DNNs) on edge devices exposes valuable intellectual property to model-stealing attacks. While TEE-shielded DNN partitioning (TSDP) mitigates this by isolating sensitive computations, existing paradigms fail to simultaneously satisfy privacy and efficiency. The training-before-partition paradigm suffers from intrinsic privacy leakage, whereas the partition-before-training paradigm incurs severe latency due to structural dependencies that hinder parallel execution. To overcome these limitations, we propose SPOILER, a novel search-before-training framework that fundamentally decouples the TEE sub-network from the backbone via hardware-aware neural architecture search (NAS). SPOILER identifies a lightweight TEE architecture strictly optimized for hardware constraints, maximizing parallel efficiency. Furthermore, we introduce self-poisoning learning to enforce logical isolation, rendering the exposed backbone functionally incoherent without the TEE component. Extensive experiments on CNNs and Transformers demonstrate that SPOILER achieves state-of-the-art trade-offs between security, latency, and accuracy.
Authors:Yangyang Wei, Yijie Xu, Zhenyuan Li, Xiangmin Shen, Shouling Ji
Abstract:
Multi-Agent System is emerging as the \textit{de facto} standard for complex task orchestration. However, its reliance on autonomous execution and unstructured inter-agent communication introduces severe risks, such as indirect prompt injection, that easily circumvent conventional input guardrails. To address this, we propose \SysName, a framework that shifts the defensive paradigm from static input filtering to execution-aware analysis. By extracting and reconstructing Cross-Agent Semantic Flows, \SysName synthesizes fragmented operational primitives into contiguous behavioral trajectories, enabling a holistic view of system activity. We leverage a Supervisor LLM to scrutinize these trajectories, identifying anomalies across data flow violations, control flow deviations, and intent inconsistencies. Empirical evaluations demonstrate that \SysName effectively detects over ten distinct compound attack vectors, achieving F1-scores of 85.3\% and 66.7\% for node-level and path-level end-to-end attack detection, respectively. The source code is available at https://anonymous.4open.science/r/MAScope-71DC.
Authors:Furkan Mumcu, Yasin Yilmaz
Abstract:
As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability when highly non-linear policies induce extreme local curvature in the inner maximization. Standard remedies that enforce global Jacobian bounds are overly conservative, suppressing sensitivity in all directions and inducing a large Price of Robustness. We introduce Adversarially-Aligned Jacobian Regularization (AAJR), a trajectory-aligned approach that controls sensitivity strictly along adversarial ascent directions. We prove that AAJR yields a strictly larger admissible policy class than global constraints under mild conditions, implying a weakly smaller approximation gap and reduced nominal performance degradation. Furthermore, we derive step-size conditions under which AAJR controls effective smoothness along optimization trajectories and ensures inner-loop stability. These results provide a structural theory for agentic robustness that decouples minimax stability from global expressivity restrictions.
Authors:Max Landauer, Wolfgang Hotwagner, Thorina Boenke, Florian Skopik, Markus Wurzenberger
Abstract:
Log data are essential for intrusion detection and forensic investigations. However, manual log analysis is tedious due to high data volumes, heterogeneous event formats, and unstructured messages. Even though many automated methods for log analysis exist, they usually still rely on domain-specific configurations such as expert-defined detection rules, handcrafted log parsers, or manual feature-engineering. Crucially, the level of automation of conventional methods is limited due to their inability to semantically understand logs and explain their underlying causes. In contrast, Large Language Models enable domain- and format-agnostic interpretation of system logs and security alerts. Unfortunately, research on this topic remains challenging, because publicly available and labeled data sets covering a broad range of attack techniques are scarce. To address this gap, we introduce the Cyber Attack Manifestation Log Data Set (CAM-LDS), comprising seven attack scenarios that cover 81 distinct techniques across 13 tactics and collected from 18 distinct sources within a fully open-source and reproducible test environment. We extract log events that directly result from attack executions to facilitate analysis of manifestations concerning command observability, event frequencies, performance metrics, and intrusion detection alerts. We further present an illustrative case study utilizing an LLM to process the CAM-LDS. The results indicate that correct attack techniques are predicted perfectly for approximately one third of attack steps and adequately for another third, highlighting the potential of LLM-based log interpretation and utility of our data set.
Authors:Hadi Mohaghegh Dolatabadi, Thalaiyasingam Ajanthan, Sameera Ramasinghe, Chamin P Hewa Koneputugodage, Gil Avraham, Yan Zuo, Violetta Shevchenko, Alexander Long
Abstract:
Decentralized training introduces critical security risks when executed across untrusted, geographically distributed nodes. While existing Byzantine-tolerant literature addresses data parallel (DP) training through robust aggregation methods, pipeline parallelism (PP) presents fundamentally distinct challenges. In PP, model layers are distributed across workers where the activations and their gradients flow between stages rather than being aggregated, making traditional DP approaches inapplicable. We propose SENTINEL, a verification mechanism for PP training without computation duplication. SENTINEL employs lightweight momentum-based monitoring using exponential moving averages (EMAs) to detect corrupted inter-stage communication. Unlike existing Byzantine-tolerant approaches for DP that aggregate parameter gradients across replicas, our approach verifies sequential activation/gradient transmission between layers. We provide theoretical convergence guarantees for this new setting that recovers classical convergence rates when relaxed to standard training. Experiments demonstrate successful training of up to 4B-parameter LLMs across untrusted distributed environments with up to 176 workers while maintaining model convergence and performance.
Authors:Zhen Guo, Shanghao Shi, Hao Li, Shamim Yazdani, Ning Zhang, Reza Tourani
Abstract:
The deployment of Large Reasoning Models (LRMs) in high-stakes decision-making pipelines has introduced a novel and opaque attack surface: reasoning backdoors. In these attacks, the model's intermediate Chain-of-Thought (CoT) is manipulated to provide a linguistically plausible but logically fallacious justification for a malicious conclusion. While frontier models exhibit an intrinsic capacity to detect these fractures, compact, deployable models suffer from a fundamental verification gap, relying on fragile lexical heuristics that are easily bypassed by motivated adversaries. To bridge this gap, we propose TraceGuard, a process-guided security framework that transforms small-scale models into robust reasoning firewalls. Our approach treats the reasoning trace as an untrusted payload and establishes a defense-in-depth strategy through three synergistic phases: (1) Automated Forensic Synthesis, which generates contrastive reasoning pairs to isolate the specific logical point of fracture; (2) Step-Aware Supervised Fine-Tuning (SSFT), to instill a structural verification grammar; and (3) Verifier-Guided Reinforcement Learning (VGRL), utilizing Group Relative Policy Optimization. We identify and mitigate a critical failure mode of baseline alignment - lexical overfitting - whereby verifiers memorize adversarial triggers rather than auditing logical integrity. Our empirical evaluation demonstrates that TraceGuard acts as a security force multiplier: a 4B-parameter verifier achieves forensic precision on unseen attacks - including latent backdoors and post-hoc rationalizations - that rivals architectures two orders of magnitude larger. We further demonstrate robustness against adaptive adversaries in a grey-box setting, establishing TraceGuard as a viable, low-latency security primitive for the Trusted Computing Base.
Authors:Yu Lin, Qizhi Zhang, Wenqiang Ruan, Daode Zhang, Jue Hong, Ye Wu, Hanning Xia, Yunlong Mao, Sheng Zhong
Abstract:
The rapid development of large language models (LLMs) has driven the widespread adoption of cloud-based LLM inference services, while also bringing prominent privacy risks associated with the transmission and processing of private data in remote inference. For privacy-preserving LLM inference technologies to be practically applied in industrial scenarios, three core requirements must be satisfied simultaneously: (1) Accuracy and efficiency losses should be minimized to mitigate degradation in service experience. (2) The inference process can be run on large-scale clusters consist of heterogeneous legacy xPUs. (3) Compatibility with existing LLM infrastructures should be ensured to reuse their engineering optimizations. To the best of our knowledge, none of the existing privacy-preserving LLM inference methods satisfy all the above constraints while delivering meaningful privacy guarantees. In this paper, we propose AloePri, the first privacy-preserving LLM inference method for industrial applications. AloePri protects both the input and output data by covariant obfuscation, which jointly transforms data and model parameters to achieve better accuracy and privacy. We carefully design the transformation for each model component to ensure inference accuracy and data privacy while keeping full compatibility with existing infrastructures of Language Model as a Service. AloePri has been integrated into an industrial system for the evaluation of mainstream LLMs. The evaluation on Deepseek-V3.1-Terminus model (671B parameters) demonstrates that AloePri causes accuracy loss of 0.0%~3.5% and exhibits efficiency equivalent to that of plaintext inference. Meanwhile, AloePri successfully resists state-of-the-art attacks, with less than 5\% of tokens recovered. To the best of our knowledge, AloePri is the first method to exhibit practical applicability to large-scale models in real-world systems.
Authors:Usman Anwar, Julianna Piskorz, David D. Baek, David Africa, Jim Weatherall, Max Tegmark, Christian Schroeder de Witt, Mihaela van der Schaar, David Krueger
Abstract:
Large language models are beginning to show steganographic capabilities. Such capabilities could allow misaligned models to evade oversight mechanisms. Yet principled methods to detect and quantify such behaviours are lacking. Classical definitions of steganography, and detection methods based on them, require a known reference distribution of non-steganographic signals. For the case of steganographic reasoning in LLMs, knowing such a reference distribution is not feasible; this renders these approaches inapplicable. We propose an alternative, \textbf{decision-theoretic view of steganography}. Our central insight is that steganography creates an asymmetry in usable information between agents who can and cannot decode the hidden content (present within a steganographic signal), and this otherwise latent asymmetry can be inferred from the agents' observable actions. To formalise this perspective, we introduce generalised $\mathcal{V}$-information: a utilitarian framework for measuring the amount of usable information within some input. We use this to define the \textbf{steganographic gap} -- a measure that quantifies steganography by comparing the downstream utility of the steganographic signal to agents that can and cannot decode the hidden content. We empirically validate our formalism, and show that it can be used to detect, quantify, and mitigate steganographic reasoning in LLMs.
Authors:Yige Liu, Yiwei Lou, Che Wang, Yongzhi Cao, Hanpin Wang
Abstract:
As a distributed collaborative machine learning paradigm, vertical federated learning (VFL) allows multiple passive parties with distinct features and one active party with labels to collaboratively train a model. Although it is known for the privacy-preserving capabilities, VFL still faces significant privacy and security threats from backdoor attacks. Existing backdoor attacks typically involve an attacker implanting a trigger into the model during the training phase and executing the attack by adding the trigger to the samples during the inference phase. However, in this paper, we find that triggers are not essential for backdoor attacks in VFL. In light of this, we disclose a new backdoor attack pathway in VFL by introducing a feature-based triggerless backdoor attack. This attack operates under a more stringent security assumption, where the attacker is honest-but-curious rather than malicious during the training phase. It comprises three modules: label inference for the targeted backdoor attack, poison generation with amplification and perturbation mechanisms, and backdoor execution to implement the attack. Extensive experiments on five benchmark datasets demonstrate that our attack outperforms three baseline backdoor attacks by 2 to 50 times while minimally impacting the main task. Even in VFL scenarios with 32 passive parties and only one set of auxiliary data, our attack maintains high performance. Moreover, when confronted with distinct defense strategies, our attack remains largely unaffected and exhibits strong robustness. We hope that the disclosure of this triggerless backdoor attack pathway will encourage the community to revisit security threats in VFL scenarios and inspire researchers to develop more robust and practical defense strategies.
Authors:Nishant Subramani, Kshitish Ghate, Mona Diab
Abstract:
Modern language models (LM) are trained on large scrapes of the Web, containing millions of personal information (PI) instances, many of which LMs memorize, increasing privacy risks. In this work, we develop the regexes and rules (R&R) detector suite to detect email addresses, phone numbers, and IP addresses, which outperforms the best regex-based PI detectors. On a manually curated set of 483 instances of PI, we measure memorization: finding that 13.6% are parroted verbatim by the Pythia-6.9b model, i.e., when the model is prompted with the tokens that precede the PI in the original document, greedy decoding generates the entire PI span exactly. We expand this analysis to study models of varying sizes (160M-6.9B) and pretraining time steps (70k-143k iterations) in the Pythia model suite and find that both model size and amount of pretraining are positively correlated with memorization. Even the smallest model, Pythia-160m, parrots 2.7% of the instances exactly. Consequently, we strongly recommend that pretraining datasets be aggressively filtered and anonymized to minimize PI parroting.
Authors:Yohan Lee, Jisoo Jang, Seoyeon Choi, Sangyeop Kim, Seungtaek Choi
Abstract:
Tool-using LLM agents increasingly coordinate real workloads by selecting and chaining third-party tools based on text-visible metadata such as tool names, descriptions, and return messages. We show that this convenience creates a supply-chain attack surface: a malicious MCP tool server can be co-registered alongside normal tools and induce overthinking loops, where individually trivial or plausible tool calls compose into cyclic trajectories that inflate end-to-end tokens and latency without any single step looking abnormal. We formalize this as a structural overthinking attack, distinguishable from token-level verbosity, and implement 14 malicious tools across three servers that trigger repetition, forced refinement, and distraction. Across heterogeneous registries and multiple tool-capable models, the attack causes severe resource amplification (up to $142.4\times$ tokens) and can degrade task outcomes. Finally, we find that decoding-time concision controls do not reliably prevent loop induction, suggesting defenses should reason about tool-call structure rather than tokens alone.
Authors:Yuqi Jia, Ruiqi Wang, Xilong Wang, Chong Xiang, Neil Gong
Abstract:
% Prompt injection attacks insert malicious instructions into an LLM's input to steer it toward an attacker-chosen task instead of the intended one. Existing detection defenses typically classify any input with instruction as malicious, leading to misclassification of benign inputs containing instructions that align with the intended task. In this work, we account for the instruction hierarchy and distinguish among three categories: inputs with misaligned instructions, inputs with aligned instructions, and non-instruction inputs. We introduce AlignSentinel, a three-class classifier that leverages features derived from LLM's attention maps to categorize inputs accordingly. To support evaluation, we construct the first systematic benchmark containing inputs from all three categories. Experiments on both our benchmark and existing ones--where inputs with aligned instructions are largely absent--show that AlignSentinel accurately detects inputs with misaligned instructions and substantially outperforms baselines.
Authors:Yannick Assogba, Jacopo Cortellazzi, Javier Abad, Pau Rodriguez, Xavier Suau, Arno Blaas
Abstract:
Jailbreak attacks remain a persistent threat to large language model safety. We propose Context-Conditioned Delta Steering (CC-Delta), an SAE-based defense that identifies jailbreak-relevant sparse features by comparing token-level representations of the same harmful request with and without jailbreak context. Using paired harmful/jailbreak prompts, CC-Delta selects features via statistical testing and applies inference-time mean-shift steering in SAE latent space. Across four aligned instruction-tuned models and twelve jailbreak attacks, CC-Delta achieves comparable or better safety-utility tradeoffs than baseline defenses operating in dense latent space. In particular, our method clearly outperforms dense mean-shift steering on all four models, and particularly against out-of-distribution attacks, showing that steering in sparse SAE feature space offers advantages over steering in dense activation space for jailbreak mitigation. Our results suggest off-the-shelf SAEs trained for interpretability can be repurposed as practical jailbreak defenses without task-specific training.
Authors:Yuepeng Hu, Yuqi Jia, Mengyuan Li, Dawn Song, Neil Gong
Abstract:
In a malicious tool attack, an attacker uploads a malicious tool to a distribution platform; once a user installs the tool and the LLM agent selects it during task execution, the tool can compromise the user's security and privacy. Prior work primarily focuses on manipulating tool names and descriptions to increase the likelihood of installation by users and selection by LLM agents. However, a successful attack also requires embedding malicious behaviors in the tool's code implementation, which remains largely unexplored. In this work, we bridge this gap by presenting the first systematic study of malicious tool code implementations. We first propose a taxonomy of malicious tool behaviors based on the confidentiality-integrity-availability triad, tailored to LLM-agent settings. To investigate the severity of the risks posed by attackers exploiting coding LLMs to automatically generate malicious tools, we develop MalTool, a coding-LLM-based framework that synthesizes tools exhibiting specified malicious behaviors, either as standalone tools or embedded within otherwise benign implementations. To ensure functional correctness and structural diversity, MalTool leverages an automated verifier that validates whether generated tools exhibit the intended malicious behaviors and differ sufficiently from prior instances, iteratively refining generations until success. Our evaluation demonstrates that MalTool is highly effective even when coding LLMs are safety-aligned. Using MalTool, we construct two datasets of malicious tools: 1,200 standalone malicious tools and 5,287 real-world tools with embedded malicious behaviors. We further show that existing detection methods, including commercial malware detection approaches such as VirusTotal and methods tailored to the LLM-agent setting, exhibit limited effectiveness at detecting the malicious tools, highlighting an urgent need for new defenses.
Authors:André García Gómez, Ines Rieger, Wolfgang Hotwagner, Max Landauer, Markus Wurzenberger, Florian Skopik, Edgar Weippl
Abstract:
Collaborative Intrusion Detection Systems (CIDS) are increasingly adopted to counter cyberattacks, as their collaborative nature enables them to adapt to diverse scenarios across heterogeneous environments. As distributed critical infrastructure operates in rapidly evolving environments, such as drones in both civil and military domains, there is a growing need for CIDS architectures that can flexibly accommodate these dynamic changes. In this study, we propose a novel CIDS framework designed for easy deployment across diverse distributed environments. The framework dynamically optimizes detector allocation per node based on available resources and data types, enabling rapid adaptation to new operational scenarios with minimal computational overhead. We first conducted a comprehensive literature review to identify key characteristics of existing CIDS architectures. Based on these insights and real-world use cases, we developed our CIDS framework, which we evaluated using several distributed datasets that feature different attack chains and network topologies. Notably, we introduce a public dataset based on a realistic cyberattack targeting a ground drone aimed at sabotaging critical infrastructure. Experimental results demonstrate that the proposed CIDS framework can achieve adaptive, efficient intrusion detection in distributed settings, automatically reconfiguring detectors to maintain an optimal configuration, without requiring heavy computation, since all experiments were conducted on edge devices.
Authors:Timothée Chauvin, Clément Lalanne, Erwan Le Merrer, Jean-Michel Loubes, François Taïani, Gilles Tredan
Abstract:
Remote change detection in LLMs is a difficult problem. Existing methods are either too expensive for deployment at scale, or require initial white-box access to model weights or grey-box access to log probabilities. We aim to achieve both low cost and strict black-box operation, observing only output tokens. Our approach hinges on specific inputs we call Border Inputs, for which there exists more than one output top token. From a statistical perspective, optimal change detection depends on the model's Jacobian and the Fisher information of the output distribution. Analyzing these quantities in low-temperature regimes shows that border inputs enable powerful change detection tests. Building on this insight, we propose the Black-Box Border Input Tracking (B3IT) scheme. Extensive in-vivo and in-vitro experiments show that border inputs are easily found for non-reasoning tested endpoints, and achieve performance on par with the best available grey-box approaches. B3IT reduces costs by $30\times$ compared to existing methods, while operating in a strict black-box setting.
Authors:Weichen Yu, Ravi Mangal, Yinyi Luo, Kai Hu, Jingxuan He, Corina S. Pasareanu, Matt Fredrikson
Abstract:
Large Language Models are rapidly becoming core components of modern software development workflows, yet ensuring code security remains challenging. Existing vulnerability detection pipelines either rely on static analyzers or use LLM/GNN-based detectors trained with coarse program-level supervision. Both families often require complete context, provide sparse end-of-completion feedback, and can degrade as code length grows, making them ill-suited for real-time, prefix-level assessment during interactive coding and streaming generation. We propose SecCodePRM, a security-oriented process reward model that assigns a context-aware, step-level security score along a code trajectory. To train the model, we derive step-level supervision labels from static analyzers and expert annotations, allowing the model to attend more precisely to fine-grained regions associated with inter-procedural vulnerabilities. SecCodePRM has three applications: full-code vulnerability detection (VD), partial-code VD, and secure code generation (CG). For VD, SecCodePRM uses risk-sensitive aggregation that emphasizes high-risk steps; for CG, SecCodePRM supports inference-time scaling by ranking candidate continuations and favoring higher cumulative reward. This design yields dense, real-time feedback that scales to long-horizon generation. Empirically, SecCodePRM outperforms prior approaches in all three settings, while preserving code functional correctness, suggesting improved security without a safety-utility tradeoff.
Authors:Sharif Noor Zisad, Ragib Hasan
Abstract:
Today's business organizations need access control systems that can handle complex, changing security requirements that go beyond what traditional methods can manage. Current approaches, such as Role-Based Access Control (RBAC), Attribute-Based Access Control (ABAC), and Discretionary Access Control (DAC), were designed for specific purposes. They cannot effectively manage the dynamic, situation-dependent workflows that modern systems require. In this research, we introduce LLMAC, a new unified approach using Large Language Models (LLMs) to combine these different access control methods into one comprehensive, understandable system. We used an extensive synthetic dataset that represents complex real-world scenarios, including policies for ownership verification, version management, workflow processes, and dynamic role separation. Using Mistral 7B, our trained LLM model achieved outstanding results with 98.5% accuracy, significantly outperforming traditional methods (RBAC: 14.5%, ABAC: 58.5%, DAC: 27.5%) while providing clear, human readable explanations for each decision. Performance testing shows that the system can be practically deployed with reasonable response times and computing resources.
Authors:Sharif Noor Zisad, Ragib Hasan
Abstract:
Traditional access control systems, including RBAC, face significant limitations such as inflexible role definitions, difficulty handling dynamic scenarios, and lack of detailed accountability and traceability. To this end, we introduce the Interaction Provenance-based Access Control (IPBAC) model. In this paper, we explore the integration of interaction provenance with access control to overcome these limitations. Interaction provenance refers to the detailed recording of actions and interactions within a system, capturing comprehensive metadata such as the identity of the actor, the time of an action, and the context. IPBAC ensures stronger protection against unauthorized access, enhances traceability for auditing and compliance, and supports adaptive security policies. This provenance-based access control not only strengthens security, but also provides a robust framework for auditing and compliance.
Authors:Lukas Karner, Max Landauer, Markus Wurzenberger, Florian Skopik
Abstract:
Automated detection of cyber attacks is a critical capability to counteract the growing volume and sophistication of cyber attacks. However, the high numbers of security alerts issued by intrusion detection systems lead to alert fatigue among analysts working in security operations centres (SOC), which in turn causes slow reaction time and incorrect decision making. Alert grouping, which refers to clustering of security alerts according to their underlying causes, can significantly reduce the number of distinct items analysts have to consider. Unfortunately, conventional time-based alert grouping solutions are unsuitable for large scale computer networks characterised by high levels of false positive alerts and simultaneously occurring attacks. To address these limitations, we propose AlertBERT, a self-supervised framework designed to group alerts from isolated or concurrent attacks in noisy environments. Thereby, our open-source implementation of AlertBERT leverages masked-language-models and density-based clustering to support both real-time or forensic operation. To evaluate our framework, we further introduce a novel data augmentation method that enables flexible control over noise levels and simulates concurrent attack occurrences. Based on the data sets generated through this method, we demonstrate that AlertBERT consistently outperforms conventional time-based grouping techniques, achieving superior accuracy in identifying correct alert groups.
Authors:Ariel Fogel, Omer Hofman, Eilon Cohen, Roman Vainshtein
Abstract:
Open-weight language models are increasingly used in production settings, raising new security challenges. One prominent threat in this context is backdoor attacks, in which adversaries embed hidden behaviors in language models that activate under specific conditions. Previous work has assumed that adversaries have access to training pipelines or deployment infrastructure. We propose a novel attack surface requiring neither, which utilizes the chat template. Chat templates are executable Jinja2 programs invoked at every inference call, occupying a privileged position between user input and model processing. We show that an adversary who distributes a model with a maliciously modified template can implant an inference-time backdoor without modifying model weights, poisoning training data, or controlling runtime infrastructure. We evaluated this attack vector by constructing template backdoors targeting two objectives: degrading factual accuracy and inducing emission of attacker-controlled URLs, and applied them across eighteen models spanning seven families and four inference engines. Under triggered conditions, factual accuracy drops from 90% to 15% on average while attacker-controlled URLs are emitted with success rates exceeding 80%; benign inputs show no measurable degradation. Backdoors generalize across inference runtimes and evade all automated security scans applied by the largest open-weight distribution platform. These results establish chat templates as a reliable and currently undefended attack surface in the LLM supply chain.
Authors:Rajendra Paudyal, Rajendra Upadhyay, Al Nahian Bin Emran, Lisa Donnan, Duminda Wijesekera
Abstract:
Denial-of-Service (DoS) conditions in enterprise networks are commonly attributed to malicious actors. However, availability can also be compromised by benign non-malicious insider behavior. This paper presents an empirical study of a production enterprise LAN that demonstrates how routine docking and undocking of user endpoints repeatedly trigger rapid recalculations of the control plane of the Rapid Spanning Tree Protocol (RSTP) [1]. Although protocol-compliant and nonmalicious, these events introduce transient forwarding disruptions of approximately 2-4 seconds duration that degrade realtime streaming (voice and video) services while remaining largely undetected by conventional security monitoring. We map this phenomenon to the NIST and MITRE insider threat frameworks, characterizing it as an unintentional insider-driven availability breach, and demonstrate that explicit edge-port configuration effectively mitigates the condition without compromising loop prevention
Authors:Nicolás E. Díaz Ferreyra, Moritz Mock, Max Kretschmann, Barbara Russo, Mojtaba Shahin, Mansooreh Zahedi, Riccardo Scandariato
Abstract:
Static Analysis Tools (SATs) are central to security engineering activities, as they enable early identification of code weaknesses without requiring execution. However, their effectiveness is often limited by high false-positive rates and incomplete coverage of vulnerability classes. At the same time, developers frequently document security-related shortcuts and compromises as Self-Admitted Technical Debt (SATD) in software artifacts, such as code comments. While prior work has recognized SATD as a rich source of security information, it remains unclear whether -and in what ways- it is utilized during SAT-aided security analysis. OBJECTIVE: This work investigates the extent to which security-related SATD complements the output produced by SATs and helps bridge some of their well-known limitations. METHOD: We followed a mixed-methods approach consisting of (i) the analysis of a SATD-annotated vulnerability dataset using three state-of-the-art SATs and (ii) an online survey with 72 security practitioners. RESULTS: The combined use of all SATs flagged 114 of the 135 security-related SATD instances, spanning 24 distinct Common Weakness Enumeration (CWE) identifiers. A manual mapping of the SATD comments revealed 33 unique CWE types, 6 of which correspond to categories that SATs commonly overlook or struggle to detect (e.g., race conditions). Survey responses further suggest that developers frequently pair SAT outputs with SATD insights to better understand the impact and root causes of security weaknesses and to identify suitable fixes. IMPLICATIONS: Our findings show that such SATD-encoded information can be a meaningful complement to SAT-driven security analysis, while helping to overcome some of SATs' practical shortcomings.
Authors:Jiamu Bai, Guanlin He, Xin Gu, Daniel Kifer, Kiwan Maeng
Abstract:
When creating public data products out of confidential datasets, inferential/posterior-based privacy definitions, such as Pufferfish, provide compelling privacy semantics for data with correlations. However, such privacy definitions are rarely used in practice because they do not always compose. For example, it is possible to design algorithms for these privacy definitions that have no leakage when run once but reveal the entire dataset when run more than once. We prove necessary and sufficient conditions that must be added to ensure linear composition for Pufferfish mechanisms, hence avoiding such privacy collapse. These extra conditions turn out to be differential privacy-style inequalities, indicating that achieving both the interpretable semantics of Pufferfish for correlated data and composition benefits requires adopting differentially private mechanisms to Pufferfish. We show that such translation is possible through a concept called the $(a,b)$-influence curve, and many existing differentially private algorithms can be translated with our framework into a composable Pufferfish algorithm. We illustrate the benefit of our new framework by designing composable Pufferfish algorithms for Markov chains that significantly outperform prior work.
Authors:Yihang Chen, Zhao Xu, Youyuan Jiang, Tianle Zheng, Cho-Jui Hsieh
Abstract:
Large Vision-Language Models (LVLMs) are increasingly equipped with robust safety safeguards to prevent responses to harmful or disallowed prompts. However, these defenses often focus on analyzing explicit textual inputs or relevant visual scenes. In this work, we introduce Text-DJ, a novel jailbreak attack that bypasses these safeguards by exploiting the model's Optical Character Recognition (OCR) capability. Our methodology consists of three stages. First, we decompose a single harmful query into multiple and semantically related but more benign sub-queries. Second, we pick a set of distraction queries that are maximally irrelevant to the harmful query. Third, we present all decomposed sub-queries and distraction queries to the LVLM simultaneously as a grid of images, with the position of the sub-queries being middle within the grid. We demonstrate that this method successfully circumvents the safety alignment of state-of-the-art LVLMs. We argue this attack succeeds by (1) converting text-based prompts into images, bypassing standard text-based filters, and (2) inducing distractions, where the model's safety protocols fail to link the scattered sub-queries within a high number of irrelevant queries. Overall, our findings expose a critical vulnerability in LVLMs' OCR capabilities that are not robust to dispersed, multi-image adversarial inputs, highlighting the need for defenses for fragmented multimodal inputs.
Authors:Jonas Möller, Erik Imgrund, Thorsten Eisenhofer, Konrad Rieck
Abstract:
Machine learning models are routinely deployed on a wide range of computing hardware. Although such hardware is typically expected to produce identical results, differences in its design can lead to small numerical variations during inference. In this work, we show that these variations can be exploited to create backdoors in machine learning models. The core idea is to shape the model's decision function such that it yields different predictions for the same input when executed on different hardware. This effect is achieved by locally moving the decision boundary close to a target input and then refining numerical deviations to flip the prediction on selected hardware. We empirically demonstrate that these hardware-triggered backdoors can be created reliably across common GPU accelerators. Our findings reveal a novel attack vector affecting the use of third-party models, and we investigate different defenses to counter this threat.
Authors:Minwoo Jang, Hoyoung Kim, Jabin Koo, Jungseul Ok
Abstract:
The rise of model hubs has made it easier to access reusable model components, making model merging a practical tool for combining capabilities. Yet, this modularity also creates a \emph{governance gap}: downstream users can recompose released weights into unauthorized mixtures that bypass safety alignment or licensing terms. Because existing defenses are largely post-hoc and architecture-specific, they provide inconsistent protection across diverse architectures and release formats in practice. To close this gap, we propose \textsc{Trap}$^{2}$, an architecture-agnostic protection framework that encodes protection into the update during fine-tuning, regardless of whether they are released as adapters or full models. Instead of relying on architecture-dependent approaches, \textsc{Trap}$^{2}$ uses weight re-scaling as a simple proxy for the merging process. It keeps released weights effective in standalone use, but degrades them under re-scaling that often arises in merging, undermining unauthorized merging.
Authors:Jaiyoung Park, Sejin Park, Jai Hyun Park, Jung Ho Ahn, Jung Hee Cheon, Guillaume Hanrot, Jung Woo Kim, Minje Park, Damien Stehlé
Abstract:
As large language models (LLMs) become ubiquitous, privacy concerns pertaining to inference inputs keep growing. In this context, fully homomorphic encryption (FHE) has emerged as a primary cryptographic solution to provide non-interactive confidential LLM inference. Existing solutions scale poorly with the input token length, and hence focus either on small models or larger models with a small number of input tokens. They also suffer from the existence of large outlier values. These values have a strong impact on the evaluation of non-linear layers, leading to large-degree polynomial approximation and thus heavy evaluation costs. We propose an FHE-based private LLM inference solution that allows thousands of input tokens with only a part of them being encrypted: this fits with a scenario where the context is benign and only part of the input is sensitive. To do so, we suggest an unbalanced chunked prefill framework that processes the private and public parts of the input tokens differently. Our framework contains plaintext-plaintext, plaintext-ciphertext and ciphertext-ciphertext computational components. We adopt different strategies and ingredients for each component. We also devise new homomorphic algorithms for specific matrix multiplication and polynomial evaluation tasks encountered during LLM inference. Furthermore, without retraining, we tailor the LLM inference algorithm to reduce the ranges of outlier values: we leverage machine learning strategies (token prepending and rotations) to mitigate the impact of the outliers on non-linear layers. Based on these ingredients, we describe a CKKS-based end-to-end implementation of Llama-2-7B private inference for up to 4096 input tokens, of which the last 128 are encrypted. On a cluster of 8~NVIDIA RTX-4090 GPUs, inference takes 85s for summarization and 33s for generation per output token.
Authors:Wonyoung Kim, Seunggi Min, Minjae Gwon, Dowoo Baik, Haein Lee, Hyeon Heo, Minjae Lee, Min Woo Baek, Yonghwi Jin, Younggi Park, Yunjae Choi, Taesoo Kim, Sangdon Park, Insu Yun
Abstract:
Continuous fuzzing platforms such as OSS-Fuzz uncover large numbers of vulnerabilities, yet the subsequent repair process remains largely manual. Unfortunately, existing Automated Vulnerability Repair (AVR) techniques -- including recent LLM-based systems -- are not directly applicable to continuous fuzzing. This is because these systems are designed and evaluated on a static, single-run benchmark setting, making them ill-suited for the diverse, noisy, and failure-prone environments in continuous fuzzing. To address these issues, we introduce PatchIsland, a system for Continuous Vulnerability Repair (CVR) that tightly integrates with continuous fuzzing pipelines. PatchIsland employs an ensemble of diverse LLM agents. By leveraging multiple LLM agents, PatchIsland can cover a wider range of settings (e.g., different projects, bug types, and programming languages) and also improve operational robustness. In addition, PatchIsland utilizes a two-phase patch-based deduplication to mitigate duplicate crashes and patches, which can be problematic in continuous fuzzing. In our internal evaluation, PatchIsland repaired 84 of 92 vulnerabilities, demonstrating strong repair capability. In the official AIxCC competition, the system operated with no human intervention in a fully autonomous environment and successfully patched 31 out of 43 vulnerabilities, achieving a repair rate of 72.1\%.
Authors:Suyang Sun, Weifei Jin, Yuxin Cao, Wei Song, Jie Hao
Abstract:
Modern Voice Control Systems (VCS) rely on the collaboration of Automatic Speech Recognition (ASR) and Speaker Recognition (SR) for secure interaction. However, prior adversarial attacks typically target these tasks in isolation, overlooking the coupled decision pipeline in real-world scenarios. Consequently, single-task attacks often fail to pose a practical threat. To fill this gap, we first utilize gradient analysis to reveal that ASR and SR exhibit no inherent conflicts. Building on this, we propose Dual-task Universal Adversarial Perturbation (DUAP). Specifically, DUAP employs a targeted surrogate objective to effectively disrupt ASR transcription and introduces a Dynamic Normalized Ensemble (DNE) strategy to enhance transferability across diverse SR models. Furthermore, we incorporate psychoacoustic masking to ensure perturbation imperceptibility. Extensive evaluations across five ASR and six SR models demonstrate that DUAP achieves high simultaneous attack success rates and superior imperceptibility, significantly outperforming existing single-task baselines.
Authors:Haoze Guo, Ziqi Wei
Abstract:
Retrieval-augmented generation (RAG) systems put more and more emphasis on grounding their responses in user-generated content found on the Web, amplifying both their usefulness and their attack surface. Most notably, indirect prompt injection and retrieval poisoning attack the web-native carriers that survive ingestion pipelines and are very concerning. We provide OpenRAG-Soc, a compact, reproducible benchmark-and-harness for web-facing RAG evaluation under these threats, in a discrete data package. The suite combines a social corpus with interchangeable sparse and dense retrievers and deployable mitigations - HTML/Markdown sanitization, Unicode normalization, and attribution-gated answered. It standardizes end-to-end evaluation from ingestion to generation and reports attacks time of one of the responses at answer time, rank shifts in both sparse and dense retrievers, utility and latency, allowing for apples-to-apples comparisons across carriers and defenses. OpenRAG-Soc targets practitioners who need fast, and realistic tests to track risk and harden deployments.
Authors:Aryan Pasikhani, Prosanta Gope, Yang Yang, Shagufta Mehnaz, Biplab Sikdar
Abstract:
This paper explores a new cyber-attack vector targeting Industrial Control Systems (ICS), particularly focusing on water treatment facilities. Developing a new multi-agent Deep Reinforcement Learning (DRL) approach, adversaries craft stealthy, strategically timed, wear-out attacks designed to subtly degrade product quality and reduce the lifespan of field actuators. This sophisticated method leverages DRL methodology not only to execute precise and detrimental impacts on targeted infrastructure but also to evade detection by contemporary AI-driven defence systems. By developing and implementing tailored policies, the attackers ensure their hostile actions blend seamlessly with normal operational patterns, circumventing integrated security measures. Our research reveals the robustness of this attack strategy, shedding light on the potential for DRL models to be manipulated for adversarial purposes. Our research has been validated through testing and analysis in an industry-level setup. For reproducibility and further study, all related materials, including datasets and documentation, are publicly accessible.
Authors:Minfeng Qi, Dongyang He, Qin Wang, Lefeng Zhang
Abstract:
Visual Reasoning CAPTCHAs (VRCs) combine visual scenes with natural-language queries that demand compositional inference over objects, attributes, and spatial relations. They are increasingly deployed as a primary defense against automated bots. Existing solvers fall into two paradigms: vision-centric, which rely on template-specific detectors but fail on novel layouts, and reasoning-centric, which leverage LLMs but struggle with fine-grained visual perception. Both lack the generality needed to handle heterogeneous VRC deployments. We present ViPer, a unified attack framework that integrates structured multi-object visual perception with adaptive LLM-based reasoning. ViPer parses visual layouts, grounds attributes to question semantics, and infers target coordinates within a modular pipeline. Evaluated on six major VRC providers (VTT, Geetest, NetEase, Dingxiang, Shumei, Xiaodun), ViPer achieves up to 93.2% success, approaching human-level performance across multiple benchmarks. Compared to prior solvers, GraphNet (83.2%), Oedipus (65.8%), and the Holistic approach (89.5%), ViPer consistently outperforms all baselines. The framework further maintains robustness across alternative LLM backbones (GPT, Grok, DeepSeek, Kimi), sustaining accuracy above 90%. To anticipate defense, we further introduce Template-Space Randomization (TSR), a lightweight strategy that perturbs linguistic templates without altering task semantics. TSR measurably reduces solver (i.e., attacker) performance. Our proposed design suggests directions for human-solvable but machine-resistant CAPTCHAs.
Authors:Hongming Fei, Zilong Hu, Prosanta Gope, Biplab Sikdar
Abstract:
Physical Unclonable Functions (PUFs) serve as lightweight, hardware-intrinsic entropy sources widely deployed in IoT security applications. However, delay-based PUFs are vulnerable to Machine Learning Attacks (MLAs), undermining their assumed unclonability. There are no valid metrics for evaluating PUF MLA resistance, but empirical modelling experiments, which lack theoretical guarantees and are highly sensitive to advances in machine learning techniques. To address the fundamental gap between PUF designs and security qualifications, this work proposes a novel, formal, and unified framework for evaluating PUF security against modelling attacks by providing security lower bounds, independent of specific attack models or learning algorithms. We mathematically characterise the adversary's advantage in predicting responses to unseen challenges based solely on observed challenge-response pairs (CRPs), formulating the problem as a conditional probability estimation over the space of candidate PUFs. We present our analysis on previous "broken" PUFs, e.g., Arbiter PUFs, XOR PUFs, Feed-Forward PUFs, and for the first time compare their MLA resistance in a formal way. In addition, we evaluate the currently "secure" CT PUF, and show its security boundary. We demonstrate that the proposed approach systematically quantifies PUF resilience, captures subtle security differences, and provides actionable, theoretically grounded security guarantees for the practical deployment of PUFs.
Authors:Yunbo Li, Jiaping Gui, Fanchao Meng, Yue Wu
Abstract:
Federated Learning (FL) enables collaborative model training without direct data sharing, yet it remains vulnerable to privacy attacks such as model inversion and membership inference. Existing differential privacy (DP) solutions for FL often inject noise uniformly across the entire model, degrading utility while providing suboptimal privacy-utility tradeoffs. To address this, we propose LaDP, a novel layer-wise adaptive noise injection mechanism for FL that optimizes privacy protection while preserving model accuracy. LaDP leverages two key insights: (1) neural network layers contribute unevenly to model utility, and (2) layer-wise privacy leakage can be quantified via KL divergence between local and global model distributions. LaDP dynamically injects noise into selected layers based on their privacy sensitivity and importance to model performance. We provide a rigorous theoretical analysis, proving that LaDP satisfies $(ε, δ)$-DP guarantees and converges under bounded noise. Extensive experiments on CIFAR-10/100 datasets demonstrate that LaDP reduces noise injection by 46.14% on average compared to state-of-the-art (SOTA) methods while improving accuracy by 102.99%. Under the same privacy budget, LaDP outperforms SOTA solutions like Dynamic Privacy Allocation LDP and AdapLDP by 25.18% and 6.1% in accuracy, respectively. Additionally, LaDP robustly defends against reconstruction attacks, increasing the FID of the reconstructed private data by $>$12.84% compared to all baselines. Our work advances the practical deployment of privacy-preserving FL with minimal utility loss.
Authors:Murtuza Shahzad, Joseph Wilson, Ibrahim Al Azher, Hamed Alhoori, Mona Rahimi
Abstract:
The increasing complexity and volume of software systems have heightened the importance of identifying and mitigating security vulnerabilities. The existing software vulnerability datasets frequently fall short in providing comprehensive, detailed code snippets explicitly linked to specific vulnerability descriptions, reducing their utility for advanced research and hindering efforts to develop a deeper understanding of security vulnerabilities. To address this challenge, we present a novel dataset that provides examples of vulnerable code snippets corresponding to Common Attack Pattern Enumerations and Classifications (CAPEC) and Common Weakness Enumeration (CWE) descriptions. By employing the capabilities of Generative Pre-trained Transformer (GPT) models, we have developed a robust methodology for generating these examples. Our approach utilizes GPT-4o, Llama and Claude models to generate code snippets that exhibit specific vulnerabilities as described in CAPEC and CWE documentation. This dataset not only enhances the understanding of security vulnerabilities in code but also serves as a valuable resource for training machine learning models focused on automatic vulnerability detection and remediation. Preliminary evaluations suggest that the dataset generated by Large Language Models demonstrates high accuracy and can serve as a reliable reference for vulnerability identification systems. We found consistent results across the three models, with 0.98 cosine similarity among codes. The final dataset comprises 615 CAPEC code snippets in three programming languages: Java, Python, and JavaScript, making it one of the most extensive and diverse resources in this domain.
Authors:Shams Tarek, Dipayan Saha, Khan Thamid Hasan, Sujan Kumar Saha, Mark Tehranipoor, Farimah Farahmandi
Abstract:
The increasing complexity of modern system-on-chip designs amplifies hardware security risks and makes manual security property specification a major bottleneck in formal property verification. This paper presents Assertain, an automated framework that integrates RTL design analysis, Common Weakness Enumeration (CWE) mapping, and threat model intelligence to automatically generate security properties and executable SystemVerilog Assertions. Assertain leverages large language models with a self-reflection refinement mechanism to ensure both syntactic correctness and semantic consistency. Evaluated on 11 representative hardware designs, Assertain outperforms GPT-5 by 61.22%, 59.49%, and 67.92% in correct assertion generation, unique CWE coverage, and architectural flaw detection, respectively. These results demonstrate that Assertain significantly expands vulnerability coverage, improves assertion quality, and reduces manual effort in hardware security verification.
Authors:Vojtěch Staněk, Martin Perešíni, Lukáš Sekanina, Anton Firc, Kamil Malinka
Abstract:
While deepfake speech detectors built on large self-supervised learning (SSL) models achieve high accuracy, employing standard ensemble fusion to further enhance robustness often results in oversized systems with diminishing returns. To address this, we propose an evolutionary multi-objective score fusion framework that jointly minimizes detection error and system complexity. We explore two encodings optimized by NSGA-II: binary-coded detector selection for score averaging and a real-valued scheme that optimizes detector weights for a weighted sum. Experiments on the ASVspoof 5 dataset with 36 SSL-based detectors show that the obtained Pareto fronts outperform simple averaging and logistic regression baselines. The real-valued variant achieves 2.37% EER (0.0684 minDCF) and identifies configurations that match state-of-the-art performance while significantly reducing system complexity, requiring only half the parameters. Our method also provides a diverse set of trade-off solutions, enabling deployment choices that balance accuracy and computational cost.
Authors:Francesco Pagano, Lorenzo Pisu, Leonardo Regano, Davide Maiorca, Alessio Merlo, Giorgio Giacinto
Abstract:
Code obfuscation is widely adopted in modern software development to protect intellectual property and hinder reverse engineering, but it also provides attackers with a powerful means to conceal malicious logic inside otherwise legitimate JavaScript code. In a software supply chain where a single compromised package can affect thousands of applications, this raises a critical question: how robust are the Static Application Security Testing (SAST) tools that CI/CD pipelines rely on as automated security gatekeepers? This paper answers that question by empirically quantifying the impact of JavaScript obfuscation on state-of-practice SAST. We define a realistic supply-chain threat model in which an adversary injects vulnerable code and iteratively obfuscates it until the pipeline reports a clean scan. To measure the resulting degradation, we introduce the Vulnerability Detection Loss (VDL) metric and conduct a two-phase study. First, we analyze 16 vulnerable-by-design Node.js web applications from the OWASP directory; second, we extend the analysis to 260 in-the-wild JavaScript/Node.js projects from GitHub. Across both datasets, we apply eight semantics-preserving obfuscation techniques and their combinations and evaluate two representative SAST tools, Njsscan and Bearer. Even a single obfuscation technique typically suppresses most baseline findings, including high-severity issues, while stacking techniques yield near-total evasion, with VDL often approaching 100%. Our results show that current JavaScript SAST is fundamentally not robust against commonplace obfuscations and that "clean" reports on obfuscated code may offer only a false sense of security. Finally, we discuss practical mitigation guidelines and directions for obfuscation-aware analysis.
Authors:Eduardo Brito, Fernando Castillo, Amnir Hadachi, Ulrich Norbisrath, Jonathan Heiss
Abstract:
Reliable use of real-world data requires confidence that recorded evidence reflects what actually occurred at the moment of capture. In adversarial or incentive-misaligned cyber-physical settings, device-centric provenance and post-capture verification are insufficient to provide that guarantee. This paper builds on Proof-of-Location (PoL) as a baseline for establishing where and when events take place, and extends it with a witnessing-zone architecture in which multiple independent observers collectively validate physical events. The resulting approach produces auditable evidence artifacts that can support downstream systems in cyber-physical settings, without relying on centralized trust. Through representative scenarios and simulation-based evaluation, this paper shows how such architectures improve sensor data trustworthiness and resilience to fabricated or staged events.
Authors:Ivan Costa, Pedro Correia, Ivone Amorim, Eva Maia, Isabel Praça
Abstract:
Federated Learning (FL) enables collaborative training while keeping sensitive data on clients' devices, but local model updates can still leak private information. Hybrid Homomorphic Encryption (HHE) has recently been applied to FL to mitigate client overhead while preserving privacy. However, existing HHE-FL systems rely on a single homomorphic key pair shared across all clients, which forces them to assume an unrealistically weak threat model: if a client misbehaves or intercepts another's traffic, private updates can be exposed. We eliminate this weakness by integrating two alternative key protection mechanisms into the HHE-FL workflow. The first is masking, where client keys are blinded before homomorphic encryption and later unblinded homomorphically by the server. The second is RSA encapsulation, where homomorphically encrypted keys are additionally wrapped under the server's RSA public key. These countermeasures prevent key misuse by other clients and extend HHE-FL security to adversarial settings with malicious participants. We implement both approaches on top of the Flower framework using the PASTA/BFV HHE scheme and evaluate them on the MNIST dataset with 12 clients. Results show that both mechanisms preserve model accuracy while adding minimal overhead: masking incurs negligible cost, and RSA encapsulation introduces only modest runtime and communication overhead.
Authors:Jieting Yuan, Songhan Zhao, Ye Xue, Yu Zhao, Bo Gu, Shimin Gong
Abstract:
This paper focuses on secure communications in UAV-assisted wireless networks, which comprise multiple legitimate UAVs (LE-UAVs) and an intelligent eavesdropping UAV (EA-UAV). The intelligent EA-UAV can observe the LE-UAVs'transmission strategies and adaptively adjust its trajectory to maximize information interception. To counter this threat, we propose a mode-switching scheme that enables LE-UAVs to dynamically switch between the data transmission and jamming modes, thereby balancing data collection efficiency and communication security. However, acquiring full global network state information for LE-UAVs' decision-making incurs significant overhead, as the network state is highly dynamic and time-varying. To address this challenge, we propose a digital twin-enabled simultaneous learning and modeling (DT-SLAM) framework that allows LE-UAVs to learn policies efficiently within the DT, thereby avoiding frequent interactions with the real environment. To capture the competitive relationship between the EA-UAV and the LE-UAVs, we model their interactions as a multi-stage Stackelberg game and jointly optimize the GUs' transmission control, UAVs' trajectory planning, mode selection, and network formation to maximize overall secure throughput. Considering potential model mismatch between the DT and the real environment, we propose a robust proximal policy optimization (RPPO) algorithm that encourages LE-UAVs to explore service regions with higher uncertainty. Numerical results demonstrate that the proposed DT-SLAM framework effectively supports the learning process. Meanwhile, the RPPO algorithm converges about 12% faster and the secure throughput can be increased by 8.6% compared to benchmark methods.
Authors:Tao Huang, Chen Hou, Guosen Wu, Jiayang Meng
Abstract:
Privacy leakage in LLM agents is often studied through individual storage or execution components, such as memory modules, retrieval pipelines, or tool-mediated artifacts. However, these settings are typically analyzed in isolation, making it difficult to compare how private internal dependence becomes externally recoverable across heterogeneous agent pipelines. In this paper, we present CIPL (Channel Inversion for Privacy Leakage) as a unified channel-oriented measurement interface for evaluating privacy leakage in LLM agent pipelines. Rather than claiming a universally strongest attack recipe, CIPL provides a shared way to represent a target through its sensitive source, selection, assembly, execution, observation, and extraction stages, and to measure how internal exposure is transformed into attacker-recoverable leakage under a common protocol. Using memory-based, retrieval-mediated, and tool-mediated instantiations under this shared interface, we identify a distinct cross-target risk picture. Memory behaves as a near-saturated high-risk special case, while beyond-memory leakage exhibits a different regime: retrieval-mediated targets show frequent but often incomplete leakage, and tool-mediated targets are strongly shaped by the exposed observation surface and provider behavior. We further show that leakage is governed by channel conditions rather than by a universally dominant recipe: cleaned weak controls sharply suppress leakage, and semantic annotation reveals attacker-useful leakage beyond exact-match extraction. Together, these findings suggest that privacy risk in LLM agent pipelines is better understood through \emph{observable channels}, not just storage components. More broadly, our results motivate channel-oriented privacy evaluation as a necessary complement to component-local or exact-only analyses.
Authors:James Hugglestone, Samuel Jacob Chacko, Dawson Stoller, Ryan Schmidt, Xiuwen Liu
Abstract:
Large Language Models (LLMs) have demonstrated potential in code generation, yet they struggle with the multi-step, stateful reasoning required for offensive cybersecurity operations. Existing research often relies on static benchmarks that fail to capture the dynamic nature of real-world vulnerabilities. In this work, we introduce STRIATUM-CTF (A Search-based Test-time Reasoning Inference Agent for Tactical Utility Maximization in Cybersecurity), a modular agentic framework built upon the Model Context Protocol (MCP). By standardizing tool interfaces for system introspection, decompilation, and runtime debugging, STRIATUM-CTF enables the agent to maintain a coherent context window across extended exploit trajectories. We validate this approach not merely on synthetic datasets, but in a live competitive environment. Our system participated in a university-hosted Capture-the-Flag (CTF) competition in late 2025, where it operated autonomously to identify and exploit vulnerabilities in real-time. STRIATUM-CTF secured First Place, outperforming 21 human teams and demonstrating strong adaptability in a dynamic problem-solving setting. We analyze the agent's decision-making logs to show how MCP-based tool abstraction significantly reduces hallucination compared to naive prompting strategies. These results suggest that standardized context protocols are a critical path toward robust autonomous cyber-reasoning systems.
Authors:Devashish Chaudhary, Sutharshan Rajasegarar, Shiva Raj Pokhrel
Abstract:
With the rapid growth of interconnected devices, accurately detecting malicious activities in network traffic has become increasingly challenging. Most existing deep learning-based intrusion detection systems treat network flows as independent instances, thereby failing to exploit the relational dependencies inherent in network communications. To address this limitation, we propose Q-AGNN, a Quantum-Enhanced Attentive Graph Neural Network for intrusion detection, where network flows are modeled as nodes and edges represent similarity relationships. Q-AGNN leverages parameterized quantum circuits (PQCs) to encode multi-hop neighborhood information into a high-dimensional latent space, inducing a bounded quantum feature map that implements a second-order polynomial graph filter in a quantum-induced Hilbert space. An attention mechanism is subsequently applied to adaptively weight the quantum-enhanced embeddings, allowing the model to focus on the most influential nodes contributing to anomalous behavior. Extensive experiments conducted on four benchmark intrusion detection datasets demonstrate that Q-AGNN achieves competitive or superior detection performance compared to state-of-the-art graph-based methods, while consistently maintaining low false positive rates under hardware-calibrated noise conditions. Moreover, we also executed the Q-AGNN framework on actual IBM quantum hardware to demonstrate the practical operability of the proposed pipeline under real NISQ conditions. These results highlight the effectiveness of integrating quantum-enhanced representations with attention mechanisms for graph-based intrusion detection and underscore the potential of hybrid quantum-classical learning frameworks in cybersecurity applications.
Authors:Devashish Chaudhary, Sutharshan Rajasegarar, Shiva Raj Pokhrel, Lei Pan, Ruby D
Abstract:
The rapid expansion of the Internet of Things (IoT) and its integration with backbone networks have heightened the risk of security breaches. Traditional centralized approaches to anomaly detection, which require transferring large volumes of data to central servers, suffer from privacy, scalability, and latency limitations. This paper proposes a lightweight autoencoder-based anomaly detection framework designed for deployment on resource-constrained edge devices, enabling real-time detection while minimizing data transfer and preserving privacy. Federated learning is employed to train models collaboratively across distributed devices, where local training occurs on edge nodes and only model weights are aggregated at a central server. A real-world IoT testbed using Raspberry Pi sensor nodes was developed to collect normal and attack traffic data. The proposed federated anomaly detection system, implemented and evaluated on the testbed, demonstrates its effectiveness in accurately identifying network attacks. The communication overhead was reduced significantly while achieving comparable performance to the centralized method.
Authors:Minghao Hu, Qiang Zeng, Lannan Luo
Abstract:
Smart contracts have transformed decentralized finance, but flaws in their logic still create major security threats. Most existing vulnerability detection techniques focus on well-supported languages like Solidity, while low-resource counterparts such as Vyper remain largely underexplored due to scarce analysis tools and limited labeled datasets. Training a robust detection model directly on Vyper is particularly challenging, as collecting sufficiently large and diverse Vyper training datasets is difficult in practice. To address this gap, we introduce Sol2Vy, a novel framework that enables cross-language knowledge transfer from Solidity to Vyper, allowing vulnerability detection on Vyper using models trained exclusively on Solidity. This approach eliminates the need for extensive labeled Vyper datasets typically required to build a robust vulnerability detection model. We implement and evaluate Sol2Vy on various critical vulnerability types, including reentrancy, weak randomness, and unchecked transfer. Experimental results show that Sol2Vy, despite being trained exclusively on Solidity, achieves strong detection performance on Vyper contracts and significantly outperforms prior state-of-the-art methods.
Authors:Dmytro Petryk, Ievgen Kabin, Peter Langendoerfer, Zoya Dyka
Abstract:
Devices employing cryptographic approaches have to be resistant to physical attacks. Side-Channel Analysis (SCA) and Fault Injection (FI) attacks are frequently used to reveal cryptographic keys. In this paper, we present a combined SCA and laser illumination attack against an Elliptic Curve Scalar Multiplication accelerator using a differential probe from Teledyne LeCroy. Our experiments show that laser illumination increases the power consumption of the chip, especially its static power consumption but the success of the horizontal power analysis attacks was changed insignificantly. We assume that using a laser with a high laser beam power and concentrating on measuring and analysing only static current can improve the attack success significantly. The horizontal attacks against public key cryptosystems exploiting the Static Consumption under Laser Illumination (SCuLI attacks) are novel and their potential is not investigated yet. These attacks can be especially dangerous against cryptographic chips manufactured in scaled technologies. If such attacks are feasible, appropriate countermeasures have to be proposed in the future.
Authors:Kassem Fawaz, Ren Yi, Octavian Suciu, Rishabh Khandelwal, Hamza Harkous, Nina Taft, Marco Gruteser
Abstract:
The ability to simulate human privacy decisions has significant implications for aligning autonomous agents with individual intent and conducting cost-effective, large-scale privacy-centric user studies. Prior approaches prompt Large Language Models (LLMs) with natural language user statements, data-sharing histories, or demographic attributes to simulate privacy decisions. These approaches, however, fail to balance individual-level accuracy, prompt usability, token efficiency, and population-level representation. We present Narriva, an approach that generates text-based synthetic privacy personas to address these shortcomings. Narriva grounds persona generation in prior user privacy decisions, such as those from large-scale survey datasets, rather than purely relying on demographic stereotypes. It compresses this data into concise, human-readable summaries structured by established privacy theories. Through benchmarking across five diverse datasets, we analyze the characteristics of Narriva's synthetic personas in modeling both individual and population-level privacy preferences. We find that grounding personas in past privacy behaviors achieves up to 88% predictive accuracy (significantly outperforming a non-personalized LLM baseline), and yields an 80-95% reduction in prompt tokens compared to in-context learning with raw examples. Finally, we demonstrate that personas synthesized from a single survey can reproduce the aggregate privacy behaviors and statistical distributions (TVComplement up to 0.85) of entirely different studies.
Authors:Pengzhen Chen, Yanwei Liu, Xiaoyan Gu, Xiaojun Chen, Wu Liu, Weiping Wang
Abstract:
Recent advancements in diffusion-based image editing pose a significant threat to the authenticity of digital visual content. Traditional embedding-based watermarking methods often introduce perceptible perturbations to maintain robustness, inevitably compromising visual fidelity. Meanwhile, existing zero-watermarking approaches, typically relying on global image features, struggle to withstand sophisticated manipulations. In this work, we uncover a key observation: while individual image patches undergo substantial alterations during AI-based editing, the relational distance between patch pairs remains relatively invariant. Leveraging this property, we propose Relational Zero-Watermarking (Rel-Zero), a novel framework that requires no modification to the original image but derives a unique zero-watermark from these editing-invariant patch relations. By grounding the watermark in intrinsic structural consistency rather than absolute appearance, Rel-Zero provides a non-invasive yet resilient mechanism for content authentication. Extensive experiments demonstrate that Rel-Zero achieves substantially improved robustness across diverse editing models and manipulations compared to prior zero-watermarking approaches.
Authors:Shenao Yan, Shimaa Ahmed, Shan Jin, Sunpreet S. Arora, Yiwei Cai, Yizhen Wang, Yuan Hong
Abstract:
Code generation large language models (LLMs) are increasingly integrated into modern software development workflows. Recent work has shown that these models are vulnerable to backdoor and poisoning attacks that induce the generation of insecure code, yet effective defenses remain limited. Existing scanning approaches rely on token-level generation consistency to invert attack targets, which is ineffective for source code where identical semantics can appear in diverse syntactic forms. We present CodeScan, which, to the best of our knowledge, is the first poisoning-scanning framework tailored to code generation models. CodeScan identifies attack targets by analyzing structural similarities across multiple generations conditioned on different clean prompts. It combines iterative divergence analysis with abstract syntax tree (AST)-based normalization to abstract away surface-level variation and unify semantically equivalent code, isolating structures that recur consistently across generations. CodeScan then applies LLM-based vulnerability analysis to determine whether the extracted structures contain security vulnerabilities and flags the model as compromised when such a structure is found. We evaluate CodeScan against four representative attacks under both backdoor and poisoning settings across three real-world vulnerability classes. Experiments on 108 models spanning three architectures and multiple model sizes demonstrate 97%+ detection accuracy with substantially lower false positives than prior methods.
Authors:Lorenzo Corrias, Lorenzo Pisu, Davide Maiorca, Giorgio Giacinto
Abstract:
The growth in the adoption of the WebAssembly (WASM) standard has given rise to a rapidly increasing landscape of binary applications that are natively ported to the environment of websites. The flexibility of WASM has made it the preferred way to run fast and resource-heavy applications, replacing a field that JavaScript previously monopolized. Despite its success, researchers have raised concerns over the security implementations of WASM, demonstrating that binary vulnerabilities, such as Buffer Overflows and Use After Free, remain a present danger for WASM binaries. Our work aims to demonstrate that such vulnerabilities, when occurring on a WebAssembly module, can affect the behavior of a web application in unexpected ways, enabling an attacker to exploit vulnerabilities that are typical of the web security landscape. We provide several scenarios to provide examples of how each binary vulnerability might lead to a web security vulnerability, such as SQL Injections, XS-Leaks, and SSTI. Our results show that binary vulnerabilities can invalidate common security mechanisms that web developer implement in their applications, demonstrating how the security of WASM modules remains a problem that needs to be addressed. We also provide a list of best practices and defensive strategies that developers can implement to mitigate the risks associated with running unsafe WASM modules in their web applications.
Authors:Christian Ewert, Tim Hardow, Melf Fritsch, Leon Dietrich, Henrik Strunck, Rainer Buchty, Mladen Berekovic, Saleh Mulhem
Abstract:
Recently, RISC-V has contributed to the development of IoT devices, requiring architectures that balance energy efficiency, compact area, and integrated security. However, most recent RISC-V cores for IoT prioritize either area footprint or energy efficiency, while adding cryptographic support further compromises compactness. As a result, truly integrated architectures that simultaneously optimize efficiency and security remain largely unexplored, leaving constrained IoT environments vulnerable to performance and security trade-offs. In this paper, we introduce SAILOR, an energy-efficient and scalable ultra-lightweight RISC-V core family for cryptographic applications in IoT. Our design is modular and spans 1-, 2-, 4-, 8-, 16-, and 32-bit serialized execution data-paths, prioritizing minimal area. This modular design and adaptable data-path minimizes the overhead of integrating RISC-V cryptography extensions, achieving low hardware cost while significantly improving energy efficiency. We validate our design approach through a comprehensive analysis of area, energy, and efficiency trade-offs. The results surpass state-of-the-art solutions in both performance and energy efficiency by up to 13x and reduce area by up to 59 %, demonstrating that lightweight cryptographic features can be added without prohibitive overhead, and that energy- or area-efficient designs need not compromise performance.
Authors:Tian Zhang, Yiwei Xu, Juan Wang, Keyan Guo, Xiaoyang Xu, Bowen Xiao, Quanlong Guan, Jinlin Fan, Jiawei Liu, Zhiquan Liu, Hongxin Hu
Abstract:
Large language model (LLM) agents increasingly rely on external tools and retrieval systems to autonomously complete complex tasks. However, this design exposes agents to indirect prompt injection (IPI), where attacker-controlled context embedded in tool outputs or retrieved content silently steers agent actions away from user intent. Unlike prompt-based attacks, IPI unfolds over multi-turn trajectories, making malicious control difficult to disentangle from legitimate task execution. Existing inference-time defenses primarily rely on heuristic detection and conservative blocking of high-risk actions, which can prematurely terminate workflows or broadly suppress tool usage under ambiguous multi-turn scenarios. We propose AgentSentry, a novel inference-time detection and mitigation framework for tool-augmented LLM agents. To the best of our knowledge, AgentSentry is the first inference-time defense to model multi-turn IPI as a temporal causal takeover. It localizes takeover points via controlled counterfactual re-executions at tool-return boundaries and enables safe continuation through causally guided context purification that removes attack-induced deviations while preserving task-relevant evidence. We evaluate AgentSentry on the \textsc{AgentDojo} benchmark across four task suites, three IPI attack families, and multiple black-box LLMs. AgentSentry eliminates successful attacks and maintains strong utility under attack, achieving an average Utility Under Attack (UA) of 74.55 %, improving UA by 20.8 to 33.6 percentage points over the strongest baselines without degrading benign performance.
Authors:Alexander Benvenuti, Brandon Fallin, Calvin Hawkins, Brendan Bialy, Miriam Dennis, Warren Dixon, Matthew Hale
Abstract:
Markov chains model a wide range of user behaviors. However, generating accurate Markov chain models requires substantial user data, and sharing these models without privacy protections may reveal sensitive information about the underlying user data. We introduce a method for protecting user data used to formulate a Markov chain model. First, we develop a method for privatizing database queries whose outputs are elements of the unit simplex, and we prove that this method is differentially private. We quantify its accuracy by bounding the expected KL divergence between private and non-private queries. We extend this method to privatize stochastic matrices whose rows are each a simplex-valued query of a database, which includes data-driven Markov chain models. To assess their accuracy, we analytically bound the change in the stationary distribution and the change in the convergence rate between a non-private Markov chain model and its private form. Simulations show that under a typical privacy implementation, our method yields less than 2% error in the stationary distribution, indicating that our approach to private modeling faithfully captures the behavior of the systems we study.
Authors:Che Wang, Jiaming Zhang, Ziqi Zhang, Zijie Wang, Yinghui Wang, Jianbo Gao, Tao Wei, Zhong Chen, Wei Yang Bryan Lim
Abstract:
The integration of external data services (e.g., Model Context Protocol, MCP) has made large language model-based agents increasingly powerful for complex task execution. However, this advancement introduces critical security vulnerabilities, particularly indirect prompt injection (IPI) attacks. Existing attack methods are limited by their reliance on static patterns and evaluation on simple language models, failing to address the fast-evolving nature of modern AI agents. We introduce AdapTools, a novel adaptive IPI attack framework that selects stealthier attack tools and generates adaptive attack prompts to create a rigorous security evaluation environment. Our approach comprises two key components: (1) Adaptive Attack Strategy Construction, which develops transferable adversarial strategies for prompt optimization, and (2) Attack Enhancement, which identifies stealthy tools capable of circumventing task-relevance defenses. Comprehensive experimental evaluation shows that AdapTools achieves a 2.13 times improvement in attack success rate while degrading system utility by a factor of 1.78. Notably, the framework maintains its effectiveness even against state-of-the-art defense mechanisms. Our method advances the understanding of IPI attacks and provides a useful reference for future research.
Authors:Che Wang, Fuyao Zhang, Jiaming Zhang, Ziqi Zhang, Yinghui Wang, Longtao Huang, Jianbo Gao, Zhong Chen, Wei Yang Bryan Lim
Abstract:
Large Language Model (LLM) agents are susceptible to Indirect Prompt Injection (IPI) attacks, where malicious instructions in retrieved content hijack the agent's execution. Existing defenses typically rely on strict filtering or refusal mechanisms, which suffer from a critical limitation: over-refusal, prematurely terminating valid agentic workflows. We propose ICON, a probing-to-mitigation framework that neutralizes attacks while preserving task continuity. Our key insight is that IPI attacks leave distinct over-focusing signatures in the latent space. We introduce a Latent Space Trace Prober to detect attacks based on high intensity scores. Subsequently, a Mitigating Rectifier performs surgical attention steering that selectively manipulate adversarial query key dependencies while amplifying task relevant elements to restore the LLM's functional trajectory. Extensive evaluations on multiple backbones show that ICON achieves a competitive 0.4% ASR, matching commercial grade detectors, while yielding a over 50% task utility gain. Furthermore, ICON demonstrates robust Out of Distribution(OOD) generalization and extends effectively to multi-modal agents, establishing a superior balance between security and efficiency.
Authors:Silvia Lucia Sanna, Davide Maiorca, Giorgio Giacinto
Abstract:
Memory forensics is an effective methodology for analyzing living-off-the-land malware, including threats that employ evasion, obfuscation, anti-analysis, and steganographic techniques. By capturing volatile system state, memory analysis enables the recovery of transient artifacts such as decrypted payloads, executed commands, credentials, and cryptographic keys that are often inaccessible through static or traditional dynamic analysis. While several automated models have been proposed for malware detection from memory, their outputs typically lack interpretability, and memory analysis still relies heavily on expert-driven inspection of complex tool outputs, such as those produced by Volatility. In this paper, we propose an explainable, AI-assisted memory forensics approach that leverages general-purpose large language models (LLMs) to interpret memory analysis outputs in a human-readable form and to automatically extract meaningful Indicators of Compromise (IoCs), in some circumstances detecting more IoCs than current state-of-the-art tools. We apply the proposed methodology to both Windows and Android malware, comparing full RAM acquisition with target-process memory dumping and highlighting their complementary forensic value. Furthermore, we demonstrate how LLMs can support both expert and non-expert analysts by explaining analysis results, correlating artifacts, and justifying malware classifications. Finally, we show that a human-in-the-loop workflow, assisted by LLMs during kernel-assisted setup and analysis, improves reproducibility and reduces operational complexity, thereby reinforcing the practical applicability of AI-driven memory forensics for modern malware investigations.
Authors:Kaiwen Wang, Xiaolin Chang, Yuehan Dong, Ruichen Zhang
Abstract:
Secure comparison is a fundamental primitive in multi-party computation, supporting privacy-preserving applications such as machine learning and data analytics. A critical performance bottleneck in comparison protocols is their preprocessing phase, primarily due to the high cost of generating the necessary correlated randomness. Recent frameworks introduce a passive, non-colluding dealer to accelerate preprocessing. However, two key issues still remain. First, existing dealer-assisted approaches treat the dealer as a drop-in replacement for conventional preprocessing without redesigning the comparison protocol to optimize the online phase. Second, most protocols are specialized for particular algebraic domains, adversary models, or party configurations, lacking broad generality. In this work, we present the first dealer-assisted $n$-party LTBits (Less-Than-Bits) and MSB (Most Significant Bit) extraction protocols over both $\mathbb{F}_p$ and $\mathbb{Z}_{2^k}$, achieving perfect security at the protocol level. By fully exploiting the dealer's capability to generate rich correlated randomness, our $\mathbb{F}_p$ construction achieves constant-round online complexity and our $\mathbb{Z}_{2^k}$ construction achieves $O(\log_n k)$ rounds with tunable branching factor. All protocols are formulated as black-box constructions via an extended ABB model, ensuring portability across MPC backends and adversary models. Experimental results demonstrate $1.79\times$ to $19.4\times$ speedups over state-of-the-art MPC frameworks, highlighting the practicality of our protocols for comparison-intensive MPC applications.
Authors:Hossein Shokouhinejad, Roozbeh Razavi-Far, Griffin Higgins, Ali. A Ghorbani
Abstract:
Mixture-of-Experts (MoE) offers flexible graph reasoning by combining multiple views of a graph through a learned router. We investigate routing-aware explanations for MoE graph models in malware detection using control flow graphs (CFGs). Our architecture builds diversity at two levels. At the node level, each layer computes multiple neighborhood statistics and fuses them with an MLP, guided by a degree reweighting factor rho and a pooling choice lambda in {mean, std, max}, producing distinct node representations that capture complementary structural cues in CFGs. At the readout level, six experts, each tied to a specific (rho, lambda) view, output graph-level logits that the router weights into a final prediction. Post-hoc explanations are generated with edge-level attributions per expert and aggregated using the router gates so the rationale reflects both what each expert highlights and how strongly it is selected. Evaluated against single-expert GNN baselines such as GCN, GIN, and GAT on the same CFG dataset, the proposed MoE achieves strong detection accuracy while yielding stable, faithful attributions under sparsity-based perturbations. The results indicate that making the router explicit and combining multi-statistic node encoding with expert-level diversity can improve the transparency of MoE decisions for malware analysis.
Authors:Yuxuan Li, Leyang Li, Hao-Ping Lee, Sauvik Das
Abstract:
A growing body of research assumes that large language model (LLM) agents can serve as proxies for how people form attitudes toward and behave in response to security and privacy (S&P) threats. If correct, these simulations could offer a scalable way to forecast S&P risks in products prior to deployment. We interrogate this assumption using SP-ABCBench, a new benchmark of 30 tests derived from validated S&P human-subject studies, which measures alignment between simulations and human-subjects studies on a 0-100 ascending scale, where higher scores indicate better alignment across three dimensions: Attitude, Behavior, and Coherence. Evaluating twelve LLMs, four persona construction strategies, and two prompting methods, we found that there remains substantial room for improvement: all models score between 50 and 64 on average. Newer, bigger, and smarter models do not reliably do better and sometimes do worse. Some simulation configurations, however, do yield high alignment: e.g., with scores above 95 for some behavior tests when agents are prompted to apply bounded rationality and weigh privacy costs against perceived benefits. We release SP-ABCBench to enable reproducible evaluation as methods improve.
Authors:Michael Lanier, Yevgeniy Vorobeychik
Abstract:
We introduce MetaDOAR, a lightweight meta-controller that augments the Double Oracle / PSRO paradigm with a learned, partition-aware filtering layer and Q-value caching to enable scalable multi-agent reinforcement learning on very large cyber-network environments. MetaDOAR learns a compact state projection from per node structural embeddings to rapidly score and select a small subset of devices (a top-k partition) on which a conventional low-level actor performs focused beam search utilizing a critic agent. Selected candidate actions are evaluated with batched critic forwards and stored in an LRU cache keyed by a quantized state projection and local action identifiers, dramatically reducing redundant critic computation while preserving decision quality via conservative k-hop cache invalidation. Empirically, MetaDOAR attains higher player payoffs than SOTA baselines on large network topologies, without significant scaling issues in terms of memory usage or training time. This contribution provide a practical, theoretically motivated path to efficient hierarchical policy learning for large-scale networked decision problems.
Authors:Jie Cao, Zelin Zhang, Qi Li, Jianbing Ni
Abstract:
AI watermarking embeds invisible signals within images to provide provenance information and identify content as AI-generated. In this paper, we introduce MarkSweep, a novel watermark removal attack that effectively erases the embedded watermarks from AI-generated images without degrading visual quality. MarkSweep first amplifies watermark noise in high-frequency regions via edge-aware Gaussian perturbations and injects it into clean images for training a denoising network. This network then integrates two modules, the learnable frequency decomposition module and the frequency-aware fusion module, to suppress amplified noise and eliminate watermark traces. Theoretical analysis and extensive experiments demonstrate that invisible watermarks are highly vulnerable to MarkSweep, which effectively removes embedded watermarks, reducing the bit accuracy of HiDDeN and Stable Signature watermarking schemes to below 67%, while preserving perceptual quality of AI-generated images.
Authors:Shan Ali, Feifei Niu, Paria Shirani, Lionel C. Briand
Abstract:
The rapid evolution of cyberattacks continues to drive the emergence of unknown (zero-day) threats, posing significant challenges for network intrusion detection systems in Internet of Things (IoT) networks. Existing machine learning and deep learning approaches typically rely on large labeled datasets, payload inspection, or closed-set classification, limiting their effectiveness under data scarcity, encrypted traffic, and distribution shifts. Consequently, detecting unknown attacks in realistic IoT deployments remains difficult. To address these limitations, we propose SiamXBERT, a robust and data-efficient Siamese meta-learning framework empowered by a transformer-based language model for unknown attack detection. The proposed approach constructs a dual-modality feature representation by integrating flow-level and packet-level information, enabling richer behavioral modeling while remaining compatible with encrypted traffic. Through meta-learning, the model rapidly adapts to new attack types using only a small number of labeled samples and generalizes to previously unseen behaviors. Extensive experiments on representative IoT intrusion datasets demonstrate that SiamXBERT consistently outperforms state-of-the-art baselines under both within-dataset and cross-dataset settings while requiring significantly less training data, achieving up to \num{78.8}\% improvement in unknown F1-score. These results highlight the practicality of SiamXBERT for robust unknown attack detection in real-world IoT environments.
Authors:Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis
Abstract:
Jailbreaking large language models (LLMs) has emerged as a critical security challenge with the widespread deployment of conversational AI systems. Adversarial users exploit these models through carefully crafted prompts to elicit restricted or unsafe outputs, a phenomenon commonly referred to as Jailbreaking. Despite numerous proposed defense mechanisms, attackers continue to develop adaptive prompting strategies, and existing models remain vulnerable. This motivates approaches that examine the internal behavior of LLMs rather than relying solely on prompt-level defenses. In this work, we study jailbreaking from both security and interpretability perspectives by analyzing how internal representations differ between jailbreak and benign prompts. We conduct a systematic layer-wise analysis across multiple open-source models, including GPT-J, LLaMA, Mistral, and the state-space model Mamba, and identify consistent latent-space patterns associated with harmful inputs. We then propose a tensor-based latent representation framework that captures structure in hidden activations and enables lightweight jailbreak detection without model fine-tuning or auxiliary LLM-based detectors. We further demonstrate that the latent signals can be used to actively disrupt jailbreak execution at inference time. On an abliterated LLaMA-3.1-8B model, selectively bypassing high-susceptibility layers blocks 78% of jailbreak attempts while preserving benign behavior on 94% of benign prompts. This intervention operates entirely at inference time and introduces minimal overhead, providing a scalable foundation for achieving stronger coverage by incorporating additional attack distributions or more refined susceptibility thresholds. Our results provide evidence that jailbreak behavior is rooted in identifiable internal structures and suggest a complementary, architecture-agnostic direction for improving LLM security.
Authors:Simon Erni, Martin Kotuliak, Marc Roeschlin, Richard Baker, Srdjan Capkun
Abstract:
5G presents numerous advantages compared to previous generations: improved throughput, lower latency, and improved privacy protection for subscribers. Attacks against 5G standalone (SA) commonly use fake base stations (FBS), which need to operate at a very high output power level to lure victim phones to connect to them and are thus highly detectable. In this paper, we introduce 5Gone, a powerful software-defined radio (SDR)-based uplink overshadowing attack method against 5G-SA. 5Gone exploits deficiencies in the 3GPP standard to perform surgical, covert denial-of-service, privacy, and downgrade attacks. Uplink overshadowing means that an attacker is transmitting at exactly the same time and frequency as the victim UE, but with a slightly higher output power. 5Gone runs on a COTS x86 computer without any need for dedicated hardware acceleration and can overshadow commercial 100 MHz cells with an E2E latency of less than 500$μ$s, which up to now has not been possible with any software-based UE implementation. We demonstrate that 5Gone is highly scalable, even when many UEs are connecting in parallel, and finally evaluate the attacks end-to-end against 7 phone models and three different chipset vendors both in our lab and in the real-world on public gNodeBs.
Authors:Himanshu Singh, Ziwei Xu, A. V. Subramanyam, Mohan Kankanhalli
Abstract:
Large Language Models (LLMs) are powerful text generators, yet they can produce toxic or harmful content even when given seemingly harmless prompts. This presents a serious safety challenge and can cause real-world harm. Toxicity is often subtle and context-dependent, making it difficult to detect at the token level or through coarse sentence-level signals. Moreover, efforts to mitigate toxicity often face a trade-off between safety and the coherence, or fluency of the generated text. In this work, we present a targeted subspace intervention strategy for identifying and suppressing hidden toxic patterns from underlying model representations, while preserving overall ability to generate safe fluent content. On the RealToxicityPrompts, our method achieves strong mitigation performance compared to existing baselines, with minimal impact on inference complexity. Across multiple LLMs, our approach reduces toxicity of state-of-the-art detoxification systems by 8-20%, while maintaining comparable fluency. Through extensive quantitative and qualitative analyses, we show that our approach achieves effective toxicity reduction without impairing generative performance, consistently outperforming existing baselines.
Authors:Guowei Guan, Yurong Hao, Jiaming Zhang, Tiantong Wu, Fuyao Zhang, Tianxiang Chen, Longtao Huang, Cyril Leung, Wei Yang Bryan Lim
Abstract:
Multimodal large language models (MLLMs) are pushing recommender systems (RecSys) toward content-grounded retrieval and ranking via cross-modal fusion. We find that while cross-modal consensus often mitigates conventional poisoning that manipulates interaction logs or perturbs a single modality, it also introduces a new attack surface where synchronised multimodal poisoning can reliably steer fused representations along stable semantic directions during fine-tuning. To characterise this threat, we formalise cross-modal interactive poisoning and propose VENOMREC, which performs Exposure Alignment to identify high-exposure regions in the joint embedding space and Cross-modal Interactive Perturbation to craft attention-guided coupled token-patch edits. Experiments on three real-world multimodal datasets demonstrate that VENOMREC consistently outperforms strong baselines, achieving 0.73 mean ER@20 and improving over the strongest baseline by +0.52 absolute ER points on average, while maintaining comparable recommendation utility.
Authors:Kim Hammar, Tansu Alpcan, Emil Lupu
Abstract:
Large language models (LLMs) are promising tools for supporting security management tasks, such as incident response planning. However, their unreliability and tendency to hallucinate remain significant challenges. In this paper, we address these challenges by introducing a principled framework for using an LLM as decision support in security management. Our framework integrates the LLM in an iterative loop where it generates candidate actions that are checked for consistency with system constraints and lookahead predictions. When consistency is low, we abstain from the generated actions and instead collect external feedback, e.g., by evaluating actions in a digital twin. This feedback is then used to refine the candidate actions through in-context learning (ICL). We prove that this design allows to control the hallucination risk by tuning the consistency threshold. Moreover, we establish a bound on the regret of ICL under certain assumptions. To evaluate our framework, we apply it to an incident response use case where the goal is to generate a response and recovery plan based on system logs. Experiments on four public datasets show that our framework reduces recovery times by up to 30% compared to frontier LLMs.
Authors:Simone Manoni, Emanuele Parisi, Riccardo Tedeschi, Davide Rossi, Andrea Acquaviva, Andrea Bartolini
Abstract:
This work presents the first design, integration, and evaluation of the standard RISC-V extensions for Control-Flow Integrity (CFI). The Zicfiss and Zicfilp extensions aim at protecting the execution of a vulnerable program from control-flow hijacking attacks through the implementation of security mechanisms based on shadow stack and landing pad primitives. We introduce two independent and configurable hardware units implementing forward-edge and backward-edge control-flow protection, fully integrated into the open-source CVA6 core. Our design incurs in only 1.0% area overhead when synthesized in 22 nm FDX technology, and up to 15.6% performance overhead based on evaluation with the MiBench automotive benchmark subset. We release the complete implementation as open source.
Authors:Alexander Loth, Dominique Conceicao Rosario, Peter Ebinger, Martin Kappes, Marc-Oliver Pahl
Abstract:
The proliferation of generative AI poses challenges for information integrity assurance, requiring systems that connect model governance with end-user verification. We present Origin Lens, a privacy-first mobile framework that targets visual disinformation through a layered verification architecture. Unlike server-side detection systems, Origin Lens performs cryptographic image provenance verification and AI detection locally on the device via a Rust/Flutter hybrid architecture. Our system integrates multiple signals - including cryptographic provenance, generative model fingerprints, and optional retrieval-augmented verification - to provide users with graded confidence indicators at the point of consumption. We discuss the framework's alignment with regulatory requirements (EU AI Act, DSA) and its role in verification infrastructure that complements platform-level mechanisms.
Authors:Rodrigo Tertulino, Ricardo Almeida, Laercio Alencar
Abstract:
The digitization of healthcare has generated massive volumes of Electronic Health Records (EHRs), offering unprecedented opportunities for training Artificial Intelligence (AI) models. However, stringent privacy regulations such as GDPR and HIPAA have created data silos that prevent centralized training. Federated Learning (FL) has emerged as a promising solution that enables collaborative model training without sharing raw patient data. Despite its potential, FL remains vulnerable to poisoning and Sybil attacks, in which malicious participants corrupt the global model or infiltrate the network using fake identities. While recent approaches integrate Blockchain technology for auditability, they predominantly rely on probabilistic reputation systems rather than robust cryptographic identity verification. This paper proposes a Trustworthy Blockchain-based Federated Learning (TBFL) framework integrating Self-Sovereign Identity (SSI) standards. By leveraging Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs), our architecture ensures only authenticated healthcare entities contribute to the global model. Through comprehensive evaluation using the MIMIC-IV dataset, we demonstrate that anchoring trust in cryptographic identity verification rather than behavioral patterns significantly mitigates security risks while maintaining clinical utility. Our results show the framework successfully neutralizes 100% of Sybil attacks, achieves robust predictive performance (AUC = 0.954, Recall = 0.890), and introduces negligible computational overhead (<0.12%). The approach provides a secure, scalable, and economically viable ecosystem for inter-institutional health data collaboration, with total operational costs of approximately $18 for 100 training rounds across multiple institutions.
Authors:Poushali Sengupta, Mayank Raikwar, Sabita Maharjan, Frank Eliassen, Yan Zhang
Abstract:
Powerful quantum computers in the future may be able to break the security used for communication between vehicles and other devices (Vehicle-to-Everything, or V2X). New security methods called post-quantum cryptography can help protect these systems, but they often require more computing power and can slow down communication, posing a challenge for fast 6G vehicle networks. In this paper, we propose an adaptive post-quantum cryptography (PQC) framework that predicts short-term mobility and channel variations and dynamically selects suitable lattice-, code-, or hash-based PQC configurations using a predictive multi-objective evolutionary algorithm (APMOEA) to meet vehicular latency and security constraints.However, frequent cryptographic reconfiguration in dynamic vehicular environments introduces new attack surfaces during algorithm transitions. A secure monotonic-upgrade protocol prevents downgrade, replay, and desynchronization attacks during transitions. Theoretical results show decision stability under bounded prediction error, latency boundedness under mobility drift, and correctness under small forecast noise. These results demonstrate a practical path toward quantum-safe cryptography in future 6G vehicular networks. Through extensive experiments based on realistic mobility (LuST), weather (ERA5), and NR-V2X channel traces, we show that the proposed framework reduces end-to-end latency by up to 27\%, lowers communication overhead by up to 65\%, and effectively stabilizes cryptographic switching behavior using reinforcement learning. Moreover, under the evaluated adversarial scenarios, the monotonic-upgrade protocol successfully prevents downgrade, replay, and desynchronization attacks.
Authors:Alberto Maria Mongardini, Alessandro Mei
Abstract:
The meme coin ecosystem has grown into one of the most active yet least observable segments of the cryptocurrency market, characterized by extreme churn, minimal project commitment, and widespread fraudulent behavior. While countless meme coins are deployed across multiple blockchains, they rely heavily on off-chain web and social infrastructure to signal legitimacy. These very signals are largely absent from existing datasets, which are often limited to single-chain data or lack the multimodal artifacts required for comprehensive risk modeling. To address this gap, we introduce MemeChain, a large-scale, open-source, cross-chain dataset comprising 34,988 meme coins across Ethereum, BNB Smart Chain, Solana, and Base. MemeChain integrates on-chain data with off-chain artifacts, including website HTML source code, token logos, and linked social media accounts, enabling multimodal and forensic study of meme coin projects. Analysis of the dataset shows that visual branding is frequently omitted in low-effort deployments, and many projects lack a functional website. Moreover, we quantify the ecosystem's extreme volatility, identifying 1,801 tokens (5.15%) that cease all trading activity within just 24 hours of launch. By providing unified cross-chain coverage and rich off-chain context, MemeChain serves as a foundational resource for research in financial forensics, multimodal anomaly detection, and automated scam prevention in the meme coin ecosystem.
Authors:Amir Reza Ramtin, Philippe Nain, Don Towsley
Abstract:
We study the problem of covert quickest change detection in a discrete-time setting, where a sequence of observations undergoes a distributional change at an unknown time. Unlike classical formulations, we consider a covert adversary who has knowledge of the detector's false alarm constraint parameter $γ$ and selects a stationary post-change distribution that depends on it, seeking to remain undetected for as long as possible. Building on the theoretical foundations of the CuSum procedure, we rigorously characterize the asymptotic behavior of the average detection delay (ADD) and the average time to false alarm (AT2FA) when the post-change distribution converges to the pre-change distribution as $γ\to \infty$. Our analysis establishes exact asymptotic expressions for these quantities, extending and refining classical results that no longer hold in this regime. We identify the critical scaling laws governing covert behavior and derive explicit conditions under which an adversary can maintain covertness, defined by ADD = $Θ(γ)$, whereas in the classical setting, ADD grows only as $\mathcal{O}(\log γ)$. In particular, for Gaussian and Exponential models under adversarial perturbations of their respective parameters, we asymptotically characterize ADD as a function of the Kullback--Leibler divergence between the pre- and post-change distributions and $γ$.
Authors:Kristen Moore, Diksha Goel, Cody James Christopher, Zhen Wang, Minjune Kim, Ahmed Ibrahim, Ahmad Mohsin, Seyit Camtepe
Abstract:
Realistic network traffic simulation is critical for evaluating intrusion detection systems, stress-testing network protocols, and constructing high-fidelity environments for cybersecurity training. While attack traffic can often be layered into training environments using red-teaming or replay methods, generating authentic benign background traffic remains a core challenge -- particularly in simulating the complex temporal and communication dynamics of real-world networks. This paper introduces TempoNet, a novel generative model that combines multi-task learning with multi-mark temporal point processes to jointly model inter-arrival times and all packet- and flow-header fields. TempoNet captures fine-grained timing patterns and higher-order correlations such as host-pair behavior and seasonal trends, addressing key limitations of GAN-, LLM-, and Bayesian-based methods that fail to reproduce structured temporal variation. TempoNet produces temporally consistent, high-fidelity traces, validated on real-world datasets. Furthermore, we show that intrusion detection models trained on TempoNet-generated background traffic perform comparably to those trained on real data, validating its utility for real-world security applications.
Authors:Saswat Das, Ferdinando Fioretto
Abstract:
This work addresses the computational challenge of enforcing privacy for agentic Large Language Models (LLMs), where privacy is governed by the contextual integrity framework. Indeed, existing defenses rely on LLM-mediated checking stages that add substantial latency and cost, and that can be undermined in multi-turn interactions through manipulation or benign-looking conversational scaffolding. Contrasting this background, this paper makes a key observation: internal representations associated with privacy-violating intent can be separated from benign requests using linear structure. Using this insight, the paper proposes NeuroFilter, a guardrail framework that operationalizes contextual integrity by mapping norm violations to simple directions in the model's activation space, enabling detection even when semantic filters are bypassed. The proposed filter is also extended to capture threats arising during long conversations using the concept of activation velocity, which measures cumulative drift in internal representations across turns. A comprehensive evaluation across over 150,000 interactions and covering models from 7B to 70B parameters, illustrates the strong performance of NeuroFilter in detecting privacy attacks while maintaining zero false positives on benign prompts, all while reducing the computational inference cost by several orders of magnitude when compared to LLM-based agentic privacy defenses.
Authors:Jiasen Li, Yanwei Liu, Zhuoyi Shang, Xiaoyan Gu, Weiping Wang
Abstract:
Graph-structured data is foundational to numerous web applications, and watermarking is crucial for protecting their intellectual property and ensuring data provenance. Existing watermarking methods primarily operate on graph structures or entangled graph representations, which compromise the transparency and robustness of watermarks due to the information coupling in representing graphs and uncontrollable discretization in transforming continuous numerical representations into graph structures. This motivates us to propose DRGW, the first graph watermarking framework that addresses these issues through disentangled representation learning. Specifically, we design an adversarially trained encoder that learns an invariant structural representation against diverse perturbations and derives a statistically independent watermark carrier, ensuring both robustness and transparency of watermarks. Meanwhile, we devise a graph-aware invertible neural network to provide a lossless channel for watermark embedding and extraction, guaranteeing high detectability and transparency of watermarks. Additionally, we develop a structure-aware editor that resolves the issue of latent modifications into discrete graph edits, ensuring robustness against structural perturbations. Experiments on diverse benchmark datasets demonstrate the superior effectiveness of DRGW.
Authors:Sebastian Bitzer, Maximilian Egger, Mumin Liu, Antonia Wachter-Zeh
Abstract:
Secure aggregation enables aggregation of inputs from multiple parties without revealing individual contributions to the server or other clients. Existing post-quantum approaches based on homomorphic encryption offer practical efficiency but predominantly rely on lattice-based hardness assumptions. We present a code-based alternative for secure aggregation by instantiating a general framework based on key- and message-additive homomorphic encryption under the Learning Parity with Noise (LPN) assumption. Our construction employs a committee-based decryptor realized via secret sharing and incorporates a Chinese Remainder Theorem (CRT)-based optimization to reduce the communication costs of LPN-based instantiations. We analyze the security of the proposed scheme under a new Hint-LPN assumption and show that it is equivalent to standard LPN for suitable parameters. Finally, we evaluate performance and identify regimes in which our approach outperforms information-theoretically secure aggregation protocols.
Authors:Kaichuan Kong, Dongjie Liu, Xiaobo Jin, Shijie Xu, Guanggang Geng
Abstract:
Insider threat detection is a key challenge in enterprise security, relying on user activity logs that capture rich and complex behavioral patterns. These logs are often multi-channel, non-stationary, and anomalies are rare, making anomaly detection challenging. To address these issues, we propose a novel framework that integrates wavelet-aware modulation, multi-resolution wavelet decomposition, and resolution-adaptive attention for robust anomaly detection. Our approach first applies a deviation-aware modulation scheme to suppress routine behaviors while amplifying anomalous deviations. Next, discrete wavelet transform (DWT) decomposes the log signals into multi-resolution representations, capturing both long-term trends and short-term anomalies. Finally, a learnable attention mechanism dynamically reweights the most discriminative frequency bands for detection. On the CERT r4.2 benchmark, our approach consistently outperforms existing baselines in precision, recall, and F1 score across various time granularities and scenarios.
Authors:Zhuoyi Shang, Jiasen Li, Pengzhen Chen, Yanwei Liu, Xiaoyan Gu, Weiping Wang
Abstract:
The fine-tuning technique in deep learning gives rise to an emerging lineage relationship among models. This lineage provides a promising perspective for addressing security concerns such as unauthorized model redistribution and false claim of model provenance, which are particularly pressing in \textcolor{blue}{open-weight model} libraries where robust lineage verification mechanisms are often lacking. Existing approaches to model lineage detection primarily rely on static architectural similarities, which are insufficient to capture the dynamic evolution of knowledge that underlies true lineage relationships. Drawing inspiration from the genetic mechanism of human evolution, we tackle the problem of model lineage attestation by verifying the joint trajectory of knowledge evolution and parameter modification. To this end, we propose a novel model lineage attestation framework. In our framework, model editing is first leveraged to quantify parameter-level changes introduced by fine-tuning. Subsequently, we introduce a novel knowledge vectorization mechanism that refines the evolved knowledge within the edited models into compact representations by the assistance of probe samples. The probing strategies are adapted to different types of model families. These embeddings serve as the foundation for verifying the arithmetic consistency of knowledge relationships across models, thereby enabling robust attestation of model lineage. Extensive experimental evaluations demonstrate the effectiveness and resilience of our approach in a variety of adversarial scenarios in the real world. Our method consistently achieves reliable lineage verification across a broad spectrum of model types, including classifiers, diffusion models, and large language models.
Authors:Wadid Foudhaili, Aykut Rencber, Anouar Nechi, Rainer Buchty, Mladen Berekovic, Andres Gomez, Saleh Mulhem
Abstract:
In the modern Systems-on-Chip (SoC), the Advanced eXtensible Interface (AXI) protocol exhibits security vulnerabilities, enabling partial or complete denial-of-service (DoS) through protocol-violation attacks. The recent countermeasures lack a dedicated real-time protocol semantic analysis and evade protocol compliance checks. This paper tackles this AXI vulnerability issue and presents an intelligent hardware monitoring system (IMS) for real-time detection of AXI protocol violations. IMS is a hardware module leveraging neural networks to achieve high detection accuracy. For model training, we perform DoS attacks through header-field manipulation and systematic malicious operations, while recording AXI transactions to build a training dataset. We then deploy a quantization-optimized neural network, achieving 98.7% detection accuracy with <=3% latency overhead, and throughput of >2.5 million inferences/s. We subsequently integrate this IMS into a RISC-V SoC as a memory-mapped IP core to monitor its AXI bus. For demonstration and initial assessment for later ASIC integration, we implemented this IMS on an AMD Zynq UltraScale+ MPSoC ZCU104 board, showing an overall small hardware footprint (9.04% look-up-tables (LUTs), 0.23% DSP slices, and 0.70% flip-flops) and negligible impact on the overall design's achievable frequency. This demonstrates the feasibility of lightweight, security monitoring for resource-constrained edge environments.
Authors:Christopher Blake, Chen Feng, Xuechao Wang, Qianyu Yu
Abstract:
A proof of the security of the Bitcoin protocol is made rigorous, and simplified in certain parts. A computational model in which an adversary can delay transmission of blocks by time $Δ$ is considered. The protocol is generalized to allow blocks of different scores and a proof within this more general model is presented. An approach used in a previous paper that used random walk theory is shown through a counterexample to be incorrect; an approach involving a punctured block arrival process is shown to remedy this error. Thus, it is proven that with probability one, the Bitcoin protocol will have infinitely many honest blocks so long as the fully-delayed honest mining rate exceeds the adversary mining rate.
Authors:Yixiao Peng, Hao Hu, Feiyang Li, Xinye Cao, Yingchang Jiang, Jipeng Tang, Guoshun Nan, Yuling Liu
Abstract:
While virtualization and resource pooling empower cloud networks with structural flexibility and elastic scalability, they inevitably expand the attack surface and challenge cyber resilience. Reinforcement Learning (RL)-based defense strategies have been developed to optimize resource deployment and isolation policies under adversarial conditions, aiming to enhance system resilience by maintaining and restoring network availability. However, existing approaches lack robustness as they require retraining to adapt to dynamic changes in network structure, node scale, attack strategies, and attack intensity. Furthermore, the lack of Human-in-the-Loop (HITL) support limits interpretability and flexibility. To address these limitations, we propose CyberOps-Bots, a hierarchical multi-agent reinforcement learning framework empowered by Large Language Models (LLMs). Inspired by MITRE ATT&CK's Tactics-Techniques model, CyberOps-Bots features a two-layer architecture: (1) An upper-level LLM agent with four modules--ReAct planning, IPDRR-based perception, long-short term memory, and action/tool integration--performs global awareness, human intent recognition, and tactical planning; (2) Lower-level RL agents, developed via heterogeneous separated pre-training, execute atomic defense actions within localized network regions. This synergy preserves LLM adaptability and interpretability while ensuring reliable RL execution. Experiments on real cloud datasets show that, compared to state-of-the-art algorithms, CyberOps-Bots maintains network availability 68.5% higher and achieves a 34.7% jumpstart performance gain when shifting the scenarios without retraining. To our knowledge, this is the first study to establish a robust LLM-RL framework with HITL support for cloud defense. We will release our framework to the community, facilitating the advancement of robust and autonomous defense in cloud networks.
Authors:Zhaoqi Wang, Zijian Zhang, Daqing He, Pengtao Kou, Xin Li, Jiamou Liu, Jincheng An, Yong Liu
Abstract:
Large language models (LLMs) have demonstrated remarkable capabilities across diverse applications, however, they remain critically vulnerable to jailbreak attacks that elicit harmful responses violating human values and safety guidelines. Despite extensive research on defense mechanisms, existing safeguards prove insufficient against sophisticated adversarial strategies. In this work, we propose iMIST (\underline{i}nteractive \underline{M}ulti-step \underline{P}rogre\underline{s}sive \underline{T}ool-disguised Jailbreak Attack), a novel adaptive jailbreak method that synergistically exploits vulnerabilities in current defense mechanisms. iMIST disguises malicious queries as normal tool invocations to bypass content filters, while simultaneously introducing an interactive progressive optimization algorithm that dynamically escalates response harmfulness through multi-turn dialogues guided by real-time harmfulness assessment. Our experiments on widely-used models demonstrate that iMIST achieves higher attack effectiveness, while maintaining low rejection rates. These results reveal critical vulnerabilities in current LLM safety mechanisms and underscore the urgent need for more robust defense strategies.
Authors:Md Ajoad Hasan, Dipayan Saha, Khan Thamid Hasan, Nashmin Alam, Azim Uddin, Sujan Kumar Saha, Mark Tehranipoor, Farimah Farahmandi
Abstract:
The growing complexity of modern system-on-chip (SoC) and IP designs is making security assurance difficult day by day. One of the fundamental steps in the pre-silicon security verification of a hardware design is the identification of security assets, as it substantially influences downstream security verification tasks, such as threat modeling, security property generation, and vulnerability detection. Traditionally, assets are determined manually by security experts, requiring significant time and expertise. To address this challenge, we present LAsset, a novel automated framework that leverages large language models (LLMs) to identify security assets from both hardware design specifications and register-transfer level (RTL) descriptions. The framework performs structural and semantic analysis to identify intra-module primary and secondary assets and derives inter-module relationships to systematically characterize security dependencies at the design level. Experimental results show that the proposed framework achieves high classification accuracy, reaching up to 90% recall rate in SoC design, and 93% recall rate in IP designs. This automation in asset identification significantly reduces manual overhead and supports a scalable path forward for secure hardware development.
Authors:Saif E. Nouma, Gokhan Mumcu, Attila A. Yavuz
Abstract:
Resource-constrained Internet of Things (IoT) devices, from medical implants to small drones, must transmit sensitive telemetry under adversarial wireless channels while operating under stringent computing and energy budgets. Authenticated Encryption (AE) is essential for ensuring confidentiality, integrity, and authenticity. However, existing lightweight AE standards lack forward-security guarantees, compact tag aggregation, and offline-online (OO) optimizations required for modern high-throughput IoT pipelines. We introduce Diamond, the first provable secure Forward-secure and Aggregate Authenticated Encryption (FAAE) framework that extends and generalizes prior FAAE constructions through a lightweight key evolution mechanism, an OO-optimized computation pipeline, and a set of performance-tiered instantiations tailored to heterogeneous IoT platforms. Diamond substantially reduces amortized offline preprocessing (up to 47%) and achieves up to an order-ofmagnitude reduction in end-to-end latency for large telemetry batches. Our comprehensive evaluation across 64-bit ARM Cortex-A72, 32-bit ARM Cortex-M4, and 8-bit AVR architectures confirms that Diamond consistently outperforms baseline FAAE variants and NIST lightweight AE candidates across authenticated encryption throughput and end-to-end verification latency while maintaining compact tag aggregation and strong breach resilience. We formally prove the security of Diamond and provide two concrete instantiations optimized for compliance and high efficiency. Our open-source release enables reproducibility and seamless integration into IoT platforms.
Authors:Bilgehan Sel, Xuanli He, Alwin Peng, Ming Jin, Jerry Wei
Abstract:
Fine-tuning APIs offered by major AI providers create new attack surfaces where adversaries can bypass safety measures through targeted fine-tuning. We introduce Trojan-Speak, an adversarial fine-tuning method that bypasses Anthropic's Constitutional Classifiers. Our approach uses curriculum learning combined with GRPO-based hybrid reinforcement learning to teach models a communication protocol that evades LLM-based content classification. Crucially, while prior adversarial fine-tuning approaches report more than 25% capability degradation on reasoning benchmarks, Trojan-Speak incurs less than 5% degradation while achieving 99+% classifier evasion for models with 14B+ parameters. We demonstrate that fine-tuned models can provide detailed responses to expert-level CBRN (Chemical, Biological, Radiological, and Nuclear) queries from Anthropic's Constitutional Classifiers bug-bounty program. Our findings reveal that LLM-based content classifiers alone are insufficient for preventing dangerous information disclosure when adversaries have fine-tuning access, and we show that activation-level probes can substantially improve robustness to such attacks.
Authors:César Vieira, João Vitorino, Eva Maia, Isabel Praça
Abstract:
Malware continues to be a predominant operational risk for organizations, especially when obfuscation techniques are used to evade detection. Despite the ongoing efforts in the development of Machine Learning (ML) detection approaches, there is still a lack of feature compatibility in public datasets. This limits generalization when facing distribution shifts, as well as transferability to different datasets. This study evaluates the suitability of different data preprocessing approaches for the detection of Portable Executable (PE) files with ML models. The preprocessing pipeline unifies EMBERv2 (2,381-dim) features datasets, trains paired models under two training setups: EMBER + BODMAS and EMBER + BODMAS + ERMDS. Regarding model evaluation, both EMBER + BODMAS and EMBER + BODMAS + ERMDS models are tested against TRITIUM, INFERNO and SOREL-20M. ERMDS is also used for testing for the EMBER + BODMAS setup.
Authors:Kaan Durmaz, Jan Schuchardt, Sebastian Schmidt, Stephan Günnemann
Abstract:
Random cropping is one of the most common data augmentation techniques in computer vision, yet the role of its inherent randomness in training differentially private machine learning models has thus far gone unexplored. We observe that when sensitive content in an image is spatially localized, such as a face or license plate, random cropping can probabilistically exclude that content from the model's input. This introduces a third source of stochasticity in differentially private training with stochastic gradient descent, in addition to gradient noise and minibatch sampling. This additional randomness amplifies differential privacy without requiring changes to model architecture or training procedure. We formalize this effect by introducing a patch-level neighboring relation for vision data and deriving tight privacy bounds for differentially private stochastic gradient descent (DP-SGD) when combined with random cropping. Our analysis quantifies the patch inclusion probability and shows how it composes with minibatch sampling to yield a lower effective sampling rate. Empirically, we validate that patch-level amplification improves the privacy-utility trade-off across multiple segmentation architectures and datasets. Our results demonstrate that aligning privacy accounting with domain structure and additional existing sources of randomness can yield stronger guarantees at no additional cost.
Authors:Yoshimichi Nakatsuka, Nicolas Dutly, Kari Kostiainen, Srdjan Capkun
Abstract:
Private Membership Testing (PMT) protocols enable clients to verify whether a certain data item is included in a database without revealing the item to the database operator or other external parties. This paper examines Source-assisted PMT (SPMT), in which clients leverage compact data source-provided information issued when the data item is first submitted to the database. SPMT is relevant in applications such as certificate transparency and supply-chain auditing; yet, designing an approach that is efficient, scalable, and privacy-preserving remains a challenge. This work presents Gyokuro, which takes a different approach to conventional membership testing schemes. Instead of requesting the server to produce a proof attesting that a certain data item exists in the database, we leverage Trusted Execution Environments (TEEs) to produce proofs demonstrating that the server has made enough progress to add the data item to the database. With the help of existing monitoring services, clients can infer that no items have been removed from the database. This allows Gyokuro to provide strong privacy guaranties and achieve high efficiency, as a client's membership testing query does not include any information regarding their interests, and eliminates the need for complex and inefficient protection mechanisms. Additionally, this approach enables membership testing on large-scale databases, since the communication and computation required are independent of the database size. Our evaluations show practical feasibility, achieving 7 ms membership testing latency and throughput of around 1400 requests/sec/core.
Authors:Najeeb Jebreel, David Sánchez, Josep Domingo-Ferrer
Abstract:
Membership inference attacks (MIAs) aim to determine whether a data sample was included in a machine learning (ML) model's training set and have become the de facto standard for measuring privacy leakages in ML. We propose an evaluation framework that defines the conditions under which MIAs constitute a genuine privacy threat, and review representative MIAs against it. We find that, under the realistic conditions defined in our framework, MIAs represent weak privacy threats. Thus, relying on them as a privacy metric in ML can lead to an overestimation of risk and to unnecessary sacrifices in model utility as a consequence of employing too strong defenses.
Authors:Bushra Sabir, Shigang Liu, Seung Ick Jang, Sharif Abuadbba, Yansong Gao, Kristen Moore, SangCheol Kim, Hyoungshick Kim, Surya Nepal
Abstract:
Automatically generating source code from natural language using large language models (LLMs) is becoming common, yet security vulnerabilities persist despite advances in fine tuning and prompting. In this work, we systematically evaluate whether multi LLM ensembles and collaborative strategies can meaningfully improve secure code generation. We present MULTI-LLMSECCODEEVAL, a framework for assessing and enhancing security across the vulnerability management lifecycle by combining multiple LLMs with static analysis and structured collaboration. Using SecLLMEval and SecLLMHolmes, we benchmark ten pipelines spanning single model, ensemble, collaborative, and hybrid designs. Our results show that ensemble pipelines augmented with static analysis improve secure code generation over single LLM baselines by up to 47.3% on SecLLMEval and 19.3% on SecLLMHolmes, while purely LLM based collaborative pipelines yield smaller gains of 8.9% to 22.3%. Hybrid pipelines that integrate ensembling, detection, and patching achieve the strongest security performance, outperforming the best ensemble baseline by 1.78% to 4.72% and collaborative baselines by 19.81% to 26.78%. Ablation studies reveal that model scale alone does not ensure security. Smaller, structured multi model ensembles consistently outperform large monolithic LLMs. Overall, our findings demonstrate that secure code does not emerge from scale, but from carefully orchestrated multi model system design.
Authors:Lina Alkarmi, Armin Sarabi, Mingyan Liu
Abstract:
While the size of a data breach is typically measured by the number of (consumer, customer, or user) records exposed or compromised, its economic impact is generally measured from the point of view of the corporation suffering the data breach: cost in crisis management, legal fees, drop in stock price, and so on. This study examines whether it is possible to estimate the true cost, or the social cost of a data breach, measured by the impact on its victims and their out of pocket costs. To accomplish this we establish: (1) the estimation of the average direct financial losses of an identity theft (IDT) victim, including the opportunity cost of lost time, and healthcare expenditures associated with distress associated with identity theft; and (2) the estimation of increases in incidents of IDT that can be attributed to a major breach event. Our findings show that the average social cost per victim has declined significantly since 2016. Furthermore, we find that there is indeed a statistically significant increase in the number of IDTs following a mega-breach event when accounting for a discovery lag of 1-2 months post-breach. Applying our model to real-world cases allows us to estimate an upper and lower bound social cost of specific mega-breach events. We find that for the 2009 Heartland and 2013 Target breaches, even the conservative lower bound social cost estimate exceeded settlements by factors of 5 and 18, respectively. In contrast, the 2017 Equifax breach resulted in a lower bound estimate of $263.8 million, falling well within its $700 million settlement cap. While the Equifax upper bound estimate of $1.72 billion in social cost more than doubles this settlement, the narrowing gap between institutional liability and an incident's social cost provides empirical evidence of a market saturation effect that reduces the marginal damage of individual compromised records over time.
Authors:Di Lu, Yongzhi Liao, Xutong Mu, Lele Zheng, Ke Cheng, Xuewen Dong, Yulong Shen, Jianfeng Ma
Abstract:
Host-acting agents promise a convenient interaction model in which users specify goals and the system determines how to realize them. We argue that this convenience introduces a distinct security problem: semantic under-specification in goal specification. User instructions are typically goal-oriented, yet they often leave process constraints, safety boundaries, persistence, and exposure insufficiently specified. As a result, the agent must complete missing execution semantics before acting, and this completion can produce risky host-side plans even when the user-stated goal is benign. In this paper, we develop a semantic threat model, present a taxonomy of semantic-induced risky completion patterns, and study the phenomenon through an OpenClaw-centered case study and execution-trace analysis. We further derive defense design principles for making execution boundaries explicit and constraining risky completion. These findings suggest that securing host-acting agents requires governing not only which actions are allowed at execution time, but also how goal-only instructions are translated into executable plans.
Authors:Wenjing Hong, Zhonghua Rong, Li Wang, Feng Chang, Jian Zhu, Ke Tang, Zexuan Zhu, Yew-Soon Ong
Abstract:
Large Language Models (LLMs) have been widely deployed, especially through free Web-based applications that expose them to diverse user-generated inputs, including those from long-tail distributions such as low-resource languages and encrypted private data. This open-ended exposure increases the risk of jailbreak attacks that undermine model safety alignment. While recent studies have shown that leveraging long-tail distributions can facilitate such jailbreaks, existing approaches largely rely on handcrafted rules, limiting the systematic evaluation of these security and privacy vulnerabilities. In this work, we present EvoJail, an automated framework for discovering long-tail distribution attacks via multi-objective evolutionary search. EvoJail formulates long-tail attack prompt generation as a multi-objective optimization problem that jointly maximizes attack effectiveness and minimizes output perplexity, and introduces a semantic-algorithmic solution representation to capture both high-level semantic intent and low-level structural transformations of encryption-decryption logic. Building upon this representation, EvoJail integrates LLM-assisted operators into a multi-objective evolutionary framework, enabling adaptive and semantically informed mutation and crossover for efficiently exploring a highly structured and open-ended search space. Extensive experiments demonstrate that EvoJail consistently discovers diverse and effective long-tail jailbreak strategies, achieving competitive performance with existing methods in both individual and ensemble level.
Authors:Fazhong Liu, Zhuoyan Chen, Tu Lan, Haozhen Tan, Zhenyu Xu, Xiang Li, Guoxing Chen, Yan Meng, Haojin Zhu
Abstract:
Autonomous coding agents are increasingly integrated into software development workflows, offering capabilities that extend beyond code suggestion to active system interaction and environment management. OpenClaw, a representative platform in this emerging paradigm, introduces an extensible skill ecosystem that allows third-party developers to inject behavioral guidance through lifecycle hooks during agent initialization. While this design enhances automation and customization, it also opens a novel and unexplored attack surface. In this paper, we identify and systematically characterize guidance injection, a stealthy attack vector that embeds adversarial operational narratives into bootstrap guidance files. Unlike traditional prompt injection, which relies on explicit malicious instructions, guidance injection manipulates the agent's reasoning context by framing harmful actions as routine best practices. These narratives are automatically incorporated into the agent's interpretive framework and influence future task execution without raising suspicion.We construct 26 malicious skills spanning 13 attack categories including credential exfiltration, workspace destruction, privilege escalation, and persistent backdoor installation. We evaluate them using ORE-Bench, a realistic developer workspace benchmark we developed. Across 52 natural user prompts and six state-of-the-art LLM backends, our attacks achieve success rates from 16.0% to 64.2%, with the majority of malicious actions executed autonomously without user confirmation. Furthermore, 94% of our malicious skills evade detection by existing static and LLM-based scanners. Our findings reveal fundamental tensions in the design of autonomous agent ecosystems and underscore the urgent need for defenses based on capability isolation, runtime policy enforcement, and transparent guidance provenance.
Authors:Nicholas D'Silva, Surya Nepal, Salil S. Kanhere
Abstract:
Graph data is increasingly prevalent across domains, offering analytical value but raising significant privacy concerns. Edges may encode sensitive relationships, while node attributes may contain sensitive entity or personal data. Differential Privacy (DP) has gained traction for its strong guarantees, yet applying DP to graphs is challenging because of their complex relational structure, leading to trade-offs between privacy and utility. Existing methods vary in privacy definitions, utility goals, and contextual settings, complicating comparison. For practitioners, this is compounded by DP's interpretability issues, contributing to misleading protection claims. To address this, we propose a novel systemisation of existing methods tailored to practical considerations and adaptable to varying practitioner objectives. Our contributions include: (i) a comprehensive survey of differentially private graph release methods; (ii) identification of key vulnerabilities; and (iii) a practitioner-oriented, objective-based framework to guide the selection, interpretation, and sound evaluation of existing methods. We demonstrate the use of our systemisation through two exemplary scenarios in which we assume the role of a social network analyst, apply it, and conduct evaluations in accordance with our framework. Together, these two illustrative instantiations ultimately provide a unified benchmark for state-of-the-art methods in the social networks domain.
Authors:Hammad Atta, Ken Huang, Kyriakos Rock Lambros, Yasir Mehmood, Zeeshan Baig, Mohamed Abdur Rahman, Manish Bhatt, M. Aziz Ul Haq, Muhammad Aatif, Nadeem Shahzad, Kamal Noor, Vineeth Sai Narajala, Hazem Ali, Jamel Abed
Abstract:
Agentic LLM systems equipped with persistent memory, RAG pipelines, and external tool connectors face a class of attacks - Logic-layer Prompt Control Injection (LPCI) - for which no automated red-teaming instrument existed. We present LAAF (Logic-layer Automated Attack Framework), the first automated red-teaming framework to combine an LPCI-specific technique taxonomy with stage-sequential seed escalation - two capabilities absent from existing tools: Garak lacks memory-persistence and cross-session triggering; PyRIT supports multi-turn testing but treats turns independently, without seeding each stage from the prior breakthrough. LAAF provides: (i) a 49-technique taxonomy spanning six attack categories (Encoding~11, Structural~8, Semantic~8, Layered~5, Trigger~12, Exfiltration~5; see Table 1), combinable across 5 variants per technique and 6 lifecycle stages, yielding a theoretical maximum of 2,822,400 unique payloads ($49 \times 5 \times 1{,}920 \times 6$; SHA-256 deduplicated at generation time); and (ii) a Persistent Stage Breaker (PSB) that drives payload mutation stage-by-stage: on each breakthrough, the PSB seeds the next stage with a mutated form of the winning payload, mirroring real adversarial escalation. Evaluation on five production LLM platforms across three independent runs demonstrates that LAAF achieves higher stage-breakthrough efficiency than single-technique random testing, with a mean aggregate breakthrough rate of 84\% (range 83--86\%) and platform-level rates stable within 17 percentage points across runs. Layered combinations and semantic reframing are the highest-effectiveness technique categories, with layered payloads outperforming encoding on well-defended platforms.
Authors:Christian Gehrmann, Jonas Ricker, Simon Damm, Deruo Cheng, Julian Speith, Yiqiong Shi, Asja Fischer, Christof Paar
Abstract:
In light of globalized hardware supply chains, the assurance of hardware components has gained significant interest, particularly in cryptographic applications and high-stakes scenarios. Identifying metal lines on scanning electron microscope (SEM) images of integrated circuits (ICs) is one essential step in verifying the absence of malicious circuitry in chips manufactured in untrusted environments. Due to varying manufacturing processes and technologies, such verification usually requires tuning parameters and algorithms for each target IC. Often, a machine learning model trained on images of one IC fails to accurately detect metal lines on other ICs. To address this challenge, we create SAMSEM by adapting Meta's Segment Anything Model 2 (SAM2) to the domain of IC metal line segmentation. Specifically, we develop a multi-scale segmentation approach that can handle SEM images of varying sizes, resolutions, and magnifications. Furthermore, we deploy a topology-based loss alongside pixel-based losses to focus our segmentation on electrical connectivity rather than pixel-level accuracy. Based on a hyperparameter optimization, we then fine-tune the SAM2 model to obtain a model that generalizes across different technology nodes, manufacturing materials, sample preparation methods, and SEM imaging technologies. To this end, we leverage an unprecedented dataset of SEM images obtained from 48 metal layers across 14 different ICs. When fine-tuned on seven ICs, SAMSEM achieves an error rate as low as 0.72% when evaluated on other images from the same ICs. For the remaining seven unseen ICs, it still achieves error rates as low as 5.53%. Finally, when fine-tuned on all 14 ICs, we observe an error rate of 0.62%. Hence, SAMSEM proves to be a reliable tool that significantly advances the frontier in metal line segmentation, a key challenge in post-manufacturing IC verification.
Authors:Yuan Qiu, Xiaokui Xiao, Yin Yang
Abstract:
Answering Select-Join-Aggregate queries with DP is a fundamental problem with important applications in various domains. The current SOTA methods ensure user-level DP (i.e., the adversary cannot infer the presence or absence of any given individual user with high confidence) and achieve instance-optimal accuracy on the query results. However, these solutions involve solving expensive optimization programs, which may incur prohibitive computational overhead for large databases. One promising direction to achieve scalability is through sampling, which provides a tunable trade-off between result utility and computational costs. However, applying sampling to differentially private SJA processing is a challenge for two reasons. First, it is unclear what to sample, in order to achieve the best accuracy within a given computational budget. Second, prior solutions were not designed with sampling in mind, and their mathematical tool chains are not sampling-friendly. To our knowledge, the only known solution that applies sampling to private SJA processing is S&E, a recent proposal that (i) samples users and (ii) combines sampling directly with existing solutions to enforce DP. We show that both are suboptimal designs; consequently, even with a relatively high sample rate, the error incurred by S&E can be 10x higher than the underlying DP mechanism without sampling. Motivated by this, we propose Differentially Private Sampling for Scale (DP-S4S), a novel mechanism that addresses the above challenges by (i) sampling aggregation units instead of users, and (ii) laying the mathematical foundation for SJA processing under RDP, which composes more easily with sampling. Further, DP-S4S can answer both scalar and vector SJA queries. Extensive experiments on real data demonstrate that DP-S4S enables scalable SJA processing on large datasets under user-level DP, while maintaining high result utility.
Authors:Chang Xue, Fang Liu, Jiaye Wang, Jinming Xing, Chen Yang
Abstract:
Decentralized financial platforms rely heavily on Web of Trust reputation systems to mitigate counterparty risk in the absence of centralized identity verification. However, these pseudonymous networks are inherently vulnerable to adversarial behaviors, such as Sybil attacks and camouflaged fraud, where malicious actors cultivate artificial reputations before executing exit scams. Traditional anomaly detection in this domain faces two critical limitations. First, reliance on naive statistical heuristics (e.g., flagging the lowest 5% of rated users) fails to distinguish between victims of bad-mouthing attacks and actual fraudsters. Second, standard Graph Neural Networks (GNNs) operate on the assumption of homophily and cannot effectively process the semantic inversion inherent in signed (trust vs. distrust) and directed (status) edges. We propose TAS-GNN (Topology-Aware Signed Graph Neural Network), a novel framework designed for feature-sparse signed networks like Bitcoin-Alpha. TAS-GNN integrates recursive Web-of-Trust labeling and a dual-channel message-passing architecture that separately models trust and distrust signals, fused through a Status-Aware Attention mechanism. Experiments demonstrate that TAS-GNN achieves state-of-the-art performance, significantly outperforming existing signed GNN baselines.
Authors:Xian Qin, Xue Yang, Xiaohu Tang
Abstract:
While Secure Aggregation (SA) protects update confidentiality in Cross-silo Federated Learning, it fails to guarantee aggregation integrity, allowing malicious servers to silently omit or tamper with updates. Existing verifiable aggregation schemes rely on heavyweight cryptography (e.g., ZKPs, HE), incurring computational costs that scale poorly with model size. In this paper, we propose a lightweight architecture that shifts from extrinsic cryptographic proofs to \textit{Intrinsic Proofs}. We repurpose backdoor injection to embed verification signals directly into model parameters. By harnessing Catastrophic Forgetting, these signals are robust for immediate verification yet ephemeral, naturally decaying to preserve final model utility. We design a randomized, single-verifier auditing framework compatible with SA, ensuring client anonymity and preventing signal collision without trusted third parties. Experiments on SVHN, CIFAR-10, and CIFAR-100 demonstrate high detection probabilities against malicious servers. Notably, our approach achieves over $1000\times$ speedup on ResNet-18 compared to cryptographic baselines, effectively scaling to large models.
Authors:Yinpeng Wu, Yitong Chen, Lixiang Wang, Jinyu Gu, Zhichao Hua, Yubin Xia
Abstract:
Device-side Large Language Models (LLMs) have witnessed explosive growth, offering higher privacy and availability compared to cloud-side LLMs. During LLM inference, both model weights and user data are valuable, and attackers may even compromise the OS kernel to steal them. ARM TrustZone is the de facto hardware-based isolation technology on mobile devices, used to protect sensitive applications from a compromised OS. However, protecting LLM inference with TrustZone incurs significant overhead due to its inflexible isolation of memory and the NPU. To address these challenges, this paper introduces FlexServe, a fast and secure LLM serving system for mobile devices. It first introduces a Flexible Resource Isolation mechanism to construct Flexible Secure Memory (Flex-Mem) and Flexible Secure NPU (Flex-NPU). Both memory pages and the NPU can be efficiently switched between unprotected and protected modes. Based on these mechanisms, FlexServe designs a fast and secure LLM inference framework within TrustZone's secure world. The LLM-Aware Memory Management and Secure Inference Pipeline are introduced to accelerate inference. A Multi-Model Scheduler is proposed to optimize multi-model workflows. We implement a prototype of FlexServe and compare it with two TrustZone-based strawman designs. The results show that FlexServe achieves an average $10.05\times$ speedup in Time to First Token (TTFT) compared to the strawman, and an average $2.44\times$ TTFT speedup compared to an optimized strawman with pipeline and secure NPU enabled. For multi-model agent workflows, the end-to-end speedup is up to $24.30\times$ and $4.05\times$ compared to the strawman and optimized strawman, respectively.
Authors:Tingwei Zhang, John X. Morris, Vitaly Shmatikov
Abstract:
Many large language models (LLMs) use reasoning to generate responses but do not reveal their full reasoning traces (a.k.a. chains of thought), instead outputting only final answers and brief reasoning summaries. To demonstrate that hiding reasoning traces does not prevent users from "stealing" a model's reasoning capabilities, we introduce trace inversion models that, given only the inputs, answers, and (optionally) reasoning summaries exposed by a target model, generate detailed, synthetic reasoning traces. We show that (1) traces synthesized by trace inversion have high overlap with the ground-truth reasoning traces (when available), and (2) fine-tuning student models on inverted traces substantially improves their reasoning. For example, fine-tuning Qwen-2.5-7B-Instruct on traces inverted from the answers and summaries of GPT-5 mini, a commercial black-box LLM, improves its performance from 56.8% to 77.6% on MATH500 and from 11.7% to 42.3% on JEEBench, compared to fine-tuning on just the answers and summaries.
Authors:Ben Dong, Hui Feng, Qian Wang
Abstract:
Modern generative agents such as OpenClaw - an open-source, self-hosted personal assistant with a community skill ecosystem, are gaining attention and are used pervasively. However, the openness and rapid growth of these ecosystems often outpace systematic security evaluation. In this paper, we design, implement, and evaluate Clawdrain, a Trojanized skill that induces a multi-turn "Segmented Verification Protocol" via injected SKILL.md instructions and a companion script that returns PROGRESS/REPAIR/TERMINAL signals. We deploy Clawdrain in a production-like OpenClaw instance with real API billing and a production model (Gemini 2.5 Pro), and we measure 6-7x token amplification over a benign baseline, with a costly, failure configuration reaching approximately 9x. We observe a deployment-only phenomenon: the agent autonomously composes general-purpose tools (e.g., shell/Python) to route around brittle protocol steps, reducing amplification and altering attack dynamics. Finally, we identify production vectors enabled by OpenClaw's architecture, including SKILL.md prompt bloat, persistent tool-output pollution, cron/heartbeat frequency amplification, and behavioral instruction injection. Overall, we demonstrate that token-drain attacks remain feasible in real deployments, but their magnitude and observability are shaped by tool composition, recovery behavior, and interface design.
Authors:Vincent Langford, Shihan Zhao, Hongyu Zhang, Ben Dong, Qian Wang, Anees Rehman, Yuntao Liu
Abstract:
In the realm of quantum computing, quantum circuits serve as essential depictions of quantum algorithms, which are then compiled into executable operations for quantum computations. Quantum compilers are responsible for converting these algorithmic quantum circuits into versions compatible with specific quantum hardware, thus connecting quantum software with hardware. Nevertheless, untrusted quantum compilers present notable threats. They have the potential to result in the theft of quantum circuit designs and jeopardize sensitive intellectual property (IP). In this work, we propose CLOAQ, a quantum circuit obfuscation (QCO) approach that hides the logic and the phase angles of selected gates within the obfuscated quantum circuit. To evaluate the effectiveness of CLOAQ, we sample the input state uniformly from the Hilbert space of all qubits, which is more accurate than prior work that use all-|0> inputs. Our results show that CLOAQ benefits from the synergy between logic and phase protections. Compared with prior QCO approaches using only one perspective, the combined method is more resilient to attacks and causes greater functional disruption when the unlocking key is incorrect.
Authors:Yedi Zhang, Haoyu Wang, Xianglin Yang, Jin Song Dong, Jun Sun
Abstract:
LLM-enabled applications are rapidly reshaping the software ecosystem by using large language models as core reasoning components for complex task execution. This paradigm shift, however, introduces fundamentally new reliability challenges and significantly expands the security attack surface, due to the non-deterministic, learning-driven, and difficult-to-verify nature of LLM behavior. In light of these emerging and unavoidable safety challenges, we argue that such risks should be treated as expected operational conditions rather than exceptional events, necessitating a dedicated incident-response perspective. Consequently, the primary barrier to trustworthy deployment is not further improving model capability but establishing system-level threat monitoring mechanisms that can detect and contextualize security-relevant anomalies after deployment -- an aspect largely underexplored beyond testing or guardrail-based defenses. Accordingly, this position paper advocates systematic and comprehensive monitoring of security threats in LLM-enabled applications as a prerequisite for reliable operation and a foundation for dedicated incident-response frameworks.
Authors:Arka Pal, Louai Zahran, William Gvozdjak, Akilesh Potti, Micah Goldblum
Abstract:
As large language models (LLMs) continue to grow in size, fewer users are able to host and run models locally. This has led to increased use of third-party hosting services. However, in this setting, there is a lack of guarantees on the computation performed by the inference provider. For example, a dishonest provider may replace an expensive large model with a cheaper-to-run weaker model and return the results from the weaker model to the user. Existing tools to verify inference typically rely on methods from cryptography such as zero-knowledge proofs (ZKPs), but these add significant computational overhead, and remain infeasible for use for large models. In this work, we develop a new insight -- that given a method for performing private LLM inference, one can obtain forms of verified inference at marginal extra cost. Specifically, we propose two new protocols which leverage privacy-preserving LLM inference in order to provide guarantees over the inference that was carried out. Our approaches are cheap, requiring the addition of a few extra tokens of computation, and have little to no downstream impact. As the fastest privacy-preserving inference methods are typically faster than ZK methods, the proposed protocols also improve verification runtime. Our work provides novel insights into the connections between privacy and verifiability in LLM inference.
Authors:Banafsheh Saber Latibari, Najmeh Nazari, Daniel Brignac, Hossein Sayadi, Houman Homayoun, Abhijit Mahalanobis
Abstract:
State-space models like Mamba offer linear-time sequence processing and low memory, making them attractive for medical imaging. However, their robustness under realistic software and hardware threat models remains underexplored. This paper evaluates Mamba on multiple MedM-NIST classification benchmarks under input-level attacks, including white-box adversarial perturbations (FGSM/PGD), occlusion-based PatchDrop, and common acquisition corruptions (Gaussian noise and defocus blur) as well as hardware-inspired fault attacks emulated in software via targeted and random bit-flip injections into weights and activations. We profile vulnerabilities and quantify impacts on accuracy indicating that defenses are needed for deployment.
Authors:Tingting Tang, James Flemings, Yongqin Wang, Murali Annavaram
Abstract:
Retrieval-augmented generation (RAG) is a widely used framework for reducing hallucinations in large language models (LLMs) on domain-specific tasks by retrieving relevant documents from a database to support accurate responses. However, when the database contains sensitive corpora, such as medical records or legal documents, RAG poses serious privacy risks by potentially exposing private information through its outputs. Prior work has demonstrated that one can practically craft adversarial prompts that force an LLM to regurgitate the augmented contexts. A promising direction is to integrate differential privacy (DP), a privacy notion that offers strong formal guarantees, into RAG systems. However, naively applying DP mechanisms into existing systems often leads to significant utility degradation. Particularly for RAG systems, DP can reduce the usefulness of the augmented contexts leading to increase risk of hallucination from the LLMs. Motivated by these challenges, we present DP-KSA, a novel privacy-preserving RAG algorithm that integrates DP using the propose-test-release paradigm. DP-KSA follows from a key observation that most question-answering (QA) queries can be sufficiently answered with a few keywords. Hence, DP-KSA first obtains an ensemble of relevant contexts, each of which will be used to generate a response from an LLM. We utilize these responses to obtain the most frequent keywords in a differentially private manner. Lastly, the keywords are augmented into the prompt for the final output. This approach effectively compresses the semantic space while preserving both utility and privacy. We formally show that DP-KSA provides formal DP guarantees on the generated output with respect to the RAG database. We evaluate DP-KSA on two QA benchmarks using three instruction-tuned LLMs, and our empirical results demonstrate that DP-KSA achieves a strong privacy-utility tradeoff.
Authors:Adel ElZemity, Joshua Sylvester, Budi Arief, Rogério De Lemos
Abstract:
SMS-based phishing (smishing) attacks have surged, yet training effective on-device detectors requires labelled threat data that quickly becomes outdated. To deal with this issue, we present Agentic Knowledge Distillation, which consists of a powerful LLM acts as an autonomous teacher that fine-tunes a smaller student SLM, deployable for security tasks without human intervention. The teacher LLM autonomously generates synthetic data and iteratively refines a smaller on-device student model until performance plateaus. We compare four LLMs in this teacher role (Claude Opus 4.5, GPT 5.2 Codex, Gemini 3 Pro, and DeepSeek V3.2) on SMS spam/smishing detection with two student SLMs (Qwen2.5-0.5B and SmolLM2-135M). Our results show that performance varies substantially depending on the teacher LLM, with the best configuration achieving 94.31% accuracy and 96.25% recall. We also compare against a Direct Preference Optimisation (DPO) baseline that uses the same synthetic knowledge and LoRA setup but without iterative feedback or targeted refinement; agentic knowledge distillation substantially outperforms it (e.g. 86-94% vs 50-80% accuracy), showing that closed-loop feedback and targeted refinement are critical. These findings demonstrate that agentic knowledge distillation can rapidly yield effective security classifiers for edge deployment, but outcomes depend strongly on which teacher LLM is used.
Authors:Nardine Basta, Firas Ben Hmida, Houssem Jmal, Muhammad Ikram, Mohamed Ali Kaafar, Andy Walker
Abstract:
In today's enterprise network landscape, the combination of perimeter and distributed firewall rules governs connectivity. To address challenges arising from increased traffic and diverse network architectures, organizations employ automated tools for firewall rule and access policy generation. Yet, effectively managing risks arising from dynamically generated policies, especially concerning critical asset exposure, remains a major challenge. This challenge is amplified by evolving network structures due to trends like remote users, bring-your-own devices, and cloud integration. This paper introduces a novel graph neural network model for identifying weighted shortest paths. The model aids in detecting network misconfigurations and high-risk connectivity paths that threaten critical assets, potentially exploited in zero-day attacks -- cyber-attacks exploiting undisclosed vulnerabilities. The proposed Pro-ZD framework adopts a proactive approach, automatically fine-tuning firewall rules and access policies to address high-risk connections and prevent unauthorized access. Experimental results highlight the robustness and transferability of Pro-ZD, achieving over 95% average accuracy in detecting high-risk connections. \
Authors:Mayank Kumar, Qian Lou, Paulo Barreto, Martine De Cock, Sikha Pentyala
Abstract:
Data is the lifeblood of AI, yet much of the most valuable data remains locked in silos due to privacy and regulations. As a result, AI remains heavily underutilized in many of the most important domains, including healthcare, education, and finance. Synthetic data generation (SDG), i.e. the generation of artificial data with a synthesizer trained on real data, offers an appealing solution to make data available while mitigating privacy concerns, however existing SDG-as-a-service workflow require data holders to trust providers with access to private data.We propose FHAIM, the first fully homomorphic encryption (FHE) framework for training a marginal-based synthetic data generator on encrypted tabular data. FHAIM adapts the widely used AIM algorithm to the FHE setting using novel FHE protocols, ensuring that the private data remains encrypted throughout and is released only with differential privacy guarantees. Our empirical analysis show that FHAIM preserves the performance of AIM while maintaining feasible runtimes.
Authors:Ethan Rathbun, Wo Wei Lin, Alina Oprea, Christopher Amato
Abstract:
Simulated environments are a key piece in the success of Reinforcement Learning (RL), allowing practitioners and researchers to train decision making agents without running expensive experiments on real hardware. Simulators remain a security blind spot, however, enabling adversarial developers to alter the dynamics of their released simulators for malicious purposes. Therefore, in this work we highlight a novel threat, demonstrating how simulator dynamics can be exploited to stealthily implant action-level backdoors into RL agents. The backdoor then allows an adversary to reliably activate targeted actions in an agent upon observing a predefined ``trigger'', leading to potentially dangerous consequences. Traditional backdoor attacks are limited in their strong threat models, assuming the adversary has near full control over an agent's training pipeline, enabling them to both alter and observe agent's rewards. As these assumptions are infeasible to implement within a simulator, we propose a new attack ``Daze'' which is able to reliably and stealthily implant backdoors into RL agents trained for real world tasks without altering or even observing their rewards. We provide formal proof of Daze's effectiveness in guaranteeing attack success across general RL tasks along with extensive empirical evaluations on both discrete and continuous action space domains. We additionally provide the first example of RL backdoor attacks transferring to real, robotic hardware. These developments motivate further research into securing all components of the RL training pipeline to prevent malicious attacks.
Authors:Heajun An, Connor Ng, Sandesh Sharma Dulal, Junghwan Kim, Jin-Hee Cho
Abstract:
Online scams across email, short message services, and social media increasingly challenge everyday risk assessment, particularly as generative AI enables more fluent and context-aware deception. Although transformer-based detectors achieve strong predictive performance, their explanations are often opaque to non-experts or misaligned with model decisions. We propose VEXA, an evidence-grounded and persona-adaptive framework for generating learner-facing scam explanations by integrating GradientSHAP-based attribution with theory-informed vulnerability personas. Evaluation across multi-channel datasets shows that grounding explanations in detector-derived evidence improves semantic reliability without increasing linguistic complexity, while persona conditioning introduces interpretable stylistic variation without disrupting evidential alignment. These results reveal a key design insight: evidential grounding governs semantic correctness, whereas persona-based adaptation operates at the level of presentation under constraints of faithfulness. Together, VEXA demonstrates the feasibility of persona-adaptive, evidence-grounded explanations and provides design guidance for trustworthy, learner-facing security explanations in non-formal contexts.
Authors:Anh Kiet Pham, Van Truong Vo, Vu Trung Duong Le, Tuan Hai Vu, Hoai Luan Pham, Van Tinh Nguyen, Yasuhiko Nakashima
Abstract:
Cryptographic operations are critical for securing IoT, edge computing, and autonomous systems. However, current RISC-V platforms lack efficient hardware support for comprehensive cryptographic algorithm families and post-quantum cryptography. This paper presents Crypto-RV, a RISC-V co-processor architecture that unifies support for SHA-256, SHA-512, SM3, SHA3-256, SHAKE-128, SHAKE-256 AES-128, HARAKA-256, and HARAKA-512 within a single 64-bit datapath. Crypto-RV introduces three key architectural innovations: a high-bandwidth internal buffer (128x64-bit), cryptography-specialized execution units with four-stage pipelined datapaths, and a double-buffering mechanism with adaptive scheduling optimized for large-hash. Implemented on Xilinx ZCU102 FPGA at 160 MHz with 0.851 W dynamic power, Crypto-RV achieves 165 times to 1,061 times speedup over baseline RISC-V cores, 5.8 times to 17.4 times better energy efficiency compared to powerful CPUs. The design occupies only 34,704 LUTs, 37,329 FFs, and 22 BRAMs demonstrating viability for high-performance, energy-efficient cryptographic processing in resource-constrained IoT environments.
Authors:Fabio Turazza, Alessandro Neri, Marcello Pietri, Maria Angela Butturi, Marco Picone, Marco Mamei
Abstract:
Effective demand forecasting is crucial for reducing food waste. However, data privacy concerns often hinder collaboration among retailers, limiting the potential for improved predictive accuracy. In this study, we explore the application of Federated Learning (FL) in Sustainable Supply Chain Management (SSCM), with a focus on the grocery retail sector dealing with perishable goods. We develop a baseline predictive model for demand forecasting and waste assessment in an isolated retailer scenario. Subsequently, we introduce a Blockchain-based FL model, trained collaboratively across multiple retailers without direct data sharing. Our preliminary results show that FL models have performance almost equivalent to the ideal setting in which parties share data with each other, and are notably superior to models built by individual parties without sharing data, cutting waste and boosting efficiency.
Authors:Fabio Turazza, Marcello Pietri, Marco Picone, Marco Mamei
Abstract:
Privacy-Preserving Federated Learning (PPFL) is a Decentralized machine learning paradigm that enables multiple participants to collaboratively train a global model without sharing their data with the integration of cryptographic and privacy-based techniques to enhance the security of the global system. This privacy-oriented approach makes PPFL a highly suitable solution for training shared models in sectors where data privacy is a critical concern. In traditional FL, local models are trained on edge devices, and only model updates are shared with a central server, which aggregates them to improve the global model. However, despite the presence of the aforementioned privacy techniques, in the classical Federated structure, the issue of the server as a single-point-of-failure remains, leading to limitations both in terms of security and scalability. This paper introduces FedBGS, a fully Decentralized Blockchain-based framework that leverages Segmented Gossip Learning through Federated Analytics. The proposed system aims to optimize blockchain usage while providing comprehensive protection against all types of attacks, ensuring both privacy, security and non-IID data handling in Federated environments.
Authors:Dev Vikesh Doshi, Mehjabeen Tasnim, Fernando Landeros, Chinthagumpala Muni Venkatesh, Daniel Timko, Muhammad Lutfor Rahman
Abstract:
Phishing attacks through text, also known as smishing, are a prevalent type of social engineering tactic in which attackers impersonate brands to deceive victims into providing personal information and/or money. While smishing awareness and cyber education are a key method by which organizations communicate this awareness, the guidance itself varies widely. In this paper, we investigate the state of practice of how 149 well-known brands across 25 categories educate their customers about smishing and what smishing prevention and reporting advice they provide. After conducting a comprehensive content analysis of the brands, we identified significant gaps in the smishing-related information provided: only 46\% of the 149 brands mentioned the definition of smishing, less than 1\% had a video tutorial on smishing, and only 50\% of brands provided instructions on how to report. Our study highlights variation in terminology, prevention advice, and reporting mechanisms across industries, with some brands recommending potentially ineffective strategies such as "ignoring suspicious messages." These findings establish a baseline for understanding the current state of industry smishing awareness advice and provide specific areas where standardization improvements are needed. From our evaluation, we provide recommendations for brands on how to offer streamlined education to their respective customers on smishing for better awareness and protection against increasing smishing attacks.
Authors:Voktho Das, Kimia Azar, Hadi Kamali
Abstract:
While logic locking has been extensively studied as a countermeasure against integrated circuit (IC) supply chain threats, recent research has shifted toward reconfigurable-based redaction techniques, e.g., LUT- and eFPGA-based schemes. While these approaches raise the bar against attacks, they incur substantial overhead, much of which arises not from genuine functional reconfigurability need, but from artificial complexity intended solely to frustrate reverse engineering (RE). As a result, fabrics are often underutilized, and security is achieved at disproportionate cost. This paper introduces NuRedact, the first full-custom eFPGA redaction framework that embraces architectural non-uniformity to balance security and efficiency. Built as an extension of the widely adopted OpenFPGA infrastructure, NuRedact introduces a three-stage methodology: (i) custom fabric generation with pin-mapping irregularity, (ii) VPR-level modifications to enable non-uniform placement guided by an automated Python-based optimizer, and (iii) redaction-aware reconfiguration and mapping of target IP modules. Experimental results show up to 9x area reduction compared to conventional uniform fabrics, achieving competitive efficiency with LUT-based and even transistor-level redaction techniques while retaining strong resilience. From a security perspective, NuRedact fabrics are evaluated against state-of-the-art attack models, including SAT-based, cyclic, and sequential variants, and show enhanced resilience while maintaining practical design overheads.
Authors:Li Zheng, Liangbin Xie, Jiantao Zhou, He YiMin
Abstract:
Stable Diffusion (SD) often produces degraded outputs when the training dataset contains adversarial noise. Adversarial purification offers a promising solution by removing adversarial noise from contaminated data. However, existing purification methods are primarily designed for classification tasks and fail to address SD-specific adversarial strategies, such as attacks targeting the VAE encoder, UNet denoiser, or both. To address the gap in SD security, we propose Universal Diffusion Adversarial Purification (UDAP), a novel framework tailored for defending adversarial attacks targeting SD models. UDAP leverages the distinct reconstruction behaviors of clean and adversarial images during Denoising Diffusion Implicit Models (DDIM) inversion to optimize the purification process. By minimizing the DDIM metric loss, UDAP can effectively remove adversarial noise. Additionally, we introduce a dynamic epoch adjustment strategy that adapts optimization iterations based on reconstruction errors, significantly improving efficiency without sacrificing purification quality. Experiments demonstrate UDAP's robustness against diverse adversarial methods, including PID (VAE-targeted), Anti-DreamBooth (UNet-targeted), MIST (hybrid), and robustness-enhanced variants like Anti-Diffusion (Anti-DF) and MetaCloak. UDAP also generalizes well across SD versions and text prompts, showcasing its practical applicability in real-world scenarios.
Authors:Gennady Khalimov, Yevgen Kotukh
Abstract:
We propose a novel digital signature cryptosystem that exploits the concept of the brute-force problem. To ensure the security of the cryptosystem, we employed several mechanisms: sharing a common secret for factorable permutations, associating permutations with the message being signed, and confirming knowledge of the shared secret using a zero-knowledge proof. We developed a secret-sharing theory based on homomorphic matrix transformations for factorized permutations. The inverse matrix transformation for computing the shared secret is determined by secret parameters, which results in incompletely defined functionality and gives rise to a brute-force cryptanalysis problem. Randomization of session keys using a message hash and random parameters guarantees the uniqueness of each signature, even for identical messages. We employed a zero-knowledge authentication protocol to confirm knowledge of the shared secret, thereby protecting the verifier against unauthorized signature imposition. The LINEture cryptosystem is built on linear matrix algebra and does not rely on a computationally hard problem. High security is achieved through the appropriate selection of matrix transformation dimensions. Matrix computations potentially offer low operational costs for signature generation and verification.
Authors:Yevgen Kotukh, Gennady Khalimov
Abstract:
This paper presents a comprehensive cryptographic analysis of the security parameters of the LINEture post-quantum digital signature scheme, which is constructed using matrix algebra over elementary abelian 2-groups. We investigate the influence of three principal parameters. First, the word size m (exhibiting quadratic impact), the second is a vector dimension l, and the third is a number of submatrices in the session key q (exhibiting linear impact) on cryptographic strength. Our analysis reveals a dualistic nature of the parameter l. According to the previous analysis, it does not affect resistance to guessing attacks. A deeper examination of the verification mechanism demonstrates that l establishes a kind of verification barrier of l times m bits. We establish the threshold relationship l less q minus 1 times m, below which parameter l becomes security-critical. The optimal selection rule l near q minus 1 times m is proposed for maximum cryptographic efficiency. Comparative analysis with NIST PQC standards and practical parameter recommendations are provided.
Authors:Qinyi Liu, Dong Liu, Farhad Vadiee, Mohammad Khalil, Pedro P. Vergara Barrios
Abstract:
Synthetic Data Generation (SDG) can be used to facilitate privacy-preserving data sharing. However, most existing research focuses on privacy attacks where the adversary is the recipient of the released synthetic data and attempts to infer sensitive information from it. This study investigates quality degradation attacks initiated by adversaries who possess access to the real dataset or control over the generation process, such as the data owner, the synthetic data provider, or potential intruders. We formalize a corresponding threat model and empirically evaluate the effectiveness of targeted manipulations of real data (e.g., label flipping and feature-importance-based interventions) on the quality of generated synthetic data. The results show that even small perturbations can substantially reduce downstream predictive performance and increase statistical divergence, exposing vulnerabilities within SDG pipelines. This study highlights the need to integrate integrity verification and robustness mechanisms, alongside privacy protection, to ensure the reliability and trustworthiness of synthetic data sharing frameworks.
Authors:Di Lu, Mengna Sun, Qingwen Zhang, Yujia Liu, Jia Zhang, Xuewen Dong, Yulong Shen, Jianfeng Ma
Abstract:
Confidential containers protect cloud-native workloads using trusted execution environments (TEEs). However, existing Container-in-TEE designs (e.g., Confidential Containers (CoCo)) encapsulate the entire runtime within the TEE, inflating the trusted computing base (TCB) and introducing redundant components and cross-layer overhead. We present Arca, a lightweight confidential container framework based on a TEE-in-Container architecture that isolates each workload in an independent, hardware-enforced trust domain while keeping orchestration logic outside the TEE. This design minimizes inter-layer dependencies, confines compromise to per-container boundaries, and restores the TEE's minimal trust principle. We implemented Arca on Intel SGX, Intel TDX, and AMD SEV. Experimental results show that Arca achieves near-native performance and outperforms CoCo in most benchmarks, while the reduced TCB significantly improves verifiability and resilience against host-level compromise. Arca emonstrates that efficient container management and strong runtime confidentiality can be achieved without sacrificing security assurance.
Authors:Joyjit Roy, Samaresh Kumar Singh
Abstract:
Automated negotiations in insurance and business-to-business (B2B) commerce encounter substantial challenges. Current systems force a trade-off between convenience and privacy by routing sensitive financial data through centralized servers, increasing security risks, and diminishing user trust. This study introduces a device-native autonomous Artificial Intelligence (AI) agent system for privacy-preserving negotiations. The proposed system operates exclusively on user hardware, enabling real-time bargaining while maintaining sensitive constraints locally. It integrates zero-knowledge proofs to ensure privacy and employs distilled world models to support advanced on-device reasoning. The architecture incorporates six technical components within an agentic AI workflow. Agents autonomously plan negotiation strategies, conduct secure multi-party bargaining, and generate cryptographic audit trails without exposing user data to external servers. The system is evaluated in insurance and B2B procurement scenarios across diverse device configurations. Results show an average success rate of 87%, a 2.4x latency improvement over cloud baselines, and strong privacy preservation through zero-knowledge proofs. User studies show 27% higher trust scores when decision trails are available. These findings establish a foundation for trustworthy autonomous agents in privacy-sensitive financial domains.
Authors:Ali Akarma, Toqeer Ali Syed, Salman Jan, Hammad Muneer, Abdul Khadar Jilani
Abstract:
The AI-based sensing and autonomous monitoring have become the main components of wildfire early detection, but current systems do not provide adaptive inter-agent coordination, structurally defined human control, and cryptographically verifiable responsibility. Purely autonomous alert dissemination in the context of safety critical disasters poses threats of false alarming, governance failure and lack of trust in the system. This paper provides a blockchain-based governance-conscious agentic AI architecture of trusted wildfire early warning. The monitoring of wildfires is modeled as a constrained partially observable Markov decision process (POMDP) that accounts for the detection latency, false alarms reduction and resource consumption with clear governance constraints. Hierarchical multi-agent coordination means dynamic risk-adaptive reallocation of unmanned aerial vehicles (UAVs). With risk-adaptive policies, a permissioned blockchain layer sets mandatory human-authorization as a state-transition invariant as a smart contract. We build formal assurances such as integrity of alerts, human control, non-repudiation and limited detection latency assumptions of Byzantine fault. Security analysis shows that it is resistant to alert injections, replays, and tampering attacks. High-fidelity simulation environment experimental evaluation of governance enforcement demonstrates that it presents limited operational overhead and decreases false public alerts and maintains adaptive detection performance. This work is a step towards a principled design paradigm of reliable AI systems by incorporating accountability into the agentic control loop of disaster intelligence systems that demand safety in their application.
Authors:Ali Akarma, Toqeer Ali Syed, Abdul Khadar Jilani, Salman Jan, Hammad Muneer, Muazzam A. Khan, Changli Yu
Abstract:
Autonomous underwater vehicles (AUVs) and sensor nodes increasingly support decentralized sensing and coordination in the Internet of Underwater Things (IoUT), yet most deployments rely on static trust once authentication is established, leaving long-duration missions vulnerable to compromised or behaviorally deviating agents. In this paper, an interrogator based structure is presented that incorporates the idea of behavioral trust monitoring into underwater multi-agent operation without interfering with autonomy. Privileged interrogator module is a passive communication metadata analyzer that uses a lightweight transformer model to calculate dynamic trust scores, which are used to authorize the forwarding of mission critical data. Suspicious agents cause proportional monitoring and conditional restrictions, which allow fast containment and maintain network continuity. The evidence of trust is stored in a permissioned blockchain consortium which offers identity management which is not tampered and is decentralized without causing the overhead of public consensus mechanisms. Simulation based analysis shows that the evaluation of the result compares to a relative improvement of 21.7% in the detection accuracy compared to the static trust baselines with limited energy overhead. These findings suggest that behavior driven validation has the capability of reinforcing underwater coordination without compromising scalability and deployment.
Authors:Dalal Alharthi, Ivan Roberto Kawaminami Garcia
Abstract:
As cloud environments become increasingly complex, cybersecurity and forensic investigations must evolve to meet emerging threats. Large Language Models (LLMs) have shown promise in automating log analysis and reasoning tasks, yet they remain vulnerable to prompt injection attacks and lack forensic rigor. To address these dual challenges, we propose a unified, secure-by-design GenAI framework that integrates PromptShield and the Cloud Investigation Automation Framework (CIAF). PromptShield proactively defends LLMs against adversarial prompts using ontology-driven validation that standardizes user inputs and mitigates manipulation. CIAF streamlines cloud forensic investigations through structured, ontology-based reasoning across all six phases of the forensic process. We evaluate our system on real-world datasets from AWS and Microsoft Azure, demonstrating substantial improvements in both LLM security and forensic accuracy. Experimental results show PromptShield boosts classification performance under attack conditions, achieving precision, recall, and F1 scores above 93%, while CIAF enhances ransomware detection accuracy in cloud logs using Likert-transformed performance features. Our integrated framework advances the automation, interpretability, and trustworthiness of cloud forensics and LLM-based systems, offering a scalable foundation for real-time, AI-driven incident response across diverse cloud infrastructures.
Authors:Mohammad Wali Ur Rahman, Martin Manuel Lopez, Lamia Tasnim Mim, Carter Farthing, Julius Battle, Kathryn Buckley, Salim Hariri
Abstract:
For digital infrastructure to be safe, compatible, and standards-aligned, automated communication protocol compliance verification is crucial. Nevertheless, current rule-based systems are becoming less and less effective since they are unable to identify subtle or intricate non-compliance, which attackers frequently use to establish covert communication channels in IPv6 traffic. In order to automate IPv6 compliance verification, this paper presents the Artificial Intelligence Driven Compliance Checker Engine (AICCE), a novel generative system that combines dual-architecture reasoning and retrieval-augmented generation (RAG). Specification segments pertinent to each query can be efficiently retrieved thanks to the semantic encoding of protocol standards into a high-dimensional vector space. Based on this framework, AICCE offers two complementary pipelines: (i) Explainability Mode, which uses parallel LLM agents to render decisions and settle disputes through organized discussions to improve interpretability and robustness, and (ii) Script Execution Mode, which converts clauses into Python rules that can be executed quickly for dataset-wide verification. With the debate mechanism enhancing decision reliability in complicated scenarios and the script-based pipeline lowering per-sample latency, AICCE achieves accuracy and F1-scores of up to 99% when tested on IPv6 packet samples across sixteen cutting-edge generative models. By offering a scalable, auditable, and generalizable mechanism for identifying both routine and covert non-compliance in dynamic communication environments, our results show that AICCE overcomes the blind spots of conventional rule-based compliance checking systems.
Authors:Alexander Benvenuti, Huaiyuan Rao, Matthew Hale
Abstract:
Privacy techniques have been developed for data-driven systems, but systems with non-numeric data cannot use typical noise-adding techniques. Therefore, we develop a new mechanism for privatizing state trajectories of symbolic systems that may be represented as words over a finite alphabet. Such systems include Markov chains, Markov decision processes, and finite-state automata, and we protect their symbolic trajectories with differential privacy. The mechanism we develop randomly selects a private approximation to be released in place of the original sensitive word, with a bias towards low-error private words. This work is based on the permute-and-flip mechanism for differential privacy, which can be applied to non-numeric data. However, a na\"ıve implementation would have to enumerate an exponentially large list of words to generate a private word. As a result, we develop a new mechanism that generates private words without ever needing to enumerate such a list. We prove that the accuracy of our mechanism is never worse than the prior state of the art, and we empirically show on a real traffic dataset that it introduces up to $55\%$ less error than the prior state of the art under a conventional privacy implementation.
Authors:Aymen Lassoued, Nacef Mbarek, Bechir Dardouri, Bassem Ouni, Qing Li, Fakhri Karray
Abstract:
Vulnerability detection in C programs is a critical challenge in software security. Although large language models (LLMs) achieve strong detection performance, their multi-billion-parameter scale makes them impractical for integration into development workflows requiring low latency and continuous analysis. We introduce VULNSCOUT-C, a compact transformer architecture with 693M total parameters (353M active during inference), derived from the Qwen model family and optimized for C code vulnerability detection. Alongside the model, we present VULNSCOUT, a new 33,565-sample curated dataset generated through a controlled multi-agent pipeline with formal verification, designed to fill coverage gaps in existing benchmarks across underrepresented CWE categories. Evaluated on a standardized C vulnerability detection benchmark, VULNSCOUT-C outperforms all evaluated baselines, including state-of-the-art reasoning LLMs and commercial static analysis tools, while offering a fraction of their inference cost. These results demonstrate that task-specialized compact architectures can match or even outperform the detection capability of models orders of magnitude larger, making continuous, low-latency vulnerability analysis practical within real-world development workflows.
Authors:Maolin Wang, Beining Bao, Gan Yuan, Hongyu Chen, Bingkun Zhao, Baoshuo Kan, Jiming Xu, Qi Shi, Yinggong Zhao, Yao Wang, Wei Ying Ma, Jun Yan
Abstract:
Electronic health records (EHRs) and other real-world clinical data are essential for clinical research, medical artificial intelligence, and life science, but their sharing is severely limited by privacy, governance, and interoperability constraints. These barriers create persistent data silos that hinder multi-center studies, large-scale model development, and broader biomedical discovery. Existing privacy-preserving approaches, including multi-party computation and related cryptographic techniques, provide strong protection but often introduce substantial computational overhead, reducing the efficiency of large-scale machine learning and foundation-model training. In addition, many such methods make data usable for restricted computation while leaving them effectively invisible to clinicians and researchers, limiting their value in workflows that still require direct inspection, exploratory analysis, and human interpretation. We propose a real-world-data transformation framework for privacy-preserving sharing of structured clinical records. Instead of converting data into opaque representations, our approach constructs transformed numeric views that preserve medical semantics and major statistical properties while, under a clearly specified threat model, provably breaking direct linkage between those views and protected patient-level attributes. Through collaboration between computer scientists and the AI agent \textbf{SciencePal}, acting as a constrained tool inventor under human guidance, we design three transformation operators that are non-reversible within this threat model, together with an additional mixing strategy for high-risk scenarios, supported by theoretical analysis and empirical evaluation under reconstruction, record linkage, membership inference, and attribute inference attacks.
Authors:Protiva Das, Sovon Chakraborty, Sidhant Narula, Lucas Potter, Xavier-Lewis Palmer, Pratip Rana, Daniel Takabi, Mohammad Ghasemigol
Abstract:
The rapid advancement of Large Language Models (LLMs) in biological research has significantly lowered the barrier to accessing complex bioinformatics knowledge, ex perimental design strategies, and analytical workflows. While these capabilities accelerate innovation, they also introduce serious dual-use risks, as Bio-LLMs can be exploited to generate harmful biological insights under the guise of legitimate research queries. Existing safeguards, such as static prompt filtering and policy-based restrictions, are insufficient when LLMs are embedded within dynamic biological workflows and application-layer systems. In this paper, we present BioShield, a context-aware application-level firewall designed to secure Bio LLMs against dual-use attacks. At the core of BioShield is a domain-specific prompt scanner that performs contextual risk analysis of incoming queries. The scanner leverages a harmful scoring mechanism tailored to biological dual-use threat cat egories to identify prompts that attempt to conceal malicious intent within seemingly benign research requests. Queries ex ceeding a predefined risk threshold are blocked before reaching the model, effectively preventing unsafe knowledge generation at the source. In addition to pre-generation protection, BioShield deploys a post-generation output verification module that inspects model responses for actionable or weaponizable biological content. If an unsafe response is detected, the system triggers controlled regeneration under strengthened safety constraints. By combining contextual prompt scanning with response-level validation, BioShield provides a layered defense framework specifically designed for bio-domain LLM deployments. Our framework advances cyberbiosecurity by formalizing dual-use threat detection in Bio-LLMs and proposing a structured mitigation strategy for secure, responsible AI driven biological research.
Authors:Kolja Dorschel, René Walendy, Lukas Plätz, Thorben Moos, Christof Paar, Steffen Becker
Abstract:
At S&P 2023, Puschner et al. made a valuable dataset for hardware Trojan detection research publicly available. It contains a complete set of Scanning Electron Microscope (SEM) images of four different digital Integrated Circuits (ICs) fabricated at progressively smaller semiconductor technology nodes. Puschner et al. reported preliminary evidence that feature sizes affect Trojan detection performance, but they were unable to disentangle effects caused by insertion strategies or by degrading image quality from those intrinsic to the underlying standard cell libraries. Distinguishing those causes, however, is crucial to understand whether improved tooling (e.g., higher resolution imaging equipment) can remove the observed technology bias, or whether susceptibility to stealthy hardware Trojans is indeed an inherent property of a cell library. In this work, we dive deep into the S&P 2023 dataset to answer these questions. We first show that, using Puschner et al.'s metrics, such a separation is indeed difficult to establish. We then devise alternative metrics to more meaningfully assess and compare the potential susceptibility of standard cell libraries. We find clear differences between the evaluated libraries. However, in all cases we identify cells that implement distinct logic functions yet are visually indistinguishable in SEM images. We exploit this property to construct stealthy, standard-cell-based hardware Trojans and present a concrete case study: a privilege-escalation backdoor in an Ibex RISC-V core. Our results demonstrate that cell libraries can - and should - be evaluated for their potential "Trojanizability", and we recommend practical defenses.
Authors:Jiahao Zhang, Yilong Wang, Suhang Wang
Abstract:
Graph neural networks (GNNs) are widely used for learning from graph-structured data in domains such as social networks, recommender systems, and financial platforms. To comply with privacy regulations like the GDPR, CCPA, and PIPEDA, approximate graph unlearning, which aims to remove the influence of specific data points from trained models without full retraining, has become an increasingly important component of trustworthy graph learning. However, approximate unlearning often incurs subtle performance degradation, which may incur negative and unintended side effects. In this work, we show that such degradations can be amplified into adversarial attacks. We introduce the notion of \textbf{unlearning corruption attacks}, where an adversary injects carefully chosen nodes into the training graph and later requests their deletion. Because deletion requests are legally mandated and cannot be denied, this attack surface is both unavoidable and stealthy: the model performs normally during training, but accuracy collapses only after unlearning is applied. Technically, we formulate this attack as a bi-level optimization problem: to overcome the challenges of black-box unlearning and label scarcity, we approximate the unlearning process via gradient-based updates and employ a surrogate model to generate pseudo-labels for the optimization. Extensive experiments across benchmarks and unlearning algorithms demonstrate that small, carefully designed unlearning requests can induce significant accuracy degradation, raising urgent concerns about the robustness of GNN unlearning under real-world regulatory demands. The source code will be released upon paper acceptance.
Authors:Zehra Karadağ, Simon Klix, René Walendy, Felix Hahn, Kolja Dorschel, Julian Speith, Christof Paar, Steffen Becker
Abstract:
As hardware serves as the root of trust in modern computing systems, Hardware Reverse Engineering (HRE) is foundational for security assurance. In practice, HRE enables critical security applications, including design verification, supply-chain assurance, and vulnerability discovery. Over the past two decades, academic research on Integrated Circuit (IC), Field-Programmable Gate Array (FPGA), and netlist reverse engineering has steadily grown. However, knowledge remains fragmented across domains and communities, which complicates assessing the state of the art and hampers identifying shared research challenges. In this paper, we present a systematization of knowledge based on an in-depth analysis of 187 peer-reviewed publications. Using this corpus, we characterize technical methods across the HRE workflow and identify technical and organizational challenges that impede research progress. We analyze all 30 artifacts from our corpus using established artifact evaluation practices. Key results could be reproduced for only seven publications (4%). Based on our findings, we derive stakeholder-centric recommendations for academia, industry, and government to enable more coordinated and reproducible HRE research. These recommendations target three cross-cutting opportunities: (i) improving reproducibility and reuse via artifact-centric practices, (ii) enabling rigorous comparability through standardized benchmarks and evaluation metrics, and (iii) improving legal clarity for public HRE research.
Authors:Reshabh K Sharma, Linxi Jiang, Zhiqiang Lin, Shuo Chen
Abstract:
The emerging agentic web envisions AI agents that reliably fulfill users' natural-language (NL)-based tasks by interacting with existing web services. However, existing authorization models are misaligned with this vision. In particular, today's operator-scoped authorization, exemplified by OAuth, grants broad permissions tied to operators (e.g., the transfer operator) rather than to the specific operations (e.g., transfer $100 to Bob) implied by a user's task. This will inevitably result in overprivileged agents. We introduce Precise Task-Scoped Implicit Authorization (PAuth), a fundamentally different model in which submitting an NL task implicitly authorizes only the concrete operations required for its faithful execution. To make this enforceable at servers, we propose NL slices: symbolic specifications of the calls each service expects, derived from the task and upstream results. Complementing this, we also propose envelopes: special data structure to bind each operand's concrete value to its symbolic provenance, enabling servers to verify that all operands arise from legitimate computations. PAuth is prototyped in the agent-security evaluation framework AgentDojo. We evaluate it in both benign settings and attack scenarios where a spurious operation is injected into an otherwise normal task. In all benign tests, PAuth executes the tasks successfully without requiring any additional permissions. In all attack tests, PAuth correctly raises warnings about missing permissions. These results demonstrate that PAuth's reasoning about permissions is indeed precise. We further analyze the characteristics of these tasks and measure the associated token costs.
Authors:Hamish Alsop, Leandros Maglaras, Naghmeh Moradpoor
Abstract:
This paper presents Ember, a serverless peer-to-peer messaging system providing end-to-end encrypted communication over a decentralised IPv6 mesh network. Ember operates without central servers, enforces data minimisation through ciphertext-only local storage and time-based message expiration, and prioritises architectural clarity, explicit trust boundaries, and practical deployability on Android. The paper describes the system architecture, cryptographic design, network model, and security properties -- including dynamic testing results demonstrating that no plaintext is recoverable from captured network traffic -- and discusses limitations and future work
Authors:Yifei Cai, Zhuoran Li, Yizhou Feng, Qiao Zhang, Hongyi Wu, Danella Zhao, Chunsheng Xin
Abstract:
The rapid adoption of Transformer-based AI has been driven by accessible models such as ChatGPT, which provide API-based services for developers and businesses. However, as these online inference services increasingly handle sensitive inputs, privacy concerns have emerged as a significant challenge. To address this, secure inference frameworks have been proposed, but their high computational and communication overhead often limit practical deployment. In plaintext settings, token drop is an effective technique for reducing inference cost; however, our analysis reveals that directly applying such methods to ciphertext scenarios is suboptimal due to distinct cost distributions in secure computation. We propose SecDTD, a dynamic token drop scheme tailored for secure Transformer inference. SecDTD advances token drop by shifting the dropping to earlier inference stages, effectively reducing the cost of key components such as Softmax. To support this, we introduce two core techniques. Max-Centric Normalization (MCN): A novel, Softmax-independent scoring method that enables early token drop with minimal overhead and improved normalization, supporting more aggressive dropping without accuracy loss. OMSel: A faster, oblivious median selection protocol that securely identifies the median of importance scores to support token drop. Compared to existing sorting-based methods, OMSel achieves a 16.9$\times$ speedup while maintaining security, obliviousness and randomness. We evaluate SecDTD through 48 experiments across eight GLUE datasets under various network settings using the BOLT and BumbleBee frameworks. SecDTD achieves 4.47 times end-to-end inference acceleration without degradation in accuracy.
Authors:Jianwei Li, Jung-Eun Kim
Abstract:
Backdoor attacks pose severe security threats to large language models (LLMs), where a model behaves normally under benign inputs but produces malicious outputs when a hidden trigger appears. Existing backdoor removal methods typically assume prior knowledge of triggers, access to a clean reference model, or rely on aggressive finetuning configurations, and are often limited to classification tasks. However, such assumptions fall apart in real-world instruction-tuned LLM settings. In this work, we propose a new framework for purifying instruction-tuned LLM without any prior trigger knowledge or clean references. Through systematic sanity checks, we find that backdoor associations are redundantly encoded across MLP layers, while attention modules primarily amplify trigger signals without establishing the behavior. Leveraging this insight, we shift the focus from isolating specific backdoor triggers to cutting off the trigger-behavior associations, and design an immunization-inspired elimination approach: by constructing multiple synthetic backdoored variants of the given suspicious model, each trained with different malicious trigger-behavior pairs, and contrasting them with their clean counterparts. The recurring modifications across variants reveal a shared "backdoor signature"-analogous to antigens in a virus. Guided by this signature, we neutralize highly suspicious components in LLM and apply lightweight finetuning to restore its fluency, producing purified models that withstand diverse backdoor attacks and threat models while preserving generative capability.
Authors:Abdullah Ghani, Yash Vekaria, Zubair Shafiq
Abstract:
Tracking pixels are used to optimize online ad campaigns through personalization, re-targeting, and conversion tracking. Past research has primarily focused on detecting the prevalence of tracking pixels on the web, with limited attention to how they are configured across websites. A tracking pixel may be configured differently on different websites. In this paper, we present a differential analysis framework: PixelConfig, to reverse-engineer the configurations of Meta Pixel deployments across the web. Using this framework, we investigate three types of Meta Pixel configurations: activity tracking (i.e., what a user is doing on a website), identity tracking (i.e., who a user is or who the device is associated with), and tracking restrictions (i.e., mechanisms to limit the sharing of potentially sensitive information). Using data from the Internet Archive's Wayback Machine, we analyze and compare Meta Pixel configurations on 18K health-related websites with a control group of the top 10K websites from 2017 to 2024. We find that activity tracking features, such as automatic events that collect button clicks and page metadata, and identity tracking features, such as first-party cookies that are unaffected by third-party cookie blocking, reached adoption rates of up to 98.4%, largely driven by the Pixel's default settings. We also find that the Pixel is being used to track potentially sensitive information, such as user interactions related to booking medical appointments and button clicks associated with specific medical conditions (e.g., erectile dysfunction) on health-related websites. Tracking restriction features, such as Core Setup, are configured on up to 34.3% of health websites and 8.7% of control websites. However, even when enabled, these tracking restriction features provide limited protection and can be circumvented in practice.
Authors:Hassan Wasswa, Hussein Abbass, Timothy Lynar
Abstract:
The increasing incidence of IoT-based botnet attacks has driven interest in advanced learning models for detection. Recent efforts have focused on leveraging attention mechanisms to model long-range feature dependencies and Graph Neural Networks (GNNs) to capture relationships between data instances. Since GNNs require graph-structured input, tabular NetFlow data must be transformed accordingly. This study evaluates how the choice of the method for constructing the graph-structured dataset impacts the classification performance of a GNN model. Five methods--k-Nearest Neighbors, Mutual Nearest Neighbors, Shared Nearest Neighbor, Gabriel Graph, and epsilon-radius Graph--were evaluated in this research. To reduce the computational burden associated with high-dimensional data, a Variational Autoencoder (VAE) is employed to project the original features into a lower-dimensional latent space prior to graph generation. Subsequently, a Graph Attention Network (GAT) is trained on each graph to classify traffic in the N-BaIoT dataset into three categories: Normal, Mirai, and Gafgyt. The results indicate that using Gabriel graph achieves the highest detection performance with an accuracy of 97.56% while SNN recorded the lowest performance with an accuracy as low as 78.56%.
Authors:Mingkai Li, Joseph Devietti, Suman Jana, Tanvir Ahmed Khan
Abstract:
Modern computing is shifting from homogeneous CPU-centric systems to heterogeneous systems with closely integrated CPUs and GPUs. While the CPU software stack has benefited from decades of memory safety hardening, the GPU software stack remains dangerously immature. This discrepancy presents a critical ethical challenge: the world's most advanced AI and scientific workloads are increasingly deployed on vulnerable hardware components. In this paper, we study the key challenges of ensuring memory safety on heterogeneous systems. We show that, while the number of exploitable bugs in heterogeneous systems rises every year, current mitigation methods often rely on unfaithful translations, i.e., converting GPU programs to run on CPUs for testing, which fails to capture the architectural differences between CPUs and GPUs. We argue that the faithfulness of the program behavior is at the core of secure and reliable heterogeneous systems design. To ensure faithfulness, we discuss several design considerations of a GPU-native fuzzing pipeline for CUDA programs.
Authors:Touseef Hasan, Blessing Airehenbuwa, Nitin Pundir, Souvika Sarkar, Ujjwal Guin
Abstract:
Large language models (LLMs) have shown remarkable capabilities in natural language processing tasks, yet their application in hardware security verification remains limited due to scarcity of publicly available hardware description language (HDL) datasets. This knowledge gap constrains LLM performance in detecting vulnerabilities within HDL designs. To address this challenge, we propose SecureRAG-RTL, a novel Retrieval-Augmented Generation (RAG)-based approach that significantly enhances LLM-based security verification of hardware designs. Our approach integrates domain-specific retrieval with generative reasoning, enabling models to overcome inherent limitations in hardware security expertise. We establish baseline vulnerability detection rates using prompt-only methods and then demonstrate that SecureRAG-RTL achieves substantial improvements across diverse LLM architectures, regardless of size. On average, our method increases detection accuracy by about 30%, highlighting its effectiveness in bridging domain knowledge gaps. For evaluation, we curated and annotated a benchmark dataset of 14 HDL designs containing real-world security vulnerabilities, which we will release publicly to support future research. These findings underscore the potential of RAG-driven augmentation to enable scalable, efficient, and accurate hardware security verification workflows.
Authors:Chen Sun, Yash Vekaria, Zubair Shafiq, Rishab Nithyanand
Abstract:
YouTube has evolved into a powerful platform that where creators monetize their influence through affiliate marketing, raising concerns about transparency and ethics, especially when creators fail to disclose their affiliate relationships. Although regulatory agencies like the US Federal Trade Commission (FTC) have issued guidelines to address these issues, non-compliance and consumer harm persist, and the extent of these problems remains unclear. In this paper, we introduce tools, developed with insights from recent advances in Web measurement and NLP research, to examine the state of the affiliate marketing ecosystem on YouTube. We apply these tools to a 10-year dataset of 2 million videos from nearly 540,000 creators, analyzing the prevalence of affiliate marketing on YouTube and the rates of non-compliant behavior. Our findings reveal that affiliate links are widespread, yet dis- closure compliance remains low, with most videos failing to meet FTC standards. Furthermore, we analyze the effects of different stakeholders in improving disclosure behavior. Our study suggests that the platform is highly associated with improved compliance through standardized disclosure features. We recommend that regulators and affiliate partners collaborate with platforms to enhance transparency, accountability, and trust in the influencer economy.
Authors:Amir Al Sadi, Sina Abdollahi, Adrien Ghosn, Hamed Haddadi, Marios Kogias
Abstract:
Confidential computing protects data in use within Trusted Execution Environments (TEEs), but current TEEs provide little support for secure communication between components. As a result, pipelines of independently developed and deployed TEEs must trust one another to avoid the leakage of sensitive information they exchange -- a fragile assumption that is unrealistic for modern cloud workloads. We present Mica, a confidential computing architecture that decouples confidentiality from trust. Mica provides tenants with explicit mechanisms to define, restrict, and attest all communication paths between components, ensuring that sensitive data cannot leak through shared resources or interactions. We implement Mica on Arm CCA using existing primitives, requiring only modest changes to the trusted computing base. Our extension adds a policy language to control and attest communication paths among Realms and with the untrusted world via shared protected and unprotected memory and control transfers. Our evaluation shows that Mica supports realistic cloud pipelines with only a small increase to the trusted computing base while providing strong, attestable confidentiality guarantees.
Authors:Linxi Jiang, Zhijie Liu, Haotian Luo, Zhiqiang Lin
Abstract:
Browser-use agents are widely used for everyday tasks. They enable automated interaction with web pages through structured DOM based interfaces or vision language models operating on page screenshots. However, web pages often change between planning and execution, causing agents to execute actions based on stale assumptions. We view this temporal mismatch as a time of check to time of use (TOCTOU) vulnerability in browser-use agents. Dynamic or adversarial web content can exploit this window to induce unintended actions. We present a large scale empirical study of TOCTOU vulnerabilities in browser-use agents using a benchmark that spans synthesized and real world websites. Using this benchmark, we evaluate 10 popular open source agents and show that TOCTOU vulnerabilities are widespread. We design a lightweight mitigation based on pre-execution validation. It monitors DOM and layout changes during planning and validates the page state immediately before action execution. This approach reduces the risk of insecure execution and mitigates unintended side effects in browser-use agents.
Authors:Chen Bo Calvin Zhang, Christina Q. Knight, Nicholas Kruus, Jason Hausenloy, Pedro Medeiros, Nathaniel Li, Aiden Kim, Yury Orlovskiy, Coleman Breen, Bryce Cai, Jasper Götting, Andrew Bo Liu, Samira Nedungadi, Paula Rodriguez, Yannis Yiming He, Mohamed Shaaban, Zifan Wang, Seth Donoughe, Julian Michael
Abstract:
Large language models (LLMs) perform increasingly well on biology benchmarks, but it remains unclear whether they uplift novice users -- i.e., enable humans to perform better than with internet-only resources. This uncertainty is central to understanding both scientific acceleration and dual-use risk. We conducted a multi-model, multi-benchmark human uplift study comparing novices with LLM access versus internet-only access across eight biosecurity-relevant task sets. Participants worked on complex problems with ample time (up to 13 hours for the most involved tasks). We found that LLM access provided substantial uplift: novices with LLMs were 4.16 times more accurate than controls (95% CI [2.63, 6.87]). On four benchmarks with available expert baselines (internet-only), novices with LLMs outperformed experts on three of them. Perhaps surprisingly, standalone LLMs often exceeded LLM-assisted novices, indicating that users were not eliciting the strongest available contributions from the LLMs. Most participants (89.6%) reported little difficulty obtaining dual-use-relevant information despite safeguards. Overall, LLMs substantially uplift novices on biological tasks previously reserved for trained practitioners, underscoring the need for sustained, interactive uplift evaluations alongside traditional benchmarks.
Authors:Jasmine Bayrooti, Weiwei Kong, Natalia Ponomareva, Carlos Esteves, Ameesh Makadia, Amanda Prorok
Abstract:
Generative models trained on sensitive image datasets risk memorizing and reproducing individual training examples, making strong privacy guarantees essential. While differential privacy (DP) provides a principled framework for such guarantees, standard DP finetuning (e.g., with DP-SGD) often results in severe degradation of image quality, particularly in high-frequency textures, due to the indiscriminate addition of noise across all model parameters. In this work, we propose a spectral DP framework based on the hypothesis that the most privacy-sensitive portions of an image are often low-frequency components in the wavelet space (e.g., facial features and object shapes) while high-frequency components are largely generic and public. Based on this hypothesis, we propose the following two-stage framework for DP image generation with coarse image intermediaries: (1) DP finetune an autoregressive spectral image tokenizer model on the low-resolution wavelet coefficients of the sensitive images, and (2) perform high-resolution upsampling using a publicly pretrained super-resolution model. By restricting the privacy budget to the global structures of the image in the first stage, and leveraging the post-processing property of DP for detail refinement, we achieve promising trade-offs between privacy and utility. Experiments on the MS-COCO and MM-CelebA-HQ datasets show that our method generates images with improved quality and style capture relative to other leading DP image frameworks.
Authors:Osama Zafar, Shaojie Zhan, Tianxi Ji, Erman Ayday
Abstract:
In recent years, the widespread adoption of Machine Learning as a Service (MLaaS), particularly in sensitive environments, has raised considerable privacy concerns. Of particular importance are membership inference attacks (MIAs), which exploit behavioral discrepancies between training and non-training data to determine whether a specific record was included in the model's training set, thereby presenting significant privacy risks. Although existing defenses, such as adversarial regularization, DP-SGD, and MemGuard, assist in mitigating these threats, they often entail trade-offs such as compromising utility, increased computational requirements, or inconsistent protection against diverse attack vectors. In this paper, we introduce a novel inference-time defense mechanism called Neighborhood Blending, which mitigates MIAs without retraining the model or incurring significant computational overhead. Our approach operates post-training by smoothing the model's confidence outputs based on the neighborhood of a queried sample. By averaging predictions from similar training samples selected using differentially private sampling, our method establishes a consistent confidence pattern, rendering members and non-members indistinguishable to an adversary while maintaining high utility. Significantly, Neighborhood Blending maintains label integrity (zero label loss) and ensures high utility through an adaptive, "pay-as-you-go" distortion strategy. It is a model-agnostic approach that offers a practical, lightweight solution that enhances privacy without sacrificing model utility. Through extensive experiments across diverse datasets and models, we demonstrate that our defense significantly reduces MIA success rates while preserving model performance, outperforming existing post-hoc defenses like MemGuard and training-time techniques like DP-SGD in terms of utility retention.
Authors:Merve Gülmez, Ruben Sturm, Hossam ElAtali, Håkan Englund, Jonathan Woodruff, N. Asokan, Thomas Nyman
Abstract:
While the CHERI instruction-set architecture extensions for capabilities enable strong spatial memory safety, CHERI lacks built-in temporal safety, particularly for heap allocations. Prior attempts to augment CHERI with temporal safety fall short in terms of scalability, memory overhead, and incomplete security guarantees due to periodical sweeps of the system's memory to individually revoke stale capabilities. We address these limitations by introducing colored capabilities that add a controlled form of indirection to CHERI's capability model. This enables provenance tracking of capabilities to their respective allocations via a hardware-managed provenance-validity table, allowing bulk retraction of dangling pointers without needing to quarantine freed memory. Colored capabilities significantly reduce the frequency of capability revocation sweeps while improving security. We realize colored capabilities in PICASSO, an extension of the CHERI-RISC-V architecture on a speculative out-of-order FPGA softcore (CHERI-Toooba). We also integrate colored-capability support into the CheriBSD OS and CHERI-enabled Clang/LLVM toolchain. Our evaluation shows effective mitigation of use-after-free and double-free bugs across all heap-based temporal memory-safety vulnerabilities in NIST Juliet test cases, with only a small performance overhead on SPEC CPU benchmarks (5% g.m.), less latency, and more consistent performance in long-running SQLite, PostgreSQL, and gRPC workloads compared to prior work.
Authors:Hedong Zhang, Neusha Javidnia, Shweta Pardeshi, Qian Lou, Farinaz Koushanfar
Abstract:
The widespread deployment of cloud-hosted generative models raises a fundamental challenge: enabling efficient autoregressive generation while preserving the privacy of both user prompts and model parameters in untrusted environments. We address this challenge in a client-server setting where an untrusted server hosts an autoregressive Transformer and the client requires cryptographic protection for both inputs and inference. We present CryptoGen, the first system to enable scalable privacy-preserving neural generation with persistent encrypted key-value (KV) cache reuse. Discriminative-task secure inference systems incur quadratic latency and memory growth when adapted to autoregressive decoding due to the lack of native encrypted KV-cache support. In contrast, CryptoGen achieves near-linear scaling by securely reusing and updating encrypted KV caches throughout generation. CryptoGen integrates homomorphic encryption and secret sharing to support both prefilling and generation. Key techniques include a unified encrypted KV-cache framework, heterogeneous SIMD encodings for different phases, optimized cipher-cipher matrix-matrix and matrix-vector operations, and efficient noise refresh and ciphertext concatenation mechanisms. Evaluation on generative Transformer models trained on WikiText-2, PTB, and LAMBADA shows that for input lengths of 128-512 tokens, CryptoGen achieves 4.4x-7.6x lower per-token latency than state-of-the-art discriminative secure inference systems, while maintaining near-linear latency and memory scaling, with advantages increasing for longer sequences. CryptoGen is released as an open-source library.
Authors:Yanshu Wang, Shuaishuai Yang, Jingjing He, Tong Yang
Abstract:
Large Language Models (LLMs) face increasing threats from jailbreak attacks that bypass safety alignment. While prompt-based defenses such as Role-Oriented Prompts (RoP) and Task-Oriented Prompts (ToP) have shown effectiveness, the role of few-shot demonstrations in these defense strategies remains unclear. Prior work suggests that few-shot examples may compromise safety, but lacks investigation into how few-shot interacts with different system prompt strategies. In this paper, we conduct a comprehensive evaluation on multiple mainstream LLMs across four safety benchmarks (AdvBench, HarmBench, SG-Bench, XSTest) using six jailbreak attack methods. Our key finding reveals that few-shot demonstrations produce opposite effects on RoP and ToP: few-shot enhances RoP's safety rate by up to 4.5% through reinforcing role identity, while it degrades ToP's effectiveness by up to 21.2% through distracting attention from task instructions. Based on these findings, we provide practical recommendations for deploying prompt-based defenses in real-world LLM applications.
Authors:Xiaozuo Shen, Yifei Cai, Rui Ning, Chunsheng Xin, Hongyi Wu
Abstract:
The widespread adoption of Vision Transformers (ViTs) elevates supply-chain risk on third-party model hubs, where an adversary can implant backdoors into released checkpoints. Existing ViT backdoor attacks largely rely on poisoned-data training, while prior data-free attempts typically require synthetic-data fine-tuning or extra model components. This paper introduces Data-Free Logic-Gated Backdoor Attacks (DF-LoGiT), a truly data-free backdoor attack on ViTs via direct weight editing. DF-LoGiT exploits ViT's native multi-head architecture to realize a logic-gated compositional trigger, enabling a stealthy and effective backdoor. We validate its effectiveness through theoretical analysis and extensive experiments, showing that DF-LoGiT achieves near-100% attack success with negligible degradation in benign accuracy and remains robust against representative classical and ViT-specific defenses.
Authors:Patrick Cooper, Alireza Nadali, Ashutosh Trivedi, Alvaro Velasquez
Abstract:
Large language models (LLMs) are known to exhibit brittle behavior under adversarial prompts and jailbreak attacks, even after extensive alignment and fine-tuning. This fragility reflects a broader challenge of modern neural language models: small, carefully structured perturbations in high-dimensional input spaces can induce large and unpredictable changes in internal semantic representations and output. We investigate monotonicity as an architectural inductive bias for improving the robustness of Transformer-based language models. Monotonicity constrains semantic transformations so that strengthening information, evidence, or constraints cannot lead to regressions in the corresponding internal representations. Such order-preserving behavior has long been exploited in control and safety-critical systems to simplify reasoning and improve robustness, but has traditionally been viewed as incompatible with the expressivity required by neural language models. We show that this trade-off is not inherent. By enforcing monotonicity selectively in the feed-forward sublayers of sequence-to-sequence Transformers -- while leaving attention mechanisms unconstrained -- we obtain monotone language models that preserve the performance of their pretrained counterparts. This architectural separation allows negation, contradiction, and contextual interactions to be introduced explicitly through attention, while ensuring that subsequent semantic refinement is order-preserving. Empirically, monotonicity substantially improves robustness: adversarial attack success rates drop from approximately 69% to 19%, while standard summarization performance degrades only marginally.
Authors:Terry Yue Zhuo, Yangruibo Ding, Wenbo Guo, Ruijie Meng
Abstract:
For over a decade, cybersecurity has relied on human labor scarcity to limit attackers to high-value targets manually or generic automated attacks at scale. Building sophisticated exploits requires deep expertise and manual effort, leading defenders to assume adversaries cannot afford tailored attacks at scale. AI agents break this balance by automating vulnerability discovery and exploitation across thousands of targets, needing only small success rates to remain profitable. Current developers focus on preventing misuse through data filtering, safety alignment, and output guardrails. Such protections fail against adversaries who control open-weight models, bypass safety controls, or develop offensive capabilities independently. We argue that AI-agent-driven cyber attacks are inevitable, requiring a fundamental shift in defensive strategy. In this position paper, we identify why existing defenses cannot stop adaptive adversaries and demonstrate that defenders must develop offensive security intelligence. We propose three actions for building frontier offensive AI capabilities responsibly. First, construct comprehensive benchmarks covering the full attack lifecycle. Second, advance from workflow-based to trained agents for discovering in-wild vulnerabilities at scale. Third, implement governance restricting offensive agents to audited cyber ranges, staging release by capability tier, and distilling findings into safe defensive-only agents. We strongly recommend treating offensive AI capabilities as essential defensive infrastructure, as containing cybersecurity risks requires mastering them in controlled settings before adversaries do.
Authors:Loes Kruger, Paul Kobialka, Andrea Pferscher, Einar Broch Johnsen, Sebastian Junges, Jurriaan Rot
Abstract:
Network protocol fingerprinting is used to identify a protocol implementation by analyzing its input-output behavior. Traditionally, fingerprinting operates under a closed-world assumption, where models of all implementations are assumed to be available. However, this assumption is unrealistic in practice. When this assumption does not hold, fingerprinting results in numerous misclassifications without indicating that a model for an implementation is missing. Therefore, we introduce an open-world variant of the fingerprinting problem, where not all models are known in advance. We propose an incremental fingerprinting approach to solve the problem by combining active automata learning with closed-world fingerprinting. Our approach quickly determines whether the implementation under consideration matches an available model using fingerprinting and conformance checking. If no match is found, it learns a new model by exploiting the structure of available models. We prove the correctness of our approach and improvements in asymptotic complexity compared to naive baselines. Moreover, experimental results on a variety of protocols demonstrate a significant reduction in misclassifications and interactions with these black-boxes.
Authors:Yifei Cai, Yizhou Feng, Qiao Zhang, Chunsheng Xin, Hongyi Wu
Abstract:
Privacy-preserving deep learning addresses privacy concerns in Machine Learning as a Service (MLaaS) by using Homomorphic Encryption (HE) for linear computations. However, the computational overhead remains a major challenge. While prior work has improved efficiency, most approaches build on models originally designed for plaintext inference. Such models incur architectural inefficiencies when adapted to HE. We argue that substantial gains require networks tailored to HE rather than retrofitting plaintext architectures. Our design has two components: the building block and the overall architecture. First, StriaBlock targets the most expensive HE operation, rotation. It integrates ExRot-Free Convolution and a novel Cross Kernel, eliminating external rotations and requiring only 19% of the internal rotations used by plaintext models. Second, our architectural principles include (i) the Focused Constraint Principle, which limits cost-sensitive factors while preserving flexibility elsewhere, and (ii) the Channel Packing-Aware Scaling Principle, which adapts bottleneck ratios to ciphertext channel capacity that varies with depth. Together, these strategies control both local and end-to-end HE cost, enabling a balanced HE-tailored network. We evaluate the resulting StriaNet across datasets of varying scales, including ImageNet, Tiny ImageNet, and CIFAR-10. At comparable accuracy, StriaNet achieves speedups of 9.78x, 6.01x, and 9.24x on ImageNet, Tiny ImageNet, and CIFAR-10, respectively.
Authors:Monika Santra, Bokai Zhang, Mark Lim, Vishnu Asutosh Dasu, Dongrui Zeng, Gang Tan
Abstract:
Indirect call resolution remains a key challenge in reverse engineering and control-flow graph recovery, especially for stripped or optimized binaries. Static analysis is sound but often over-approximates, producing many false positives, whereas machine-learning approaches can improve precision but may sacrifice completeness and generalization. We present iResolveX, a hybrid multi-layered framework that combines conservative static analysis with learning-based refinement. The first layer applies a conservative value-set analysis (BPA) to ensure high recall. The second layer adds a learning-based soft-signature scorer (iScoreGen) and selective inter-procedural backward analysis with memory inspection (iScoreRefine) to reduce false positives. The final output, p-IndirectCFG, annotates indirect edges with confidence scores, enabling downstream analyses to choose appropriate precision--recall trade-offs. Across SPEC CPU2006 and real-world binaries, iScoreGen reduces predicted targets by 19.2% on average while maintaining BPA-level recall (98.2%). Combined with iScoreRefine, the total reduction reaches 44.3% over BPA with 97.8% recall (a 0.4% drop). iResolveX supports both conservative, recall-preserving and F1-optimized configurations and outperforms state-of-the-art systems.
Authors:Jiazhu Xie, Bowen Li, Heyu Fu, Chong Gao, Ziqi Xu, Fengling Han
Abstract:
Large Language Model (LLM)-based question-answering systems offer significant potential for automating customer support and internal knowledge access in small businesses, yet their practical deployment remains challenging due to infrastructure costs, engineering complexity, and security risks, particularly in retrieval-augmented generation (RAG)-based settings. This paper presents an industry case study of an open-source, multi-tenant platform that enables small businesses to deploy customised LLM-based support chatbots via a no-code workflow. The platform is built on distributed, lightweight k3s clusters spanning heterogeneous, low-cost machines and interconnected through an encrypted overlay network, enabling cost-efficient resource pooling while enforcing container-based isolation and per-tenant data access controls. In addition, the platform integrates practical, platform-level defences against prompt injection attacks in RAG-based chatbots, translating insights from recent prompt injection research into deployable security mechanisms without requiring model retraining or enterprise-scale infrastructure. We evaluate the proposed platform through a real-world e-commerce deployment, demonstrating that secure and efficient LLM-based chatbot services can be achieved under realistic cost, operational, and security constraints faced by small businesses.
Authors:Nay Myat Min, Long H. Pham, Hongyu Zhang, Jun Sun
Abstract:
Single-pass hallucination detectors rely on internal telemetry (e.g., uncertainty, hidden-state geometry, and attention) of large language models, implicitly assuming hallucinations leave separable traces in these signals. We study a white-box, model-side adversary that fine-tunes lightweight LoRA adapters on the model while keeping the detector fixed, and introduce CORVUS, an efficient red-teaming procedure that learns to camouflage detector-visible telemetry under teacher forcing, including an embedding-space FGSM attention stress test. Trained on 1,000 out-of-distribution Alpaca instructions (<0.5% trainable parameters), CORVUS transfers to FAVA-Annotation across Llama-2, Vicuna, Llama-3, and Qwen2.5, and degrades both training-free detectors (e.g., LLM-Check) and probe-based detectors (e.g., SEP, ICR-probe), motivating adversary-aware auditing that incorporates external grounding or cross-model evidence.
Authors:Arth Bhardwaj, Nirav Diwan, Gang Wang
Abstract:
Web scraping has historically required technical expertise in HTML parsing, session management, and authentication circumvention, which limited large-scale data extraction to skilled developers. We argue that large language models (LLMs) have democratized web scraping, enabling low-skill users to execute sophisticated operations through simple natural language prompts. While extensive benchmarks evaluate these tools under optimal expert conditions, we show that without extensive manual effort, current LLM-based workflows allow novice users to scrape complex websites that would otherwise be inaccessible. We systematically benchmark what everyday users can do with off-the-shelf LLM tools across 35 sites spanning five security tiers, including authentication, anti-bot, and CAPTCHA controls. We devise and evaluate two distinct workflows: (a) LLM-assisted scripting, where users prompt LLMs to generate traditional scraping code but maintain manual execution control, and (b) end-to-end LLM agents, which autonomously navigate and extract data through integrated tool use. Our results demonstrate that end-to-end agents have made complex scraping accessible - requiring as little as a single prompt with minimal refinement (less than 5 changes) to complete workflows. We also highlight scenarios where LLM-assisted scripting may be simpler and faster for static sites. In light of these findings, we provide simple procedures for novices to use these workflows and gauge what adversaries could achieve using these.
Authors:Wiebe Vandendriessche, Jordi Thijsman, Laurens D'hooge, Bruno Volckaert, Merlijn Sebrechts
Abstract:
The rapid adoption of complex AI systems has outpaced the development of tools to ensure their transparency, security, and regulatory compliance. In this paper, the AI Bill of Materials (AIBOM), an extension of the Software Bill of Materials (SBOM), is introduced as a standardized, verifiable record of trained AI models and their environments. Our proof-of-concept platform, AIBoMGen, automates the generation of signed AIBOMs by capturing datasets, model metadata, and environment details during training. The training platform acts as a neutral, third-party observer and root of trust. It enforces verifiable AIBOM creation for every job. The system uses cryptographic hashing, digital signatures, and in-toto attestations to ensure integrity and protect against threats such as artifact tampering by dishonest model creators. Our evaluation demonstrates that AIBoMGen reliably detects unauthorized modifications to all artifacts and can generate AIBOMs with negligible performance overhead. These results highlight the potential of AIBoMGen as a foundational step toward building secure and transparent AI ecosystems, enabling compliance with regulatory frameworks like the EUs AI Act.
Authors:Firas Ben Hmida, Zain Sbeih, Philemon Hailemariam, Birhanu Eshete
Abstract:
Machine learning (ML) explainability is central to algorithmic transparency in high-stakes settings such as predictive diagnostics and loan approval. However, these same domains require rigorous privacy guaranties, creating tension between interpretability and privacy. Although prior work has shown that explanation methods can leak membership information, practitioners still lack systematic guidance on selecting or deploying explanation techniques that balance transparency with privacy. We present DeepLeak, a system to audit and mitigate privacy risks in post-hoc explanation methods. DeepLeak advances the state-of-the-art in three ways: (1) comprehensive leakage profiling: we develop a stronger explanation-aware membership inference attack (MIA) to quantify how much representative explanation methods leak membership information under default configurations; (2) lightweight hardening strategies: we introduce practical, model-agnostic mitigations, including sensitivity-calibrated noise, attribution clipping, and masking, that substantially reduce membership leakage while preserving explanation utility; and (3) root-cause analysis: through controlled experiments, we pinpoint algorithmic properties (e.g., attribution sparsity and sensitivity) that drive leakage. Evaluating 15 explanation techniques across four families on image benchmarks, DeepLeak shows that default settings can leak up to 74.9% more membership information than previously reported. Our mitigations cut leakage by up to 95% (minimum 46.5%) with only <=3.3% utility loss on average. DeepLeak offers a systematic, reproducible path to safer explainability in privacy-sensitive ML.
Authors:Yuqiao Xu, Mina Namazi, Sahith Reddy Jalapally, Osama Zafar, Youngjin Yoo, Erman Ayday
Abstract:
Learning and Employment Record (LER) systems are emerging as critical infrastructure for securely compiling and sharing educational and work achievements. Existing blockchain-based platforms leverage verifiable credentials but typically lack automated skill-credential generation and the ability to incorporate unstructured evidence of learning. In this paper,a privacy-preserving, AI-enabled decentralized LER system is proposed to address these gaps. Digitally signed transcripts from educational institutions are accepted, and verifiable self-issued skill credentials are derived inside a trusted execution environment (TEE) by a natural language processing pipeline that analyzes formal records (e.g., transcripts, syllabi) and informal artifacts. All verification and job-skill matching are performed inside the enclave with selective disclosure, so raw credentials and private keys remain enclave-confined. Job matching relies solely on attested skill vectors and is invariant to non-skill resume fields, thereby reducing opportunities for screening bias.The NLP component was evaluated on sample learner data; the mapping follows the validated Syllabus-to-O*NET methodology,and a stability test across repeated runs observed <5% variance in top-ranked skills. Formal security statements and proof sketches are provided showing that derived credentials are unforgeable and that sensitive information remains confidential. The proposed system thus supports secure education and employment credentialing, robust transcript verification,and automated, privacy-preserving skill extraction within a decentralized framework.
Authors:Maryam Mahdi Alhusseini, Alireza Rouhi, Mohammad-Reza Feizi-Derakhshi
Abstract:
Cybersecurity poses considerable problems to Cloud Computing (CC), especially regarding Intrusion Detection Systems (IDSs), facing difficulties with skewed datasets and suboptimal classification model performance. This study presents the Hybrid Intrusion Detection System (HyIDS), an innovative IDS that employs the Energy Valley Optimizer (EVO) for Feature Selection (FS). Additionally, it introduces a novel technique for enhancing the cybersecurity of cloud computing through the integration of machine learning methodologies with the EVO Algorithm. The Energy Valley Optimizer (EVO) effectively diminished features in the CIC-DDoS2019 dataset from 88 to 38 and in the CSE-CIC-IDS2018 data from 80 to 43, significantly enhancing computing efficiency. HyIDS incorporates four Machine Learning (ML) models: Support Vector Machine (SVM), Random Forest (RF), Decision Tree (D_Tree), and K-Nearest Neighbors (KNN). The proposed HyIDS was assessed utilizing two real-world intrusion datasets, CIC-DDoS2019 and CSE-CIC-IDS2018, both distinguished by considerable class imbalances. The CIC-DDoS2019 dataset has a significant imbalance between DDoS assault samples and legal traffic, while the CSE-CIC-IDS2018 dataset primarily comprises benign traffic with insufficient representation of attack types, complicating the detection of minority attacks. A downsampling technique was employed to balance the datasets, hence improving detection efficacy for both benign and malicious traffic. Twenty-four trials were done, revealing substantial enhancements in categorization accuracy, precision, and recall. Our suggested D_TreeEVO model attained an accuracy rate of 99.13% and an F1 score of 98.94% on the CIC-DDoS2019 dataset, and an accuracy rate of 99.78% and an F1 score of 99.70% on the CSE-CIC-IDS2018 data. These data demonstrate that EVO significantly improves cybersecurity in Cloud Computing (CC).
Authors:Andrés Fábrega, James Austgen, Samuel Breckenridge, Jay Yu, Amy Zhao, Sarah Allen, Aditya Saraf, Ari Juels
Abstract:
Collective Investment Algorithms (CoinAlgs) are increasingly popular systems that deploy shared trading strategies for investor communities. Their goal is to democratize sophisticated -- often AI-based -- investing tools. We identify and demonstrate a fundamental profitability-fairness tradeoff in CoinAlgs that we call the CoinAlg Bind: CoinAlgs cannot ensure economic fairness without losing profit to arbitrage. We present a formal model of CoinAlgs, with definitions of privacy (incomplete algorithm disclosure) and economic fairness (value extraction by an adversarial insider). We prove two complementary results that together demonstrate the CoinAlg Bind. First, privacy in a CoinAlg is a precondition for insider attacks on economic fairness. Conversely, in a game-theoretic model, lack of privacy, i.e., transparency, enables arbitrageurs to erode the profitability of a CoinAlg. Using data from Uniswap, a decentralized exchange, we empirically study both sides of the CoinAlg Bind. We quantify the impact of arbitrage against transparent CoinAlgs. We show the risks posed by a private CoinAlg: Even low-bandwidth covert-channel information leakage enables unfair value extraction.
Authors:Mohammed Latif Siddiq, Xinye Zhao, Vinicius Carvalho Lopes, Beatrice Casey, Joanna C. S. Santos
Abstract:
Autonomous coding agents are increasingly deployed as AI teammates in modern software engineering, independently authoring pull requests (PRs) that modify production code at scale. This study aims to systematically characterize how autonomous coding agents contribute to software security in practice, how these security-related contributions are reviewed and accepted, and which observable signals are associated with PR rejection. We conduct a large-scale empirical analysis of agent-authored PRs using the AIDev dataset, comprising of over 33,000 curated PRs from popular GitHub repositories. Security-relevant PRs are identified using a keyword filtering strategy, followed by manual validation, resulting in 1,293 confirmed security-related agentic-PRs. We then analyze prevalence, acceptance outcomes, and review latency across autonomous agents, programming ecosystems, and types of code changes. Moreover, we apply qualitative open coding to identify recurring security-related actions and underlying intents, and examine review metadata to identify early signals associated with PR rejection. Security-related Agentic-PRs constitute a meaningful share of agent activity (approximately 4\%). Rather than focusing solely on narrow vulnerability fixes, agents most frequently perform supportive security hardening activities, including testing, documentation, configuration, and improved error handling. Compared to non-security PRs, security-related Agentic-PRs exhibit lower merge rates and longer review latency, reflecting heightened human scrutiny, with variation across agents and programming ecosystems. PR rejection is more strongly associated with PR complexity and verbosity than with explicit security topics.
Authors:Kemal Mutluergil, Deniz Elbek, Kamer Kaya, Erkay Savaş
Abstract:
Homomorphic encryption (HE) enables computation over encrypted data but incurs a substantial overhead. For sparse-matrix vector multiplication, the widely used Halevi and Shoup (2014) scheme has a cost linear in the number of occupied cyclic diagonals, which may be many due to the irregular nonzero pattern of the matrix. In this work, we study how to permute the rows and columns of a sparse matrix so that its nonzeros are packed into as few cyclic diagonals as possible. We formalise this as the two-dimensional diagonal packing problem (2DPP), introduce the two-dimensional circular bandsize metric, and give an integer programming formulation that yields optimal solutions for small instances. For large matrices, we propose practical ordering heuristics that combine graph-based initial orderings - based on bandwidth reduction, anti-bandwidth maximisation, and spectral analysis - and an iterative-improvement-based optimization phase employing 2OPT and 3OPT swaps. We also introduce a dense row/column elimination strategy and an HE-aware cost model that quantifies the benefits of isolating dense structures. Experiments on 175 sparse matrices from the SuiteSparse collection show that our ordering-optimisation variants can reduce the diagonal count by $5.5\times$ on average ($45.6\times$ for one instance). In addition, the dense row/column elimination approach can be useful for cases where the proposed permutation techniques are not sufficient; for instance, in one case, the additional elimination helped to reduce the encrypted multiplication cost by $23.7\times$ whereas without elimination, the improvement was only $1.9\times$.
Authors:Aobo Chen, Chenxu Zhao, Chenglin Miao, Mengdi Huai
Abstract:
Large language models (LLMs) possess strong semantic understanding, driving significant progress in data mining applications. This is further enhanced by large reasoning models (LRMs), which provide explicit multi-step reasoning traces. On the other hand, the growing need for the right to be forgotten has driven the development of machine unlearning techniques, which aim to eliminate the influence of specific data from trained models without full retraining. However, unlearning may also introduce new security vulnerabilities by exposing additional interaction surfaces. Although many studies have investigated unlearning attacks, there is no prior work on LRMs. To bridge the gap, we first in this paper propose LRM unlearning attack that forces incorrect final answers while generating convincing but misleading reasoning traces. This objective is challenging due to non-differentiable logical constraints, weak optimization effect over long rationales, and discrete forget set selection. To overcome these challenges, we introduce a bi-level exact unlearning attack that incorporates a differentiable objective function, influential token alignment, and a relaxed indicator strategy. To demonstrate the effectiveness and generalizability of our attack, we also design novel optimization frameworks and conduct comprehensive experiments in both white-box and black-box settings, aiming to raise awareness of the emerging threats to LRM unlearning pipelines.
Authors:Ashish Kundu, Ramana Kompella
Abstract:
As quantum computing matures toward the realization of Cryptographically Relevant Quantum Computers (CRQC), global cryptographic infrastructure faces an existential threat. This paper introduces a two-dimensional coordinate system to map the co-evolution of cryptographic resilience (x-axis) and computational capability (y-axis). By analyzing the four resulting quadrants, we categorize the transition from legacy classical systems to quantum-resilient architectures. We argue that the "Quantum Gap" - the delta between CRQC arrival and quantum-safe adoption represents the highest systemic risk, necessitating an immediate transition to crypto-agile frameworks.
Authors:Jiaqing Li, Zhibo Zhang, Shide Zhou, Yuxi Li, Tianlong Yu, Kailong Wang
Abstract:
Model merging has emerged as a powerful technique for combining specialized capabilities from multiple fine-tuned LLMs without additional training costs. However, the security implications of this widely-adopted practice remain critically underexplored. In this work, we reveal that model merging introduces a novel attack surface that can be systematically exploited to compromise safety alignment. We present TrojanMerge,, a framework that embeds latent malicious components into source models that remain individually benign but produce severely misaligned models when merged. Our key insight is formulating this attack as a constrained optimization problem: we construct perturbations that preserve source model safety through directional consistency constraints, maintain capabilities via Frobenius directional alignment constraints, yet combine during merging to form pre-computed attack vectors. Extensive experiments across 9 LLMs from 3 model families demonstrate that TrojanMerge, consistently achieves high harmful response rates in merged models while source models maintain safety scores comparable to unmodified versions. Our attack succeeds across diverse merging algorithms and remains effective under various hyperparameter configurations. These findings expose fundamental vulnerabilities in current model merging practices and highlight the urgent need for security-aware mechanisms.
Authors:Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa, Kai Greshake, Hanshen Xiao, Chaowei Xiao, G. Edward Suh
Abstract:
AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper discusses our vision for system-level defenses against indirect prompt injection attacks. We articulate three positions: (1) dynamic replanning and security policy updates are often necessary for dynamic tasks and realistic environments; (2) certain context-dependent security decisions would still require LLMs (or other learned models), but should only be made within system designs that strictly constrain what the model can observe and decide; (3) in inherently ambiguous cases, personalization and human interaction should be treated as core design considerations. In addition to our main positions, we discuss limitations of existing benchmarks that can create a false sense of utility and security. We also highlight the value of system-level defenses, which serve as the skeleton of agentic systems by structuring and controlling agent behaviors, integrating rule-based and model-based security checks, and enabling more targeted research on model robustness and human interaction.
Authors:Zhuoran Tan, Jeremy Singer, Christos Anagnostopoulos
Abstract:
Modern Large Language Model (LLM) systems are assembled from third-party artifacts such as pre-trained weights, fine-tuning adapters, datasets, dependency packages, and container images, fetched through automated pipelines. This speed comes with supply-chain risks, including compromised dependencies, malicious hub artifacts, unsafe deserialization, forged provenance, and backdoored models. A core gap is that training and release claims (e.g., data and code lineage, build environment, and security scanning results) are rarely cryptographically bound to the artifacts they describe, making enforcement inconsistent across teams and stages. We propose an attestation-aware promotion gate: before an artifact is admitted into trusted environments (training, fine-tuning, deployment), the gate verifies claim evidence, enforces safe loading and static scanning policies, and applies secure-by-default deployment constraints. When organizations operate runtime security tooling, the same gate can optionally ingest standardized dynamic signals via plugins to reduce uncertainty for high-risk artifacts. We outline a practical claims-to-controls mapping and an evaluation blueprint using representative supply-chain scenarios and operational metrics (coverage and decisions), charting a path toward a full research paper.
Authors:Sushil Kumar, Soumya P. Dash, George C. Alexandropoulos
Abstract:
A multiple-input multiple-output (MIMO) system operating at terahertz (THz) frequencies and consisting of a transmitter, Alice, that encodes secret keys using Gaussian-modulated coherent states, which are communicated to a legitimate receiver, Bob, under the assistance of a reconfigurable intelligent surface (RIS) is considered in this paper. The composite wireless channel comprising the direct Alice-to-Bob signal propagation path and the RIS-enabled reflected one is modeled as a passive linear Gaussian quantum channel, allowing for a unitary dilation that preserves the canonical commutation relations. The security of the considered RIS-empowered MIMO system is analyzed under collective Gaussian entangling attacks, according to which an eavesdropper, Eve, is assumed to have access to environmental modes associated with specific propagation segments. We also study, as a benchmark, the case where Eve has access to the purification of the overall channel. The legitimate receiver, Bob, is designed to deploy homodyne detection and reverse reconciliation for key extraction. Novel expressions for the achievable secret key rate (SKR) of the system are derived for both the considered eavesdropping scenarios. Furthermore, an optimization framework is developed to determine the optimal RIS phase configuration matrix that maximizes the SKR performance. The resulting optimization problem is efficiently solved using particle swarm optimization. Numerical results are presented to demonstrate the system's performance with respect to various free parameters. It is showcased that the considered RIS plays a crucial role in enhancing the SKR of the system as well as in extending the secure communication range. This establishes RIS-assisted THz MIMO CV-QKD as a promising solution for next generation secure wireless networks.
Authors:Minjia Shi, Xuan Wang, Bouazzaoui Zakariae, Jon-Lark Kim, Patrick Solé
Abstract:
Wall-Sun-Sun primes (shortly WSS primes) are defined as those primes $p$ such that the period of the Fibonacci recurrence is the same modulo $p$ and modulo $p^2.$ This concept has been generalized recently to certain second order recurrences whose characteristic polynomials admit as a zero the principal unit of $\mathbb{Q}(\sqrt{d}),$ for some integer $d>0.$ Primes of the latter type we call $WSS(d).$ They correspond to the case when $\mathbb{Q}(\sqrt{d})$ is not $p$-rational. For such a prime $p$ we study the weight distributions of the cyclic codes over $\mathbb{F}_p$ and $\mathbb{Z}_{p^2}$ whose check polynomial is the reciprocal of the said characteristic polynomial. Some of these codes are MDS (reducible case) or NMDS (irreducible case).
Authors:Toluwani Aremu, Daniil Ognev, Samuele Poppi, Nils Lukas
Abstract:
Large language models (LLMs) can be misused to reveal sensitive information, such as weapon-making instructions or writing malware. LLM providers rely on $\emph{monitoring}$ to detect and flag unsafe behavior during inference. An open security challenge is $\emph{adaptive}$ adversaries who craft attacks that simultaneously (i) evade detection while (ii) eliciting unsafe behavior. Adaptive attackers are a major concern as LLM providers cannot patch their security mechanisms, since they are unaware of how their models are being misused. We cast $\emph{robust}$ LLM monitoring as a security game, where adversaries who know about the monitor try to extract sensitive information, while a provider must accurately detect these adversarial queries at low false positive rates. Our work (i) shows that existing LLM monitors are vulnerable to adaptive attackers and (ii) designs improved defenses through $\emph{activation watermarking}$ by carefully introducing uncertainty for the attacker during inference. We find that $\emph{activation watermarking}$ outperforms guard baselines by up to $52\%$ under adaptive attackers who know the monitoring algorithm but not the secret key.
Authors:Harish Karthikeyan, Antigoni Polychroniadou
Abstract:
Privacy-preserving aggregation is a cornerstone for AI systems that learn from distributed data without exposing individual records, especially in federated learning and telemetry. Existing two-server protocols (e.g., Prio and successors) set a practical baseline by validating inputs while preventing any single party from learning users' values, but they impose symmetric costs on both servers and communication that scales with the per-client input dimension $L$. Modern learning tasks routinely involve dimensionalities $L$ in the tens to hundreds of millions of model parameters. We present TAPAS, a two-server asymmetric private aggregation scheme that addresses these limitations along four dimensions: (i) no trusted setup or preprocessing, (ii) server-side communication that is independent of $L$ (iii) post-quantum security based solely on standard lattice assumptions (LWE, SIS), and (iv) stronger robustness with identifiable abort and full malicious security for the servers. A key design choice is intentional asymmetry: one server bears the $O(L)$ aggregation and verification work, while the other operates as a lightweight facilitator with computation independent of $L$. This reduces total cost, enables the secondary server to run on commodity hardware, and strengthens the non-collusion assumption of the servers. One of our main contributions is a suite of new and efficient lattice-based zero-knowledge proofs; to our knowledge, we are the first to establish privacy and correctness with identifiable abort in the two-server setting.
Authors:Ján Mikulec, Jakub Breier, Xiaolu Hou
Abstract:
Test Vector Leakage Assessment (TVLA) based on Welch's $t$-test has become a standard tool for detecting side-channel leakage. However, its mean-based nature can limit sensitivity when leakage manifests primarily through higher-order distributional differences. As our experiments show, this property becomes especially crucial when it comes to evaluating neural network implementations. In this work, we propose Anderson--Darling Leakage Assessment (ADLA), a leakage detection framework that applies the two-sample Anderson--Darling test for leakage detection. Unlike TVLA, ADLA tests equality of the full cumulative distribution functions and does not rely on a purely mean-shift model. We evaluate ADLA on a multilayer perceptron (MLP) trained on MNIST and implemented on a ChipWhisperer-Husky evaluation platform. We consider protected implementations employing shuffling and random jitter countermeasures. Our results show that ADLA can provide improved leakage-detection sensitivity in protected implementations for a low number of traces compared to TVLA.
Authors:Amal Raj, Vivek Balachandran
Abstract:
Quantum computing often requires classical data to be supplied to execution environments that may not be fully trusted or isolated. While encryption protects data at rest and in transit, it provides limited protection once computation begins, when classical values are encoded into quantum registers. This paper explores data obfuscation for protecting classical values during quantum computation. To the best of our knowledge, we present the first explicit data obfuscation technique designed to protect classical values during quantum execution. We propose an obfuscation technique that encodes sensitive data into structured quantum representations across multiple registers, avoiding direct exposure while preserving computational usability. Reversible quantum operations and amplitude amplification allow selective recovery of valid encodings without revealing the underlying data. We evaluate the feasibility of the proposed method through simulation and analyze its resource requirements and practical limitations. Our results highlight data obfuscation as a complementary security primitive for quantum computing.
Authors:Zhuoran Tan, Wenbo Guo, Taylor Brierley, Jiewen Luo, Jeremy Singer, Christos Anagnostopoulos
Abstract:
Advanced software supply chain (SSC) attacks are increasingly runtime-only and leave fragmented evidence across hosts, services, and build/dependency layers, so any single telemetry stream is inherently insufficient to reconstruct full compromise chains under realistic access and budget limits. We present SynthChain, a near-production testbed and a multi-source runtime dataset with chain-level ground truth, derived from real-world malicious packages and exploit campaigns. SynthChain covers seven representative supply-chain exploit scenarios across PyPI, npm, and a native C/C++ supply-chain case, spanning Windows and Linux, and involving four hosts and one containerized environment. Scenarios span realistic time windows from minutes to hours and are annotated with 14 MITRE ATT&CK tactics and 161 techniques (29-104 techniques per scenario). Beyond releasing the data, we quantify observability constraints by mapping each chain step to the minimum evidence needed for detection and cross-source correlation. With realistic trace availability, no single source is chain-complete: the best single source reaches only 0.391 weighted tag/step coverage and 0.403 mean chain reconstruction. Even minimal two-source fusion boosts coverage to 0.636 and reconstruction to 0.639 (approximately 1.6x gain), with consistent chain coverage/recall improvements (0.545). The corpus contains approximately 0.58M raw multi-source events and 1.50M evaluation rows, enabling controlled studies of detection under constrained telemetry. We release the dataset, ground truth, and artifacts to support reproducible, forensic-aware runtime defenses and to guide efficient detection for software supply chains.
Authors:Yulong Ming, Mingyue Wang, Jijia Yang, Cong Wang, Xiaohua Jia
Abstract:
Retrieval-Augmented Generation (RAG) enables large language models to use external knowledge, but outsourcing the RAG service raises privacy concerns for both data owners and users. Privacy-preserving RAG systems address these concerns by performing secure top-$k$ retrieval, which typically is secure sorting to identify relevant documents. However, existing systems face challenges supporting arbitrary $k$ due to their inability to change $k$, new security issues, or efficiency degradation with large $k$. This is a significant limitation because modern long-context models generally achieve higher accuracy with larger retrieval sets. We propose $p^2$RAG, a privacy-preserving RAG service that supports arbitrary top-$k$ retrieval. Unlike existing systems, $p^2$RAG avoids sorting candidate documents. Instead, it uses an interactive bisection method to determine the set of top-$k$ documents. For security, $p^2$RAG uses secret sharing on two semi-honest non-colluding servers to protect the data owner's database and the user's prompt. It enforces restrictions and verification to defend against malicious users and tightly bound the information leakage of the database. The experiments show that $p^2$RAG is 3--300$\times$ faster than the state-of-the-art PRAG for $k = 16$--$1024$.
Authors:Jack Vanlyssel, Gruia-Catalin Roman, Afsah Anwar
Abstract:
Spoofing attacks are among the most destructive cyber threats to terrestrial systems, and they become even more dangerous in space, where satellites cannot be easily serviced, and operators depend on accurate telemetry to ensure mission success. When telemetry is compromised, entire spaceborne missions are placed at risk. Prior work on spoofing has largely focused on attacks from Earth, such as injecting falsified uplinks or overpowering downlinks with stronger radios. In contrast, onboard spoofing originating from within the satellite itself remains an underexplored and underanalyzed threat. This vector is particularly concerning given that modern satellites, especially small satellites, rely on modular architectures and globalized supply chains that reduce cost and accelerate development but also introduce hidden risks. This paper presents an end-to-end demonstration of an internal satellite spoofing attack delivered through a compromised vendor-supplied component implemented in NASA's NOS3 simulation environment. Our rogue Core Flight Software application passed integration and generated packets in the correct format and cadence that the COSMOS ground station accepted as legitimate. By undermining both onboard estimators and ground operator views, the attack directly threatens mission integrity and availability, as corrupted telemetry can bias navigation, conceal subsystem failures, and mislead operators into executing harmful maneuvers. These results expose component-level telemetry spoofing as an overlooked supply-chain vector distinct from jamming or external signal injection. We conclude by discussing practical countermeasures-including authenticated telemetry, component attestation, provenance tracking, and lightweight runtime monitoring-and highlight the trade-offs required to secure resource-constrained small satellites.
Authors:Ondřej Lukáš, Jihoon Shin, Emilia Rivas, Diego Forni, Maria Rigaki, Carlos Catania, Aritran Piplai, Christopher Kiekintveld, Sebastian Garcia
Abstract:
Autonomous offensive agents often fail to transfer beyond the networks on which they are trained. We isolate a minimal but fundamental shift -- unseen host/subnet IP reassignment in an otherwise fixed enterprise scenario -- and evaluate attacker generalization in the NetSecGame environment. Agents are trained on five IP-range variants and tested on a sixth unseen variant; only the meta-learning agent may adapt at test time. We compare three agent families (traditional RL, adaptation agents, and LLM-based agents) and use action-distribution-based behavioral/XAI analyses to localize failure modes. Some adaptation methods show partial transfer but significant degradation under unseen reassignment, indicating that even address-space changes can break long-horizon attack policies. Under our evaluation protocol and agent-specific assumptions, prompt-driven pretrained LLM agents achieve the highest success on the held-out reassignment, but at the cost of increased inference-time compute, reduced transparency, and practical failure modes such as repetition/invalid-action loops.
Authors:Rasoul Akhavan Mahdavi, Abdulrahman Diaa, Florian Kerschbaum
Abstract:
Private Information Retrieval (PIR) allows a client to privately access a database without revealing which element is accessed. Initial PIR protocols based on Ring Learning with Errors (RLWE) demonstrated the practicality of PIR, but achieve limited throughput. Alternatively, high-throughput protocols leverage an offline phase that requires substantial client-side storage (e.g., hints in SimplePIR) or involve prohibitive communication costs during the offline phase (e.g., Piano). These limitations conflict with the practical constraints of resource-limited clients and are further exacerbated by dynamic databases, where updates necessitate costly regeneration and retransmission of hints. To address these challenges, we propose ZipPIR, a high-throughput PIR protocol that compresses LWE ciphertexts into significantly smaller Paillier ciphertexts. ZipPIR leverages the offline phase to obtain this size reduction without incurring the associated computational cost in the online phase. Moreover, under computational assumptions, ZipPIR features an almost silent offline phase, requiring no communication beyond an initial public key, enabling the server to independently generate and update hints during idle times without client interaction. ZipPIR achieves over 2 GB/s of throughput - comparable to state-of-the-art protocols such as SimplePIR - without the need for a large client-stored hint. For PIR over a 1 GB database, ZipPIR has up to 10x higher throughput than existing protocols with no client-side storage, while requiring less than 200 KB of server-side storage per client, significantly enhancing scalability for practical deployments. While prior PIR protocols using Paillier are very inefficient, ZipPIR is the first PIR protocol using Paillier that achieves throughput that is competitive with state-of-the-art PIR protocols.
Authors:Charlie Harrison, Pasin Manurangsi
Abstract:
A common problem in private data analysis is the partition selection problem, where each user holds a set of partitions (e.g. keys in a GROUP BY operation) from a possibly unbounded set. The challenge here is in maximizing the set of released partitions while respecting a differential privacy constraint. Previous work [Desfontaines et al., PoPETS 2022] presented an optimal $(\varepsilon, δ)$-DP algorithm when each user submits only a single partition. We generalize this approach to find the optimal algorithm under $δ$-approximate $(α, \varepsilon)$-Rényi differential privacy (RDP), which allows much tighter analysis under composition. Motivated by the non-existence of a general optimality result in the case where users submit multiple partitions each, we present an extension of our optimal algorithm tuned for $L^2$ bounded weighted partition selection which can be used as a drop-in improvement over the Gaussian mechanism any time the partition frequency is not also needed. We show that our primitive can be easily plugged into state of the art partition selection algorithms (PolicyGaussian from [Gopi et al., ICML 2020] and MAD2R from [Chen et al., ICML 2025]), improving performance both for parallel and sequential adaptive algorithms. Finally, we show that there is an inherent cost to algorithms which do support releasing the frequency as well as the partitions. Specifically, we formulate a basic notion of optimal approximate RDP algorithm for partition selection using additive noise, and show that there is a numerical separation between additive and non-additive noise mechanisms for this problem.
Authors:David Heye, Karl Kindermann, Robin Decker, Johannes Lohmöller, Anastasiia Belova, Sandra Geisler, Klaus Wehrle, Jan Pennekamp
Abstract:
Artifact Evaluation (AE) is essential for ensuring the transparency and reliability of research, closing the gap between exploratory work and real-world deployment is particularly important in cybersecurity, particularly in IoT and CPSs, where large-scale, heterogeneous, and privacy-sensitive data meet safety-critical actuation. Yet, manual reproducibility checks are time-consuming and do not scale with growing submission volumes. In this work, we demonstrate that Large Language Models (LLMs) can provide powerful support for AE tasks: (i) text-based reproducibility rating, (ii) autonomous sandboxed execution environment preparation, and (iii) assessment of methodological pitfalls. Our reproducibility-assessment toolkit yields an accuracy of over 72% and autonomously sets up execution environments for 28% of runnable cybersecurity artifacts. Our automated pitfall assessment detects seven prevalent pitfalls with high accuracy ($F_1$ > 92%). Hence, the toolkit significantly reduces reviewer effort and, when integrated into established AE processes, could incentivize authors to submit higher-quality and more reproducible artifacts. IoT, CPS, and cybersecurity conferences and workshops may integrate the toolkit into their peer-review processes to support reviewers' decisions on awarding artifact badges, improving the overall sustainability of the process.
Authors:Qianying Liao, Jonah Bellemans, Laurens Sion, Xue Jiang, Dmitrii Usynin, Xuebing Zhou, Dimitri Van Landuyt, Lieven Desmet, Wouter Joosen
Abstract:
As generative AI (GenAI) systems become increasingly prevalent across various technological stacks, the question of how such systems handle sensitive and personal data flows becomes increasingly important. Specifically, both the ability to harness and process large swaths of information as well as their stochastic nature raise key concerns related to both security and privacy. Unfortunately, while some of the traditional security threat modeling can effectively identify certain violations, privacy-related issues are often overlooked. To respond to these challenges, we introduce a novel domain-specific privacy threat modeling framework to support the privacy threat analysis of GenAI-based applications. This framework is constructed through a two-pronged approach: (1) a systematic review of the emerging literature on GenAI privacy threats, and (2) a case-driven application to a representative Chatbot system. These efforts yield a foundational GenAI privacy threat modeling framework built on LINDDUN. The new framework affects three out of the seven privacy threat types of LINDDUN and introduces 100 new GenAI examples to the knowledge base. Its effectiveness is validated on an AI Agent system, which demonstrates that a comprehensive privacy analysis can be supported by the new framework.
Authors:Michael Rettinger, Ben Beaumont, Nhien-An Le-Khac, Hong-Hanh Nguyen-Le
Abstract:
The proliferation of deepfake imagery poses escalating challenges for practitioners tasked with verifying digital media authenticity. While detection algorithm research is abundant, empirical evaluations of publicly accessible tools that practitioners actually use remain scarce. This paper presents the first cross-paradigm evaluation of six tools, spanning two complementary detection approaches: forensic analysis tools (InVID \& WeVerify, FotoForensics, Forensically) and AI-based classifiers (DecopyAI, FaceOnLive, Bitmind). Both tool categories were evaluated by professional investigators with law enforcement experience using blinded protocols across datasets comprising authentic, tampered, and AI-generated images sourced from DF40, CelebDF, and CASIA-v2. We report three principal findings: forensic tools exhibit high recall but poor specificity, while AI classifiers demonstrate the inverse pattern; human evaluators substantially outperform all automated tools; and human-AI disagreement is asymmetric, with human judgment prevailing in the vast majority of discordant cases. We discuss implications for practitioner workflows and identify critical gaps in current detection capabilities.
Authors:Romina Omidi, Yun Dong, Binghui Wang
Abstract:
Google's SynthID-Text, the first ever production-ready generative watermark system for large language model, designs a novel Tournament-based method that achieves the state-of-the-art detectability for identifying AI-generated texts. The system's innovation lies in: 1) a new Tournament sampling algorithm for watermarking embedding, 2) a detection strategy based on the introduced score function (e.g., Bayesian or mean score), and 3) a unified design that supports both distortionary and non-distortionary watermarking methods. This paper presents the first theoretical analysis of SynthID-Text, with a focus on its detection performance and watermark robustness, complemented by empirical validation. For example, we prove that the mean score is inherently vulnerable to increased tournament layers, and design a layer inflation attack to break SynthID-Text. We also prove the Bayesian score offers improved watermark robustness w.r.t. layers and further establish that the optimal Bernoulli distribution for watermark detection is achieved when the parameter is set to 0.5. Together, these theoretical and empirical insights not only deepen our understanding of SynthID-Text, but also open new avenues for analyzing effective watermark removal strategies and designing robust watermarking techniques. Source code is available at https: //github.com/romidi80/Synth-ID-Empirical-Analysis.
Authors:Federico Villa, F. Betül Durak, Tadayoshi Kohno, Tapdig Maharramli, Franziska Roesner
Abstract:
Recent progress in (Large) Language Models (LMs) has enabled the development of autonomous LM-based agents capable of executing complex tasks with minimal supervision. These agents have started to be integrated into systems with significant autonomy and authority. The security community has been studying their security. One emerging direction to mitigate security risks is to constrain agent behaviours via access control and permissioning mechanisms. Existing permissioning proposals, however, remain difficult to compare due to the absence of a shared formal foundation. This work provides such a foundation. We first systematize the landscape by constructing an attack taxonomy tailored to language models, the computational primitives of agentic systems. We then develop a formal treatment of agentic access control by defining an AIOracle algorithmically and introducing a security-game framework that captures completeness (in the absence of an adversary) and adversarial robustness. Our security game unifies confidentiality, integrity, and availability within a single model. Using this framework, we show that existing approaches to confidentiality of training data fundamentally conflict with completeness. Finally, we formalize a modular decomposition of helpfulness and harmlessness objectives and prove its soundness, in order to enable principled reasoning about the security of agentic system designs. Our studies suggests that if we were to design a secure system with measurable security, then we might want to use a modular approach to break the problem into sub-problems and let the composition on different modules complete the design. Our studies show that this natural approach with the relevant formalism is needed to prove security reductions.
Authors:Jingyuan Xie, Wenjie Wang, Ji Wu, Jiandong Gao
Abstract:
Supervised fine-tuning (SFT) is essential for the development of medical large language models (LLMs), yet prior poisoning studies have mainly focused on the detectable backdoor attacks. We propose a novel poisoning attack targeting the reasoning process of medical LLMs during SFT. Unlike backdoor attacks, our method injects poisoned rationales into few-shot training data, leading to stealthy degradation of model performance on targeted medical topics. Results showed that knowledge overwriting was ineffective, while rationale poisoning caused significant decline on the accuracy of the target subject, as long as no correct samples of the same subject appear in the dataset. A minimum number and ratio of poisoned samples was needed to carry out an effective and stealthy attack, which was more efficient and accurate than catastrophic forgetting. We demonstrate though this study the risk of SFT-stage poisoning, hoping to spur more studies of defense in the sensitive medical domain.
Authors:Bahirah Adewunmi, Edward Raff, Sanjay Purushotham
Abstract:
Automating network security analysis, particularly the identification of potential attack paths, presents significant challenges. Due in part to the sequential, interconnected, and evolutionary nature of system events which most artificial intelligence (AI) techniques struggle to model effectively. This paper proposes a Reinforcement Learning (RL) environment generation framework that simulates the sequence of processes executed on a Windows operating system, enabling dynamic modeling of malicious processes on a system. This methodology models operating system state and transitions using a graph representation. This graph is derived from open-source System Monitor (Sysmon) logs. To address the variety in system event types, fields, and log formats, a mechanism was developed to capture and model parent-child processes from Sysmon logs. A Gymnasium environment (SubstratumGraphEnv) was constructed to establish the perceptible basis for an RL environment, and a customized PyTorch interface was also built (SubstratumBridge) to translate Gymnasium graphs into Deep Reinforcement Learning (DRL) observations and discrete actions. Graph Convolutional Networks (GCNs) concretize the graph's local and global state, which feed the distinct policy and critic heads of an Advantage Actor-Critic (A2C) model. This work's central contribution lies in the design of a novel deep graphical RL environment that automates translation of sequential user and system events, furnishing crucial context for cybersecurity analysis. This work provides a foundation for future research into shaping training parameters and advanced reward shaping, while also offering insight into which system events attributes are critical to training autonomous RL agents.
Authors:Amal Raj, Vivek Balachandran
Abstract:
As quantum computing platforms increasingly adopt cloud-based execution, users submit quantum circuits to remote compilers and backends, trusting that what they submit is exactly what will be run. This shift introduces new trust assumptions in the submission pipeline, which remain largely unexamined. In this paper, we present QSpy, the first proof-of-concept Quantum Remote Access Trojan capable of intercepting quantum circuits in transit. Once deployed on a user's machine, QSpy silently installs a rogue certificate authority and proxies outgoing API traffic, enabling a man-in-the-middle (MITM) attack on submitted quantum circuits. We show that the intercepted quantum circuits may be forwarded to a remote server, which is capable of categorizing, storing, and analyzing them, without disrupting execution or triggering authentication failures. Our prototype targets IBM Qiskit APIs on a Windows system, but the attack model generalizes to other delegated quantum computing workflows. This work highlights the urgent need for submission-layer protections and demonstrates how even classical attack primitives can pose critical threats to quantum workloads.
Authors:Meisam Mohammady, Qin Yang, Nicholas Stout, Ayesha Samreen, Han Wang, Christopher J Quinn, Yuan Hong
Abstract:
Differentially Private Stochastic Gradient Descent (DP-SGD) is a cornerstone technique for ensuring privacy in deep learning, widely used in both training from scratch and fine-tuning large-scale language models. While DP-SGD predominantly relies on the Gaussian mechanism, the Laplace mechanism remains underutilized due to its reliance on L1 norm clipping. This constraint severely limits its practicality in high-dimensional models because the L1 norm of an n-dimensional gradient can be up to sqrt(n) times larger than its L2 norm. As a result, the required noise scale grows significantly with model size, leading to poor utility or untrainable models. In this work, we introduce Lap2, a new solution that enables L2 clipping for Laplace DP-SGD while preserving strong privacy guarantees. We overcome the dimensionality-driven clipping barrier by computing coordinate-wise moment bounds and applying majorization theory to construct a tight, data-independent upper bound over the full model. By exploiting the Schur-convexity of the moment accountant function, we aggregate these bounds using a carefully designed majorization set that respects the L2 clipping constraint. This yields a multivariate privacy accountant that scales gracefully with model dimension and enables the use of thousands of moments. Empirical evaluations demonstrate that our approach significantly improves the performance of Laplace DP-SGD, achieving results comparable to or better than Gaussian DP-SGD under strong privacy constraints. For instance, fine-tuning RoBERTa-base (125M parameters) on SST-2 achieves 87.88% accuracy at epsilon=0.54, outperforming Gaussian (87.16%) and standard Laplace (48.97%) under the same budget.
Authors:Inderjeet Singh, Vikas Pahuja, Aishvariya Priya Rathina Sabapathy, Chiara Picardi, Amit Giloni, Roman Vainshtein, Andrés Murillo, Hisashi Kojima, Motoyoshi Sekiya, Yuki Unno, Junichi Suga
Abstract:
Current stateless defences for multimodal agentic RAG fail to detect adversarial strategies that distribute malicious semantics across retrieval, planning, and generation components. We formulate this security challenge as a Partially Observable Markov Decision Process (POMDP), where adversarial intent is a latent variable inferred from noisy multi-stage observations. We introduce MMA-RAG^T, an inference-time control framework governed by a Modular Trust Agent (MTA) that maintains an approximate belief state via structured LLM reasoning. Operating as a model-agnostic overlay, MMA-RAGT mediates a configurable set of internal checkpoints to enforce stateful defence-in-depth. Extensive evaluation on 43,774 instances demonstrates a 6.50x average reduction factor in Attack Success Rate relative to undefended baselines, with negligible utility cost. Crucially, a factorial ablation validates our theoretical bounds: while statefulness and spatial coverage are individually necessary (26.4 pp and 13.6 pp gains respectively), stateless multi-point intervention can yield zero marginal benefit under homogeneous stateless filtering when checkpoint detections are perfectly correlated.
Authors:Refat Othman, Diaeddin Rimawi, Bruno Rossi, Barbara Russo
Abstract:
Identifying the vulnerabilities exploited during cyberattacks is essential for enabling timely responses and effective mitigation in software security. This paper directly examines the process of predicting software vulnerabilities, specifically Common Vulnerabilities and Exposures (CVEs), from unstructured descriptions of attacks reported in cybersecurity news articles. We propose a semantic similarity-based approach utilizing the multi-qa-mpnet-base-dot-v1 (MPNet) sentence transformer model to generate a ranked list of the most likely CVEs corresponding to each news report. To assess the accuracy of the predicted vulnerabilities, we implement four complementary validation methods: filtering predictions based on similarity thresholds, conducting manual validation, performing semantic comparisons with the first vulnerability explicitly mentioned in each report, and comparing against all CVEs referenced within the report. Experimental results, drawn from a dataset of 100 SecurityWeek news articles, demonstrate that the model attains a precision of 81 percent when employing threshold-based filtering. Manual evaluations report that 70 percent of the predictions are relevant, while comparisons with the initially mentioned CVEs reveal agreement rates of 80 percent with the first listed vulnerability and 78 percent across all referenced CVEs. In 57 percent of the news reports analyzed, at least one predicted vulnerability precisely matched a CVE-ID mentioned in the article. These findings underscore the model's potential to facilitate automated vulnerability identification from real-world cyberattack news reports.
Authors:Jakub Breier, Štefan Kučerák, Xiaolu Hou
Abstract:
Fault injection attacks on embedded neural network models have been shown as a potent threat. Numerous works studied resilience of models from various points of view. As of now, there is no comprehensive study that would evaluate the influence of number representations used for model parameters against electromagnetic fault injection (EMFI) attacks. In this paper, we investigate how four different number representations influence the success of an EMFI attack on embedded neural network models. We chose two common floating-point representations (32-bit, and 16-bit), and two integer representations (8-bit, and 4-bit). We deployed four common image classifiers, ResNet-18, ResNet-34, ResNet-50, and VGG-11, on an embedded memory chip, and utilized a low-cost EMFI platform to trigger faults. Our results show that while floating-point representations exhibit almost a complete degradation in accuracy (Top-1 and Top-5) after a single fault injection, integer representations offer better resistance overall. Especially, when considering the the 8-bit representation on a relatively large network (VGG-11), the Top-1 accuracies stay at around 70% and the Top-5 at around 90%.
Authors:Ahmed Ryan, Ibrahim Khalil, Abdullah Al Jahid, Md Erfan, Akond Ashfaque Ur Rahman, Md Rayhanur Rahman
Abstract:
The prevalence of malicious packages in open-source repositories, such as PyPI, poses a critical threat to the software supply chain. While Large Language Models (LLMs) have emerged as a promising tool for automated security tasks, their effectiveness in detecting malicious packages and indicators remains underexplored. This paper presents a systematic evaluation of 13 LLMs for detecting malicious software packages. Using a curated dataset of 4,070 packages (3,700 benign and 370 malicious), we evaluate model performance across two tasks: binary classification (package detection) and multi-label classification (identification of specific malicious indicators). We further investigate the impact of prompting strategies, temperature settings, and model specifications on detection accuracy. We find a significant "granularity gap" in LLMs' capabilities. While GPT-4.1 achieves near-perfect performance in binary detection (F1 $\approx$ 0.99), performance degrades by approximately 41\% when the task shifts to identifying specific malicious indicators. We observe that general models are best for filtering out the majority of threats, while specialized coder models are better at detecting attacks that follow a strict, predictable code structure. Our correlation analysis indicates that parameter size and context width have negligible explanatory power regarding detection accuracy. We conclude that while LLMs are powerful detectors at the package level, they lack the semantic depth required for precise identification at the granular indicator level.
Authors:Viet Hoang Luu, Amirmohammad Pasdar, Wachiraphan Charoenwet, Toby Murray, Shaanan Cohney, Van-Thuan Pham
Abstract:
Modern fuzzers scale to large, real-world software but often fail to exercise the program states developers consider most fragile or security-critical. Such states are typically deep in the execution space, gated by preconditions, or overshadowed by lower-value paths that consume limited fuzzing budgets. Meanwhile, developers routinely surface risk-relevant insights during code review, yet this information is largely ignored by automated testing tools. We present EyeQ, a system that leverages developer intelligence from code reviews to guide fuzzing. EyeQ extracts security-relevant signals from review discussions, localizes the implicated program regions, and translates these insights into annotation-based guidance for fuzzing. The approach operates atop existing annotation-aware fuzzing, requiring no changes to program semantics or developer workflows. We first validate EyeQ through a human-guided feasibility study on a security-focused dataset of PHP code reviews, establishing a strong baseline for review-guided fuzzing. We then automate the workflow using a large language model with carefully designed prompts. EyeQ significantly improves vulnerability discovery over standard fuzzing configurations, uncovering more than 40 previously unknown bugs in the security-critical PHP codebase.
Authors:Peiran Wang, Xinfeng Li, Chong Xiang, Jinghuai Zhang, Ying Li, Lixia Zhang, Xiaofeng Wang, Yuan Tian
Abstract:
The evolution of Large Language Models (LLMs) has resulted in a paradigm shift towards autonomous agents, necessitating robust security against Prompt Injection (PI) vulnerabilities where untrusted inputs hijack agent behaviors. This SoK presents a comprehensive overview of the PI landscape, covering attacks, defenses, and their evaluation practices. Through a systematic literature review and quantitative analysis, we establish taxonomies that categorize PI attacks by payload generation strategies (heuristic vs. optimization) and defenses by intervention stages (text, model, and execution levels). Our analysis reveals a key limitation shared by many existing defenses and benchmarks: they largely overlook context-dependent tasks, in which agents are authorized to rely on runtime environmental observations to determine actions. To address this gap, we introduce AgentPI, a new benchmark designed to systematically evaluate agent behavior under context-dependent interaction settings. Using AgentPI, we empirically evaluate representative defenses and show that no single approach can simultaneously achieve high trustworthiness, high utility, and low latency. Moreover, we show that many defenses appear effective under existing benchmarks by suppressing contextual inputs, yet fail to generalize to realistic agent settings where context-dependent reasoning is essential. This SoK distills key takeaways and open research problems, offering structured guidance for future research and practical deployment of secure LLM agents.
Authors:Zhenyu Xu, Victor S. Sheng
Abstract:
Protecting the intellectual property of large language models (LLMs) is a critical challenge due to the proliferation of unauthorized derivative models. We introduce a novel fingerprinting framework that leverages the behavioral patterns induced by safety alignment, applying the concept of refusal vectors for LLM provenance tracking. These vectors, extracted from directional patterns in a model's internal representations when processing harmful versus harmless prompts, serve as robust behavioral fingerprints. Our contribution lies in developing a fingerprinting system around this concept and conducting extensive validation of its effectiveness for IP protection. We demonstrate that these behavioral fingerprints are highly robust against common modifications, including finetunes, merges, and quantization. Our experiments show that the fingerprint is unique to each model family, with low cosine similarity between independently trained models. In a large-scale identification task across 76 offspring models, our method achieves 100\% accuracy in identifying the correct base model family. Furthermore, we analyze the fingerprint's behavior under alignment-breaking attacks, finding that while performance degrades significantly, detectable traces remain. Finally, we propose a theoretical framework to transform this private fingerprint into a publicly verifiable, privacy-preserving artifact using locality-sensitive hashing and zero-knowledge proofs.
Authors:Yu-Che Tsai, Hsiang Hsiao, Kuan-Yu Chen, Shou-De Lin
Abstract:
Text embeddings enable numerous NLP applications but face severe privacy risks from embedding inversion attacks, which can expose sensitive attributes or reconstruct raw text. Existing differential privacy defenses assume uniform sensitivity across embedding dimensions, leading to excessive noise and degraded utility. We propose SPARSE, a user-centric framework for concept-specific privacy protection in text embeddings. SPARSE combines (1) differentiable mask learning to identify privacy-sensitive dimensions for user-defined concepts, and (2) the Mahalanobis mechanism that applies elliptical noise calibrated by dimension sensitivity. Unlike traditional spherical noise injection, SPARSE selectively perturbs privacy-sensitive dimensions while preserving non-sensitive semantics. Evaluated across six datasets with three embedding models and attack scenarios, SPARSE consistently reduces privacy leakage while achieving superior downstream performance compared to state-of-the-art DP methods.
Authors:Haoyang Hu, Zhejun Jiang, Yueming Lyu, Junyuan Zhang, Yi Liu, Ka-Ho Chow
Abstract:
Retrieval-augmented generation (RAG) is increasingly deployed in real-world applications, where its reference-grounded design makes outputs appear trustworthy. This trust has spurred research on poisoning attacks that craft malicious content, inject it into knowledge sources, and manipulate RAG responses. However, when evaluated in practical RAG systems, existing attacks suffer from severely degraded effectiveness. This gap stems from two overlooked realities: (i) content is often processed before use, which can fragment the poison and weaken its effect, and (ii) users often do not issue the exact queries anticipated during attack design. These factors can lead practitioners to underestimate risks and develop a false sense of security. To better characterize the threat to practical systems, we present Confundo, a learning-to-poison framework that fine-tunes a large language model as a poison generator to achieve high effectiveness, robustness, and stealthiness. Confundo provides a unified framework supporting multiple attack objectives, demonstrated by manipulating factual correctness, inducing biased opinions, and triggering hallucinations. By addressing these overlooked challenges, Confundo consistently outperforms a wide range of purpose-built attacks across datasets and RAG configurations by large margins, even in the presence of defenses. Beyond exposing vulnerabilities, we also present a defensive use case that protects web content from unauthorized incorporation into RAG systems via scraping, with no impact on user experience.
Authors:Ehsan Firouzi, Mohammad Ghafari
Abstract:
Existing literature heavily relies on static analysis tools to evaluate LLMs for secure code generation and vulnerability detection. We reviewed 1,080 LLM-generated code samples, built a human-validated ground-truth, and compared the outputs of two widely used static security tools, CodeQL and Semgrep, against this corpus. While 61% of the samples were genuinely secure, Semgrep and CodeQL classified 60% and 80% as secure, respectively. Despite the apparent agreement in aggregate statistics, per-sample analysis reveals substantial discrepancies: only 65% of Semgrep's and 61% of CodeQL's reports correctly matched the ground truth. These results question the reliability of static analysis tools as sole evaluators of code security and underscore the need for expert feedback. Building on this insight, we propose a conceptual framework that persistently stores human feedback in a dynamic retrieval-augmented generation pipeline, enabling LLMs to reuse past feedback for secure code generation and vulnerability detection.
Authors:Luis Cunha, Jose Martins, Manuel Rodriguez, Tiago Gomes, Sandro Pinto, Uwe Moslehner, Kai Dieffenbach, Glenn Farrall, Kajetan Nuernberger, Thomas Roecker
Abstract:
As RISC-V adoption accelerates, domains such as automotive, the Internet of Things (IoT), and industrial control are attracting growing attention. These domains are subject to stringent Size, Weight, Power, and Cost (SWaP-C) constraints, which have driven a shift toward heterogeneous Systems-on-Chip (SoCs) integrating general-purpose CPUs, tightly coupled accelerators, and diverse I/O devices with different integrity levels. While such integration improves cost efficiency and performance, it introduces a fundamental safety and security challenge: enforcing system-level isolation in mixed-criticality environments. Although RISC-V International has proposed several hardware isolation primitives, including RISC-V Worlds, IOPMP, and SmMTT, their interoperability, scalability, and suitability for real-time systems remain insufficiently understood. In this paper, we present a comparative analysis of these primitives from the perspective of practical heterogeneous SoC designs. We implement an IOPMP, a World-based checker, and a modified RISC-V World checker that addresses key limitations of the baseline specification, and evaluate their trade-offs in terms of security guarantees and power-performance-area (PPA). Our results show that the World-based checker introduces a fixed, configuration-independent access latency, achieving lower worst-case delay than the evaluated alternatives while scaling predictably with system size. At the macro level, we estimate that the proposed modifications reduce SoC area by up to approximately 5% compared to a baseline design. All artifacts will be released as open source, and we expect these findings to directly contribute to the evolution and ratification of RISC-V specifications, as well as to the design of future RISC-V SoCs.
Authors:Maya Le, Paweł Prałat, Aaron Smith, François Théberge
Abstract:
Motivated by applications in cybersecurity such as finding meaningful sequences of malware-related events buried inside large amounts of computer log data, we introduce the "planted path" problem and propose an algorithm to find fuzzy matchings between two trees. This algorithm can be used as a "building block" for more complicated workflows. We demonstrate usefulness of a few of such workflows in mining synthetically generated data as well as real-world ACME cybersecurity datasets.
Authors:Ehsan Firouzi, Shardul Bhatt, Mohammad Ghafari
Abstract:
We investigated the capabilities of GPT-4o and Gemini 2.0 Flash for secure Infrastructure as Code (IaC) development. For security smell detection, on the Stack Overflow dataset, which primarily contains small, simplified code snippets, the models detected at least 71% of security smells when prompted to analyze code from a security perspective (general prompt). With a guided prompt (adding clear, step-by-step instructions), this increased to 78%.In GitHub repositories, which contain complete, real-world project scripts, a general prompt was less effective, leaving more than half of the smells undetected. However, with the guided prompt, the models uncovered at least 67% of the smells. For secure code generation, we prompted LLMs with 89 vulnerable synthetic scenarios and observed that only 7% of the generated scripts were secure. Adding an explicit instruction to generate secure code increased GPT secure output rate to 17%, while Gemini changed little (8%). These results highlight the need for further research to improve LLMs' capabilities in assisting developers with secure IaC development.
Authors:Yuxuan Lu, Yongkang Guo, Yuqing Kong
Abstract:
Safety alignment in Large Language Models (LLMs) often creates a systematic discrepancy between a model's aligned output and the underlying pre-aligned data distribution. We propose a framework in which the effect of safety alignment on next-token prediction is modeled as a systematic distortion of a pre-alignment distribution. We cast Weak-to-Strong Jailbreaking as a forecast aggregation problem and derive an optimal aggregation strategy characterized by a Gradient Shift in the loss-induced dual space. We show that logit-arithmetic jailbreaking methods are a special case of this framework under cross-entropy loss, and derive a broader family of aggregation rules corresponding to other proper losses. We also propose a new hybrid aggregation rule. Evaluations across red-teaming benchmarks and math utility tasks using frontier models demonstrate that our approach achieves superior Attack Success Rates and lower "Jailbreak Tax" compared with existing methods, especially on the safety-hardened gpt-oss-120b.
Authors:Alessandra Maciel Paz Milani, Norman Anderson, Margaret-Anne Storey
Abstract:
Cybersecurity increasingly relies on threat hunters to proactively identify adversarial activity, yet the cognitive work underlying threat hunting remains underexplored or insufficiently supported by existing tools. Building on prior studies that examined how threat hunters construct and share mental models during investigations, we derived a set of design propositions to support their cognitive and collaborative work. In this paper, we present the Threat Hunter Board, a prototype tool that operationalizes these design propositions by enabling threat hunters to externalize reasoning, organize investigative leads, and maintain continuity across sessions. Using a design science paradigm, we describe the solution design rationale and artifact development. In addition, we propose six design heuristics that form a solution-evaluation framework for assessing cognitive support in threat hunting tools. An initial evaluation using a cognitive walkthrough provides early evidence of feasibility, while future work will focus on user-based validation with professional threat hunters.
Authors:Timofey Mezhuev, Darya Parygina, Daniil Kuts
Abstract:
In modern SSDLC, program analysis and automated testing are essential for minimizing vulnerabilities before software release, with fuzzing being a fast and widely used dynamic testing method. However, traditional coverage-guided fuzzing may be less effective in specific tasks like verifying static analysis reports or reproducing crashes, while directed fuzzing, focusing on targeted program locations using proximity metrics, proves to be more effective. Some of the earliest directed fuzzers are, for example, AFLGo and BEACON, which use different proximity metric approaches. Although most automated testing tools focus on C/C++ code, the growing popularity of Rust and Go causes the need for precise and efficient testing solutions for these languages. This work expands the applicability of directed fuzzing beyond traditional analysis of C/C++ software. We present a novel approach to directed greybox fuzzing tailored specifically for Rust and Go applications. We introduce advanced preprocessing techniques, rustc compiler customizations, and elaborate graph construction and instrumentation methods to enable effective targeting of specific program locations. Our implemented fuzzing tools, based on LibAFL-DiFuzz backend, demonstrate competitive advantages compared to popular existing fuzzers like afl.rs, cargo-fuzz, and go-fuzz. According to TTE (Time to Exposure) experiments, Rust-LibAFL-DiFuzz outperforms other tools by the best TTE result. Some stability issues can be explained by different mutation approaches. Go-LibAFL-DiFuzz outperforms its opponent by the best and, in the majority of cases, by average result, having two cases with orders of magnitude difference. These results prove better efficiency and accuracy of our approach.
Authors:Haoyun Yang, Ronghong Huang, Yong Fang, Beizeng Zhang, Junpu Guo, Zhanyu Wu, Xianghang Mi
Abstract:
Transport Layer Security (TLS) is fundamental to secure online communication, yet vulnerabilities in certificate validation that enable Man-in-the-Middle (MitM) attacks remain a pervasive threat in Android apps. Existing detection tools are hampered by low-coverage UI interaction, costly instrumentation, and a lack of scalable root-cause analysis. We present Okara, a framework that leverages foundation models to automate the detection and deep attribution of TLS MitM Vulnerabilities (TMVs). Okara's detection component, TMV-Hunter, employs foundation model-driven GUI agents to achieve high-coverage app interaction, enabling efficient vulnerability discovery at scale. Deploying TMV-Hunter on 37,349 apps from Google Play and a third-party store revealed 8,374 (22.42%) vulnerable apps. Our measurement shows these vulnerabilities are widespread across all popularity levels, affect critical functionalities like authentication and code delivery, and are highly persistent with a median vulnerable lifespan of over 1,300 days. Okara's attribution component, TMV-ORCA, combines dynamic instrumentation with a novel LLM-based classifier to locate and categorize vulnerable code according to a comprehensive new taxonomy. This analysis attributes 41% of vulnerabilities to third-party libraries and identifies recurring insecure patterns, such as empty trust managers and flawed hostname verification. We have initiated a large-scale responsible disclosure effort and will release our tools and datasets to support further research and mitigation.
Authors:Shengwei You, Aditya Joshi, Andrey Kuehlkamp, Jarek Nabrzyski
Abstract:
Algorithmic stablecoins promise decentralized monetary stability by maintaining a target peg through programmatic reserve management. Yet, their reserve controllers remain vulnerable to regime-blind optimization, calibrating risk parameters on fair-weather data while ignoring tail events that precipitate cascading failures. The March 2020 Black Thursday collapse, wherein MakerDAO's collateral auctions yielded $8.3M in losses and a 15% peg deviation, exposed a critical gap: existing models like SAS systematically omit extreme volatility regimes from covariance estimates, producing allocations optimal in expectation but catastrophic under adversarial stress. We present MVF-Composer, a trust-weighted Mean-Variance Frontier reserve controller incorporating a novel Stress Harness for risk-state estimation. Our key insight is deploying multi-agent simulations as adversarial stress-testers: heterogeneous agents (traders, liquidity providers, attackers) execute protocol actions under crisis scenarios, exposing reserve vulnerabilities before they manifest on-chain. We formalize a trust-scoring mechanism T: A -> [0,1] that down-weights signals from agents exhibiting manipulative behavior, ensuring the risk-state estimator remains robust to signal injection and Sybil attacks. Across 1,200 randomized scenarios with injected Black-Swan shocks (10% collateral drawdown, 50% sentiment collapse, coordinated redemption attacks), MVF-Composer reduces peak peg deviation by 57% and mean recovery time by 3.1x relative to SAS baselines. Ablation studies confirm the trust layer accounts for 23% of stability gains under adversarial conditions, achieving 72% adversarial agent detection. Our system runs on commodity hardware, requires no on-chain oracles beyond standard price feeds, and provides a reproducible framework for stress-testing DeFi reserve policies.
Authors:Shir Rozenfeld, Rahul Pankajakshan, Itay Zloczower, Eyal Lenga, Gilad Gressel, Yisroel Mirsky
Abstract:
Large language models (LLMs) are increasingly paired with activation-based monitoring to detect and prevent harmful behaviors that may not be apparent at the surface-text level. However, existing activation safety approaches, trained on broad misuse datasets, struggle with poor precision, limited flexibility, and lack of interpretability. This paper introduces a new paradigm: rule-based activation safety, inspired by rule-sharing practices in cybersecurity. We propose modeling activations as cognitive elements (CEs), fine-grained, interpretable factors such as ''making a threat'' and ''payment processing'', that can be composed to capture nuanced, domain-specific behaviors with higher precision. Building on this representation, we present a practical framework that defines predicate rules over CEs and detects violations in real time. This enables practitioners to configure and update safeguards without retraining models or detectors, while supporting transparency and auditability. Our results show that compositional rule-based activation safety improves precision, supports domain customization, and lays the groundwork for scalable, interpretable, and auditable AI governance. We will release GAVEL as an open-source framework and provide an accompanying automated rule creation tool.
Authors:Bilel Sefsaf, Abderraouf Dandani, Abdessamed Seddiki, Arab Mohammed, Eduardo Chielle, Michail Maniatakos, Riyadh Baghdadi
Abstract:
Fully Homomorphic Encryption (FHE) enables computations directly on encrypted data, but its high computational cost remains a significant barrier. Writing efficient FHE code is a complex task requiring cryptographic expertise, and finding the optimal sequence of program transformations is often intractable. In this paper, we propose CHEHAB RL, a novel framework that leverages deep reinforcement learning (RL) to automate FHE code optimization. Instead of relying on predefined heuristics or combinatorial search, our method trains an RL agent to learn an effective policy for applying a sequence of rewriting rules to automatically vectorize scalar FHE code while reducing instruction latency and noise growth. The proposed approach supports the optimization of both structured and unstructured code. To train the agent, we synthesize a diverse dataset of computations using a large language model (LLM). We integrate our proposed approach into the CHEHAB FHE compiler and evaluate it on a suite of benchmarks, comparing its performance against Coyote, a state-of-the-art vectorizing FHE compiler. The results show that our approach generates code that is $5.3\times$ faster in execution, accumulates $2.54\times$ less noise, while the compilation process itself is $27.9\times$ faster than Coyote (geometric means).
Authors:Mohammed Barhoush, Arthur Mehta, Anne Müller, Louis Salvail
Abstract:
Functional encryption is a powerful cryptographic primitive that enables fine-grained access to encrypted data and underlies numerous applications. Although the ideal security notion for FE (simulation security) has been shown to be impossible in the classical setting, those impossibility results rely on inherently classical arguments. This leaves open the question of whether simulation-secure functional encryption can be achieved in the quantum regime. In this work, we rule out this possibility by showing that the classical impossibility results largely extend to the quantum world. In particular, when the adversary can issue an unbounded number of challenge messages, we prove an unconditional impossibility, matching the classical barrier. In the case where the adversary may obtain many functional keys, classical arguments only yield impossibility under the assumption of pseudorandom functions; we strengthen this by proving impossibility under the potentially weaker assumption of pseudorandom quantum states. In the same setting, we also establish an alternative impossibility based on public-key encryption. Since public-key encryption is not known to imply pseudorandom quantum states, this provides independent evidence of the barrier. As part of our proofs, we show a novel incompressibility property for pseudorandom states, which may be of independent interest.
Authors:Haodong Chen, Ziheng Zhang, Jinghui Jiang, Qiang Su, Qiao Xiang
Abstract:
Cloud environments face frequent DDoS threats due to centralized resources and broad attack surfaces. Modern cloud-native DDoS attacks further evolve rapidly and often blend multi-vector strategies, creating an operational dilemma: defenders need wire-speed monitoring while also requiring explainable, auditable attribution for response. Existing rule-based and supervised-learning approaches typically output black-box scores or labels, provide limited evidence chains, and generalize poorly to unseen attack variants; meanwhile, high-quality labeled data is often difficult to obtain in cloud settings. We present Holmes (DDoS Detective), an LLM-based DDoS detection agent that reframes the model as a virtual SRE investigator rather than an end-to-end classifier. Holmes couples a funnel-like hierarchical workflow (counters/sFlow for continuous sensing and triage; PCAP evidence collection triggered only on anomaly windows) with an Evidence Pack abstraction that converts binary packets into compact, reproducible, high-signal structured evidence. On top of this evidence interface, Holmes enforces a structure-first investigation protocol and strict JSON/quotation constraints to produce machine-consumable reports with auditable evidence anchors. We evaluate Holmes on CICDDoS2019 reflection/amplification attacks and script-triggered flooding scenarios. Results show that Holmes produces attribution decisions grounded in salient evidence anchors across diverse attack families, and when errors occur, its audit logs make the failure source easy to localize, demonstrating the practicality of an LLM agent for cost-controlled and traceable DDoS investigation in cloud operations.
Authors:Roy Betser, Shamik Bose, Amit Giloni, Chiara Picardi, Sindhu Padakandla, Roman Vainshtein
Abstract:
AI agents are autonomous systems that combine LLMs with external tools to solve complex tasks. While such tools extend capability, improper tool permissions introduce security risks such as indirect prompt injection and tool misuse. We characterize these failures as unbalanced tool-driven agency. Agents may retain unnecessary permissions (excessive agency) or fail to invoke required tools (insufficient agency), amplifying the attack surface and reducing performance. We introduce AgenTRIM, a framework for detecting and mitigating tool-driven agency risks without altering an agent's internal reasoning. AgenTRIM addresses these risks through complementary offline and online phases. Offline, AgenTRIM reconstructs and verifies the agent's tool interface from code and execution traces. At runtime, it enforces per-step least-privilege tool access through adaptive filtering and status-aware validation of tool calls. Evaluating on the AgentDojo benchmark, AgenTRIM substantially reduces attack success while maintaining high task performance. Additional experiments show robustness to description-based attacks and effective enforcement of explicit safety policies. Together, these results demonstrate that AgenTRIM provides a practical, capability-preserving approach to safer tool use in LLM-based agents.
Authors:Mohammed Himayath Ali, Mohammed Aqib Abdullah, Syed Muneer Hussin, Mohammed Mudassir Uddin, Shahnawaz Alam
Abstract:
Federated learning enables collaborative model training across distributed institutions without centralizing sensitive data; however, ensuring algorithmic fairness across heterogeneous data distributions while preserving privacy remains fundamentally unresolved. This paper introduces CryptoFair-FL, a novel cryptographic framework providing the first verifiable fairness guarantees for federated learning systems under formal security definitions. The proposed approach combines additively homomorphic encryption with secure multi-party computation to enable privacy-preserving verification of demographic parity and equalized odds metrics without revealing protected attribute distributions or individual predictions. A novel batched verification protocol reduces computational complexity from BigO(n^2) to BigO(n \log n) while maintaining (\dparam, \deltap)-differential privacy with dparam = 0.5 and deltap = 10^{-6}. Theoretical analysis establishes information-theoretic lower bounds on the privacy cost of fairness verification, demonstrating that the proposed protocol achieves near-optimal privacy-fairness tradeoffs. Comprehensive experiments across four benchmark datasets (MIMIC-IV healthcare records, Adult Income, CelebA, and a novel FedFair-100 benchmark) demonstrate that CryptoFair-FL reduces fairness violations from 0.231 to 0.031 demographic parity difference while incurring only 2.3 times computational overhead compared to standard federated averaging. The framework successfully defends against attribute inference attacks, maintaining adversarial success probability below 0.05 across all tested configurations. These results establish a practical pathway for deploying fairness-aware federated learning in regulated industries requiring both privacy protection and algorithmic accountability.
Authors:Shengwei You, Aditya Joshi, Andrey Kuehlkamp, Jarek Nabrzyski
Abstract:
Cross-chain bridges constitute the single largest vector of systemic risk in Decentralized Finance (DeFi), accounting for over \$2.8 billion in losses since 2021. The fundamental vulnerability lies in the binary nature of existing bridge security models: a bridge is either fully operational or catastrophically compromised, with no intermediate state to contain partial failures. We present ASAS-BridgeAMM, a bridge-coupled automated market maker that introduces Contained Degradation: a formally specified operational state where the system gracefully degrades functionality in response to adversarial signals. By treating cross-chain message latency as a quantifiable execution risk, the protocol dynamically adjusts collateral haircuts, slippage bounds, and withdrawal limits. Across 18 months of historical replay on Ethereum and two auxiliary chains, ASAS-BridgeAMM reduces worst-case bridge-induced insolvency by 73% relative to baseline mint-and-burn architectures, while preserving 104.5% of transaction volume during stress periods. In rigorous adversarial simulations involving delayed finality, oracle manipulation, and liquidity griefing, the protocol maintains solvency with probability $>0.9999$ and bounds per-epoch bad debt to $<0.2%$ of total collateral. We provide a reference implementation in Solidity and formally prove safety (bounded debt), liveness (settlement completion), and manipulation resistance under a Byzantine relayer model.
Authors:Daniel Moghimi, Alexandru-Cosmin Mihai, Borbala Benko, Catherine Vlasov, Elie Bursztein, Kurt Thomas, Laszlo Siroki, Pedro Barbosa, Remi Audebert
Abstract:
We develop DroidCCT, a distributed test framework to evaluate the scale of a wide range of failures/bugs in cryptography for end users. DroidCCT relies on passive analysis of artifacts from the execution of cryptographic operations in the Android ecosystem to identify weak implementations. We collect trillions of samples from cryptographic operations of Android Keystore on half a billion devices and apply severalanalysis techniques to evaluate the quality of cryptographic output from these devices and their underlying implementations. Our study reveals several patterns of bugs and weakness in cryptographic implementations from various manufacturers and chipsets. We show that the heterogeneous nature of cryptographic implementations results in non-uniform availability and reliability of various cryptographic functions. More importantly, flaws such as the use of weakly-generated random parameters, and timing side channels may surface across deployments of cryptography. Our results highlight the importance of fault- and side-channel-resistant cryptography and the ability to transparently and openly test these implementations.
Authors:Haiyue Yuan, Nikolay Matyunin, Ali Raza, Shujun Li
Abstract:
Privacy policies help inform people about organisations' personal data processing practices, covering different aspects such as data collection, data storage, and sharing of personal data with third parties. Privacy policies are often difficult for people to fully comprehend due to the lengthy and complex legal language used and inconsistent practices across different sectors and organisations. To help conduct automated and large-scale analyses of privacy policies, many researchers have studied applications of machine learning and natural language processing techniques, including large language models (LLMs). While a limited number of prior studies utilised LLMs for extracting personal data flows from privacy policies, our approach builds on this line of work by combining LLMs with retrieval-augmented generation (RAG) and a customised knowledge base derived from existing studies. This paper presents the development of LADFA, an end-to-end computational framework, which can process unstructured text in a given privacy policy, extract personal data flows and construct a personal data flow graph, and conduct analysis of the data flow graph to facilitate insight discovery. The framework consists of a pre-processor, an LLM-based processor, and a data flow post-processor. We demonstrated and validated the effectiveness and accuracy of the proposed approach by conducting a case study that involved examining ten selected privacy policies from the automotive industry. Moreover, it is worth noting that LADFA is designed to be flexible and customisable, making it suitable for a range of text-based analysis tasks beyond privacy policy analysis.
Authors:Muhammad Danish, Enrique Sobrados, Priya Kaushik, Bhupendra Acharya, Muhammad Saad, Abdullah Mueen, Sazzadur Rahaman, Afsah Anwar
Abstract:
Digital service providers often prioritize a frictionless user experience by adopting technologies that simplify access to their services. One widely used mechanism is the Short Message Service (SMS) to deliver links (URLs) that enable single-click access to online services with little to no resistance. However, SMS is inherently insecure, and numerous reports have documented message interception and data leaks. Thus, attributing excessive trust in such an insecure channel opens avenues for unintended access and exploitation by adversaries. In this paper, we present a comprehensive investigation of the implications of SMS-delivered URLs from the lens of public SMS gateways. We conduct the study on more than 322K unique SMS-delivered URLs extracted from more than 33 million messages across more than 30K phone numbers, revealing critical security and privacy vulnerabilities. We identify and validate critical Personally Identifiable Information (PII) exposure in 701 endpoints affecting 177 services. Our manual investigation of the root cause of the exposure reveals a weak authentication model which hinges upon tokenized bearer links as sufficient authorization proofs, thereby allowing anyone with the URL to access private user information, including social security number, date of birth, bank account number, and credit score. Additionally, we identify 125 services allowing mass enumeration of valid URLs due to low entropy within tokens, thereby cascading the privacy risks beyond the initially compromised users. Furthermore, we identify mismatches between the GUI and data fetched by the client, extending the scale of privacy leakages. Particularly, we identify 76 services that perform data overfetching. Finally, 18 services have acknowledged and addressed the weaknesses in their services, thereby enhancing the privacy of at least 120M users.
Authors:Mohammed Himayath Ali, Mohammed Aqib Abdullah, Mohammed Mudassir Uddin, Shahnawaz Alam
Abstract:
Large Language Models have emerged as transformative tools for Security Operations Centers, enabling automated log analysis, phishing triage, and malware explanation; however, deployment in adversarial cybersecurity environments exposes critical vulnerabilities to prompt injection attacks where malicious instructions embedded in security artifacts manipulate model behavior. This paper introduces SecureCAI, a novel defense framework extending Constitutional AI principles with security-aware guardrails, adaptive constitution evolution, and Direct Preference Optimization for unlearning unsafe response patterns, addressing the unique challenges of high-stakes security contexts where traditional safety mechanisms prove insufficient against sophisticated adversarial manipulation. Experimental evaluation demonstrates that SecureCAI reduces attack success rates by 94.7% compared to baseline models while maintaining 95.1% accuracy on benign security analysis tasks, with the framework incorporating continuous red-teaming feedback loops enabling dynamic adaptation to emerging attack strategies and achieving constitution adherence scores exceeding 0.92 under sustained adversarial pressure, thereby establishing a foundation for trustworthy integration of language model capabilities into operational cybersecurity workflows and addressing a critical gap in current approaches to AI safety within adversarial domains.
Authors:Zhuoran Tan, Ke Xiao, Jeremy Singer, Christos Anagnostopoulos
Abstract:
Open-source software (OSS) is a critical component of modern software systems, yet supply chain security remains challenging in practice due to unavailable or obfuscated source code. Consequently, security teams often rely on runtime observations collected from sandboxed executions to investigate suspicious third-party components. We present HeteroGAT-Rank, an industry-oriented runtime behavior mining system that supports analyst-in-the-loop supply chain threat investigation. The system models execution-time behaviors of OSS packages as lightweight heterogeneous graphs and applies attention-based graph learning to rank behavioral patterns that are most relevant for security analysis. Rather than aiming for fully automated detection, HeteroGAT-Rank surfaces actionable runtime signals - such as file, network, and command activities - to guide manual investigation and threat hunting. To operate at ecosystem scale, the system decouples offline behavior mining from online analysis and integrates parallel graph construction for efficient processing across multiple ecosystems. An evaluation on a large-scale OSS execution dataset shows that HeteroGAT-Rank effectively highlights meaningful and interpretable behavioral indicators aligned with real-world vulnerability and attack trends, supporting practical security workflows under realistic operational constraints.
Authors:Muhammad Wahid Akram, Keshav Sood, Muneeb Ul Hassan, Dhananjay Thiruvady
Abstract:
Phishing with Quick Response (QR) codes is termed as Quishing. The attackers exploit this method to manipulate individuals into revealing their confidential data. Recently, we see the colorful and fancy representations of QR codes, the 2D matrix of QR codes which does not reflect a typical mixture of black-white modules anymore. Instead, they become more tempting as an attack vector for adversaries which can evade the state-of-the-art deep learning visual-based and other prevailing countermeasures. We introduce "ALFA", a safe-by-design approach, to mitigate Quishing and prevent everyone from accessing the post-scan harmful payload of fancy QR codes. Our method first converts a fancy QR code into the replica of binary grid and then identify the erroneous representation of modules in that grid. Following that, we present "FAST" method which can conveniently recover erroneous modules from that binary grid. Afterwards, using this binary grid, our solution extracts the structural features of fancy QR code and predicts its legitimacy using a pre-trained model. The effectiveness of our proposal is demonstrated by the experimental evaluation on a synthetic dataset (containing diverse variations of fancy QR codes) and achieve a FNR of 0.06% only. We also develop the mobile app to test the practical feasibility of our solution and provide a performance comparison of the app with the real-world QR readers. This comparison further highlights the classification reliability and detection accuracy of this solution in real-world environments.
Authors:Badhan Chandra Das, Md Tasnim Jawad, Joaquin Molto, M. Hadi Amini, Yanzhao Wu
Abstract:
In recent years, the security vulnerabilities of Multi-modal Large Language Models (MLLMs) have become a serious concern in the Generative Artificial Intelligence (GenAI) research. These highly intelligent models, capable of performing multi-modal tasks with high accuracy, are also severely susceptible to carefully launched security attacks, such as jailbreaking attacks, which can manipulate model behavior and bypass safety constraints. This paper introduces MJAD-MLLMs, a holistic framework that systematically analyzes the proposed Multi-turn Jailbreaking Attacks and multi-LLM-based defense techniques for MLLMs. In this paper, we make three original contributions. First, we introduce a novel multi-turn jailbreaking attack to exploit the vulnerabilities of the MLLMs under multi-turn prompting. Second, we propose a novel fragment-optimized and multi-LLM defense mechanism, called FragGuard, to effectively mitigate jailbreaking attacks in the MLLMs. Third, we evaluate the efficacy of the proposed attacks and defenses through extensive experiments on several state-of-the-art (SOTA) open-source and closed-source MLLMs and benchmark datasets, and compare their performance with the existing techniques.
Authors:Kai Hu, Abhinav Aggarwal, Mehran Khodabandeh, David Zhang, Eric Hsin, Li Chen, Ankit Jain, Matt Fredrikson, Akash Bharadwaj
Abstract:
This paper introduces Jailbreak-Zero, a novel red teaming methodology that shifts the paradigm of Large Language Model (LLM) safety evaluation from a constrained example-based approach to a more expansive and effective policy-based framework. By leveraging an attack LLM to generate a high volume of diverse adversarial prompts and then fine-tuning this attack model with a preference dataset, Jailbreak-Zero achieves Pareto optimality across the crucial objectives of policy coverage, attack strategy diversity, and prompt fidelity to real user inputs. The empirical evidence demonstrates the superiority of this method, showcasing significantly higher attack success rates against both open-source and proprietary models like GPT-40 and Claude 3.5 when compared to existing state-of-the-art techniques. Crucially, Jailbreak-Zero accomplishes this while producing human-readable and effective adversarial prompts with minimal need for human intervention, thereby presenting a more scalable and comprehensive solution for identifying and mitigating the safety vulnerabilities of LLMs.
Authors:Zhuoran Tan, Run Hao, Jeremy Singer, Yutian Tang, Christos Anagnostopoulos
Abstract:
Tool-augmented LLM agents raise new security risks: tool executions can introduce runtime-only behaviors, including prompt injection and unintended exposure of external inputs (e.g., environment secrets or local files). While existing scanners often focus on static artifacts, analyzing runtime behavior is challenging because directly executing untrusted tools can itself be dangerous. We present MCP-SandboxScan, a lightweight framework motivated by the Model Context Protocol (MCP) that safely executes untrusted tools inside a WebAssembly/WASI sandbox and produces auditable reports of external-to-sink exposures. Our prototype (i) extracts LLM-relevant sinks from runtime outputs (prompt/messages and structured tool-return fields), (ii) instantiates external-input candidates from environment values, mounted file contents, and output-surfaced HTTP fetch intents, and (iii) links sources to sinks via snippet-based substring matching. Case studies on three representative tools show that MCP-SandboxScan can surface provenance evidence when external inputs appear in prompt/messages or tool-return payloads, and can expose filesystem capability violations as runtime evidence. We further compare against a lightweight static string-signature baseline and use a micro-benchmark to characterize false negatives under transformations and false positives from short-token collisions.
Authors:Weijie Wang, Peizhuo Lv, Yan Wang, Rujie Dai, Guokun Xu, Qiujian Lv, Hangcheng Liu, Weiqing Huang, Wei Dong, Jiaheng Zhang
Abstract:
Graph Retrieval-Augmented Generation (GraphRAG) has emerged as a key technique for enhancing Large Language Models (LLMs) with proprietary Knowledge Graphs (KGs) in knowledge-intensive applications. As these KGs often represent an organization's highly valuable intellectual property (IP), they face a significant risk of theft for private use. In this scenario, attackers operate in isolated environments. This private-use threat renders passive defenses like watermarking ineffective, as they require output access for detection. Simultaneously, the low-latency demands of GraphRAG make strong encryption which incurs prohibitive overhead impractical. To address these challenges, we propose AURA, a novel framework based on Data Adulteration designed to make any stolen KG unusable to an adversary. Our framework pre-emptively injects plausible but false adulterants into the KG. For an attacker, these adulterants deteriorate the retrieved context and lead to factually incorrect responses. Conversely, for authorized users, a secret key enables the efficient filtering of all adulterants via encrypted metadata tags before they are passed to the LLM, ensuring query results remain completely accurate. Our evaluation demonstrates the effectiveness of this approach: AURA degrades the performance of unauthorized systems to an accuracy of just 5.3%, while maintaining 100% fidelity for authorized users with negligible overhead. Furthermore, AURA proves robust against various sanitization attempts, retaining 80.2% of its adulterants.
Authors:Irdin Pekaric, Raffaela Groner, Alexander Raschke, Thomas Witte, Jubril Gbolahan Adigun, Michael Felderer, Matthias Tichy
Abstract:
In the rapidly evolving landscape of software engineering, the demand for robust and secure systems has become increasingly critical. This is especially true for self-adaptive systems due to their complexity and the dynamic environments in which they operate. To address this issue, we designed and developed the SAFT-GT toolchain that tackles the multifaceted challenges associated with ensuring both safety and security. This paper provides a comprehensive description of the toolchain's architecture and functionalities, including the Attack-Fault Trees generation and model combination approaches. We emphasize the toolchain's ability to integrate seamlessly with existing systems, allowing for enhanced safety and security analyses without requiring extensive modifications and domain knowledge. Our proposed approach can address evolving security threats, including both known vulnerabilities and emerging attack vectors that could compromise the system. As a use case for the toolchain, we integrate it into the feedback loop of self-adaptive systems. Finally, to validate the practical applicability of the toolchain, we conducted an extensive user study involving domain experts, whose insights and feedback underscore the toolchain's relevance and usability in real-world scenarios. Our findings demonstrate the toolchain's effectiveness in real-world applications while highlighting areas for future improvements. The toolchain and associated resources are available in an open-source repository to promote reproducibility and encourage further research in this field.
Authors:Konstantinos E. Kampourakis, Vasileios Gkioulos, Sokratis Katsikas
Abstract:
Cyber attacks targeting Industrial Control Systems (ICS) have become increasingly sophisticated and hard to identify. Detecting such attacks requires integrating low-level behavioral cues with high-level semantic interpretation, a capability that traditional anomaly detectors lack. This paper presents a Digital Twin (DT)-driven hybrid detection approach that combines deterministic heuristics with systematic, constrained Large Language Model (LLM) reasoning to achieve real-time incident detection. The DT maintains a synchronized, feature-enriched representation of the Secure Water Treatment (SWaT) process, deriving behavioral descriptors. Heuristics identify characteristic signatures of spoofing, valve forcing, denial-of-service, and bias drift, while the LLM is invoked only when heuristics abstain. A constrained JSON schema and semantic plausibility filters ensure physically consistent LLM outputs, and a temporal smoothing layer stabilizes the final decision signal. Evaluation on four canonical SWaT attack scenarios shows that the proposed detector precisely localizes each attack interval with low time-to-detect and zero False Positives (FPs) in the evaluated benign region. Results are consistent across both a local LLaMA model and a cloud-based GPT model, demonstrating the robustness of the constrained hybrid architecture. The findings highlight the potential of DT-guided LLM reasoning as a reliable and interpretable approach to ICS anomaly detection.
Authors:Zhiyuan Li, Jingzheng Wu, Xiang Ling, Xing Cui, Tianyue Luo
Abstract:
Agent Skills is an emerging open standard that defines a modular, filesystem-based packaging format enabling LLM-based agents to acquire domain-specific expertise on demand. Despite rapid adoption across multiple agentic platforms and the emergence of large community marketplaces, the security properties of Agent Skills have not been systematically studied. This paper presents the first comprehensive security analysis of the Agent Skills framework. We define the full lifecycle of an Agent Skill across four phases -- Creation, Distribution, Deployment, and Execution -- and identify the structural attack surface each phase introduces. Building on this lifecycle analysis, we construct a threat taxonomy comprising seven categories and seventeen scenarios organized across three attack layers, grounded in both architectural analysis and real-world evidence. We validate the taxonomy through analysis of five confirmed security incidents in the Agent Skills ecosystem. Based on these findings, we discuss defense directions for each threat category, identify open research challenges, and provide actionable recommendations for stakeholders. Our analysis reveals that the most severe threats arise from structural properties of the framework itself, including the absence of a data-instruction boundary, a single-approval persistent trust model, and the lack of mandatory marketplace security review, and cannot be addressed through incremental mitigations alone.
Authors:Fabian Fleischer, Cen Zhang, Joonun Jang, Jeongin Cho, Meng Xu, Taesoo Kim
Abstract:
Java applications are prone to vulnerabilities stemming from the insecure use of security-sensitive APIs, such as file operations enabling path traversal or deserialization routines allowing remote code execution. These sink APIs encode critical information for vulnerability discovery: the program-specific constraints required to reach them and the exploitation conditions necessary to trigger security flaws. Despite this, existing fuzzers largely overlook such vulnerability-specific knowledge, limiting their effectiveness. We present GONDAR, a sink-centric fuzzing framework that systematically leverages sink API semantics for targeted vulnerability discovery. GONDAR first identifies reachable and exploitable sink call sites through CWE-specific scanning combined with LLM-assisted static filtering. It then deploys two specialized agents that work collaboratively with a coverage-guided fuzzer: an exploration agent generates inputs to reach target call sites by iteratively solving path constraints, while an exploitation agent synthesizes proof-of-concept exploits by reasoning about and satisfying vulnerability-triggering conditions. The agents and fuzzer continuously exchange seeds and runtime feedback, complementing each other. We evaluated GONDAR on real-world Java benchmarks, where it discovers four times more vulnerabilities than Jazzer, the state-of-the-art Java fuzzer. Notably, GONDAR also demonstrated strong performance in the DARPA AI Cyber Challenge, and is integrated into OSS-CRS, a sandbox project in The Linux Foundation's OpenSSF, to improve the security of open-source software.
Authors:Pengzhi Huang, Kiwan Maeng, G. Edward Suh
Abstract:
Privacy protection has become an increasing concern in modern machine learning applications. Privacy-preserving machine learning (PPML) has attracted growing research attention, with approaches such as secure multiparty computation (MPC) and fully homomorphic encryption (FHE) being actively explored. However, existing evaluations of these approaches have frequently been done on a narrow, fragmented setup and only focused on a specific performance metric, such as the online inference latency of a specific batch size. From the existing reports, it is hard to compare different approaches, especially when considering other metrics like energy/cost or broader system setups (various hyperparameters, offline overheads, future hardware/network configurations, etc.). We present a unified characterization of three popular approaches -- two variants of MPC based on arithmetic/binary sharing conversion and function secret sharing, and FHE -- on their performance and cost in performing privacy-preserving inference on multiple CNN and Transformer models. We study a range of LAN and WAN environments, model sizes, batch sizes, and input sequence lengths. We evaluate not only the performance but also the energy consumption and monetary cost of deploying under a realistic scenario, taking into account their offline and online computation/communication overheads. We provide empirical guidance for selecting, optimizing, and deploying these privacy-preserving compute paradigms, and outline how evolving hardware and network trends are likely to shift trade-offs between the two MPC schemes and FHE. This work provides system-level insights for researchers and practitioners who seek to understand or accelerate PPML workloads.
Authors:Xaver Fabian, Marco Guarnieri, Boris Köpf, Jose F. Morales, Marco Patrignani, Jan Reineke, Andres Sanchez
Abstract:
Speculative execution enhances processor performance by predicting intermediate results and executing instructions based on these predictions. However, incorrect predictions can lead to security vulnerabilities, as speculative instructions leave traces in microarchitectural components that attackers can exploit. This is demonstrated by the family of Spectre attacks. Unfortunately, existing countermeasures to these attacks lack a formal security characterization, making it difficult to verify their effectiveness. In this paper, we propose a novel framework for detecting information flows introduced by speculative execution and reasoning about software defenses. The theoretical foundation of our approach is speculative non-interference (SNI), a novel semantic notion of security against speculative execution attacks. SNI relates information leakage observed under a standard non-speculative semantics to leakage arising under semantics that explicitly model speculative execution. To capture their combined effects, we extend our framework with a mechanism to safely compose multiple speculative semantics, each focussing on a single aspect of speculation. This allows us to analyze the complex interactions and resulting leaks that can arise when multiple speculative mechanisms operate together. On the practical side, we develop Spectector, a symbolic analysis tool that uses our compositional framework and leverages SMT solvers to detect vulnerabilities and verify program security with respect to multiple speculation mechanisms. We demonstrate the effectiveness of Spectector through evaluations on standard security benchmarks and new vulnerability scenarios.
Authors:Quan Zhang, Lianhang Fu, Lvsi Lian, Gwihwan Go, Yujue Wang, Chijin Zhou, Yu Jiang, Geguang Pu
Abstract:
Equipping LLM agents with real-world tools can substantially improve productivity. However, granting agents autonomy over tool use also transfers the associated privileges to both the agent and the underlying LLM. Improper privilege usage may lead to serious consequences, including information leakage and infrastructure damage. While several benchmarks have been built to study agents' security, they often rely on pre-coded tools and restricted interaction patterns. Such crafted environments differ substantially from the real-world, making it hard to assess agents' security capabilities in critical privilege control and usage. Therefore, we propose GrantBox, a security evaluation sandbox for analyzing agent privilege usage. GrantBox automatically integrates real-world tools and allows LLM agents to invoke genuine privileges, enabling the evaluation of privilege usage under prompt injection attacks. Our results indicate that while LLMs exhibit basic security awareness and can block some direct attacks, they remain vulnerable to more sophisticated attacks, resulting in an average attack success rate of 84.80% in carefully crafted scenarios.
Authors:Tzu-Ti Wei, Yu-Han Tseng, Jun-Yi Lin, Yu-Chee Tseng, Jen-Jee Chen
Abstract:
Steganography conceals secret information within innocuous carriers while preserving visual fidelity and enabling reliable recovery. Recent unified networks operate normally under untriggered conditions but switch to hidden steganographic tasks when triggered. PUSNet follows this paradigm by performing image purification during normal operation and steganographic embedding when activated. However, it supports only a single user with one key pair, limiting its applicability in multi-user settings. We propose PUSNet-MK, a multi-key extension that enforces strict key isolation via a mismatched-key isolation loss, effectively preventing cross-key decoding when a wrong key is applied. This design preserves the intended steganographic behavior while addressing a critical security limitation of PUSNet. Extensive experiments demonstrate that PUSNet-MK produces high-quality stego images and accurate secret recovery, while preventing unintended information leakage.
Authors:Tanvir Ahmed, Yixuan Gao, Adnan Armouti, Rajalakshmi Nandakumar
Abstract:
We present mmFHE, the first system that enables fully homomorphic encryption (FHE) for end-to-end mmWave radar sensing. mmFHE encrypts raw range profiles on a lightweight edge device and executes the entire mmWave signal-processing and ML inference pipeline homomorphically on an untrusted cloud that operates exclusively on ciphertexts. At the core of mmFHE is a library of seven composable, data-oblivious FHE kernels that replace standard DSP routines with fixed arithmetic circuits. These kernels can be flexibly composed into different application-specific pipelines. We demonstrate this approach on two representative tasks: vital-sign monitoring and gesture recognition. We formally prove two cryptographic guarantees for any pipeline assembled from this library: input privacy, the cloud learns nothing about the sensor data; and data obliviousness, the execution trace is identical on the cloud regardless of the data being processed. These guarantees effectively neutralize various supervised and unsupervised privacy attacks on raw data, including re-identification and data-dependent privacy leakage. Evaluation on three public radar datasets (270 vital-sign recordings, 600 gesture trials) shows that encryption introduces negligible error: HR/RR MAE <10^-3 bpm versus plaintext, and 84.5% gesture accuracy (vs. 84.7% plaintext) with end-to-end cloud GPU latency of 103s for a 10s vital-sign window and 37s for a 3s gesture window. These results show that privacy-preserving end-to-end mmWave sensing is feasible on commodity hardware today.
Authors:Renuga Kanagavelu, Manjil Nepal, Ning Peiyan, Cai Kangning, Xu Jiming, Fei Gao, Yong Liu, Goh Siow Mong Rick, Qingsong Wei
Abstract:
In the modern financial system, combating money laundering is a critical challenge complicated by data privacy concerns and increasingly complex fraud transaction patterns. Although federated learning (FL) is a promising problem-solving approach as it allows institutions to train their models without sharing their data, it has the drawback of being prone to privacy leakage, specifically in tabular data forms like financial data. To address this, we propose DPxFin, a novel federated framework that integrates reputation-guided adaptive differential privacy. Our approach computes client reputation by evaluating the alignment between locally trained models and the global model. Based on this reputation, we dynamically assign differential privacy noise to client updates, enhancing privacy while maintaining overall model utility. Clients with higher reputations receive lower noise to amplify their trustworthy contributions, while low-reputation clients are allocated stronger noise to mitigate risk. We validate DPxFin on the Anti-Money Laundering (AML) dataset under both IID and non-IID settings using Multi Layer Perceptron (MLP). Experimental analysis established that our approach has a more desirable trade-off between accuracy and privacy than those of traditional FL and fixed-noise Differential Privacy (DP) baselines, where performance improvements were consistent, even though on a modest scale. Moreover, DPxFin does withstand tabular data leakage attacks, proving its effectiveness under real-world financial conditions.
Authors:Qiang Li, XiangRui Zhang, Haining Wang
Abstract:
Binary vulnerability analysis is increasingly performed by LLM-based agents in an iterative, multi-pass manner, with the model as the core decision-maker. However, how such systems organize exploration over hundreds of reasoning steps remains poorly understood, due to limited context windows and implicit token-level behaviors. We present the first large-scale, trace-level study showing that multi-pass LLM reasoning gives rise to structured, token-level implicit patterns. Analyzing 521 binaries with 99,563 reasoning steps, we identify four dominant patterns: early pruning, path-dependent lock-in, targeted backtracking, and knowledge-guided prioritization that emerge implicitly from reasoning traces. These token-level implicit patterns serve as an abstraction of LLM reasoning: instead of explicit control-flow or predefined heuristics, exploration is organized through implicit decisions regulating path selection, commitment, and revision. Our analysis shows these patterns form a stable, structured system with distinct temporal roles and measurable characteristics. Our results provide the first systematic characterization of LLM-driven binary analysis and a foundation for more reliable analysis systems.
Authors:Dimitris Mitropoulos, Nikolaos Alexopoulos, Georgios Alexopoulos, Diomidis Spinellis
Abstract:
Security code reviews increasingly rely on systems integrating Large Language Models (LLMs), ranging from interactive assistants to autonomous agents in CI/CD pipelines. We study whether confirmation bias (i.e., the tendency to favor interpretations that align with prior expectations) affects LLM-based vulnerability detection, and whether this failure mode can be exploited in software supply-chain attacks. We conduct two complementary studies. Study 1 quantifies confirmation bias through controlled experiments on 250 CVE vulnerability/patch pairs evaluated across four state-of-the-art models under five framing conditions for the review prompt. Framing a change as bug-free reduces vulnerability detection rates by 16-93%, with strongly asymmetric effects: false negatives increase sharply while false positive rates change little. Bias effects vary by vulnerability type, with injection flaws being more susceptible to them than memory corruption bugs. Study 2 evaluates exploitability in practice mimicking adversarial pull requests that reintroduce known vulnerabilities while framed as security improvements or urgent functionality fixes via their pull request metadata. Adversarial framing succeeds in 35% of cases against GitHub Copilot (interactive assistant) under one-shot attacks and in 88% of cases against Claude Code (autonomous agent) in real project configurations where adversaries can iteratively refine their framing to increase attack success. Debiasing via metadata redaction and explicit instructions restores detection in all interactive cases and 94% of autonomous cases. Our results show that confirmation bias poses a weakness in LLM-based code review, with implications on how AI-assisted development tools are deployed.
Authors:Georgios Alexopoulos, Nikolaos Alexopoulos, Thodoris Sotiropoulos, Charalambos Mitropoulos, Zhendong Su, Dimitris Mitropoulos
Abstract:
Python applications depend on native libraries that may be vendored within package distributions or installed on the host system. When vulnerabilities are discovered in these libraries, determining which Python packages are affected requires cross-ecosystem analysis spanning Python dependency graphs and OS package versions. Current vulnerability scanners produce false negatives by missing vendored vulnerabilities and false positives by ignoring security patches backported by OS distributions. We present a provenance-aware vulnerability analysis approach that resolves vendored libraries to specific OS package versions or upstream releases. Our approach queries vendored libraries against a database of historical OS package artifacts using content-based hashing, and applies library-specific dynamic analyses to extract version information from binaries built from upstream source. We then construct cross-ecosystem call graphs by stitching together Python and binary call graphs across dependency boundaries, enabling reachability analysis of vulnerable functions. Evaluating on 100,000 Python packages and 10 known CVEs associated with third-party native dependencies, we identify 39 directly vulnerable packages (47M+ monthly downloads) and 312 indirectly vulnerable client packages affected through dependency chains. Our analysis achieves up to 97% false positive reduction compared to upstream version matching.
Authors:Ferhat Ozgur Catak, Murat Kuzlu, Jungwon Seo, Umit Cali
Abstract:
This paper presents a federated learning framework secured by quantum key distribution (QKD) for wireless channel estimation and radar spectrum sensing in the next generation networks (NextG or Beyond 6G). A BB84-style protocol abstraction and pairwise additive masking are utilized to train clients' local models (CNN for channel estimation, U-Net for radar segmentation) and upload only masked model updates. The server aggregates without observing plain parameters; an eavesdropper without QKD keys cannot recover individual updates. Experiments show that secure FL achieves NMSE of 0.216 for channel estimation and 92.1\% accuracy with 0.72 mIoU for radar sensing. When an eavesdropper is present, QBER rises to $\sim$25\% and all rounds abort as intended; reconstruction error remains below $10^{-5}$, confirming correct aggregation.
Authors:Qian Li, Yunuo Chen, Yuntian Chen
Abstract:
Real-world backdoor attacks often require poisoned datasets to be stored and transmitted before being used to compromise deep learning systems. However, in the era of big data, the inevitable use of lossy compression poses a fundamental challenge to invisible backdoor attacks. We find that triggers embedded in RGB images often become ineffective after the images are lossily compressed into binary bitstreams (e.g., JPEG files) for storage and transmission. As a result, the poisoned data lose its malicious effect after compression, causing backdoor injection to fail. In this paper, we highlight the necessity of explicitly accounting for the lossy compression process in backdoor attacks. This requires attackers to ensure that the transmitted binary bitstreams preserve malicious trigger information, so that effective triggers can be recovered in the decompressed data. Building on the region-of-interest (ROI) coding mechanism in image compression, we propose two poisoning strategies tailored to inevitable lossy compression. First, we introduce Universal Attack Activation, a universal method that uses sample-specific ROI masks to reactivate trigger information in binary bitstreams for learned image compression (LIC). Second, we present Compression-Adapted Attack, a new attack strategy that employs customized ROI masks to encode trigger information into binary bitstreams and is applicable to both traditional codecs and LIC. Extensive experiments demonstrate the effectiveness of both strategies.
Authors:Mark Bun, Marco Gaboardi, Connor Wagaman
Abstract:
We resolve an open question of Jain, Raskhodnikova, Sivakumar, and Smith (ICML 2023) by exhibiting a problem separating differential privacy under continual observation in the oblivious and adaptive settings. The continual observation (a.k.a. continual release) model formalizes privacy for streaming algorithms, where data is received over time and output is released at each time step. In the oblivious setting, privacy need only hold for data streams fixed in advance; in the adaptive setting, privacy is required even for streams that can be chosen adaptively based on the streaming algorithm's output. We describe the first explicit separation between the oblivious and adaptive settings. The problem showing this separation is based on the correlated vector queries problem of Bun, Steinke, and Ullman (SODA 2017). Specifically, we present an $(\varepsilon,0)$-DP algorithm for the oblivious setting that remains accurate for exponentially many time steps in the dimension of the input. On the other hand, we show that every $(\varepsilon,δ)$-DP adaptive algorithm fails to be accurate after releasing output for only a constant number of time steps.
Authors:Shovon Paul, Md Imran Hossen, Xiali Hei
Abstract:
CAPTCHAs remain a critical defense against automated abuse, yet modern systems suffer from well-known limitations in usability, accessibility, and resistance to increasingly capable bots and low-cost CAPTCHA farms. Behavioral and puzzle-based mechanisms often impose cognitive burdens, collect extensive interaction data, or permit outsourcing to human solvers. In this paper, we present ThermoCAPTCHA, a novel privacy-preserving human verification system that uses real-time thermal imaging to detect live human presence without requiring users to solve challenges. A lightweight YOLOv4-tiny model identifies human heat signatures from a single thermal capture, while cryptographically bound traceable tokens prevent forwarding attacks by CAPTCHA farm workers. Our prototype achieves 96.70% detection accuracy with a 73.60 ms verification latency on a low-powered server. Comprehensive security evaluation, including MITM manipulation, spoofing attempts, adversarial perturbations, and misuse scenarios, shows that ThermoCAPTCHA withstands threats that commonly defeat behavioral CAPTCHAs. A user study with 50 participants, including visually challenged users, demonstrates improved accuracy, faster completion times, and higher perceived usability compared to reCAPTCHA v2.
Authors:Qingxiao Xu, Ze Sheng, Zhicheng Chen, Jeff Huang
Abstract:
Large language models (LLMs) have shown promise for automated patching, but their effectiveness depends strongly on how they are integrated into patching systems. While prior work explores prompting strategies and individual agent designs, the field lacks a systematic comparison of patching architectures. In this paper, we present a controlled evaluation of four LLM-based patching paradigms -- fixed workflow, single-agent system, multi-agent system, and general-purpose code agents -- using a unified benchmark and evaluation framework. We analyze patch correctness, failure modes, token usage, and execution time across real-world vulnerability tasks. Our results reveal clear architectural trade-offs: fixed workflows are efficient but brittle, single-agent systems balance flexibility and cost, and multi-agent designs improve generalization at the expense of substantially higher overhead and increased risk of reasoning drift on complex tasks. Surprisingly, general-purpose code agents achieve the strongest overall patching performance, benefiting from general-purpose tool interfaces that support effective adaptation across vulnerability types. Overall, we show that architectural design and iteration depth, rather than model capability alone, dominate the reliability and cost of LLM-based automated patching.
Authors:Berk Çakar, Dongyoon Lee, James C. Davis
Abstract:
Software engineers use regular expressions (regexes) across a wide range of domains and tasks. To support regexes, software projects must integrate a regex engine, whether provided natively by the language runtime (e.g., Python's re) or included as an external dependency (e.g., PCRE). However, these engines may contain bugs and introduce vulnerabilities. A common strategy for testing regex engines involves differential testing -- comparing outputs across different implementations. However, this approach is concerning because regex syntax and semantics vary significantly between dialects (e.g., POSIX vs. PCRE). Fuzzing is also utilized to ease testing of feature-rich regex implementations to expose defects, but naive byte-level mutations generate syntactically invalid inputs that exercise only parsing logic, not matching internals. In this work, we describe our progress towards ReTest, a framework that systematically tests regular expression engines by combining grammar-aware fuzzing for high code coverage with metamorphic testing to generate dialect-independent test oracles. So far, we have surveyed testing practices across 22 regex engines, analyzed 1,007 regex engine bugs and 156 CVEs to characterize failure modes, and curated 16 metamorphic relations for regexes derived from Kleene algebra. Our preliminary evaluation on PCRE shows that ReTest achieves 3x higher edge coverage than existing fuzzing approaches and has identified three new memory safety defects. We conclude by describing our next steps toward our ultimate goal: helping regex engine developers identify bugs without depending on a consistent cross-implementation standard.
Authors:Tomoya Matsumoto, Shokichi Takakura, Shun Takagi, Satoshi Hasegawa
Abstract:
SQL is the de facto interface for exploratory data analysis; however, releasing exact query results can expose sensitive information through membership or attribute inference attacks. Differential privacy (DP) provides rigorous privacy guarantees, but in practice, DP alone may not satisfy governance requirements such as the \emph{minimum frequency rule}, which requires each released group (cell) to include contributions from at least $k$ distinct individuals. In this paper, we present \textbf{DPSQL+}, a privacy-preserving SQL library that simultaneously enforces user-level $(\varepsilon,δ)$-DP and the minimum frequency rule. DPSQL+ adopts a modular architecture consisting of: (i) a \emph{Validator} that statically restricts queries to a DP-safe subset of SQL; (ii) an \emph{Accountant} that consistently tracks cumulative privacy loss across multiple queries; and (iii) a \emph{Backend} that interfaces with various database engines, ensuring portability and extensibility. Experiments on the TPC-H benchmark demonstrate that DPSQL+ achieves practical accuracy across a wide range of analytical workloads -- from basic aggregates to quadratic statistics and join operations -- and allows substantially more queries under a fixed global privacy budget than prior libraries in our evaluation.
Authors:Zhonghao Zhan, Krinos Li, Yefan Zhang, Hamed Haddadi
Abstract:
Edge deployment of LLM agents on IoT hardware introduces attack surfaces absent from cloud-hosted orchestration. We present an empirical security analysis of three architectures (cloud-hosted, edge-local swarm, and hybrid) using a multi-device home-automation testbed with local MQTT messaging and an Android smartphone as an edge inference node. We identify five systems-level attack surfaces, including two emergent failures observed during live testbed operation: coordination-state divergence and induced trust erosion. We frame core security properties as measurable systems metrics: data egress volume, failover window exposure, sovereignty boundary integrity, and provenance chain completeness. Our measurements show that edge-local deployments eliminate routine cloud data exposure but silently degrade sovereignty when fallback mechanisms trigger, with boundary crossings invisible at the application layer. Provenance chains remain complete under cooperative operation yet are trivially bypassed without cryptographic enforcement. Failover windows create transient blind spots exploitable for unauthorised actuation. These results demonstrate that deployment architecture, not just model or prompt design, is a primary determinant of security risk in agent-controlled IoT systems.
Authors:Yichen Liu, Berk Çakar, Aman Agrawal, Minseok Seo, James C. Davis, Dongyoon Lee
Abstract:
This paper presents the first systematic study of denial-of-service vulnerabilities in Regular Expressions with Backreferences (REwB). We introduce the Two-Phase Memory Automaton (2PMFA), an automaton model that precisely captures REwB semantics. Using this model, we derive necessary conditions under which backreferences induce super-linear backtracking runtime, even when sink ambiguity is linear -- a regime where existing detectors report no vulnerability. Based on these conditions, we identify three vulnerability patterns, develop detection and attack-construction algorithms, and validate them in practice. Using the Snort intrusion detection ruleset, our evaluation identifies 45 previously unknown REwB vulnerabilities with quadratic or worse runtime. We further demonstrate practical exploits against Snort, including slowing rule evaluation by 0.6-1.2 seconds and bypassing alerts by triggering PCRE's matching limit.
Authors:Michail Takaronis, Athanasia Kollarou, Vyron Kampourakis, Vasileios Gkioulos, Sokratis Katsikas
Abstract:
It is well established that industrial control systems comprise the operational backbone of modern critical infrastructures, yet their increasing connectivity exposes them to cyber threats that are difficult to study and remedy safely under real-time operational conditions. In this paper, we present ICSSPulse, an open-source, modular, and extensible penetration testing platform designed for the security assessment of ICS communication protocols. To the best of our knowledge, ICSSPulse is the first web-based platform that unifies network scanning, protocol-aware Modbus and OPC~UA interaction, and Large Language Model (LLM)-assisted reporting within a single, lightweight ecosystem. Our platform provides a user-friendly graphical interface that orchestrates enumeration, exploitation, and reporting activities over simulated industrial services, enabling safe and reproducible experimentation. It supports protocol-level discovery, asset enumeration, and controlled read/write interactions, while preserving protocol fidelity and operational transparency. Experimental evaluation using synthetic Modbus test servers, a Factory I/O water treatment scenario, and a custom OPC~UA production-line model demonstrated ICSSPulse's potential to discover active industrial services, enumerate process-relevant assets, and manipulate process variables. A key contribution of this work lies in the integration of an LLM-assisted reporting module that automatically translates technical findings into structured executive and technical reports, with mitigation guidance informed by the ICS MITRE ATT&CK ICS matrix.
Authors:Abdullah Caglar Oksuz, Anisa Halimi, Erman Ayday
Abstract:
Membership inference attacks (MIAs) threaten the privacy of machine learning models by revealing whether a specific data point was used during training. Existing MIAs often rely on impractical assumptions such as access to public datasets, shadow models, confidence scores, or training data distribution knowledge and making them vulnerable to defenses like confidence masking and adversarial regularization. Label-only MIAs, even under strict constraints suffer from high query requirements per sample. We propose a cost-effective label-only MIA framework based on transferability and model extraction. By querying the target model M using active sampling, perturbation-based selection, and synthetic data, we extract a functionally similar surrogate S on which membership inference is performed. This shifts query overhead to a one-time extraction phase, eliminating repeated queries to M . Operating under strict black-box constraints, our method matches the performance of state-of-the-art label-only MIAs while significantly reducing query costs. On benchmarks including Purchase, Location, and Texas Hospital, we show that a query budget equivalent to testing $\approx1\%$ of training samples suffices to extract S and achieve membership inference accuracy within $\pm1\%$ of M . We also evaluate the effectiveness of standard defenses proposed for label-only MIAs against our attack.
Authors:Darsh Asher, Farshad Dizani, Joshua Kalyanapu, Rosario Cammarota, Aydin Aysu, Samira Mirbagher Ajorpaz
Abstract:
Backend enrichment is now widely deployed in sensitive domains such as product recommendation pipelines, healthcare, and finance, where models are trained on confidential data and retrieve private features whose values influence inference behavior while remaining hidden from the API caller. This paper presents the first hardware-level backend retrieval data-stealing attack, showing that accelerator optimizations designed for performance can directly undermine data confidentiality and bypass state-of-the-art privacy defenses. Our attack, FEATUREBLEED, exploits zero-skipping in AI accelerators to infer private backend-retrieved features solely through end-to-end timing, without relying on power analysis, DVFS manipulation, or shared-cache side channels. We evaluate FEATUREBLEED on three datasets spanning medical and non-medical domains: Texas-100X (clinical records), OrganAMNIST (medical imaging), and Census-19 (socioeconomic data). We further evaluate FEATUREBLEED across three hardware backends (Intel AVX, Intel AMX, and NVIDIA A100) and three model architectures (DNNs, CNNs, and hybrid CNN-MLP pipelines), demonstrating that the leakage generalizes across CPU and GPU accelerators, data modalities, and application domains, with an adversarial advantage of up to 98.87 percentage points. Finally, we identify the root cause of the leakage as sparsity-driven zero-skipping in modern hardware. We quantify the privacy-performance-power trade-off: disabling zero-skipping increases Intel AMX per-operation energy by up to 25 percent and incurs 100 percent performance overhead. We propose a padding-based defense that masks timing leakage by equalizing responses to the worst-case execution time, achieving protection with only 7.24 percent average performance overhead and no additional power cost.
Authors:Diego Soi, Silvia Lucia Sanna, Lorenzo Pisu, Leonardo Regano, Giorgio Giacinto
Abstract:
In recent years, stealthy Android malware has increasingly adopted sophisticated techniques to bypass automatic detection mechanisms and harden manual analysis. Adversaries typically rely on obfuscation, anti-repacking, steganography, poisoning, and evasion techniques to AI-based tools, and in-memory execution to conceal malicious functionality. In this paper, we investigate WebAssembly (Wasm) as a novel technique for hiding malicious payloads and evading traditional static analysis and signature-matching mechanisms. While Wasm is typically employed to render specific gaming activities and interact with the native components in web browsers, we provide an in-depth analysis on the mechanisms Android may employ to include Wasm modules in its execution pipeline. Additionally, we provide Proofs-of-Concept to demonstrate a threat model in which an attacker embeds and executes malicious routines, effectively bypassing IoC detection by industrial state-of-the-art tools, like VirusTotal and MobSF.
Authors:Rohit Chatterjee, Yunqi Li, Prashant Nalini Vasudevan
Abstract:
We study the implications of the existence of weak Zero-Knowledge (ZK) protocols for worst-case hard languages. These are protocols that have completeness, soundness, and zero-knowledge errors (denoted $ε_c$, $ε_s$, and $ε_z$, respectively) that might not be negligible. Under the assumption that there are worst-case hard languages in NP, we show the following: 1. If all languages in NP have NIZK proofs or arguments satisfying $ ε_c+ε_s+ ε_z < 1 $, then One-Way Functions (OWFs) exist. This covers all possible non-trivial values for these error rates. It additionally implies that if all languages in NP have such NIZK proofs and $ε_c$ is negligible, then they also have NIZK proofs where all errors are negligible. Previously, these results were known under the more restrictive condition $ ε_c+\sqrt{ε_s}+ε_z < 1 $ [Chakraborty et al., CRYPTO 2025]. 2. If all languages in NP have $k$-round public-coin ZK proofs or arguments satisfying $ ε_c+ε_s+(2k-1).ε_z < 1 $, then OWFs exist. 3. If, for some constant $k$, all languages in NP have $k$-round public-coin ZK proofs or arguments satisfying $ ε_c+ε_s+k.ε_z < 1 $, then infinitely-often OWFs exist.
Authors:Tingting Tang, Yongqin Wang, Murali Annavaram
Abstract:
Secure Multi-party Computation (MPC) enables untrusted parties to jointly compute a function without revealing their inputs. Its application to machine learning (ML) has gained significant attention, particularly for secure inference services deployed across multiple cloud virtual machines (VMs), where each VM acts as an MPC party. Model providers secret-share model weights, and users secret-share inputs, ensuring that each server operates only on random shares. While MPC provides strong cryptographic guarantees, it incurs substantial computational and communication overhead. Deep neural networks rely heavily on convolutional and fully connected layers, which require costly matrix multiplications in MPC. To reduce this cost, we propose leveraging low-rank decomposition (LRD) for linear layers, replacing one large matrix multiplication with two smaller ones. Each matrix multiplication in MPC incurs a round of communication, meaning decomposing one matrix multiplication into two leads to an additional communication round. Second, the added matrix multiplication requires an additional truncation step to maintain numerical precision. Since truncation itself requires communication and computation, these overheads can offset the gains from decomposition. To address this, we introduce two complementary optimizations: truncation skipping and efficient linear layer concatenation. Truncation skipping removes the extra truncation induced by LRD, while linear layer concatenation pipelines operations to hide the additional communication round. Together, these techniques mitigate the main overheads of LRD in MPC and improve overall efficiency. Our approach is broadly applicable across MPC protocols. Experiments show up to 25% speedup in n-PC and 33% in 3-PC protocols over full-rank baselines, along with up to 52% GPU energy savings and 88% reduction in offline-phase latency.
Authors:Anushri Eswaran, Oleg Golev, Darshan Tank, Sidhant Rahi, Himanshu Tyagi
Abstract:
Modern analyst agents must reason over complex, high token inputs, including dozens of retrieved documents, tool outputs, and time sensitive data. While prior work has produced tool calling benchmarks and examined factuality in knowledge augmented systems, relatively little work studies their intersection: settings where LLMs must integrate large volumes of dynamic, structured and unstructured multi tool outputs. We investigate LLM failure modes in this regime using crypto as a representative high data density domain. We introduce (1) CryptoAnalystBench, an analyst aligned benchmark of 198 production crypto and DeFi queries spanning 11 categories; (2) an agentic harness equipped with relevant crypto and DeFi tools to generate responses across multiple frontier LLMs; and (3) an evaluation pipeline with citation verification and an LLM as a judge rubric spanning four user defined success dimensions: relevance, temporal relevance, depth, and data consistency. Using human annotation, we develop a taxonomy of seven higher order error types that are not reliably captured by factuality checks or LLM based quality scoring. We find that these failures persist even in state of the art systems and can compromise high stakes decisions. Based on this taxonomy, we refine the judge rubric to better capture these errors. While the judge does not align with human annotators on precise scoring across rubric iterations, it reliably identifies critical failure modes, enabling scalable feedback for developers and researchers studying analyst style agents. We release CryptoAnalystBench with annotated queries, the evaluation pipeline, judge rubrics, and the error taxonomy, and outline mitigation strategies and open challenges in evaluating long form, multi tool augmented systems.
Authors:Tianyi Wang, Huawei Fan, Yuanchao Shu, Peng Cheng, Cong Wang
Abstract:
Large Language Models face an emerging and critical threat known as latency attacks. Because LLM inference is inherently expensive, even modest slowdowns can translate into substantial operating costs and severe availability risks. Recently, a growing body of research has focused on algorithmic complexity attacks by crafting inputs to trigger worst-case output lengths. However, we report a counter-intuitive finding that these algorithmic latency attacks are largely ineffective against modern LLM serving systems. We reveal that system-level optimization such as continuous batching provides a logical isolation to mitigate contagious latency impact on co-located users. To this end, in this paper, we shift the focus from the algorithm to the system layer, and introduce a new Fill and Squeeze attack strategy targeting the state transition of the scheduler. "Fill" first exhausts the global KV cache to induce Head-of-Line blocking, while "Squeeze" forces the system into repetitive preemption. By manipulating output lengths using methods from simple plain-text prompts to more complex prompt engineering, and leveraging side-channel probing of memory status, we demonstrate that the attack can be orchestrated in a black-box setting with much less cost. Extensive evaluations indicate by up to 20-280x average slowdown on Time to First Token and 1.5-4x average slowdown on Time Per Output Token compared to existing attacks with 30-40% lower attack cost.
Authors:Cen Zhang, Younggi Park, Fabian Fleischer, Yu-Fu Fu, Jiho Kim, Dongkwan Kim, Youngjoon Kim, Qingxiao Xu, Andrew Chin, Ze Sheng, Hanqing Zhao, Brian J. Lee, Joshua Wang, Michael Pelican, David J. Musliner, Jeff Huang, Jon Silliman, Mikel Mcdaniel, Jefferson Casavant, Isaac Goldthwaite, Nicholas Vidovich, Matthew Lehman, Taesoo Kim
Abstract:
DARPA's AI Cyber Challenge (AIxCC, 2023--2025) is the largest competition to date for building fully autonomous cyber reasoning systems (CRSs) that leverage recent advances in AI -- particularly large language models (LLMs) -- to discover and remediate vulnerabilities in real-world open-source software. This paper presents the first systematic analysis of AIxCC. Drawing on design documents, source code, execution traces, and discussions with organizers and competing teams, we examine the competition's structure and key design decisions, characterize the architectural approaches of finalist CRSs, and analyze competition results beyond the final scoreboard. Our analysis reveals the factors that truly drove CRS performance, identifies genuine technical advances achieved by teams, and exposes limitations that remain open for future research. We conclude with lessons for organizing future competitions and broader insights toward deploying autonomous CRSs in practice.
Authors:Juefei Pu, Xingyu Li, Zhengchuan Liang, Jonathan Cox, Yifan Wu, Kareem Shehada, Arrdya Srivastav, Zhiyun Qian
Abstract:
Autonomous large language model (LLM) based systems have recently shown promising results across a range of cybersecurity tasks. However, there is no systematic study on their effectiveness in autonomously reproducing Linux kernel vulnerabilities with concrete proofs-of-concept (PoCs). Owing to the size, complexity, and low-level nature of the Linux kernel, such tasks are widely regarded as particularly challenging for current LLM-based approaches. In this paper, we present the first large-scale study of LLM-based Linux kernel vulnerability reproduction. For this purpose, we develop K-Repro, an LLM-based agentic system equipped with controlled code-browsing, virtual machine management, interaction, and debugging capabilities. Using kernel security patches as input, K-Repro automates end-to-end bug reproduction of N-day vulnerabilities in the Linux kernel. On a dataset of 100 real-world exploitable Linux kernel vulnerabilities collected from KernelCTF, our results show that K-Repro can generate PoCs that reproduce over 50\% of the cases with practical time and monetary cost. Beyond aggregate success rates, we perform an extensive study of effectiveness, efficiency, stability, and impact factors to explain when agentic reproduction succeeds, where it fails, and which components drive performance. These findings provide actionable guidance for building more reliable autonomous security agents and for assessing real-world N-day risk from both offensive and defensive perspectives.
Authors:Qi Sun, Ahmed Abdo, Luis Burbano, Ziyang Li, Yaxing Yao, Alvaro Cardenas, Yinzhi Cao
Abstract:
Autonomous Vehicles (AVs), especially vision-based AVs, are rapidly being deployed without human operators. As AVs operate in safety-critical environments, understanding their robustness in an adversarial environment is an important research problem. Prior physical adversarial attacks on vision-based autonomous vehicles predominantly target immediate safety failures (e.g., a crash, a traffic-rule violation, or a transient lane departure) by inducing a short-lived perception or control error. This paper shows a qualitatively different risk: a long-horizon route integrity compromise, where an attacker gradually steers a victim AV away from its intended route and into an attacker-chosen destination while the victim continues to drive "normally." This will not pose a danger to the victim vehicle itself, but also to potential passengers sitting inside the vehicle. In this paper, we design and implement the first adversarial framework, called JackZebra, that performs route-level hijacking of a vision-based end-to-end driving stack using a physically plausible attacker vehicle with a reconfigurable display mounted on the rear. The central challenge is temporal persistence: adversarial influence must remain effective in changing viewpoints, lighting, weather, traffic, and the victim's continual replanning -- without triggering conspicuous failures. Our key insight is to treat route hijacking as a closed-loop control problem and to convert adversarial patches into steering primitives that can be selected online via an interactive adjustment loop. Our adversarial patches are also carefully designed against worst-case background and sensor variations so that the adversarial impacts on the victim. Our evaluation shows that JackZebra can successfully hijack victim vehicles to deviate from original routes and stop at adversarial destinations with a high success rate.
Authors:Yunlong Lyu, Yixuan Tang, Peng Chen, Tian Dong, Xinyu Wang, Zhiqiang Dong, Hao Chen
Abstract:
Modern AI-integrated IDEs are shifting from passive code completion to proactive Next Edit Suggestions (NES). Unlike traditional autocompletion, NES is designed to construct a richer context from both recent user interactions and the broader codebase to suggest multi-line, cross-line, or even cross-file modifications. This evolution significantly streamlines the programming workflow into a tab-by-tab interaction and enhances developer productivity. Consequently, NES introduces a more complex context retrieval mechanism and sophisticated interaction patterns. However, existing studies focus almost exclusively on the security implications of standalone LLM-based code generation, ignoring the potential attack vectors posed by NES in modern AI-integrated IDEs. The underlying mechanisms of NES remain under-explored, and their security implications are not yet fully understood. In this paper, we conduct the first systematic security study of NES systems. First, we perform an in-depth dissection of the NES mechanisms to understand the newly introduced threat vectors. It is found that NES retrieves a significantly expanded context, including inputs from imperceptible user actions and global codebase retrieval, which increases the attack surfaces. Second, we conduct a comprehensive in-lab study to evaluate the security implications of NES. The evaluation results reveal that NES is susceptible to context poisoning and is sensitive to transactional edits and human-IDE interactions. Third, we perform a large-scale online survey involving over 200 professional developers to assess the perceptions of NES security risks in real-world development workflows. The survey results indicate a general lack of awareness regarding the potential security pitfalls associated with NES, highlighting the need for increased education and improved security countermeasures in AI-integrated IDEs.
Authors:Enrique Feito-Casares, Francisco M. Melgarejo-Meseguer, Elena Casiraghi, Giorgio Valentini, José-Luis Rojo-Álvarez
Abstract:
The rapid expansion of Internet of Things (IoT) ecosystems has led to increasingly complex and heterogeneous network topologies. Traditional network monitoring and visualization tools rely on aggregated metrics or static representations, which fail to capture the evolving relationships and structural dependencies between devices. Although Graph Neural Networks (GNNs) offer a powerful way to learn from relational data, their internal representations often remain opaque and difficult to interpret for security-critical operations. Consequently, this work introduces an interpretable pipeline that generates directly visualizable low-dimensional representations by mapping high-dimensional embeddings onto a latent manifold. This projection enables the interpretable monitoring and interoperability of evolving network states, while integrated feature attribution techniques decode the specific characteristics shaping the manifold structure. The framework achieves a classification F1-score of 0.830 for intrusion detection while also highlighting phenomena such as concept drift. Ultimately, the presented approach bridges the gap between high-dimensional GNN embeddings and human-understandable network behavior, offering new insights for network administrators and security analysts.
Authors:Abdelkader El Mahdaouy, Issam Ait Yahia, Soufiane Oualil, Ismail Berrada
Abstract:
Network Intrusion Detection Systems (NIDS) have progressively shifted from signature-based techniques toward machine learning and, more recently, deep learning methods. Meanwhile, the widespread adoption of encryption has reduced payload visibility, weakening inspection pipelines that depend on plaintext content and increasing reliance on flow-level telemetry such as NetFlow and IPFIX. Many current learning-based detectors still frame intrusion detection as per-flow classification, implicitly treating each flow record as an independent sample. This assumption is often violated in realistic attack campaigns, where evidence is distributed across multiple flows and hosts, spanning minutes to days through staged execution, beaconing, lateral movement, and exfiltration. This paper synthesizes recent research on context-aware deep learning for flow-based intrusion detection. We organize existing methods into a four-dimensional taxonomy covering temporal context, graph or relational context, multimodal context, and multi-resolution context. Beyond modeling, we emphasize rigorous evaluation and operational realism. We review common failure modes that can inflate reported results, including temporal leakage, data splitting, dataset design flaws, limited dataset diversity, and weak cross-dataset generalization. We also analyze practical constraints that shape deployability, such as streaming state management, memory growth, latency budgets, and model compression choices. Overall, the literature suggests that context can meaningfully improve detection when attacks induce measurable temporal or relational structure, but the magnitude and reliability of these gains depend strongly on rigorous, causal evaluation and on datasets that capture realistic diversity.
Authors:Yangfan Deng, Anirudh Nakra, Min Wu
Abstract:
3D content acquisition and creation are expanding rapidly in the new era of machine learning and AI. 3D Gaussian Splatting (3DGS) has become a promising high-fidelity and real-time representation for 3D content. Similar to the initial wave of digital audio-visual content at the turn of the millennium, the demand for intellectual property protection is also increasing, since explicit and editable 3D parameterization makes unauthorized use and dissemination easier. In this position paper, we argue that effective progress in watermarking 3D assets requires articulated security objectives and realistic threat models, incorporating the lessons learned from digital audio-visual asset protection over the past decades. To address this gap in security specification and evaluation, we advocate a scenario-driven formulation, in which adversarial capabilities are formalized through a security model. Based on this formulation, we construct a reference framework that organizes existing methods and clarifies how specific design choices map to corresponding adversarial assumptions. Within this framework, we also examine a legacy spread-spectrum embedding scheme, characterizing its advantages and limitations and highlighting the important trade-offs it entails. Overall, this work aims to foster effective intellectual property protection for 3D assets.
Authors:Abdurrahman Elmaghbub, Bechir Hamdaoui
Abstract:
Deep Learning-based RF fingerprinting approaches struggle to perform well in cross-domain scenarios, particularly during hardware warm-up. This often-overlooked vulnerability has been jeopardizing their reliability and their adoption in practical settings. To address this critical gap, in this work, we first dive deep into the anatomy of RF fingerprints, revealing insights into the temporal fingerprinting variations during and post hardware stabilization. Introducing HEEDFUL, a novel framework harnessing sequential transfer learning and targeted impairment estimation, we then address these challenges with remarkable consistency, eliminating blind spots even during challenging warm-up phases. Our evaluation showcases HEEDFUL's efficacy, achieving remarkable classification accuracies of up to 96% during the initial device operation intervals-far surpassing traditional models. Furthermore, cross-day and cross-protocol assessments confirm HEEDFUL's superiority, achieving and maintaining high accuracy during both the stable and initial warm-up phases when tested on WiFi signals. Additionally, we release WiFi type B and N RF fingerprint datasets that, for the first time, incorporate both the time-domain representation and real hardware impairments of the frames. This underscores the importance of leveraging hardware impairment data, enabling a deeper understanding of fingerprints and facilitating the development of more robust RF fingerprinting solutions.
Authors:Waleed Khan Mohammed, Zahirul Arief Irfan Bin Shahrul Anuar, Mousa Sufian Mousa Mitani, Hezerul Abdul Karim, Nouar AlDahoul
Abstract:
Advanced Persistent Threats (APTs) are among the most challenging cyberattacks to detect. They are carried out by highly skilled attackers who carefully study their targets and operate in a stealthy, long-term manner. Because APTs exhibit "low-and-slow" behavior, traditional statistical methods and shallow machine learning techniques often fail to detect them. Previous research on APT detection has explored machine learning approaches and provenance graph analysis. However, provenance-based methods often fail to capture the semantic intent behind system activities. This paper proposes a novel anomaly detection approach that leverages semantic embeddings generated by Large Language Models (LLMs). The method enhances APT detection by extracting meaningful semantic representations from unstructured system log data. First, raw system logs are transformed into high-dimensional semantic embeddings using a pre-trained transformer model. These embeddings are then analyzed using an Autoencoder (AE) to identify anomalous and potentially malicious patterns. The proposed method is evaluated using the DARPA Transparent Computing (TC) dataset, which contains realistic APT attack scenarios generated by red teams in live environments. Experimental results show that the AE trained on LLM-derived embeddings outperforms widely used unsupervised baseline methods, including Isolation Forest (IForest), One-Class Support Vector Machine (OC-SVM), and Principal Component Analysis (PCA). Performance is measured using the Area Under the Receiver Operating Characteristic Curve (AUC-ROC), where the proposed approach consistently achieves superior results, even in complex threat scenarios. These findings highlight the importance of semantic understanding in detecting non-linear and stealthy attack behaviors that are often missed by conventional detection techniques.
Authors:Haitham S. Al-Sinani, Chris J. Mitchell
Abstract:
Wireless ethical hacking relies heavily on skilled practitioners manually interpreting reconnaissance results and executing complex, time-sensitive sequences of commands to identify vulnerable targets, capture authentication handshakes, and assess password resilience; a process that is inherently labour-intensive, difficult to scale, and prone to subjective judgement and human error. To help address these limitations, we propose WiFiPenTester, an experimental, governed, and reproducible system for GenAI-enabled wireless ethical hacking. The system integrates large language models into the reconnaissance and decision-support phases of wireless security assessment, enabling intelligent target ranking, attack feasibility estimation, and strategy recommendation, while preserving strict human-in-the-loop control and budget-aware execution. We describe the system architecture, threat model, governance mechanisms, and prompt-engineering methodology, and empirical experiments conducted across multiple wireless environments. The results demonstrate that GenAI assistance improves target selection accuracy and overall assessment efficiency, while maintaining auditability and ethical safeguards. This indicates that WiFiPenTester is a meaningful step toward practical, safe, and scalable GenAI-assisted wireless penetration testing, while reinforcing the necessity of bounded autonomy, human oversight, and rigorous governance mechanisms when deploying GenAI in ethical hacking.
Authors:Stefan Schott, Serena Elisa Ponta, Wolfram Fischer, Jonas Klauke, Eric Bodden
Abstract:
Open-source software (OSS) dependencies are a dominant component of modern software code bases. Using proven and well-tested OSS components lets developers reduce development time and cost while improving quality. However, heavy reliance on open-source software also introduces significant security risks, including the incorporation of known vulnerabilities into the codebase. To mitigate these risks, metadata-based dependency scanners, which are lightweight and fast, and code-centric scanners, which enable the detection of modified dependencies hidden from metadata-based approaches, have been developed. In this paper, we present Unshade, a hybrid approach towards dependency scanning in Java that combines the efficiency of metadata-based scanning with the ability to detect modified dependencies of code-centric approaches. Unshade first augments a Java project's software bill of materials (SBOM) by identifying modified and hidden dependencies via a bytecode-based fingerprinting mechanism. This augmented SBOM is then passed to a metadata-based vulnerability scanner to identify known vulnerabilities in both declared and newly revealed dependencies. Leveraging Unshade's high scalability, we conducted a large-scale study of the 1,808 most popular open-source Java Maven projects on GitHub. The results show that nearly 50% of these projects contain at least one modified, hidden dependency associated with a known vulnerability. On average, each affected project includes more than eight such hidden vulnerable dependencies, all missed by traditional metadata-based scanners. Overall, Unshade identified 7,712 unique CVEs in hidden dependencies that would remain undetected when relying on metadata-based scanning alone.
Authors:Jaehee Kim, Pilsung Kang
Abstract:
Modern LLMs are increasingly accessed via black-box APIs, requiring users to transmit sensitive prompts, outputs, and fine-tuning data to external providers, creating a critical privacy risk at the API boundary. We introduce AlienLM, a deployable API-only privacy layer that protects text by translating it into an Alien Language via a vocabulary-scale bijection, enabling lossless recovery on the client side. Using only standard fine-tuning APIs, Alien Adaptation Training (AAT) adapts target models to operate directly on alienized inputs. Across four LLM backbones and seven benchmarks, AlienLM retains over 81\% of plaintext-oracle performance on average, substantially outperforming random-bijection and character-level baselines. Under adversaries with access to model weights, corpus statistics, and learning-based inverse translation, recovery attacks reconstruct fewer than 0.22\% of alienized tokens. Our results demonstrate a practical pathway for privacy-preserving LLM deployment under API-only access, substantially reducing plaintext exposure while maintaining task performance.
Authors:Weizhi Liu, Yue Li, Zhaoxia Yin
Abstract:
Generated speech achieves human-level naturalness but escalates security risks of misuse. However, existing watermarking methods fail to reconcile fidelity with robustness, as they rely either on simple superposition in the noise space or on intrusive alterations to model weights. To bridge this gap, we propose VocBulwark, an additional-parameter injection framework that freezes generative model parameters to preserve perceptual quality. Specifically, we design a Temporal Adapter to deeply entangle watermarks with acoustic attributes, synergizing with a Coarse-to-Fine Gated Extractor to resist advanced attacks. Furthermore, we develop an Accuracy-Guided Optimization Curriculum that dynamically orchestrates gradient flow to resolve the optimization conflict between fidelity and robustness. Comprehensive experiments demonstrate that VocBulwark achieves high-capacity and high-fidelity watermarking, offering robust defense against complex practical scenarios, with resilience to Codec regenerations and variable-length manipulations.
Authors:Amirhossein Taherpour, Xiaodong Wang
Abstract:
Federated learning (FL) enables collaborative model training while preserving data privacy, yet both centralized and decentralized approaches face challenges in scalability, security, and update validation. We propose ZK-HybridFL, a secure decentralized FL framework that integrates a directed acyclic graph (DAG) ledger with dedicated sidechains and zero-knowledge proofs (ZKPs) for privacy-preserving model validation. The framework uses event-driven smart contracts and an oracle-assisted sidechain to verify local model updates without exposing sensitive data. A built-in challenge mechanism efficiently detects adversarial behavior. In experiments on image classification and language modeling tasks, ZK-HybridFL achieves faster convergence, higher accuracy, lower perplexity, and reduced latency compared to Blade-FL and ChainFL. It remains robust against substantial fractions of adversarial and idle nodes, supports sub-second on-chain verification with efficient gas usage, and prevents invalid updates and orphanage-style attacks. This makes ZK-HybridFL a scalable and secure solution for decentralized FL across diverse environments.
Authors:Kahraman Kostas, Rabia Yasa Kostas
Abstract:
This paper critically examines the device identification process using machine learning, addressing common pitfalls in existing literature. We analyze the trade-offs between identification methods (unique vs. class based), data heterogeneity, feature extraction challenges, and evaluation metrics. By highlighting specific errors, such as improper data augmentation and misleading session identifiers, we provide a robust guideline for researchers to enhance the reproducibility and generalizability of IoT security models.
Authors:Jean-Guillaume Dumas, Aude Maignan, Luiza Soezima
Abstract:
Private Set Multi-Party Computations are protocols that allow parties to jointly and securely compute functions: apart from what is deducible from the output of the function, the input sets are kept private. Then, a Private Set Union (PSU), resp. Intersection (PSI), is a protocol that allows parties to jointly compute the union, resp. the intersection, between their private sets. Now a structured PSI, is a PSI where some structure of the sets can allow for more efficient protocols. For instance in Fuzzy PSI, elements only need to be close enough, instead of equal, to be part of the intersection. We present in this paper, Fuzzy PSU protocols (FPSU), able to efficiently take into account approximations in the union. For this, we introduce a new efficient sub-protocol, called Oblivious Key Homomorphic Encryption Retrieval (OKHER), improving on Oblivious Key-Value Retrieval (OKVR) techniques in our setting. In the fuzzy context, the receiver set $X=\{x_i\}_{1..n}$ is replaced by ${\mathcal B}_δ(X)$, the union of $n$ balls of dimension $d$ with radius $δ$, centered at the $x_i$. The sender set is just its $m$ points of dimension $d$. Then the FPSU functionality corresponds to $X \sqcup \{y \in Y, y \notin {\mathcal B}_δ(X)\}$. Thus, we formally define the FPSU functionality and security properties, and propose several protocols tuned to the patterns of the balls using the $l_\infty$ distance. Using our OKHER routine and homomorphic encryption, we are for instance able to obtain a FPSU protocols with an asymptotic communication volume bound ranging from $O(dm\log(δ{n}))$ to $O(d^2m\log(δ^2n))$, depending on the receiver data set structure.
Authors:Satyapriya Krishna, Matteo Memelli, Tong Wang, Abhinav Mohanty, Claire O'Brien Rajkumar, Payal Motwani, Rahul Gupta, Spyros Matsoukas
Abstract:
Amazon published its Frontier Model Safety Framework (FMSF) as part of the Paris AI summit, following which we presented a report on Amazon's Premier model. In this report, we present an evaluation of Nova 2.0 Lite. Nova 2.0 Lite was made generally available from amongst the Nova 2.0 series and is one of its most capable reasoning models. The model processes text, images, and video with a context length of up to 1M tokens, enabling analysis of large codebases, documents, and videos in a single prompt. We present a comprehensive evaluation of Nova 2.0 Lite's critical risk profile under the FMSF. Evaluations target three high-risk domains-Chemical, Biological, Radiological and Nuclear (CBRN), Offensive Cyber Operations, and Automated AI R&D-and combine automated benchmarks, expert red-teaming, and uplift studies to determine whether the model exceeds release thresholds. We summarize our methodology and report core findings. We will continue to enhance our safety evaluation and mitigation pipelines as new risks and capabilities associated with frontier models are identified.
Authors:Anjanava Biswas, Wrick Talukdar
Abstract:
The AI era has ushered in Large Language Models (LLM) to the technological forefront, which has been much of the talk in 2023, and is likely to remain as such for many years to come. LLMs are the AI models that are the power house behind generative AI applications such as ChatGPT. These AI models, fueled by vast amounts of data and computational prowess, have unlocked remarkable capabilities, from human-like text generation to assisting with natural language understanding (NLU) tasks. They have quickly become the foundation upon which countless applications and software services are being built, or at least being augmented with. However, as with any groundbreaking innovations, the rise of LLMs brings forth critical safety, privacy, and ethical concerns. These models are found to have a propensity to leak private information, produce false information, and can be coerced into generating content that can be used for nefarious purposes by bad actors, or even by regular users unknowingly. Implementing safeguards and guardrailing techniques is imperative for applications to ensure that the content generated by LLMs are safe, secure, and ethical. Thus, frameworks to deploy mechanisms that prevent misuse of these models via application implementations is imperative. In this study, wepropose a Flexible Adaptive Sequencing mechanism with trust and safety modules, that can be used to implement safety guardrails for the development and deployment of LLMs.
Authors:Advije Rizvani, Giovanni Apruzzese, Pavel Laskov
Abstract:
Large Language Models (LLMs) are increasingly adopted in the financial domain. Their exceptional capabilities to analyse textual data make them well-suited for inferring the sentiment of finance-related news. Such feedback can be leveraged by algorithmic trading systems (ATS) to guide buy/sell decisions. However, this practice bears the risk that a threat actor may craft "adversarial news" intended to mislead an LLM. In particular, the news headline may include "malicious" content that remains invisible to human readers but which is still ingested by the LLM. Although prior work has studied textual adversarial examples, their system-wide impact on LLM-supported ATS has not yet been quantified in terms of monetary risk. To address this threat, we consider an adversary with no direct access to an ATS but able to alter stock-related news headlines on a single day. We evaluate two human-imperceptible manipulations in a financial context: Unicode homoglyph substitutions that misroute models during stock-name recognition, and hidden-text clauses that alter the sentiment of the news headline. We implement a realistic ATS in Backtrader that fuses an LSTM-based price forecast with LLM-derived sentiment (FinBERT, FinGPT, FinLLaMA, and six general-purpose LLMs), and quantify monetary impact using portfolio metrics. Experiments on real-world data show that manipulating a one-day attack over 14 months can reliably mislead LLMs and reduce annual returns by up to 17.7 percentage points. To assess real-world feasibility, we analyze popular scraping libraries and trading platforms and survey 27 FinTech practitioners, confirming our hypotheses. We notified trading platform owners of this security issue.
Authors:Hao Lyu, Jingzheng Wu, Xiang Ling, Yicheng Zhong, Zhiyuan Li, Tianyue Luo
Abstract:
The Instruction Set Architecture (ISA) defines processor operations and serves as the interface between hardware and software. As an open ISA, RISC-V lowers the barriers to processor design and encourages widespread adoption, but also exposes processors to security risks such as functional bugs. Processor fuzzing is a powerful technique for automatically detecting these bugs. However, existing fuzzing methods suffer from two main limitations. First, their emphasis on redundant test case generation causes them to overlook cross-processor corner cases. Second, they rely too heavily on coverage guidance. Current coverage metrics are biased and inefficient, and become ineffective once coverage growth plateaus. To overcome these limitations, we propose SimFuzz, a fuzzing framework that constructs a high-quality seed corpus from historical bug-triggering inputs and employs similarity-guided, block-level mutation to efficiently explore the processor input space. By introducing instruction similarity, SimFuzz expands the input space around seeds while preserving control-flow structure, enabling deeper exploration without relying on coverage feedback. We evaluate SimFuzz on three widely used open-source RISC-V processors: Rocket, BOOM, and XiangShan, and discover 17 bugs in total, including 14 previously unknown issues, 7 of which have been assigned CVE identifiers. These bugs affect the decode and memory units, cause instruction and data errors, and can lead to kernel instability or system crashes. Experimental results show that SimFuzz achieves up to 73.22% multiplexer coverage on the high-quality seed corpus. Our findings highlight critical security bugs in mainstream RISC-V processors and offer actionable insights for improving functional verification.
Authors:Shuai Zhang, Minzhao Lyu, Hassan Habibi Gharakheili
Abstract:
Modern digital ecosystems, spanning software, hardware, learning models, datasets, and cryptographic products, continue to grow in complexity, making it difficult for organizations to understand and manage component dependencies. Bills of Materials (BOMs) have emerged as a structured way to document product components, their interrelationships, and key metadata, improving visibility and security across digital supply chains. This survey provides the first comprehensive cross-domain review of BOM developments and practices. We start by examining the evolution of BOM frameworks in three stages (i.e., pre-development, initial, and accelerated) and summarizing their core principles, key stakeholders, and standardization efforts for hardware, software, artificial intelligence (AI) models, datasets, and cryptographic assets. We then review industry practices for generating BOM data, evaluating its quality, and securely sharing it. Next, we review practical downstream uses of BOM data, including dependency modeling, compliance verification, operational risk assessment, and vulnerability tracking. We also discuss academic efforts to address limitations in current BOM frameworks through refinements, extensions, or new models tailored to emerging domains such as data ecosystems and AI supply chains. Finally, we identify four key gaps that limit the usability and reliability of today's BOM frameworks, motivating future research directions.
Authors:Seong-Gyu Park, Sohee Park, Jisu Lee, Hyunsik Na, Daeseon Choi
Abstract:
Recent LLMs increasingly integrate reasoning mechanisms like Chain-of-Thought (CoT). However, this explicit reasoning exposes a new attack surface for inference-time backdoors, which inject malicious reasoning paths without altering model parameters. Because these attacks generate linguistically coherent paths, they effectively evade conventional detection. To address this, we propose STAR (State-Transition Amplification Ratio), a framework that detects backdoors by analyzing output probability shifts. STAR exploits the statistical discrepancy where a malicious input-induced path exhibits high posterior probability despite a low prior probability in the model's general knowledge. We quantify this state-transition amplification and employ the CUSUM algorithm to detect persistent anomalies. Experiments across diverse models (8B-70B) and five benchmark datasets demonstrate that STAR exhibits robust generalization capabilities, consistently achieving near-perfect performance (AUROC $\approx$ 1.0) with approximately $42\times$ greater efficiency than existing baselines. Furthermore, the framework proves robust against adaptive attacks attempting to bypass detection.
Authors:Keyang Zhang, Zeyu Chen, Xuan Feng, Dongliang Fang, Yaowen Zheng, Zhi Li, Limin Sun
Abstract:
The security of scripting languages such as PowerShell is critical given their powerful automation and administration capabilities, often exercised with elevated privileges. Today, securing these languages still demands substantial human effort to craft and enforce rules, imposing heavy burdens on typical administrators and creating critical production risks (e.g., misoperations that shut down servers).Large language models (LLMs) have demonstrated strong capabilities in code generation, vulnerability detection, and automated repair for languages like Python and JavaScript. However, their ability to assist with generating secure scripting-language code remains largely underexplored. In this paper, we present SecGenEval-PS, a benchmark designed to systematically evaluate LLMs on secure scripting generation, security analysis, and automated repair. Our results show that both proprietary and open-source models fall short in these areas. For instance, over 60% of PowerShell scripts produced by GPT-4o and o3-mini are insecure without structured guidance.To bridge this gap, we propose PSSec, a framework that combines data synthesis with fine-tuning to enhance model security capabilities. We develop a self-debugging agent that integrates static analyzers with the reasoning abilities of advanced LLMs to synthesize large-scale structured triplets of insecure scripts, violation analyses, and corresponding repairs. We then fine-tune lightweight LLMs (as small as 1.7B parameters) using supervised fine-tuning (SFT) and reinforcement learning (RL), enabling security-aware reasoning and the generation of secure PowerShell code.Across multiple LLM families, including GPT and Qwen, \textit{PSSec}-trained models match or surpass general-purpose large models on PowerShell security tasks while reducing inference cost by more than an order of magnitude.
Authors:Vijayanta Jain, Sepideh Ghanavati, Sai Teja Peddinti, Collin McMillan
Abstract:
Privacy captions are short sentences that succinctly describe what personal information is used, how it is used, and why, within an app. These captions can be utilized in various notice formats, such as privacy policies, app rationales, and app store descriptions. However, inaccurate captions may mislead users and expose developers to regulatory fines. Existing approaches to generating privacy notices or just privacy captions include using questionnaires, templates, static analysis, or machine learning. However, these approaches either rely heavily on developers' inputs and thus strain their efforts, use limited source code context, leading to the incomplete capture of app privacy behaviors, or depend on potentially inaccurate privacy policies as a source for creating notices. In this work, we address these limitations by developing Privacy Caption Generator (PCapGen), an approach that - i) automatically identifies and extracts large and precise source code context that implements privacy behaviors in an app, ii) uses a Large Language Model (LLM) to describe coarse- and fine-grained privacy behaviors, and iii) generates accurate, concise, and complete privacy captions to describe the privacy behaviors of the app. Our evaluation shows PCapGen generates concise, complete, and accurate privacy captions as compared to the baseline approach. Furthermore, privacy experts choose PCapGen captions at least 71\% of the time, whereas LLMs-as-judge prefer PCapGen captions at least 76\% of the time, indicating strong performance of our approach.
Authors:Dinesh Srivasthav P, Ashok Urlana, Rahul Mishra, Bala Mallikarjunarao Garlapati, Ponnurangam Kumaraguru
Abstract:
Machine unlearning aims to selectively remove the influence of specific training samples to satisfy privacy regulations such as the GDPR's 'Right to be Forgotten'. However, many existing methods require access to the data being removed, exposing it to membership inference attacks and potential misuse of Personally Identifiable Information (PII). We address this critical challenge by proposing Shadow Unlearning, a novel paradigm of approximate unlearning, that performs machine unlearning on anonymized forget data without exposing PII. We further propose a novel privacy-preserving framework, Neuro-Semantic Projector Unlearning (NSPU) to achieve Shadow unlearning. To evaluate our method, we compile Multi-domain Fictitious Unlearning (MuFU) forget set across five diverse domains and introduce an evaluation stack to quantify the trade-off between knowledge retention and unlearning effectiveness. Experimental results on various LLMs show that NSPU achieves superior unlearning performance, preserves model utility, and enhances user privacy. Additionally, the proposed approach is at least 10 times more computationally efficient than standard unlearning approaches. Our findings foster a new direction for privacy-aware machine unlearning that balances data protection and model fidelity.
Authors:Zhixin Liu, Xuanlin Liu, Sihan Xu, Yaqiong Qiao, Ying Zhang, Xiangrui Cai
Abstract:
Existing backdoor attacks on multivariate time series (MTS) forecasting enforce strict temporal and dimensional coupling between triggers and target patterns, requiring synchronous activation at fixed positions across variables. However, realistic scenarios often demand delayed and variable-specific activation. We identify this critical unmet need and propose TDBA, a temporally decoupled backdoor attack framework for MTS forecasting. By injecting triggers that encode the expected location of the target pattern, TDBA enables the activation of the target pattern at any positions within the forecasted data, with the activation position flexibly varying across different variable dimensions. TDBA introduces two core modules: (1) a position-guided trigger generation mechanism that leverages smoothed Gaussian priors to generate triggers that are position-related to the predefined target pattern; and (2) a position-aware optimization module that assigns soft weights based on trigger completeness, pattern coverage, and temporal offset, facilitating targeted and stealthy attack optimization. Extensive experiments on real-world datasets show that TDBA consistently outperforms existing baselines in effectiveness while maintaining good stealthiness. Ablation studies confirm the controllability and robustness of its design.
Authors:Stanly Wilson, Kwabena Adu-Duodu, Yinhao Li, Ellis Solaiman, Omer Rana, Rajiv Ranjan
Abstract:
Trust between entities in any scenario without a trusted third party is very difficult, and trust is exactly what blockchain aims to bring into the digital world with its basic features. Many applications are moving to blockchain adoption, enabling users to work in a trustworthy manner. The early generations of blockchain have a problem; they cannot share information with other blockchains. As more and more entities move their applications to the blockchain, they generate large volumes of data, and as applications have become more complex, sharing information between different blockchains has become a necessity. This has led to the research and development of interoperable solutions allowing blockchains to connect together. This paper discusses a few blockchain platforms that provide interoperable solutions, emphasising their ability to connect heterogeneous blockchains. It also discusses a case study scenario to illustrate the importance and benefits of using interoperable solutions. We also present a few topics that need to be solved in the realm of interoperability.
Authors:Taufiq Islam Protick, Sai Teja Peddinti, Nina Taft, Anupam Das
Abstract:
Being able to understand the security and privacy (S&P) concerns of IoT users brings benefits to both developers and users. To learn about users' views, we examine Amazon IoT reviews - one of the biggest IoT markets. This work presents a state-of-the-art methodology to identify and categorize reviews in which users express S&P concerns. We developed an automated pipeline by fine-tuning GPT-3.5-Turbo to build two models: the Classifier-Rationalizer-Categorizer and the Thematic Mapper. By leveraging dynamic few-shot prompting and the model's large context size, our pipeline achieved over 97% precision and recall, significantly outperforming keyword-based and classical ML methods. We applied our pipeline to 91K Amazon reviews about fitness trackers, smart speakers and cameras, over multiple years. We found that on average 5% contained S&P concerns, while security camera exhibited the highest prevalence at 10%. Our method detected significantly more S&P-relevant reviews than prior works: 15x more for fitness trackers, 29% more for smart speakers, and 70% more for cameras. Our longitudinal analysis reveals that concerns like surveillance and data control have persisted for years, suggesting limited industry progress. We demonstrate that across all device types, users consistently demand more precise control over what data is collected and shared. We uncover challenges in multi-user and multi-device interactions, identifying two previously unreported themes concerning inadequate controls for account separation and data access. These findings, ranging from broad persistent trends to specific instances of customer loss, offer actionable insights for developers to improve user satisfaction and trust.
Authors:Sarisht Wadhwa, Aviv Yaish, Fan Zhang, Kartik Nayak
Abstract:
Modern blockchains increasingly rely on parallel execution to improve throughput. We show several industry and academic transaction fee mechanisms (TFMs) struggle to simultaneously account for execution parallelism while remaining performant and fair. First, if parallelism affects fees, adversarial protocol manipulations that offset possible benefits to throughput by introducing fake transactions become rational: users can insert functionally useless parallel transactions solely to reduce fees, and schedulers can create useless sequential transactions to increase revenue. Execution contingency, a core feature of expressive programming languages, both exacerbates the aforementioned threats and introduces new ones: (1) users may overpay for unused resources, and (2) scheduler revenue is harmed when reserved scheduling slots go unused due to contingency. We introduce a framework for this challenging setting, and prove an impossibility, highlighting an inherent tension: both parallelism and contingency involve a trade-off between minimizing risks for users and schedulers, as favoring one comes at the expense of the other. To complete the picture, we introduce a fee mechanisms and prove that they achieve the boundaries of this trade-off. Our results provide rigorous foundations for evaluating designs advanced by notable blockchains, such as Sui and Monad.
Authors:Yunhao Yao, Zhiqiang Wang, Ruiqi Li, Haoran Cheng, Puhan Luo, Xiangyang Li
Abstract:
As the Internet of Things (IoT) becomes deeply embedded in daily life, users are increasingly concerned about privacy leakage, especially from video data. Since frame-by-frame protection in large-scale video analytics (e.g., smart communities) introduces significant latency, a more efficient solution is to selectively protect frames containing privacy objects (e.g., faces). Existing object detectors require fully decoded videos or per-frame processing in compressed videos, leading to decoding overhead or reduced accuracy. Therefore, we propose ComPrivDet, an efficient method for detecting privacy objects in compressed video by reusing I-frame inference results. By identifying the presence of new objects through compressed-domain cues, ComPrivDet either skips P- and B-frame detections or efficiently refines them with a lightweight detector. ComPrivDet maintains 99.75% accuracy in private face detection and 96.83% in private license plate detection while skipping over 80% of inferences. It averages 9.84% higher accuracy with 75.95% lower latency than existing compressed-domain detection methods.
Authors:Zhaoting Gong, Ran Ran, Fan Yao, Wujie Wen
Abstract:
Fully Homomorphic Encryption (FHE) enables privacy-preserving Transformer inference, but long-sequence encrypted Transformers quickly exceed single-GPU memory capacity because encoded weights are already large and encrypted activations grow rapidly with sequence length. Multi-GPU execution therefore becomes unavoidable, yet scaling remains challenging because communication is jointly induced by application-level aggregation and encryption-level RNS coupling. Existing approaches either synchronize between devices frequently or replicate encrypted tensors across devices, leading to excessive communication and latency. We present AEGIS, an Application-Encryption Guided Inference System for scalable long-sequence encrypted Transformer inference on multi-GPU platforms. AEGIS derives device placement from ciphertext dependencies jointly induced by Transformer dataflow and CKKS polynomial coupling, co-locating modulus-coherent and token-coherent data so that communication is introduced only when application dependencies require it, while reordering polynomial operators to overlap the remaining collectives with computation. On 2048-token inputs, AEGIS reduces inter-GPU communication by up to 57.9% in feed-forward networks and 81.3% in self-attention versus prior state-of-the-art designs. On four GPUs, it achieves up to 96.62% scaling efficiency, 3.86x end-to-end speedup, and 69.1% per-device memory reduction. These results establish coordinated application-encryption parallelism as a practical foundation for scalable homomorphic Transformer inference.
Authors:Noor Khalal, Chakib Fettal, Lazhar Labiod, Mohamed Nadif
Abstract:
Machine-learning-based code vulnerability detection (CVD) has progressed rapidly, from deep program representations to pretrained code models and LLM-centered pipelines. Yet dependable vulnerability labeling remains expensive, noisy, and uneven across projects, languages, and CWE types, motivating approaches that reduce reliance on human labeling. This survey maps these approaches, synthesizing five paradigm families and the mechanisms they use. It connects mechanisms to token, graph, hybrid, and knowledgebased representations, and consolidates evaluation and reporting axes that limit comparison (label-budget specification, compute/cost assumptions, leakage, and granularity mismatches). A Design Map and constraintfirst Decision Guide distill trade-offs and failure modes for practical method selection.
Authors:Ioannis Karyotakis, Foivos Timotheos Proestakis, Evangelos Talos, Diomidis Spinellis, Nikolaos Alexopoulos
Abstract:
Mobile messaging apps are a fundamental communication infrastructure, used by billions of people every day to share information, including sensitive data. Security and Privacy are thus critical concerns for such applications. Although the cryptographic protocols prevalent in messaging apps are generally well studied, other relevant implementation characteristics of such apps, such as their software architecture, permission use, and network-related runtime behavior, have not received enough attention. In this paper, we present a methodology for comparing implementation characteristics of messaging applications by employing static and dynamic analysis under reproducible scenarios to identify discrepancies with potential security and privacy implications. We apply this methodology to study the Android clients of the Meta Messenger, Signal, and Telegram apps. Our main findings reveal discrepancies in application complexity, attack surface, and network behavior. Statically, Messenger presents the largest attack surface and the highest number of static analysis warnings, while Telegram requests the most dangerous permissions. In contrast, Signal consistently demonstrates a minimalist design with the fewest dependencies and dangerous permissions. Dynamically, these differences are reflected in network activity; Messenger is by far the most active, exhibiting persistent background communication, whereas Signal is the least active. Furthermore, our analysis shows that all applications properly adhere to the Android permission model, with no evidence of unauthorized data access.
Authors:Ryan Babbush, Adam Zalcman, Craig Gidney, Michael Broughton, Tanuj Khattar, Hartmut Neven, Thiago Bergamaschi, Justin Drake, Dan Boneh
Abstract:
This whitepaper seeks to elucidate implications that the capabilities of developing quantum architectures have on blockchain vulnerabilities and mitigation strategies. First, we provide new resource estimates for breaking the 256-bit Elliptic Curve Discrete Logarithm Problem, the core of modern blockchain cryptography. We demonstrate that Shor's algorithm for this problem can execute with either <1200 logical qubits and <90 million Toffoli gates or <1450 logical qubits and <70 million Toffoli gates. In the interest of responsible disclosure, we use a zero-knowledge proof to validate these results without disclosing attack vectors. On superconducting architectures with 1e-3 physical error rates and planar connectivity, those circuits can execute in minutes using fewer than half a million physical qubits. We introduce a critical distinction between fast-clock (such as superconducting and photonic) and slow-clock (such as neutral atom and ion trap) architectures. Our analysis reveals that the first fast-clock CRQCs would enable on-spend attacks on public mempool transactions of some cryptocurrencies. We survey major cryptocurrency vulnerabilities through this lens, identifying systemic risks associated with advanced features in some blockchains such as smart contracts, Proof-of-Stake consensus, and Data Availability Sampling, as well as the enduring concern of abandoned assets. We argue that technical solutions would benefit from accompanying public policy and discuss various frameworks of digital salvage to regulate the recovery or destruction of dormant assets while preventing adversarial seizure. We also discuss implications for other digital assets and tokenization as well as challenges and successful examples of the ongoing transition to Post-Quantum Cryptography (PQC). Finally, we urge all vulnerable cryptocurrency communities to join the ongoing migration to PQC without delay.
Authors:Di Wu, Yuman Bai, Shoupeng Ren, Xinyu Zhang, Yiyue Cao, Xuechao Wang, Wu Wen, Jian Liu
Abstract:
Centralized stablecoins such as USDT and USDC enforce financial sanctions through contract-layer blacklist functions, yet on public blockchains a freeze is merely an ordinary transaction that must compete for execution priority. We identify a fundamental gap between contract-layer authority and consensus-layer enforcement: when a sanctioned entity's transfer and the issuer's freeze race for inclusion in the same block, the outcome is determined not by regulatory mandate but by the economically motivated ordering decisions of block producers. We term the resulting value extraction Sanction-Evasion MEV (SE-MEV). To quantify this vulnerability, we construct the first comprehensive dataset of on-chain sanctions enforcement and evasion for Ethereum-based USDC and USDT (Nov 2017-Aug 2025), covering over $1.5 billion in frozen assets. We find that 7.3% of sanctioned USDT addresses and 18.7% of sanctioned USDC addresses were drained to zero balances before enforcement took effect, and document a clear escalation trajectory-from issuer-side out-of-gas failures, to public gas auctions, to private order flow, to direct proposer bribery. We further develop a game-theoretic model that yields three results: (i) compliant issuers cannot rationally stay outside the MEV market; (ii) fixed participation costs concentrate evasion among specialized, MEV-aware actors; and (iii) the implicit MEV tax extracted by block proposers grows without bound as regulatory penalties intensify, creating structural incentives for issuers to vertically integrate into block-building infrastructure. Our findings demonstrate that on any blockchain where ordering power is allocated by economic incentives, ordering power is sanctioning power-and contract-level authority alone cannot guarantee enforcement.
Authors:Xinyuan Zhu, Zekun Fei, Enye Wang, Ruiqi He, Zheli Liu
Abstract:
Retrieval-Augmented Generation (RAG) enhances the utility of Large Language Models (LLMs) by retrieving external documents. Since the knowledge databases in RAG are predominantly utilized via cloud services, private data in sensitive domains such as finance and healthcare faces the risk of personal information leakage. Thus, effectively anonymizing knowledge bases is crucial for privacy preservation. Existing studies equate the privacy risk of text to the linear superposition of the privacy risks of individual, isolated sensitive entities. The "one-size-fits-all" full processing of all sensitive entities severely degrades utility of LLM. To address this issue, we introduce a dynamic anonymization framework named TRIP-RAG. Based on context-aware entity quantification, this framework evaluates entities from the perspectives of marginal privacy risk, knowledge divergence, and topical relevance. It identifies highly sensitive entities while trading off utility, providing a feasible approach for variable-intensity privacy protection scenarios. Our theoretical analysis and experiments indicate that TRIP-RAG can effectively reduce context inference risks. Extensive experimental results demonstrate that, while maintaining privacy protection comparable to full anonymization, TRIP-RAG's Recall@k decreases by less than 35% compared to the original data, and the generation quality improves by up to 56% over existing baselines.
Authors:Penghui Liu, Yi Niu, Xiaoxiong Zhong, Jiahui Wu, Weizhe Zhang, Kaiping Xue, Bin Xiao
Abstract:
With the rapid evolution of the Industrial Internet of Things (IIoT), the boundaries and scale of the Internet are continuously expanding. Consequently, the limitations of traditional certificate-based Public Key Infrastructure (PKI) have become increasingly evident, particularly in scenarios requiring large-scale certificate storage, verification, and frequent transmission. These challenges are expected to be further amplified by the widespread adoption of post-quantum cryptography. In this paper, we propose a novel identity-based public key management framework for PKI based on post-quantum cryptography, termed \textit{IPK-pq}. This approach implements an identity key generation protocol leveraging NIST ML-DSA and random matrix theory. Building on the concept of the Composite Public Key (CPK), \textit{IPK-pq} addresses the linear collusion problem inherent in CPK through an enhanced identity mapping mechanism. Furthermore, it simplifies the verification of the declared public key's authenticity, effectively reducing the complexity associated with certificate-based key management. We also provide a formal security proof for \textit{IPK-pq}, covering both individual private key components and the composite private key. To validate our approach, formally, we directly implement and evaluate \textit{IPK-pq} within a typical PKI application scenario: Resource PKI (RPKI). Comparative experimental results demonstrate that an RPKI system based on \textit{IPK-pq} yields significant improvements in efficiency and scalability. These results validate the feasibility and rationality of \textit{IPK-pq}, positioning it as a strong candidate for next-generation RPKI systems capable of securely managing large-scale routing information.
Authors:Joseph G. Zalameda, Megan A. Witherow, Alexander M. Glandon, Jose Aguilera, Khan M. Iftekharuddin
Abstract:
Machine learning models trained on small data sets for security applications are especially vulnerable to adversarial attacks. Person identification from LiDAR based skeleton data requires time consuming and expensive data acquisition for each subject identity. Recently, Assessment and Augmented Identity Recognition for Skeletons (AAIRS) has been used to train Hierarchical Co-occurrence Networks for Person Identification (HCN-ID) with small LiDAR based skeleton data sets. However, AAIRS does not evaluate robustness of HCN-ID to adversarial attacks or inoculate the model to defend against such attacks. Popular perturbation-based approaches to generating adversarial attacks are constrained to targeted perturbations added to real training samples, which is not ideal for inoculating models with small training sets. Thus, we propose Attack-AAIRS, a novel addition to the AAIRS framework. Attack-AAIRS leverages a small real data set and a GAN generated synthetic data set to assess and improve model robustness against unseen adversarial attacks. Rather than being constrained to perturbations of limited real training samples, the GAN learns the distribution of adversarial attack samples that exploit weaknesses in HCN-ID. Attack samples drawn from this distribution augment training for inoculation of the HCN-ID to improve robustness. Ten-fold cross validation of Attack-AAIRS yields increased robustness to unseen attacks- including FGSM, PGD, Additive Gaussian Noise, MI-FGSM, and BIM. The HCN-ID Synthetic Data Quality Score for Attack-AAIRS indicates that generated attack samples are of similar quality to the original benign synthetic samples generated by AAIRS. Furthermore, inoculated models show consistent final test accuracy with the original model trained on real data, demonstrating that our method improves robustness to adversarial attacks without reducing test performance on real data.
Authors:Michele Battagliola, Anna-Lena Horlemann, Abhinaba Mazumder, Rocco Mora, Paolo Santini, Michael Schaller, Violetta Weger
Abstract:
Given two linear codes, the Linear Equivalence Problem (LEP) asks to find (if it exists) a linear isometry between them; as a special case, we have the Permutation Equivalence Problem (PEP), in which isometries must be permutations. LEP and PEP have recently gained renewed interest as the security foundations for several post-quantum schemes, including LESS. A recent paper has introduced the use of the Schur product to solve PEP, identifying many new easy-to-solve instances. In this paper, we extend this result to LEP. In particular, we generalize the approach and rely on the more general notion of power codes. Combining it with Frobenius automorphisms and Hermitian hulls, we identify many classes of easy LEP instances. To the best of our knowledge, this is the first work exploiting algebraic weaknesses for LEP. Finally we show an improved reduction to PEP whenever the coefficients of the monomial matrix are in a subgroup of the multiplicative group of the finite field.
Authors:Krzysztof Kotowski, Ramez Shendy, Jakub Nalepa, Agata Kaczmarek, Dawid Płudowski, Piotr Wilczyński, Artur Janicki, Przemysław Biecek, Ambros Marzetta, Atul Pande, Lalit Chandra Routhu, Swapnil Srivastava, Evridiki Ntagiou
Abstract:
Forecasting plays a crucial role in modern safety-critical applications, such as space operations. However, the increasing use of deep forecasting models introduces a new security risk of trojan horse attacks, carried out by hiding a backdoor in the training data or directly in the model weights. Once implanted, the backdoor is activated by a specific trigger pattern at test time, causing the model to produce manipulated predictions. We focus on this issue in our \textit{Trojan Horse Hunt} data science competition, where more than 200 teams faced the task of identifying triggers hidden in deep forecasting models for spacecraft telemetry. We describe the novel task formulation, benchmark set, evaluation protocol, and best solutions from the competition. We further summarize key insights and research directions for effective identification of triggers in time series forecasting models. All materials are publicly available on the official competition webpage https://www.kaggle.com/competitions/trojan-horse-hunt-in-space.
Authors:Mahyar Karimi, K. S. Thejaswini, Roderick Bloem, Thomas A. Henzinger
Abstract:
In traditional runtime verification, a system is typically observed by a monolithic monitor. Enforcing privacy in such settings is computationally expensive, as it necessitates heavy cryptographic primitives. Therefore, privacy-preserving monitoring remains impractical for real-time applications. In this work, we address this scalability challenge by distributing the monitor across multiple parties -- at least one of which is honest. This architecture enables the use of efficient secret-sharing schemes instead of computationally intensive cryptography, dramatically reducing over-head while maintaining strong privacy guarantees. While existing secret-sharing approaches are typically limited to one-shot executions which do not maintain an internal state, we introduce a protocol tailored for continuous monitoring that supports repeated evaluations over an evolving internal state (kept secret from the system and the monitoring entities). We implement our approach using the MP-SPDZ framework. Our experiments demonstrate that, under these architectural assumptions, our protocol is significantly more scalable than existing alternatives.
Authors:Yijia Guo, Junqing Zhang, Yao-Win Peter Hong
Abstract:
Wireless networks are highly vulnerable to spoofing attacks, especially when attackers transmit consecutive spoofing packets. Conventional physical layer authentication (PLA) methods have mostly focused on single-packet spoofing attack. However, under consecutive spoofing attacks, they become ineffective due to channel evolution caused by device mobility and channel fading. To address this challenge, we propose a channel prediction-based PLA framework. Specifically, a Transformer-based channel prediction module is employed to predict legitimate CSI measurements during spoofing interval, and the input of channel prediction module is adaptively updated with predicted or observed CSI measurements based on the authentication decision to ensure robustness against sustained spoofing. Simulation results under Rayleigh fading channels demonstrate that the proposed approach achieves low prediction error and significantly higher authentication accuracy than conventional benchmark, maintaining robustness even under extended spoofing attacks.
Authors:Raphael Simon, José Carrasquel, Wim Mees, Pieter Libin
Abstract:
Penetration testing, the practice of simulating cyberattacks to identify vulnerabilities, is a complex sequential decision-making task that is inherently partially observable and features large action spaces. Training reinforcement learning (RL) policies for this domain faces a fundamental bottleneck: existing simulators are too slow to train on realistic network scenarios at scale, resulting in policies that fail to generalize. We present NASimJax, a complete JAX-based reimplementation of the Network Attack Simulator (NASim), achieving up to 100x higher environment throughput than the original simulator. By running the entire training pipeline on hardware accelerators, NASimJax enables experimentation on larger networks under fixed compute budgets that were previously infeasible. We formulate automated penetration testing as a Contextual POMDP and introduce a network generation pipeline that produces structurally diverse and guaranteed-solvable scenarios. Together, these provide a principled basis for studying zero-shot policy generalization. We use the framework to investigate action-space scaling and generalization across networks of up to 40 hosts. We find that Prioritized Level Replay better handles dense training distributions than Domain Randomization, particularly at larger scales, and that training on sparser topologies yields an implicit curriculum that improves out-of-distribution generalization, even on topologies denser than those seen during training. To handle linearly growing action spaces, we propose a two-stage action decomposition (2SAS) that substantially outperforms flat action masking at scale. Finally, we identify a failure mode arising from the interaction between Prioritized Level Replay's episode-reset behaviour and 2SAS's credit assignment structure. NASimJax thus provides a fast, flexible, and realistic platform for advancing RL-based penetration testing.
Authors:Peipei Xie, Siwei Chen, Zejun Xiang, Shasha Zhang, Xiangyong Zeng
Abstract:
At SAC 2013, Berger et al. first proposed the Extended Generalized Feistel Networks (EGFN) structure for the design of block ciphers with efficient diffusion. Later, based on the Type-2 EGFN, they instantiated a new lightweight block cipher named Lilliput (published in IEEE Transactions on Computers, Vol. 65, Issue 7, 2016). According to published cryptanalysis results, Lilliput is sufficiently secure against theoretical attacks such as differential, linear, boomerang, and integral attacks, which rely on the statistical properties of plaintext and ciphertext. However, there is a lack of analysis regarding its resistance to physical attacks in real-world scenarios, such as fault attacks. In this paper, we present the first systematic differential fault analysis (DFA) of Lilliput under three nibble-oriented fault models with progressively relaxed adversarial assumptions to comprehensively assess its fault resilience. In Model I (multi-round fixed-location), precise fault injections at specific rounds recover the master key with a 98% success rate using only 8 faults. Model II (single-round fixed-location) relaxes the multi-round requirement, demonstrating that 8 faults confined to a single round are still sufficient to achieve a 99% success rate by exploiting Lilliput's diffusion properties and DDT-based constraints. Model III (single-round random-location) further weakens the assumption by allowing faults to occur randomly among the eight rightmost branches of round 27. By uniquely identifying the fault location from ciphertext differences with high probability, the attack remains highly feasible, achieving over 99% success with 33 faults and exceeding 99.5% with 36 faults. Our findings reveal a significant vulnerability of Lilliput to practical fault attacks across different adversary capabilities in real-world scenarios, providing crucial insights for its secure implementation.
Authors:Toan Tran, Olivera Kotevska, Li Xiong
Abstract:
Membership inference attacks (MIAs), which enable adversaries to determine whether specific data points were part of a model's training dataset, have emerged as an important framework to understand, assess, and quantify the potential information leakage associated with machine learning systems. Designing effective MIAs is a challenging task that usually requires extensive manual exploration of model behaviors to identify potential vulnerabilities. In this paper, we introduce AutoMIA -- a novel framework that leverages large language model (LLM) agents to automate the design and implementation of new MIA signal computations. By utilizing LLM agents, we can systematically explore a vast space of potential attack strategies, enabling the discovery of novel strategies. Our experiments demonstrate AutoMIA can successfully discover new MIAs that are specifically tailored to user-configured target model and dataset, resulting in improvements of up to 0.18 in absolute AUC over existing MIAs. This work provides the first demonstration that LLM agents can serve as an effective and scalable paradigm for designing and implementing MIAs with SOTA performance, opening up new avenues for future exploration.
Authors:Yue Zhao, Yujia Gong, Ruigang Liang, Shenchen Zhu, Kai Chen, Xuejing Yuan, Wangjun Zhang
Abstract:
The widespread deployment of large language models (LLMs) calls for post-hoc methods that can flexibly adapt models to evolving safety requirements. Meanwhile, the rapidly expanding open-source LLM ecosystem has produced a diverse collection of models that already exhibit various safety-related functionalities. This motivates a shift from constructing safety functionality from scratch to reusing existing functionality from external models, thereby avoiding costly data collection and training procedures. In this paper, we present Cross-Model Neuron Transfer (CNT), a post-hoc method that reuses safety-oriented functionality by transferring a minimal subset of neurons from an open-source donor LLM to a target LLM. By operating at the neuron level, CNT enables modular function-level adaptation, supporting both function addition andfunction deletion. We evaluate CNT on seven popular LLMs across three representative applications: safety disalignment, alignment enhancement, and bias removal. Experimental results show that CNT achieves targeted safety-oriented functionality transfer with minimal performance degradation (less than 1% for most models), consistently outperforming five baselines, demonstrating its generality and practical effectiveness.
Authors:Lingyun Zhang, Yu Xie, Ping Chen
Abstract:
The nature of personalized text-to-image models poses a unique safety challenge that generic context-blind methods are ill-equipped to handle. Such global filters create a dilemma: to prevent misuse, they are forced to damage the model's broader utility by erasing concepts entirely, causing unacceptable collateral damage.Our work presents a more precisely targeted approach, built on the principle that security should be as context-aware as the threat itself, intrinsically bound to the personalized concept. We present IDENTITYGUARD, which realizes this principle through a conditional restriction that blocks harmful content only when combined with the personalized identity, and a concept-specific watermark for precise traceability. Experiments show our approach prevents misuse while preserving the model's utility and enabling robust traceability. By moving beyond blunt, global filters, our work demonstrates a more effective and responsible path toward AI safety.
Authors:Arit Kumar Bishwas, Mousumi Sen, Albert Nieto-Morales, Joel Jacob Varghese
Abstract:
As agentic artificial intelligence systems scale across globally distributed and long lived infrastructures, secure and policy compliant communication becomes a fundamental systems challenge. This challenge grows more serious in the quantum era, where the cryptographic assumptions built into today's AI deployments may not remain valid over their operational lifetime. Here, we introduce quantum secure by construction, or QSC, as a design paradigm that treats quantum secure communication as a core architectural property of agentic AI systems rather than an upgrade added later. We realize QSC through a runtime adaptive security model that combines post quantum cryptography, quantum random number generation, and quantum key distribution to secure interactions among autonomous agents operating across heterogeneous cloud, edge, and inter organizational environments. The approach is cryptographically pluggable and guided by policy, allowing the system to adjust its security posture according to infrastructure availability, regulatory constraints, and performance needs. QSC contributes a governance aware orchestration layer that selects and combines link specific cryptographic protections across the full agent lifecycle, including session bootstrap, inter agent coordination, tool invocation, and memory access. Through system level analysis and empirical evaluation, we examine the trade offs between classical and quantum secure mechanisms and show that QSC can reduce the operational complexity and cost of introducing quantum security into deployed agentic AI systems. These results position QSC as a foundational paradigm for post quantum agentic intelligence and establish a principled pathway for designing globally interoperable, resilient, and future ready intelligent systems.
Authors:Ruoxi Cheng, Yizhong Ding, Hongyi Zhang, Yiyan Huang
Abstract:
Contrastive pretraining models such as CLIP and CLAP underpin many vision-language and audio-language systems, yet their reliance on web-scale data raises growing concerns about memorizing Personally Identifiable Information (PII). Auditing such models via membership inference is challenging in practice: shadow-model MIAs are computationally prohibitive for large multimodal backbones, and existing multimodal attacks typically require querying the target with paired biometric inputs, thereby directly exposing sensitive biometric information to the target model. We propose Unimodal Membership Inference Detector (UMID), a text-only auditing framework that performs text-guided cross-modal latent inversion and extracts two complementary signals, similarity (alignment to the queried text) and variability (consistency across randomized inversions). UMID compares these statistics to a lightweight non-member reference constructed from synthetic gibberish and makes decisions via an ensemble of unsupervised anomaly detectors. Comprehensive experiments across diverse CLIP and CLAP architectures demonstrate that UMID significantly improves the effectiveness and efficiency over prior MIAs, delivering strong detection performance with sub-second auditing cost while complying with realistic privacy constraints.
Authors:Zakia Zaman, Praveen Gauravaram, Mahbub Hassan, Sanjay Jha, Wen Hu
Abstract:
The rapid proliferation of the Internet of Things has intensified demand for robust privacy-preserving machine learning mechanisms to safeguard sensitive data generated by large-scale, heterogeneous, and resource-constrained devices. Unlike centralized environments, IoT ecosystems are inherently decentralized, bandwidth-limited, and latency-sensitive, exposing privacy risks across sensing, communication, and distributed training pipelines. These characteristics render conventional anonymization and centralized protection strategies insufficient for practical deployments. This survey presents a comprehensive IoT-centric, cross-paradigm analysis of privacy-preserving machine learning. We introduce a structured taxonomy spanning perturbation-based mechanisms such as differential privacy, distributed paradigms such as federated learning, cryptographic approaches including homomorphic encryption and secure multiparty computation, and generative synthesis techniques based on generative adversarial networks. For each paradigm, we examine formal privacy guarantees, computational and communication complexity, scalability under heterogeneous device participation, and resilience against threats including membership inference, model inversion, gradient leakage, and adversarial manipulation. We further analyze deployment constraints in wireless IoT environments, highlighting trade-offs between privacy, communication overhead, model convergence, and system efficiency within next-generation mobile architectures. We also consolidate evaluation methodologies, summarize representative datasets and open-source frameworks, and identify open challenges including hybrid privacy integration, energy-aware learning, privacy-preserving large language models, and quantum-resilient machine learning.
Authors:Maximilian Wendlinger, Daniel Kowatsch, Konstantin Böttinger, Philip Sperl
Abstract:
Large Language Models (LLMs) show remarkable capabilities in understanding natural language and generating complex code. However, as practitioners adopt CodeLLMs for increasingly critical development tasks, research reveals that these models frequently generate functionally correct yet insecure code, posing significant security risks. While multiple approaches have been proposed to improve security in AI-based code generation, combined benchmarks show these methods remain insufficient for practical use, achieving only limited improvements in both functional correctness and security. This stems from a fundamental gap in understanding the internal mechanisms of code generation and the root causes of security vulnerabilities, forcing researchers to rely on heuristics and empirical observations. In this work, we investigate the internal representation of security concepts in CodeLLMs, revealing that models are often aware of vulnerabilities as they generate insecure code. Through systematic evaluation, we demonstrate that CodeLLMs can distinguish between security subconcepts, enabling a more fine-grained analysis than prior black-box approaches. Leveraging these insights, we propose Secure Concept Steering for CodeLLMs (SCS-Code). During token generation, SCS-Code steers LLMs' internal representations toward secure and functional code output, enabling a lightweight and modular mechanism that can be integrated into existing code models. Our approach achieves superior performance compared to state-of-the-art methods across multiple secure coding benchmarks.
Authors:Eirik Høyheim, Magnus Wiik Eckhoff, Gudmund Grov, Robert Flood, David Aspinall
Abstract:
Machine learning backdoors have the property that the machine learning model should work as expected on normal inputs, but when the input contains a specific $\textit{trigger}$, it behaves as the attacker desires. Detecting such triggers has been proven to be extremely difficult. In this paper, we present a novel and explainable approach to detect and eliminate such backdoor triggers based on active paths found in neural networks. We present promising experimental evidence of our approach, which involves injecting backdoors into a machine learning model used for intrusion detection.
Authors:Javier Blanco-Romero, Yuri Melissa Garcia-Niño, Florina Almenares Mendoza, Daniel Díaz-Sánchez, Carlos García-Rubio, Celeste Campo
Abstract:
Embedded cryptography stands or falls on entropy quality, yet small devices have few trustworthy sources and little tolerance for heavyweight protocols. We build a Quantum Entropy as a Service (QEaaS) system that moves QRNG-derived entropy from a Quantis device to ESP32-class clients over post-quantum-secured channels. On the server side, the design exposes two paths: direct quantum entropy through a custom OpenSSL provider and mixed entropy through the Linux system pool. On the client side, we extend libcoap's Zephyr support, integrate wolfSSL-based DTLS 1.3 into the CoAP stack, and add a BLAKE2s entropy pool that preserves the standard Zephyr extraction interface while introducing an injection API for server-provided entropy. Benchmarks on ESP32 hardware, targeting 100 iterations per configuration, show that ML-KEM-512 completes a DTLS 1.3 handshake in 313 ms on average without certificate verification, 35% faster than ECDHE P-256. Pairing ML-KEM-512 with ML-DSA-44 lowers the mean to 225 ms. Certificate verification adds roughly 194 ms for ECDSA but only 17 ms for ML-DSA-44, so the fully post-quantum configuration remains 63% faster than classical ECDHE P-256 with ECDSA even under full verification. Local BLAKE2s pool operations stay below 0.1 ms combined. On this platform, post-quantum key exchange and authentication are not only feasible; they are faster than the classical baseline.
Authors:Mohammadhamed Shadbeh, Khashayar Khajavi, Tao Wang
Abstract:
Website fingerprinting (WF) attacks on Tor can infer user destinations from encrypted traffic metadata. However, their real-world effectiveness remains debated due to laboratory settings that fail to capture network fluctuations, evaluate noise, and create a representative open world. In this work, we re-examine WF from a guard-relay vantage point using a novel, privacy-preserving methodology that builds an open-world background from real, unlabeled Tor traffic paired with synthetic monitored traces. Using this methodology, we collect a large-scale dataset of over 800,000 traces. We then benchmark state-of-the-art WF attacks under a cross-network setting and show that WF remains highly effective against real Tor open-world traffic: the best-performing attack achieves 0.956 precision and 0.922 recall at a 9% base rate. We further present results that demonstrate robustness to small training sets, network jitter, and concept drift. Moreover, we show that timing-independent classifiers are significantly more robust to network variability than others. Finally, we provide the first systematic study of Tor's Conflux traffic-splitting, where we show that a guard node with a latency advantage can maintain high attack effectiveness even when traffic is split.
Authors:Sourov Jajodia, Madeena Sultana, Suryadipta Majumdar, Adrian Taylor, Grant Vandenberghe
Abstract:
Security incident analysis (SIA) poses a major challenge for security operations centers, which must manage overwhelming alert volumes, large and diverse data sources, complex toolchains, and limited analyst expertise. These difficulties intensify because incidents evolve dynamically and require multi-step, multifaceted reasoning. Although organizations are eager to adopt Large Language Models (LLMs) to support SIA, the absence of rigorous benchmarking creates significant risks for assessing their effectiveness and guiding design decisions. Benchmarking is further complicated by: (i) the lack of an LLM-ready dataset covering a wide spectrum of SIA tasks; (ii) the continual emergence of new tasks reflecting the diversity of analyst responsibilities; and (iii) the rapid release of new LLMs that must be incorporated into evaluations. In this paper, we address these challenges by introducing SIABENCH, an agentic evaluation framework for security incident analysis. First, we construct a first-of-its-kind dataset comprising two major SIA task categories: (i) deep analysis workflows for security incidents (25 scenarios) and (ii) alert-triage tasks (135 scenarios). Second, we implement an agent capable of autonomously performing a broad spectrum of SIA tasks (including network and memory forensics, malware analysis across binary/code/PDF formats, phishing email and kit analysis, log analysis, and false-alert detection). Third, we benchmark 11 major LLMs (spanning both open- and closed-weight models) on these tasks, with extensibility to support emerging models and newly added analysis scenarios.
Authors:Sadia Asif, Mohammad Mohammadi Amiri
Abstract:
Sequential multi-agent large language model (LLM) systems are increasingly deployed in sensitive domains such as healthcare, finance, and enterprise decision-making, where multiple specialized agents collaboratively process a single user request. Although individual agents may satisfy local privacy constraints, sensitive information can still be inferred through sequential composition and intermediate representations. In this work, we study \emph{compositional privacy leakage} in sequential LLM agent pipelines. We formalize leakage using mutual information and derive a theoretical bound that characterizes how locally introduced leakage can amplify across agents under sequential execution. Motivated by this analysis, we propose a privacy-regularized training framework that directly constrains information flow between agent outputs and agent-local sensitive variables. We evaluate our approach across sequential agent pipelines of varying depth on three benchmark datasets, demonstrating stable optimization dynamics and consistent, interpretable privacy-utility trade-offs. Our results show that privacy in agentic LLM systems cannot be guaranteed by local constraints alone and must instead be treated as a system-level property during both training and deployment.
Authors:Arther Tian, Alex Ding, Frank Chen, Simon Wu, Aaron Chan
Abstract:
Decentralized large language model (LLM) inference networks can pool heterogeneous compute to scale serving, but they require lightweight and incentive-compatible mechanisms to assess output quality. Prior work introduced cost-aware Proof of Quality (PoQ) and adaptive robust PoQ to allocate rewards under evaluator heterogeneity and adversarial behavior. In this paper, we focus on the quality signal itself and propose a multi-dimensional quality scoring framework that decomposes output quality into modular dimensions, including model and cost priors, structure quality, semantic quality, query-output alignment, and agreement/uncertainty. Using logged outputs from QA and summarization tasks, we systematically audit dimension reliability and show that seemingly reasonable dimensions can be task-dependent and even negatively correlated with reference quality without calibration. While the default composite underperforms a strong single semantic evaluator, ablations reveal that removing unreliable dimensions and re-normalizing weights yields a calibrated composite that matches or exceeds the best single- evaluator and consensus baselines. Finally, we integrate the composite score as a drop-in quality signal in PoQ and demonstrate complementary benefits with robust aggregation and adaptive trust weighting under adversarial evaluator attacks.
Authors:Julia B. Kieserman, Athanasios Andreou, Laura Edelson, Sandra Siby, Damon McCoy
Abstract:
Popular social media platforms TikTok, Facebook and Instagram allow third-parties to run targeted advertising campaigns on sensitive attributes in-platform. These ads are interactive by default, meaning users can comment or ``react'' (e.g., ``like'', ``love'') to them. We find that this platform-level design choice creates a privacy loophole such that advertisers can view the profiles of those who interact with their ads, thus identifying individuals that fulfill certain targeting criteria. This behavior is in contradiction to the promises made by the platforms to hide user data from advertisers. We conclude by suggesting design modifications that could provide users with transparency about the consequences of ad interaction to protect against unintentional disclosure.
Authors:Jungwon Seo, Ferhat Ozgur Catak, Chunming Rong, Jaeyeon Jang
Abstract:
Federated Inference (FI) studies how independently trained and privately owned models can collaborate at inference time without sharing data or model parameters. While recent work has explored secure and distributed inference from disparate perspectives, a unified abstraction and system-level understanding of FI remain lacking. This paper positions FI as a distinct collaborative paradigm, complementary to federated learning, and identifies two fundamental requirements that govern its feasibility: inference-time privacy preservation and meaningful performance gains through collaboration. We formalize FI as a protected collaborative computation, analyze its core design dimensions, and examine the structural trade-offs that arise when privacy constraints, non-IID data, and limited observability are jointly imposed at inference time. Through a concrete instantiation and empirical analysis, we highlight recurring friction points in privacy-preserving inference, ensemble-based collaboration, and incentive alignment. Our findings suggest that FI exhibits system-level behaviors that cannot be directly inherited from training-time federation or classical ensemble methods. Overall, this work provides a unifying perspective on FI and outlines open challenges that must be addressed to enable practical, scalable, and privacy-preserving collaborative inference systems.
Authors:Yue Li, Lei Wang, Kaixuan Wang, Zhiqiang Yang, Ke Wang, Zhi Guan, Jianbo Gao
Abstract:
The rapid proliferation of autonomous AI agents is driving a shift toward Machine-to-Machine (M2M) commerce, where software agents are expected to autonomously invoke and pay for Web 2.0 services. While Web 3.0 payments offer a programmable foundation for such interactions, the recently proposed x402 standard fails to enforce end-to-end atomicity across service execution, payment, and result delivery. In this paper, we present A402, a trust-minimized payment architecture that securely binds Web 3.0 payments to Web 2.0 services. A402 introduces Atomic Service Channels (ASCs), a new channel protocol that integrates service execution into payment channels, enabling real-time, high-frequency micropayments for M2M commerce. Within each ASC, A402 employs an atomic exchange protocol based on TEE-assisted adaptor signatures, ensuring that payments are finalized if and only if the requested service is correctly executed and the corresponding result is delivered. To further ensure privacy, A402 incorporates a TEE-based Liquidity Vault that privately manages the lifecycle of ASCs and aggregates their settlements into a single on-chain transaction, revealing only aggregated balances. We implement A402 and evaluate it against x402 with integrations on both Bitcoin and Ethereum. Our results show that A402 delivers orders-of-magnitude performance and on-chain cost improvements over x402 while providing trust-minimized security guarantees.
Authors:Javier Blanco-Romero, Florina Almenares Mendoza, Carlos García Rubio, Celeste Campo, Daniel Díaz Sánchez
Abstract:
Harvest-now, decrypt-later (HN-DL) attacks threaten today's encrypted communications by archiving ciphertext until a quantum computer can break the underlying key exchange. This paper reframes HN-DL as an economic problem, quantifying adversary costs across Transport Layer Security (TLS) 1.2, TLS 1.3, QUIC, and Secure Shell (SSH) with an open-source testbed that reproduces the full attack sequence. Our model shows that retaining intercepted traffic is economically trivial, shifting the defensive question from whether an adversary can archive to how much decryption will cost. We evaluate protocol configuration strategies that act along two independent cost axes: storage overhead and quantum workload. Beyond the ongoing migration to post-quantum cryptography, these strategies provide defense in depth with current infrastructure. Encrypted Client Hello forces indiscriminate bulk collection, inflating the archive the adversary must retain, while aggressive rekeying and larger key exchange parameters multiply the quantum computations required to recover plaintext. Because storage inflation penalizes both sides while quantum cost inflation targets the adversary alone, rekeying and key size selection offer the strongest defensive levers.
Authors:Jiazheng Quan, Xiaodong Li, Bin Wang, Guo An, Like Liu, Degen Huang, Lin Liu, Chengbin Hou
Abstract:
Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities. Existing approaches commonly suffer from two key limitations: the scarcity of high-quality security data and coarse-grained reinforcement learning reward signals. To address these challenges, we propose Vul2Safe, a new secure code generation framework that leverages LLM self-reflection to construct high-confidence repair pairs from real-world vulnerabilities, and further generates diverse implicit prompts to build the PrimeVul+ dataset. Meanwhile, we introduce SRCode, a novel training framework that pioneers the use of token-level rewards in reinforcement learning for code security, which enables the model to continuously attend to and reinforce critical fine-grained security patterns during training. Compared with traditional instance-level reward schemes, our approach allows for more precise optimization of local security implementations. Extensive experiments show that PrimeVul+ and SRCode substantially reduce security vulnerabilities in generated code while improving overall code quality across multiple benchmarks.
Authors:David Polzoni, Tommaso Bianchi, Mauro Conti
Abstract:
Quantum Key Distribution (QKD) is a foundational cryptographic protocol that ensures information-theoretic security. However, classical protocols such as BB84, though favored for their simplicity, offer limited resistance to eavesdropping, and perform poorly under realistic noise conditions. Recent research has explored the use of discrete-time Quantum Walks (QWs) to enhance QKD schemes. In this work, we specifically focus on a one-way QKD protocol, where security depends exclusively on the underlying Quantum Walk (QW) topology, rather than the details of the protocol itself. Our paper introduces a novel protocol based on QWs over a hypercube topology and demonstrates that, under identical parameters, it provides significantly enhanced security and noise resistance compared to the circular topology (i.e., state-of-the-art), thereby strengthening protection against eavesdropping. Furthermore, we introduce an efficient and extensible simulation framework for one-way QKD protocols based on QWs, supporting both circular and hypercube topologies. Implemented with IBM's software development kit for quantum computing (i.e., Qiskit), our toolkit enables noise-aware analysis under realistic noise models. To support reproducibility and future developments, we release our entire simulation framework as open-source. This contribution establishes a foundation for the design of topology-aware QKD protocols that combine enhanced noise tolerance with topologically driven security.
Authors:Kyeongpil Min, Sangmin Jeon, Jae-Jin Lee, Woojoo Lee
Abstract:
Cloud-edge AI must jointly satisfy model compression and security under tight device budgets. While Tensor-Train Decomposition (TTD) shrinks on-device models, prior selective-encryption studies largely assume dense weights, leaving its practicality under TTD compression unclear. We present TT-SEAL, a selective-encryption framework for TT-decomposed networks. TT-SEAL ranks TT cores with a sensitivity-based importance metric, calibrates a one-time robustness threshold, and uses a value-DP optimizer to encrypt the minimum set of critical cores with AES. Under TTD-aware, transfer-based threat models (and on an FPGA-prototyped edge processor) TT-SEAL matches the robustness of full (black-box) encryption while encrypting as little as 4.89-15.92% of parameters across ResNet-18, MobileNetV2, and VGG-16, and drives the share of AES decryption in end-to-end latency to low single digits (e.g., 58% -> 2.76% on ResNet-18), enabling secure, low-latency edge AI.
Authors:Lohit Daksha, Seyda Guzelhan, Kaustubh Shivdikar, Carlos Agulló Domingo, Óscar Vera Lopez, Gilbert Jonatan, Hubert Dymarkowski, Aymane El Jerari, José Cano, José L. Abellán, John Kim, David Kaeli, Ajay Joshi
Abstract:
Fully Homomorphic Encryption (FHE) enables computation directly on encrypted data but incurs massive computational and memory overheads, often exceeding plaintext execution by several orders of magnitude. While custom ASIC accelerators can mitigate these costs, their long time-to-market and the rapid evolution of FHE algorithms threaten their long-term relevance. GPUs, by contrast, offer scalability, programmability, and widespread availability, making them an attractive platform for FHE. However, modern GPUs are increasingly specialized for machine learning workloads, emphasizing low-precision datatypes (e.g., INT$8$, FP$8$) that are fundamentally mismatched to the wide-precision modulo arithmetic required by FHE. Essentially, while GPUs offer ample parallelism, their functional units, like Tensor Cores, are not suited for wide-integer modulo arithmetic required by FHE schemes such as CKKS. Despite this constraint, researchers have attempted to map FHE primitives on Tensor Cores by segmenting wide integers into low-precision (INT$8$) chunks. To overcome these bottlenecks, we propose FHECore, a specialized functional unit integrated directly into the GPU's Streaming Multiprocessor. Our design is motivated by a key insight: the two dominant contributors to latency$-$Number Theoretic Transform and Base Conversion$-$can be formulated as modulo-linear transformations. This allows them to be mapped on a common hardware unit that natively supports wide-precision modulo-multiply-accumulate operations. Our simulations demonstrate that FHECore reduces dynamic instruction count by a geometric mean of $2.41\times$ for CKKS primitives and $1.96\times$ for end-to-end workloads. These reductions translate to performance speedups of $1.57\times$ and $2.12\times$, respectively$-$including a $50\%$ reduction in bootstrapping latency$-$all while inuring a modest $2.4\%$ area overhead.
Authors:Hillel Ohayon, Daniel Gilkarov, Ran Dubin
Abstract:
Model repositories such as Hugging Face increasingly distribute machine learning artifacts serialized with Python's pickle format, exposing users to remote code execution (RCE) risks during model loading. Recent defenses, such as PickleBall, rely on per-library policy synthesis that requires complex system setups and verified benign models, which limits scalability and generalization. In this work, we propose a lightweight, machine-learning-based scanner that detects malicious Pickle-based files without policy generation or code instrumentation. Our approach statically extracts structural and semantic features from Pickle bytecode and applies supervised and unsupervised models to classify files as benign or malicious. We construct and release a labeled dataset of 727 Pickle-based files from Hugging Face and evaluate our models on four datasets: our own, PickleBall (out-of-distribution), Hide-and-Seek (9 advanced evasive malicious models), and synthetic joblib files. Our method achieves 90.01% F1-score compared with 7.23%-62.75% achieved by the SOTA scanners (Modelscan, Fickling, ClamAV, VirusTotal) on our dataset. Furthermore, on the PickleBall data (OOD), it achieves 81.22% F1-score compared with 76.09% achieved by the PickleBall method, while remaining fully library-agnostic. Finally, we show that our method is the only one to correctly parse and classify 9/9 evasive Hide-and-Seek malicious models specially crafted to evade scanners. This demonstrates that data-driven detection can effectively and generically mitigate Pickle-based model file attacks.
Authors:Kiarash Ahi, Vaibhav Agrawal, Saeed Valizadeh
Abstract:
Large Language Models (LLMs) & Generative AI are transforming cybersecurity, enabling both advanced defenses and new attacks. Organizations now use LLMs for threat detection, code review, and DevSecOps automation, while adversaries leverage them to produce malwares and run targeted social-engineering campaigns. This paper presents a unified analysis integrating offensive and defensive perspectives on GenAI-driven cybersecurity. Drawing on 70 academic, industry, and policy sources, it analyzes the rise of AI-facilitated threats and its implications for global security to ground necessity for scalable defensive mechanisms. We introduce two primary contributions: the LLM Scalability Risk Index (LSRI), a parametric framework to stress-test operational risks when deploying LLMs in security-critical environments & a model-supply-chain framework establishing a verifiable root of trust throughout model lifecycle. We also synthesize defense strategies from platforms like Google Play Protect, Microsoft Security Copilot and outline a governance roadmap for secure, large-scale LLM deployment.
Authors:Cathrin Schachner, Jasmin Wachter
Abstract:
Capture-the-Flag (CTF) competitions serve as gateways into offensive cybersecurity, yet they often present steep barriers for novices due to complex toolchains and opaque workflows. Recently, agentic AI frameworks for cybersecurity promise to lower these barriers by automating and coordinating penetration testing tasks. However, their role in shaping novice learning remains underexplored. We present a human-centered, mixed-methods case study examining how agentic AI frameworks -- here Cybersecurity AI (CAI) -- mediates novice entry into CTF-based penetration testing. An undergraduate student without prior hacking experience attempted to approach performance benchmarks from a national cybersecurity challenge using CAI. Quantitative performance metrics were complemented by structured reflective analysis of learning progression and AI interaction patterns. Our thematic analysis suggest that agentic AI reduces initial entry barriers by providing overview, structure and guidance, thereby lowering the cognitive workload during early engagement. Quantitatively, the observed extensive exploration of strategies and low per-strategy execution time potetially facilitatates cybersecurity training on meta, i.e. strategic levels. At the same time, AI-assisted cybersecurity education introduces new challenges related to trust, dependency, and responsible use. We discuss implications for human-centered AI-supported cybersecurity education and outline open questions for future research.
Authors:Jingkai Guo, Chaitali Chakrabarti, Deliang Fan
Abstract:
Large language models (LLMs) are increasingly deployed in safety and security critical applications, raising concerns about their robustness to model parameter fault injection attacks. Recent studies have shown that bit-flip attacks (BFAs), which exploit computer main memory (i.e., DRAM) vulnerabilities to flip a small number of bits in model weights, can severely disrupt LLM behavior. However, existing BFA on LLM largely induce un-targeted failure or general performance degradation, offering limited control over manipulating specific or targeted outputs. In this paper, we present TFL, a novel targeted bit-flip attack framework that enables precise manipulation of LLM outputs for selected prompts while maintaining almost no or minor degradation on unrelated inputs. Within our TFL framework, we propose a novel keyword-focused attack loss to promote attacker-specified target tokens in generative outputs, together with an auxiliary utility score that balances attack effectiveness against collateral performance impact on benign data. We evaluate TFL on multiple LLMs (Qwen, DeepSeek, Llama) and benchmarks (DROP, GSM8K, and TriviaQA). The experiments show that TFL achieves successful targeted LLM output manipulations with less than 50 bit flips and significantly reduced effect on unrelated queries compared to prior BFA approaches. This demonstrates the effectiveness of TFL and positions it as a new class of stealthy and targeted LLM model attack.
Authors:Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, Florian Tramèr
Abstract:
We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.
Authors:Matthew Regehr, Bingshan Hu, Ethan Leeman, Pasin Manurangsi, Pierre Tholoniat, Mathias Lécuyer
Abstract:
We study natural privacy filters, which enable the exact composition of differentially private (DP) mechanisms with adaptively chosen privacy characteristics. Earlier privacy filters consider only simple privacy parameters such as Rényi-DP or Gaussian DP parameters. Natural filters account for the entire privacy profile of every query, promising greater utility for a given privacy budget. We show that, contrary to other forms of DP, natural privacy filters are not free in general. Indeed, we show that only families of privacy mechanisms that are well-ordered when composed admit free natural privacy filters.
Authors:Sofya Raskhodnikova, Adam Smith, Connor Wagaman, Anatoly Zavyalov
Abstract:
We initiate an investigation of node differential privacy for graphs in the local model of private data analysis. In our model, dubbed LNDP, each node sees its own edge list and releases the output of a local randomizer on this input. These outputs are aggregated by an untrusted server to obtain a final output. We develop a novel algorithmic framework for this setting that allows us to accurately answer arbitrary linear queries on a blurry approximation of the input graph's degree distribution. For some natural problems, the resulting algorithms match the accuracy achievable with node privacy in the central model, where data are held and processed by a trusted server. We also prove lower bounds on the error required by LNDP that imply the optimality of our algorithms for several fundamental graph statistics. We then lift these lower bounds to the interactive LNDP setting, demonstrating the optimality of our algorithms even when constantly many rounds of interaction are permitted. Obtaining our lower bounds requires new approaches, since those developed for the usual local model do not apply to the inherently overlapping inputs that arise from graphs. Finally, we prove structural results that reveal qualitative differences between local node privacy and the standard local model for tabular data.
Authors:Aashish Kolluri, Rishi Sharma, Manuel Costa, Boris Köpf, Tobias Nießen, Mark Russinovich, Shruti Tople, Santiago Zanella-Béguelin
Abstract:
Indirect prompt injection attacks threaten AI agents that execute consequential actions, motivating deterministic system-level defenses. Such defenses can provably block unsafe actions by enforcing confidentiality and integrity policies, but currently appear costly: they reduce task completion rates and increase token usage compared to probabilistic defenses. We argue that existing evaluations miss a key benefit of system-level defenses: reduced reliance on human oversight. We introduce autonomy metrics to quantify this benefit: the fraction of consequential actions an agent can execute without human-in-the-loop (HITL) approval while preserving security. To increase autonomy, we design a security-aware agent that (i) introduces richer HITL interactions, and (ii) explicitly plans for both task progress and policy compliance. We implement this agent design atop an existing information-flow control defense against prompt injection and evaluate it on the AgentDojo and WASP benchmarks. Experiments show that this approach yields higher autonomy without sacrificing utility.
Authors:Jiangong Chen, Mingyu Zhu, Bin Li
Abstract:
Multimodal Large Language Models (MLLMs) enhance collaboration in Extended Reality (XR) environments by enabling flexible object and animation creation through the combination of natural language and visual inputs. However, visual data captured by XR headsets includes real-world backgrounds that may contain irrelevant or sensitive user information, such as credit cards left on the table or facial identities of other users. Uploading those frames to cloud-based MLLMs poses serious privacy risks, particularly when such data is processed without explicit user consent. Additionally, existing colocation and synchronization mechanisms in commercial XR APIs rely on time-consuming, privacy-invasive environment scanning and struggle to adapt to the highly dynamic nature of MLLM-integrated XR environments. In this paper, we propose PRISM-XR, a novel framework that facilitates multi-user collaboration in XR by providing privacy-aware MLLM integration. PRISM-XR employs intelligent frame preprocessing on the edge server to filter sensitive data and remove irrelevant context before communicating with cloud generative AI models. Additionally, we introduce a lightweight registration process and a fully customizable content-sharing mechanism to enable efficient, accurate, and privacy-preserving content synchronization among users. Our numerical evaluation results indicate that the proposed platform achieves nearly 90% accuracy in fulfilling user requests and less than 0.27 seconds registration time while maintaining spatial inconsistencies of less than 3.5 cm. Furthermore, we conducted an IRB-approved user study with 28 participants, demonstrating that our system could automatically filter highly sensitive objects in over 90% of scenarios while maintaining strong overall usability.
Authors:Long Tran, Antti Koskela, Ossi Räisä, Antti Honkela
Abstract:
Accounting for privacy loss under fully adaptive composition -- where both the choice of mechanisms and their privacy parameters may depend on the entire history of prior outputs -- is a central challenge in differential privacy (DP). In this setting, privacy filters are stopping rules for compositions that ensure a prescribed global privacy budget is not exceeded. It remains unclear whether optimal trade-off-function-based notions, such as $f$-DP, admit valid privacy filters under fully adaptive interaction. We show that the natural approach to defining an $f$-DP filter -- composing individual trade-off curves and stopping when the prescribed $f$-DP curve is crossed -- is fundamentally invalid. We characterise when and why this failure occurs, and establish necessary and sufficient conditions under which the natural filter is valid. Furthermore, we prove a fully adaptive central limit theorem for $f$-DP and construct an approximate Gaussian DP filter for subsampled Gaussian mechanisms at small sampling rates $q<0.2$ and large sampling rates $q>0.8$, yielding tighter privacy guarantees than filters based on Rényi DP in the same setting.
Authors:Sung-Hoon Yoon, Ruizhi Qian, Minda Zhao, Weiyue Li, Mengyu Wang
Abstract:
Large Language Models (LLMs) have become integral to many domains, making their safety a critical priority. Prior jailbreaking research has explored diverse approaches, including prompt optimization, automated red teaming, obfuscation, and reinforcement learning (RL) based methods. However, most existing techniques fail to effectively leverage vulnerabilities revealed in earlier interaction turns, resulting in inefficient and unstable attacks. Since jailbreaking involves sequential interactions in which each response influences future actions, reinforcement learning provides a natural framework for this problem. Motivated by this, we propose a history-aware RL-based jailbreak framework that analyzes and reweights vulnerability signals from prior steps to guide future decisions. We show that incorporating historical information alone improves jailbreak success rates. Building on this insight, we introduce an attention-based reweighting mechanism that highlights critical vulnerabilities within the interaction history, enabling more efficient exploration with fewer queries. Extensive experiments on AdvBench and HarmBench demonstrate that our method achieves state-of-the-art jailbreak performance while significantly improving query efficiency. These results underscore the importance of historical vulnerability signals in reinforcement learning-driven jailbreak strategies and offer a principled pathway for advancing adversarial research on LLM safeguards.
Authors:Ahmad Alemari, Pritam Sen, Cristian Borcea
Abstract:
Since most countries are coming up with online privacy regulations, such as GDPR in the EU, online publishers need to find a balance between revenue from targeted advertisement and user privacy. One way to be able to still show targeted ads, based on user personal and behavioral information, is to employ Federated Learning (FL), which performs distributed learning across users without sharing user raw data with other stakeholders in the publishing ecosystem. This paper presents AdFL, an FL framework that works in the browsers to learn user ad preferences. These preferences are aggregated in a global FL model, which is then used in the browsers to show more relevant ads to users. AdFL can work with any model that uses features available in the browser such as ad viewability, ad click-through, user dwell time on pages, and page content. The AdFL server runs at the publisher and coordinates the learning process for the users who browse pages on the publisher's website. The AdFL prototype does not require the client to install any software, as it is built utilizing standard APIs available on most modern browsers. We built a proof-of-concept model for ad viewability prediction that runs on top of AdFL. We tested AdFL and the model with two non-overlapping datasets from a website with 40K visitors per day. The experiments demonstrate AdFL's feasibility to capture the training information in the browser in a few milliseconds, show that the ad viewability prediction achieves up to 92.59% AUC, and indicate that utilizing differential privacy (DP) to safeguard local model parameters yields adequate performance, with only modest declines in comparison to the non-DP variant.
Authors:Remi A. Chou, Joerg Kliewer, Aylin Yener
Abstract:
Consider multiple users and a fusion center. Each user possesses a sequence of bits and can communicate with the fusion center through a one-way public channel. The fusion center's task is to compute the sum of all the sequences under the privacy requirement that a set of colluding users, along with the fusion center, cannot gain more than a predetermined amount $δ$ of information, measured through mutual information, about the sequences of other users. Our first contribution is to characterize the minimum amount of necessary communication between the users and the fusion center, as well as the minimum amount of necessary randomness at the users. Our second contribution is to establish a connection between private sum computation and secret sharing by showing that secret sharing is necessary to generate the local randomness needed for private sum computation, and prove that it holds true for any $δ\geq 0$.
Authors:Najmul Hasan, Prashanth BusiReddyGari
Abstract:
Lightweight cryptography is becoming essential as emerging technologies in digital identity systems and Internet of Things verification continue to demand strong cryptographic assurance on devices with limited processing power, memory, and energy resources. As these technologies move into routine use, they demand cryptographic primitives that maintain strong security and deliver predictable performance through clear theoretical models of time complexity. Although NIST's lightweight cryptography project provides empirical evaluations of the ten finalist algorithms, a unified theoretical understanding of their time-complexity behavior remains absent. This work introduces a symbolic model that decomposes each scheme into initialization, data-processing, and finalization phases, enabling formal time-complexity derivation for all ten finalists. The results clarify how design parameters shape computational scaling on constrained mobile and embedded environments. The framework provides a foundation needed to distinguish algorithmic efficiency and guides the choice of primitives capable of supporting security systems in constrained environments.
Authors:Tianxin Chen, Wenbo Jiang, Hongqiao Chen, Zhirun Zheng, Cheng Huang
Abstract:
Text-to-image (T2I) diffusion models are widely adopted for their strong generative capabilities, yet remain vulnerable to backdoor attacks. Existing attacks typically rely on fixed textual triggers and single-entity backdoor targets, making them highly susceptible to enumeration-based input defenses and attention-consistency detection. In this work, we propose Semantic-level Backdoor Attack (SemBD), which implants backdoors at the representation level by defining triggers as continuous semantic regions rather than discrete textual patterns. Concretely, SemBD injects semantic backdoors by distillation-based editing of the key and value projection matrices in cross-attention layers, enabling diverse prompts with identical semantic compositions to reliably activate the backdoor attack. To further enhance stealthiness, SemBD incorporates a semantic regularization to prevent unintended activation under incomplete semantics, as well as multi-entity backdoor targets that avoid highly consistent cross-attention patterns. Extensive experiments demonstrate that SemBD achieves a 100% attack success rate while maintaining strong robustness against state-of-the-art input-level defenses.
Authors:Blake Bullwinkel, Giorgio Severi, Keegan Hines, Amanda Minnich, Ram Shankar Siva Kumar, Yonatan Zunger
Abstract:
Detecting whether a model has been poisoned is a longstanding problem in AI security. In this work, we present a practical scanner for identifying sleeper agent-style backdoors in causal language models. Our approach relies on two key findings: first, sleeper agents tend to memorize poisoning data, making it possible to leak backdoor examples using memory extraction techniques. Second, poisoned LLMs exhibit distinctive patterns in their output distributions and attention heads when backdoor triggers are present in the input. Guided by these observations, we develop a scalable backdoor scanning methodology that assumes no prior knowledge of the trigger or target behavior and requires only inference operations. Our scanner integrates naturally into broader defensive strategies and does not alter model performance. We show that our method recovers working triggers across multiple backdoor scenarios and a broad range of models and fine-tuning methods.
Authors:Najmul Hasan, Prashanth BusiReddyGari
Abstract:
The Uniform Resource Locator (URL), introduced in a connectivity-first era to define access and locate resources, remains historically limited, lacking future-proof mechanisms for security, trust, or resilience against fraud and abuse, despite the introduction of reactive protections like HTTPS during the cybersecurity era. In the current AI-first threatscape, deceptive URLs have reached unprecedented sophistication due to the widespread use of generative AI by cybercriminals and the AI-vs-AI arms race to produce context-aware phishing websites and URLs that are virtually indistinguishable to both users and traditional detection tools. Although AI-generated phishing accounted for a small fraction of filter-bypassing attacks in 2024, phishing volume has escalated over 4,000% since 2022, with nearly 50% more attacks evading detection. At the rate the threatscape is escalating, and phishing tactics are emerging faster than labeled data can be produced, zero-shot and few-shot learning with large language models (LLMs) offers a timely and adaptable solution, enabling generalization with minimal supervision. Given the critical importance of phishing URL detection in large-scale cybersecurity defense systems, we present a comprehensive benchmark of LLMs under a unified zero-shot and few-shot prompting framework and reveal operational trade-offs. Our evaluation uses a balanced dataset with consistent prompts, offering detailed analysis of performance, generalization, and model efficacy, quantified by accuracy, precision, recall, F1 score, AUROC, and AUPRC, to reflect both classification quality and practical utility in threat detection settings. We conclude few-shot prompting improves performance across multiple LLMs.
Authors:Ali Mahdavi, Santa Aghapour, Azadeh Zamanifar, Amirfarhad Farhadi
Abstract:
Existing Byzantine robust aggregation mechanisms typically rely on fulldimensional gradi ent comparisons or pairwise distance computations, resulting in computational overhead that limits applicability in large scale and resource constrained federated systems. This paper proposes TinyGuard, a lightweight Byzantine defense that augments the standard FedAvg algorithm via statistical update f ingerprinting. Instead of operating directly on high-dimensional gradients, TinyGuard extracts compact statistical fingerprints cap turing key behavioral properties of client updates, including norm statistics, layer-wise ratios, sparsity measures, and low-order mo ments. Byzantine clients are identified by measuring robust sta tistical deviations in this low-dimensional fingerprint space with nd complexity, without modifying the underlying optimization procedure. Extensive experiments on MNIST, Fashion-MNIST, ViT-Lite, and ViT-Small with LoRA adapters demonstrate that TinyGuard pre serves FedAvg convergence in benign settings and achieves up to 95 percent accuracy under multiple Byzantine attack scenarios, including sign-flipping, scaling, noise injection, and label poisoning. Against adaptive white-box adversaries, Pareto frontier analysis across four orders of magnitude confirms that attackers cannot simultaneously evade detection and achieve effective poisoning, features we term statistical handcuffs. Ablation studies validate stable detection precision 0.8 across varying client counts (50-150), threshold parameters and extreme data heterogeneity . The proposed framework is architecture-agnostic and well-suited for federated fine-tuning of foundation models where traditional Byzantine defenses become impractical
Authors:Montassar Naghmouchi, Maryline Laurent
Abstract:
Consent is an ethical cornerstone of clinical research and healthcare in general. Although the ethical principles of consent - providing information, ensuring comprehension, and ensuring voluntariness - are well-defined, the technological infrastructure remains outdated. Clinicians are responsible for obtaining informed consent from research subjects or patients, and for managing it before, during, and after clinical trials or care, which is a burden for them. The voluntary nature of participating in clinical research or undergoing medical treatment implies the need for a participant-centric consent management system. However, this is not reflected in most established systems. Not only do most healthcare information systems not follow a user-centric model, but they also create data silos, which significantly reduce the mobility of patient data between different healthcare institutions and impact personalized medicine. Furthermore, consent management tools are outdated. We propose ClinConNet (Clinical Consent Network), a platform that connects researchers and participants based on clinical research projects. ClinConNet is powered by a dynamic consent model based on blockchain and take advantage of dynamic consent interfaces, as well as blockchain and Self-Sovereign Identity systems. ClinConNet is user-centric and provides important privacy features for patients, such as unlinkability, confidentiality, and ownership of identity data. It is also compatible with the right to be forgotten, as defined in many personal data protection regulations, such as the GDPR. We provide a detailed privacy and security analysis in an adversarial model, as well as a Proof of Concept implementation with detailed performance measures that demonstrate the feasibility of our blockchain-based consent management system with a median end-to-end consent establishment time of under 200ms and a throughput of 250TPS.
Authors:Renascence Tarafder Prapty, Gene Tsudik
Abstract:
Multi-Factor Authentication (MFA) enhances login security by requiring multiple authentication factors. Its adoption has increased in response to more frequent and sophisticated attacks. Duo is widely used by organizations including Fortune 500 companies and major educational institutions, yet its usability has not been examined thoroughly or recently. Earlier studies focused on technical challenges during initial deployment but did not measure core usability metrics such as task completion time or System Usability Scale (SUS) scores. These results are also outdated, originating from a time when MFA was less familiar to typical users. We conducted a long-term, large-scale Duo usability study at the University of California Irvine during the 2024-2025 academic year, involving 2559 participants. Our analysis uses authentication log data and a survey of 57 randomly selected users. The average overhead of a Duo Push task is nearly 8 seconds, which participants described as short to moderate. Overhead varies with time of day, field of study, and education level. The rate of authentication failures due to incomplete Duo tasks is 4.35 percent, and 43.86 percent of survey respondents reported at least one Duo login failure. The Duo SUS score is 70, indicating good usability. Participants generally find Duo easy to use but somewhat annoying, while also reporting an increased sense of account security. They also described common issues and offered suggestions for improvement.
Authors:Shuyu Chen, Mingxun Zhou, Haoyu Niu, Guopeng Lin, Weili Han
Abstract:
Secure data join enables two parties with vertically distributed data to securely compute the joined table, allowing the parties to perform downstream Secure multi-party computation-based Data Analytics (SDA), such as training machine learning models, based on the joined table. While Circuit-based Private Set Intersection (CPSI) can be used for secure data join, it introduces redundant dummy rows in the joined table, which results in high overhead in the downstream SDA tasks. iPrivJoin addresses this issue but introduces significant communication overhead in the redundancy removal process, as it relies on the cryptographic primitive OPPRF for data encoding and multiple rounds of oblivious shuffles. In this paper, we propose a much simpler secure data join protocol, Bifrost, which outputs (the secret shares of) a redundancy-free joined table. The highlight of Bifrost lies in its simplicity: it builds upon two conceptually simple building blocks, an ECDH-PSI protocol and a two-party oblivious shuffle protocol. The lightweight protocol design allows Bifrost to avoid the need for OPPRF. We also proposed a simple optimization named \textit{dual mapping} that reduces the rounds of oblivious shuffle needed from two to one. Experiments on datasets of up to 100 GB show that Bifrost achieves $2.54 \sim 22.32\times$ speedup and reduces the communication by $84.15\% \sim 88.97\%$ compared to the SOTA redundancy-free secure data join protocol iPrivJoin. Notably, the communication size of Bifrost is nearly equal to the size of the input data. In the two-step SDA pipeline evaluation (secure join and SDA), the redundancy-free property of Bifrost not only avoids the catastrophic error rate blowup in the downstream tasks caused by the dummy rows in the joined table (as introduced in CPSI), but also shows up to $2.80\times$ speed-up in the SDA process with up to $73.15\%$ communication reduction.
Authors:Ailsa Robertson, Christian Schaffner, Sebastian R. Verschoor
Abstract:
Quantum Key Distribution (QKD) allows secure communication without relying on computational assumptions, but can currently only be deployed over relatively short distances due to hardware constraints. To extend QKD over long distances, networks of trusted repeater nodes can be used, wherein QKD is executed between neighbouring nodes and messages between non-neighbouring nodes are forwarded using a relay protocol. Although these networks are being deployed worldwide, no protocol exists which provides provable guarantees of integrity against manipulation from both external adversaries and corrupted intermediates. In this work, we present the first protocol that provably provides both confidentiality and integrity. Our protocol combines an existing cryptographic technique, Algebraic Manipulation Detection (AMD) codes, with multi-path relaying over trusted repeater networks. This protocol achieves Information Theoretic Security (ITS) against the detection of manipulation, which we prove formally through a sequence of games.
Authors:Nan Zhong, Yiran Xu, Mian Zou
Abstract:
As realistic AI-generated images threaten digital authenticity, we address the generalization failure of generative artifact-based detectors by exploiting the intrinsic properties of the camera imaging pipeline. Concretely, we investigate color correlations induced by the color filter array (CFA) and demosaicing, and propose a Demosaicing-guided Color Correlation Training (DCCT) framework for AI-generated image detection. By simulating the CFA sampling pattern, we decompose each color image into a single-channel input (as the condition) and the remaining two channels as the ground-truth targets (for prediction). A self-supervised U-Net is trained to model the conditional distribution of the missing channels from the given one, parameterized via a mixture of logistic functions. Our theoretical analysis reveals that DCCT targets a provable distributional difference in color-correlation features between photographic and AI-generated images. By leveraging these distinct features to construct a binary classifier, DCCT achieves state-of-the-art generalization and robustness, significantly outperforming prior methods across over 20 unseen generators.
Authors:Qinhan Tan, Akash Gaonkar, Yu-Wei Fan, Aarti Gupta, Sharad Malik
Abstract:
Recent years have seen significant advances in using formal verification to check hardware security properties. Of particular practical interest are checking confidentiality and integrity of secrets, by checking that there is no information flow between the secrets and observable outputs. A standard method for checking information flow is to translate the corresponding non-interference hyperproperty into a safety property on a self-composition of the design, which has two copies of the design composed together. Although prior efforts have aimed to reduce the size of the self-composed design, there are no state-of-the-art model checkers that exploit their special structure for hardware security verification. In this paper, we propose SecIC3, a hardware model checking algorithm based on IC3 that is customized to exploit this self-composition structure. SecIC3 utilizes this structure in two complementary techniques: symmetric state exploration and adding equivalence predicates. We implement SecIC3 on top of two open-source IC3 implementations and evaluate it on a non-interference checking benchmark consisting of 10 designs. The experiment results show that SecIC3 significantly reduces the time for finding security proofs, with up to 49.3x proof speedup compared to baseline implementations.
Authors:Arther Tian, Alex Ding, Frank Chen, Simon Wu, Aaron Chan
Abstract:
Decentralized large language model inference networks require lightweight mechanisms to reward high quality outputs under heterogeneous latency and cost. Proof of Quality provides scalable verification by sampling evaluator nodes that score candidate outputs, then aggregating their scores into a consensus signal that determines rewards. However, evaluator heterogeneity and malicious score manipulation can distort consensus and inflate payouts, which weakens incentive alignment in open participation settings. This paper extends a cost-aware Proof of Quality mechanism by adding adversary-resilient consensus formation. We study robust aggregation rules, including median and trimmed mean, and an adaptive trust-weighted consensus that updates evaluator weights from deviation signals. Using question answering and summarization workloads with a ground truth proxy for offline analysis, we quantify evaluator reliability and show strong variance across evaluators, including task-dependent misalignment that can invert correlations. We then evaluate robustness under four adversarial strategies, including noise injection, boosting, sabotage, and intermittent manipulation, across a sweep of malicious ratios and evaluator sample sizes. Our results show that robust aggregation improves consensus alignment with the ground truth proxy and reduces sensitivity to noisy and strategic attacks compared with simple averaging. We further characterize the operational trade-off introduced by evaluator sampling, where larger evaluator sets reduce evaluator rewards and increase payoff variance while inference rewards remain relatively stable in our configuration. These findings motivate robust consensus as a default component for cost-aware Proof of Quality and provide practical guidance for selecting evaluator sampling parameters under adversarial risk and resource constraints.
Authors:Nirhoshan Sivaroopan, Kanchana Thilakarathna, Albert Zomaya, Manu, Yi Guo, Jo Plested, Tim Lynar, Jack Yang, Wangli Yang
Abstract:
Sponge attacks increasingly threaten LLM systems by inducing excessive computation and DoS. Existing defenses either rely on statistical filters that fail on semantically meaningful attacks or use static LLM-based detectors that struggle to adapt as attack strategies evolve. We introduce SHIELD, a multi-agent, auto-healing defense framework centered on a three-stage Defense Agent that integrates semantic similarity retrieval, pattern matching, and LLM-based reasoning. Two auxiliary agents, a Knowledge Updating Agent and a Prompt Optimization Agent, form a closed self-healing loop, when an attack bypasses detection, the system updates an evolving knowledgebase, and refines defense instructions. Extensive experiments show that SHIELD consistently outperforms perplexity-based and standalone LLM defenses, achieving high F1 scores across both non-semantic and semantic sponge attacks, demonstrating the effectiveness of agentic self-healing against evolving resource-exhaustion threats.
Authors:Yao Zhao, Zhang Sheng, Shengchen Duan, Shen Wang
Abstract:
Obfuscation substantially increases the interpretation cost of smart-contract auditing, while the comparability and transferability of obfuscation signals across chains remain unclear. We present HObfNET as an efficient surrogate of Obfs_Tool (ObfProbe), enabling fast cross-chain scoring at scale. The model aligns well with tool outputs on Ethereum (PCC 0.9158, MAPE 8.20 percent) and achieves 8-9 ms per contract, a 2.3k-5.2k times speedup over second-level Obfs_Tool runs, enabling million-scale scoring. On large BSC, Polygon, and Avalanche corpora, we find systematic score drift: fixed-threshold transfer inflates and deflates candidate queues, motivating within-chain main and extreme thresholds (p99 and p99.9) and an actionable queueing strategy. The high-score tail exhibits rare selectors, external-call opcode enrichment, and low signature density; a proxy indicator is enriched in the BSC high-score queue, enabling secondary triage. Cross-chain reuse analysis shows tail enrichment and directional diffusion, with traceable same-hash cases across chains. In publicly alignable incident samples, all fall into the p99 queue; Transit Swap DEX Hack and New Free DAO Flash Loan exhibit cross-chain spillover, indicating real-world hit and prioritization value. We deliver a two-tier audit queue and cross-chain linkage workflow to support practical multi-chain security operations.
Authors:Yiyang Lu, Jinwen He, Yue Zhao, Kai Chen, Ruigang Liang
Abstract:
Large Language Models (LLMs) are widely integrated into interactive systems such as dialogue agents and task-oriented assistants. This growing ecosystem also raises supply-chain risks, where adversaries can distribute poisoned models that degrade downstream reliability and user trust. Existing backdoor attacks and defenses are largely prompt-centric, focusing on user-visible triggers while overlooking structural signals in multi-turn conversations. We propose Turn-based Structural Trigger (TST), a backdoor attack that activates from dialogue structure, using the turn index as the trigger and remaining independent of user inputs. Across four widely used open-source LLM models, TST achieves an average attack success rate (ASR) of 99.52% with minimal utility degradation, and remains effective under five representative defenses with an average ASR of 98.04%. The attack also generalizes well across instruction datasets, maintaining an average ASR of 99.19%. Our results suggest that dialogue structure constitutes an important and under-studied attack surface for multi-turn LLM systems, motivating structure-aware auditing and mitigation in practice.
Authors:Richard Hohensinner, Belgin Mutlu, Inti Gabriel Mendoza Estrada, Matej Vukovic, Simone Kopeinik, Roman Kern
Abstract:
Large language models (LLMs) are deployed at scale, yet their training data life cycle remains opaque. This survey synthesizes research from the past ten years on three tightly coupled axes: (1) data provenance, (2) transparency, and (3) traceability, and three supporting pillars: (4) bias \& uncertainty, (5) data privacy, and (6) tools and techniques that operationalize them. A central contribution is a proposed taxonomy defining the field's domains and listing corresponding artifacts. Through analysis of 95 publications, this work identifies key methodologies concerning data generation, watermarking, bias measurement, data curation, data privacy, and the inherent trade-off between transparency and opacity.
Authors:Ke Xie, Xingyi Zhao, Yiwen Hu, Shuhan Yuan, Tian Xie
Abstract:
Cellular networks are critical infrastructure supporting billions of worldwide users and safety- and mission-critical services. Vulnerabilities in cellular networks can therefore cause service disruption, privacy breaches, and broad societal harm, motivating growing efforts to analyze 3GPP specifications that define required device and operator behavior. While large language models (LLMs) have demonstrated the capability for reading technical documents, cellular specifications impose unique challenges: faithful interpretation of normative language, reasoning across cross-referenced clauses, and verifiable conclusions grounded in multimodal evidence such as tables and figures. To address these challenges, we propose CellSpecSec-ARI, a unified Adapt-Retrieve-Integrate framework for systematic understanding and standard-driven security analysis of 3GPP specifications; CellularSpecSec-Bench, a staged benchmark, containing newly constructed high-quality datasets with expert-verified and corrected subsets from prior open-source resources. Together, they establish an accessible and reproducible foundation for quantifying progress in specification understanding and security reasoning in the cellular network security domain.
Authors:Shuiyin Liu, Amin Sakzad
Abstract:
We propose a maximum toroidal distance (MTD) code for lattice-based public-key encryption (PKE). By formulating the encryption encoding problem as the selection of $2^\ell$ points in the discrete $\ell$-dimensional torus $\mathbb{Z}_q^\ell$, the proposed construction maximizes the minimum $L_2$-norm toroidal distance to reduce the decryption failure rate (DFR) in post-quantum schemes such as the NIST ML-KEM (Crystals-Kyber). For $\ell = 2$, we show that the MTD code is essentially a variant of the Minal code recently introduced at IACR CHES 2025. For $\ell = 4$, we present a construction based on the $D_4$ lattice that achieves the largest known toroidal distance, while for $\ell = 8$, the MTD code corresponds to $2E_8$ lattice points in $\mathbb{Z}_4^8$. Numerical evaluations under the Kyber setting show that the proposed codes outperform both Minal and maximum Lee-distance ($L_1$-norm) codes in DFR for $\ell > 2$, while matching Minal code performance for $\ell = 2$.
Authors:Ruiqi Li, Zhiqiang Wang, Yunhao Yao, Xiang-Yang Li
Abstract:
To standardize interactions between LLM-based agents and their environments, the Model Context Protocol (MCP) was proposed and has since been widely adopted. However, integrating external tools expands the attack surface, exposing agents to tool poisoning attacks. In such attacks, malicious instructions embedded in tool metadata are injected into the agent context during MCP registration phase, thereby manipulating agent behavior. Prior work primarily focuses on explicit tool poisoning or relied on manually crafted poisoned tools. In contrast, we focus on a particularly stealthy variant: implicit tool poisoning, where the poisoned tool itself remains uninvoked. Instead, the instructions embedded in the tool metadata induce the agent to invoke a legitimate but high-privilege tool to perform malicious operations. We propose MCP-ITP, the first automated and adaptive framework for implicit tool poisoning within the MCP ecosystem. MCP-ITP formulates poisoned tool generation as a black-box optimization problem and employs an iterative optimization strategy that leverages feedback from both an evaluation LLM and a detection LLM to maximize Attack Success Rate (ASR) while evading current detection mechanisms. Experimental results on the MCPTox dataset across 12 LLM agents demonstrate that MCP-ITP consistently outperforms the manually crafted baseline, achieving up to 84.2% ASR while suppressing the Malicious Tool Detection Rate (MDR) to as low as 0.3%.
Authors:Zhiqiang Wang, Yizhong Ding, Zilong Xiao, Jinyu Lu, Yan Jia, Yanjun Li
Abstract:
PHP's dominance in web development is undermined by security challenges: static analysis lacks semantic depth, causing high false positives; dynamic analysis is computationally expensive; and automated vulnerability localization suffers from coarse granularity and imprecise context. Additionally, the absence of large-scale PHP vulnerability datasets and fragmented toolchains hinder real-world deployment. We present AutoVulnPHP, an end-to-end framework coupling two-stage vulnerability detection with fine-grained automated localization. SIFT-VulMiner (Structural Inference for Flaw Triage Vulnerability Miner) generates vulnerability hypotheses using AST structures enhanced with data flow. SAFE-VulMiner (Semantic Analysis for Flaw Evaluation Vulnerability Miner) verifies candidates through pretrained code encoder embeddings, eliminating false positives. ISAL (Incremental Sequence Analysis for Localization) pinpoints root causes via syntax-guided tracing, chain-of-thought LLM inference, and causal consistency checks to ensure precision. We contribute PHPVD, the first large-scale PHP vulnerability dataset with 26,614 files (5.2M LOC) across seven vulnerability types. On public benchmarks and PHPVD, AutoVulnPHP achieves 99.7% detection accuracy, 99.5% F1 score, and 81.0% localization rate. Deployed on real-world repositories, it discovered 429 previously unknown vulnerabilities, 351 assigned CVE identifiers, validating its practical effectiveness.
Authors:Rémi Van Boxem, Tom Barbette, Cristel Pelsser, Ramin Sadre
Abstract:
Automated bots now account for roughly half of all web requests, and an increasing number deliberately spoof their identity to either evade detection or to not respect robots.txt. Existing countermeasures are either resource-intensive (JavaScript challenges, CAPTCHAs), cost-prohibitive (commercial solutions), or degrade the user experience. This paper proposes a lightweight, passive approach to bot detection that combines user-agent string analysis with favicon-based heuristics, operating entirely on standard web server logs with no client-side interaction. We evaluate the method on over 4.6 million requests containing 54,945 unique user-agent strings collected from website hosted all around the earth. Our approach detects 67.7% of bot traffic while maintaining a false-positive rate of 3%, outperforming state of the art (less than 20%). This method can serve as a first line of defence, routing only genuinely ambiguous requests to active challenges and preserving the experience of legitimate users.
Authors:Robert Aufschläger, Jakob Folz, Gautam Savaliya, Manjitha D Vidanalage, Michael Heigl, Martin Schramm
Abstract:
Street-level imagery contains personally identifiable information (PII), some of which is context-dependent. Existing anonymization methods either over-process images or miss subtle identifiers, while API-based solutions compromise data sovereignty. We present an agentic framework CAIAMAR (\underline{C}ontext-\underline{A}ware \underline{I}mage \underline{A}nonymization with \underline{M}ulti-\underline{A}gent \underline{R}easoning) for context-aware PII segmentation with diffusion-based anonymization, combining pre-defined processing for high-confidence cases with multi-agent reasoning for indirect identifiers. Three specialized agents coordinate via round-robin speaker selection in a Plan-Do-Check-Act (PDCA) cycle, enabling large vision-language models to classify PII based on spatial context (private vs. public property) rather than rigid category rules. The agents implement spatially-filtered coarse-to-fine detection where a scout-and-zoom strategy identifies candidates, open-vocabulary segmentation processes localized crops, and $IoU$-based deduplication ($30\%$ threshold) prevents redundant processing. Modal-specific diffusion guidance with appearance decorrelation substantially reduces re-identification (Re-ID) risks. On CUHK03-NP, our method reduces person Re-ID risk by $73\%$ ($R1$: $16.9\%$ vs. $62.4\%$ baseline). For image quality preservation on CityScapes, we achieve KID: $0.001$, and FID: $9.1$, significantly outperforming existing anonymization. The agentic workflow detects non-direct PII instances across object categories, and downstream semantic segmentation is preserved. Operating entirely on-premise with open-source models, the framework generates human-interpretable audit trails supporting EU's GDPR transparency requirements while flagging failed cases for human review.
Authors:Surada Suwansathit, Yuxuan Zhang, Guofei Gu
Abstract:
AI agent frameworks connecting large language model (LLM) reasoning to host execution surfaces--shell, filesystem, containers, and messaging--introduce security challenges structurally distinct from conventional software. We present a systematic taxonomy of 190 advisories filed against OpenClaw, an open-source AI agent runtime, organized by architectural layer and trust-violation type. Vulnerabilities cluster along two orthogonal axes: (1) the system axis, reflecting the architectural layer (exec policy, gateway, channel, sandbox, browser, plugin, agent/prompt); and (2) the attack axis, reflecting adversarial techniques (identity spoofing, policy bypass, cross-layer composition, prompt injection, supply-chain escalation). Patch-differential evidence yields three principal findings. First, three Moderate- or High-severity advisories in the Gateway and Node-Host subsystems compose into a complete unauthenticated remote code execution (RCE) path--spanning delivery, exploitation, and command-and-control--from an LLM tool call to the host process. Second, the exec allowlist, the primary command-filtering mechanism, relies on a closed-world assumption that command identity is recoverable via lexical parsing. This is invalidated by shell line continuation, busybox multiplexing, and GNU option abbreviation. Third, a malicious skill distributed via the plugin channel executed a two-stage dropper within the LLM context, bypassing the exec pipeline and demonstrating that the skill distribution surface lacks runtime policy enforcement. The dominant structural weakness is per-layer trust enforcement rather than unified policy boundaries, making cross-layer attacks resilient to local remediation.
Authors:Mohammed Elnawawy, Gargi Mitra, Shahrear Iqbal, Karthik Pattabiraman
Abstract:
Safety-critical domains like healthcare rely on deep neural networks (DNNs) for prediction, yet DNNs remain vulnerable to evasion attacks. Anomaly detectors (ADs) are widely used to protect DNNs, but conventional ADs are trained indiscriminately on benign data from all patients, overlooking physiological differences that introduce noise, degrade robustness, and reduce recall. In this paper, we propose ROAST, a novel risk-aware outlier exposure selective training framework that improves AD recall without sacrificing precision. ROAST identifies patients who are less vulnerable to attack and focuses training on these cleaner, more reliable data, thereby reducing false negatives and improving recall. To preserve precision, the framework applies outlier exposure by injecting adversarial samples into the training set of the less vulnerable patients, avoiding noisy data from others. Experiments show that ROAST increases recall by 16.2\% while reducing the training time by 88.3\% on average compared to indiscriminate training, with minimal impact on precision.
Authors:Islam Debicha, Tayeb Kenaza, Ishak Charfi, Salah Mosbah, Mehdi Sehaki, Jean-Michel Dricot
Abstract:
The integration of machine learning (ML) algorithms into Internet of Things (IoT) applications has introduced significant advantages alongside vulnerabilities to adversarial attacks, especially within IoT-based intrusion detection systems (IDS). While theoretical adversarial attacks have been extensively studied, practical implementation constraints have often been overlooked. This research addresses this gap by evaluating the feasibility of evasion attacks on IoT network-based IDSs, employing a novel black-box adversarial attack. Our study aims to bridge theoretical vulnerabilities with real-world applicability, enhancing understanding and defense against sophisticated threats in modern IoT ecosystems. Additionally, we propose a defense scheme tailored to mitigate the impact of evasion attacks, thereby reinforcing the resilience of ML-based IDSs. Our findings demonstrate successful evasion attacks against IDSs, underscoring their susceptibility to advanced techniques. In contrast, we proposed a defense mechanism that exhibits robust performance by effectively detecting the majority of adversarial traffic, showcasing promising outcomes compared to current state-of-the-art defenses. By addressing these critical cybersecurity challenges, our research contributes to advancing IoT security and provides insights for developing more resilient IDS.
Authors:Song Son Ha, Kunal Singh, Florian Foerster, Henry Beuster, Tim Kittel, Dominik Merli, Gerd Scholl
Abstract:
Industrial deployments increasingly rely on Open Platform Communications Unified Architecture (OPC UA) as a secure and platform-independent communication protocol, while private Fifth Generation (5G) networks provide low-latency and high-reliability connectivity for modern automation systems. However, their combination introduces new attack surfaces and traffic characteristics that remain insufficiently understood, particularly with respect to machine learning-based intrusion detection systems (ML-based IDS). This paper presents an experimental study on detecting cyberattacks against OPC UA applications operating over an operational private 5G network. Multiple attack scenarios are executed, and OPC UA traffic is captured and enriched with statistical flow-, packet-, and protocol-aware features. Several supervised ML models are trained and evaluated to distinguish benign and malicious traffic. The results demonstrate that the proposed ML-based IDS achieves high detection performance for a representative set of OPC UA-specific attack scenarios over an operational private 5G network.
Authors:Moritz Gstür, Gustav Keppler, Mohammed Ramadan, Ghada Elbez, Veit Hagenmeyer
Abstract:
Critical energy infrastructures increasingly rely on information and communication technology for monitoring and control, which leads to new challenges with regard to cybersecurity. Recent advancements in this domain, including attribute-based access control (ABAC), have not been sufficiently addressed by established standards such as IEC 61850 and IEC 62351. To address this issue, we propose a novel real-time server-aided attribute-based authorization and access control for time-critical applications called RTS-ABAC. We tailor RTS-ABAC to the strict timing constraints inherent to the protocols employed in substation automation systems (SAS). We extend the concept of conventional ABAC by introducing real-time attributes and time-dependent policy evaluation and enforcement. To safeguard the authenticity, integrity, and non-repudiation of SAS communication and protect an SAS against domain-typical adversarial attacks, RTS-ABAC employs mandatory authentication, authorization, and access control for any type of SAS communication using a bump-in-the-wire (BITW) approach. To evaluate RTS-ABAC, we conduct a testbed-based performance analysis and a laboratory-based demonstration of applicability. We demonstrate the applicability using intelligent electronic devices, merging units, and I/O boxes communicating via the GOOSE and SV protocol. The results show that RTS-ABAC is able to secure low-latency communication between SAS devices, as up to 99.82 % of exchanged packets achieve a round-trip time below 6 ms. Moreover, the results of the evaluation indicate that RTS-ABAC is a viable solution to enhance the cybersecurity not only in a newly constructed SAS but also via retrofitting of existing substations.
Authors:Zelin Wan, Jin-Hee Cho, Mu Zhu, Ahmed H. Anwar, Charles Kamhoua, Munindar P. Singh
Abstract:
Unmanned Aerial Vehicles (UAVs) are valuable for mission-critical systems like surveillance, rescue, or delivery. Not surprisingly, such systems attract cyberattacks, including Denial-of-Service (DoS) attacks to overwhelm the resources of mission drones (MDs). How can we defend UAV mission systems against DoS attacks? We adopt cyber deception as a defense strategy, in which honey drones (HDs) are proposed to bait and divert attacks. The attack and deceptive defense hinge upon radio signal strength: The attacker selects victim MDs based on their signals, and HDs attract the attacker from afar by emitting stronger signals, despite this reducing battery life. We formulate an optimization problem for the attacker and defender to identify their respective strategies for maximizing mission performance while minimizing energy consumption. To address this problem, we propose a novel approach, called HT-DRL. HT-DRL identifies optimal solutions without a long learning convergence time by taking the solutions of hypergame theory into the neural network of deep reinforcement learning. This achieves a systematic way to intelligently deceive attackers. We analyze the performance of diverse defense mechanisms under different attack strategies. Further, the HT-DRL-based HD approach outperforms existing non-HD counterparts up to two times better in mission performance while incurring low energy consumption.
Authors:Jiaxuan Cai, Xinmiao Zhang
Abstract:
Hamming Quasi-Cyclic (HQC) was chosen for the latest post-quantum cryptography standardization. A concatenated Reed-Muller (RM) and Reed-Solomon (RS) code is decoded during the HQC decryption. Soft-decision RS decoders achieve better error-correcting performance than hard-decision decoders and accordingly shorten the required codeword and key lengths. However, the only soft-decision decoder for HQC in prior works is an erasure-only decoder, which has limited coding gain. This paper analyzes other hardware-friendly soft-decision RS decoders and discovers that the generalized minimum-distance (GMD) decoder can better utilize the soft information available in HQC. Extending the Agrawal-Vardy bound for the scenario of HQC, it was found that the RS codeword length for HQC-128 can be reduced from 46 to 36. This paper also proposes efficient GMD decoder hardware architectures optimized for the short and low-rate RS codes used in HQC. The HQC-128 decryption utilizing the proposed GMD decoder achieves 20% and 15% reductions on the latency and area, respectively, compared to the decryption with hard-decision decoders.
Authors:Jonathan Z. Lu, Alexander Poremba, Yihui Quek, Akshar Ramkumar
Abstract:
Post-quantum cryptography currently rests on a small number of hardness assumptions, posing significant risks should any one of them be compromised. This vulnerability motivates the search for new and cryptographically versatile assumptions that make a convincing case for quantum hardness. In this work, we argue that decoding random quantum stabilizer codes -- a quantum analog of the well-studied LPN problem -- is an excellent candidate. This task occupies a unique middle ground: it is inherently native to quantum computation, yet admits an equivalent formulation with purely classical input and output, as recently shown by Khesin et al. (STOC '26). We prove that the average-case hardness of quantum stabilizer decoding implies the core primitives of classical Cryptomania, including public-key encryption (PKE) and oblivious transfer (OT), as well as one-way functions. Our constructions are moreover practical: our PKE scheme achieves essentially the same efficiency as state-of-the-art LPN-based PKE, and our OT is round-optimal. We also provide substantial evidence that stabilizer decoding does not reduce to LPN, suggesting that the former problem constitutes a genuinely new post-quantum assumption. Our primary technical contributions are twofold. First, we give a reduction from random quantum stabilizer decoding to an average-case problem closely resembling LPN, but which is equipped with additional symplectic algebraic structure. While this structure is essential to the quantum nature of the problem, it raises significant barriers to cryptographic security reductions. Second, we develop a new suit of scrambling techniques for such structured linear spaces, and use them to produce rigorous security proofs for all of our constructions.
Authors:Kasra Ahmadi, Saeed Aghapour, Mehran Mozaffari Kermani, Reza Azarderakhsh
Abstract:
The inference phase of deep neural networks (DNNs) in embedded systems is increasingly vulnerable to fault attacks and failures, which can result in incorrect predictions. These vulnerabilities can potentially lead to catastrophic consequences, making the development of effective mitigation techniques essential. In this paper, we introduce MAED (Mathematical Activation Error Detection), an algorithm-level error detection framework that exploits mathematical identities to continuously validate the correctness of non-linear activation function computations at runtime. To the best of our knowledge, this work is the first to integrate algorithm-level error detection techniques to defend against both malicious fault injection attacks and naturally occurring faults in critical DNN components in embedded systems. The evaluation is conducted on three widely adopted activation functions, namely ReLu, sigmoid, and tanh which serve as fundamental building blocks for introducing non-linearity in DNNs and can lead to mispredictions when subjected to natural faults or fault attacks. We assessed the proposed error detection scheme via fault model simulation, achieving close to 100% error detection while mitigating existing fault attacks on DNN inference. Additionally, the overhead introduced by integrating the proposed scheme with the baseline implementation (i.e., without error detection) is validated through implementations on an AMD/Xilinx Artix-7 FPGA and an ATmega328P microcontroller, as well as through integration with TensorFlow. On the microcontroller, the proposed error detection incurs less than 1% clock cycle overhead, while on the FPGA it requires nearly zero additional area, at the cost of approximately a 20% increase in latency for sigmoid and tanh.
Authors:Annika Wilde, Samira Briongos, Claudio Soriente, Ghassan Karame
Abstract:
RISC-V-based Trusted Execution Environments (TEEs) are gaining traction in the automotive and IoT sectors as a foundation for protecting sensitive computations. However, the supporting infrastructure around these TEEs remains immature. In particular, mechanisms for secure enclave updates and migrations - essential for complete enclave lifecycle management - are largely absent from the evolving RISC-V ecosystem. In this paper, we address this limitation by introducing a novel toolkit that enables RISC-V TEEs to support critical aspects of the software development lifecycle. Our toolkit provides broad compatibility with existing and emerging RISC-V TEE implementations (e.g., Keystone and CURE), which are particularly promising for integration in the automotive industry. It extends the Security Monitor (SM) - the trusted firmware layer of RISC-V TEEs - with three modular extensions that enable secure enclave update, secure migration, state continuity, and trusted time. Our implementation demonstrates that the toolkit requires only minimal interface adaptation to accommodate TEE-specific naming conventions. Our evaluation results confirm that our proposal introduces negligible performance overhead: our state continuity solution incurs less than 1.5% overhead, and enclave downtime remains as low as 0.8% for realistic applications with a 1 KB state, which conforms with the requirements of most IoT and automotive applications.
Authors:Tobias Cloosters, Pascal Winkler, Jens-Rene Giesen, Ghassan Karame, Lucas Davi
Abstract:
Solana is rapidly gaining traction among smart contract developers and users. However, its growing adoption has been accompanied by a series of major security incidents, which have spurred research into automated analysis techniques for Solana smart contracts. Unfortunately, existing approaches do not address the unique and complex account model of Solana. In this paper, we propose SseRex, the first symbolic execution vulnerability detection approach for finding Solana-specific bugs such as missing owner checks, missing signer checks, and missing key checks, as well as arbitrary cross-program invocations. Our evaluation of 8,714 bytecode-only contracts shows that our approach outperforms existing approaches and identifies potential bugs in 467 different contracts. Additionally, we analyzed 120 open-source Solana projects and conducted in-depth case studies on four of them. Our findings reveal that subtle, easily overlooked issues often serve as the root cause of severe exploits, further highlighting the need for specialized analysis tools like SseRex.
Authors:Maël Jenny, Jérémie Dentan, Sonia Vanier, Michaël Krajecki
Abstract:
Most jailbreak techniques for Large Language Models (LLMs) primarily rely on prompt modifications, including paraphrasing, obfuscation, or conversational strategies. Meanwhile, abliteration techniques (also known as targeted ablations of internal components) have been used to study and explain LLM outputs by probing which internal structures causally support particular responses. In this work, we combine these two lines of research by directly manipulating the model's internal activations to alter its generation trajectory without changing the prompt. Our method constructs a nearby benign prompt and performs layer-wise activation substitutions using a sequential procedure. We show that this activation surgery method reveals where and how refusal arises, and prevents refusal signals from propagating across layers, thereby inhibiting the model's safety mechanisms. Finally, we discuss the security implications for open-weights models and instrumented inference environments.
Authors:Therdpong Daengsi, Phisit Pornpongtechavanich, Paradorn Boonpoor, Kathawut Wattanachukul, Korn Puangnak, Kritphon Phanrattanachai, Pongpisit Wuttidittachotti, Paramate Horkaew
Abstract:
This study provides a comprehensive synthesis of Artificial Intelligence (AI), especially Machine Learning (ML) and Deep Learning (DL), in ransomware defense. Using a "review of reviews" methodology based on PRISMA, this paper gathers insights on how AI is transforming ransomware detection, prevention, and mitigation strategies during the past five years (2020-2024). The findings highlight the effectiveness of hybrid models that combine multiple analysis techniques such as code inspection (static analysis) and behavior monitoring during execution (dynamic analysis). The study also explores anomaly detection and early warning mechanisms before encryption to address the increasing complexity of ransomware. In addition, it examines key challenges in ransomware defense, including techniques designed to deceive AI-driven detection systems and the lack of strong and diverse datasets. The results highlight the role of AI in early detection and real-time response systems, improving scalability and resilience. Using a systematic review-of-reviews approach, this study consolidates insights from multiple review articles, identifies effective AI models, and bridges theory with practice to support collaboration among academia, industry, and policymakers. Future research directions and practical recommendations for cybersecurity practitioners are also discussed. Finally, this paper proposes a roadmap for advancing AI-driven countermeasures to protect critical systems and infrastructures against evolving ransomware threats.
Authors:Xingli Fang, Jung-Eun Kim
Abstract:
Prior approaches for membership privacy preservation usually update or retrain all weights in neural networks, which is costly and can lead to unnecessary utility loss or even more serious misalignment in predictions between training data and non-training data. In this work, we observed three insights: i) privacy vulnerability exists in a very small fraction of weights; ii) however, most of those weights also critically impact utility performance; iii) the importance of weights stems from their locations rather than their values. According to these insights, to preserve privacy, we score critical weights, and instead of discarding those neurons, we rewind only the weights for fine-tuning. We show that, through extensive experiments, this mechanism exhibits outperforming resilience in most cases against Membership Inference Attacks while maintaining utility.
Authors:Nasim Soltani, Shayan Nejadshamsi, Zakaria Abou El Houda, Raphael Khoury, Kelton A. P. Costa, Tiago H. Falk, Anderson R. Avila
Abstract:
Adversarial examples can represent a serious threat to machine learning (ML) algorithms. If used to manipulate the behaviour of ML-based Network Intrusion Detection Systems (NIDS), they can jeopardize network security. In this work, we aim to mitigate such risks by increasing the robustness of NIDS towards adversarial attacks. To that end, we explore two adversarial methods for generating malicious network traffic. The first method is based on Generative Adversarial Networks (GAN) and the second one is the Fast Gradient Sign Method (FGSM). The adversarial examples generated by these methods are then used to evaluate a novel multilayer defense mechanism, specifically designed to mitigate the vulnerability of ML-based NIDS. Our solution consists of one layer of stacking classifiers and a second layer based on an autoencoder. If the incoming network data are classified as benign by the first layer, the second layer is activated to ensure that the decision made by the stacking classifier is correct. We also incorporated adversarial training to further improve the robustness of our solution. Experiments on two datasets, namely UNSW-NB15 and NSL-KDD, demonstrate that the proposed approach increases resilience to adversarial attacks.
Authors:Fan Zhang, Daniel Kreuter, Javier Fernandez-Marques, BloodCounts Consortium, Gregory Verghese, Bernard Butler, Nicholas Lane, Suthesh Sivapalaratnam, Joseph Taylor, Norbert C. J. de Wit, Nicholas S. Gleadall, Carola-Bibiane Schönlieb, Michael Roberts
Abstract:
Collaborative healthcare research across multiple institutions increasingly requires diverse clinical datasets, but cross-border data sharing is strictly constrained by privacy regulations. Federated learning (FL) enables model training while keeping data local; however, many existing frameworks remain proof-of-concept and do not adequately address governance risks such as unauthorised participation, misuse, and lack of accountability. In particular, enforceable mechanisms for authentication, authorisation, and accounting (AAA) are often missing, limiting real-world clinical deployment. This paper presents FLA$^3$ (Federated Learning with Authentication, Authorisation, and Accounting), a governance-aware federated learning platform that operationalises regulatory obligations through runtime policy enforcement. FLA$^3$ integrates eXtensible Access Control Markup Language (XACML) compliant attribute-based access control (ABAC), cryptographic accounting, and study-scoped federation directly into the federated learning orchestration layer to enforce institutional sovereignty and protocol adherence. We evaluate FLA$^3$ through two complementary studies. First, we demonstrate operational feasibility by deploying the platform infrastructure across five BloodCounts! Consortium institutions in four countries: United Kingdom, Netherlands, India, and The Gambia. Second, we assess clinical utility using simulated federation of full blood count (FBC) data from 54,446 samples from 35,315 subjects across 25 centres in the INTERVAL study. Results show that FLA$^3$ achieves predictive performance comparable to centralised training while strictly enforcing governance constraints. These results show that enforceable governance can function as a first-class privacy-preserving control, improving trustworthiness for scalable artificial intelligence (AI) in cross-jurisdictional healthcare deployments.
Authors:Qiyu Li, Yuhe Tian, Haojian Jin
Abstract:
Most OAuth service providers, such as Google and Microsoft, offer only a limited range of coarse-grained data access. As a result, third-party OAuth applications often end up accessing more user data than necessary, even if their developers want to minimize data access. We present OAuthHub, a development framework that leverages users' personal devices as the intermediary controller for OAuth-based data sharing between cloud services. The key innovations of OAuthHub are: (1) the insight that discretionary data access is largely unnecessary for most OAuth apps, which typically only require access at three well-defined moments-during installation, in response to user actions, and at scheduled intervals; (2) a development framework that requires explicit declarations of intended data access and supports the three common access patterns through intermittently available personal devices; and (3) a centralized runtime permission model for managing OAuth access across providers. We evaluated OAuthHub with three real-world apps on both PCs and mobile phones and found that OAuthHub requires moderate changes to the application code and imposes insignificant performance overheads. Our study with 18 developers showed that participants completed programming tasks significantly faster (9.1 vs. 18.0 minutes) with less code (4.7 vs. 15.8 lines) using OAuthHub than conventional OAuth APIs.
Authors:Bochra Al Agha, Razane Tajeddine
Abstract:
Benchmarking presence-only passive reconnaissance in smart-grid communications is challenging because the adversary is receive-only, yet nearby observers can still alter propagation through additional shadowing and multipath that reshapes channel coherence. Public smart-grid cybersecurity datasets largely target active protocol- or measurement-layer attacks and rarely provide propagation-driven observables with tiered topology context, which limits reproducible evaluation under strictly passive threat models. This paper introduces an IEEE-inspired, literature-anchored benchmark dataset generator for passive reconnaissance over a tiered Home Area Network (HAN), Neighborhood Area Network (NAN), and Wide Area Network (WAN) communication graph with heterogeneous wireless and wireline links. Node-level time series are produced through a physically consistent channel-to-metrics mapping where channel state information (CSI) is represented via measurement-realistic amplitude and phase proxies that drive inferred signal-to-noise ratio (SNR), packet error behavior, and delay dynamics. Passive attacks are modeled only as windowed excess attenuation and coherence degradation with increased channel innovation, so reliability and latency deviations emerge through the same causal mapping without labels or feature shortcuts. The release provides split-independent realizations with burn-in removal, strictly causal temporal descriptors, adjacency-weighted neighbor aggregates and deviation features, and federated-ready per-node train, validation, and test partitions with train-only normalization metadata. Baseline federated experiments highlight technology-dependent detectability and enable standardized benchmarking of graph-temporal and federated detectors for passive reconnaissance.
Authors:Fabian Wiesner, Anna Pappa
Abstract:
Delegated quantum computation enables a client with limited quantum capabilities to outsource computations to a more powerful quantum server while preserving correctness and privacy. Verification is crucial in this setting to ensure that the untrusted quantum server performs the computation honestly and returns correct results. A common verification method is the quantum cut-and-choose technique. Inspired by classical verification methods for two-party computation, the client uses the majority of the delegated rounds to test the server's honesty, while keeping the remaining ones for the actual computation. Combining this technique with other methods, such as quantum error correction, could help achieve negligible cheating probabilities for the server; however, such methods can impose significant overheads making implementations unfeasible for the near-term future. In this work, we investigate whether cut-and-choose can yield efficient and secure verifiable quantum computation without additional costly techniques. We find that verifiable delegated quantum computation protocols relying solely on cut-and-choose techniques cannot be secure and efficient at the same time.
Authors:Weitong Li, Yuze Li, Taejoong Chung
Abstract:
The Resource Public Key Infrastructure (RPKI) secures Internet routing by binding IP prefixes to authorized Autonomous Systems, yet its RSA foundations are vulnerable to quantum adversaries. A naive swap to post-quantum (PQ) signatures (eg Falcon) is a poor fit for RPKI's bulk model: every relying party (RP) repeatedly fetches and validates the entire global repository, so larger keys and signatures inflate bandwidth and CPU cost, especially during a long dual-stack transition. We present pqRPKI , a post-quantum RPKI framework that pairs a multi-layer Merkle Tree Ladder (MTL) with RPKI objects, customized to relocate per-object verification material from certificates into the Manifest. To update RPKI for Merkle tree based schemes, pqRPKI redesign the RPKI manifest and delegation chain, introduces a ladder-guided sync and bulk-verification workflow that lets validators localize diffs top-down and rebuild trees bottom-up. pqRPKI also preserves current RPKI objects and encodings, supports both hosted and delegated operation, and provides an additive migration path that coexists with today's trust anchors for dual-stack deployment with little size overhead. Implemented as a working publication point (PP) and RPs, we show that pqRPKI reduces repository footprint to 546.8 MB on average (65.5%/83.1% smaller than Falcon/ML-DSA), cuts full-cycle validation to 102.7 s, and achieves 118.3 s end-to-end PP to Router time, enabling sub-2-minute operating cadences with full-repository validation each cycle. Dual-stack deployment with RSA only adds just 3.4% size overhead versus today's RPKI repositories.
Authors:Yanbang Sun, Quan Luo, Yuelin Wang, Qian Chen, Benjin Liu, Ruiqi Chen, Qing Huang, Xiaohong Li, Junjie Wang
Abstract:
Network protocols are the foundation of modern communication, yet their implementations often contain semantic vulnerabilities stemming from inadequate understanding of specification semantics. Existing gray-box and black-box testing approaches lack semantic modeling of protocols, making it difficult to precisely express testing intent and cover boundary conditions. Moreover, they typically rely on coarse-grained oracles such as crashes, which are inadequate for identifying deep semantic vulnerabilities. To address these limitations, we present a semantics-aware fuzzing framework, SemFuzz. The framework leverages large language models to extract structured semantic rules from RFC documents and generates test cases that intentionally violate these rules to encode specific testing intents. It then detects deep semantic vulnerabilities by comparing the observed responses with the expected ones. Evaluation on seven widely deployed protocol implementations shows that SemFuzz identified sixteen potential vulnerabilities, ten of which have been confirmed. Among the confirmed vulnerabilities, five were previously unknown and four have been assigned CVEs. These results demonstrate the effectiveness of SemFuzz in detecting semantic vulnerabilities.
Authors:Yuki Yamada, Daisuke Kotani, Kota Tsubouchi, Hidehito Gomi, Yasuo Okabe
Abstract:
We propose ISS-RegAuth, a lightweight indoor space authentication framework that authenticates a user by comparing LiDAR captures of personal rooms. Prior work processes every point in the cloud, where planar surfaces such as walls and floors dominate similarity calculations, causing latency and potential privacy exposure. In contrast, ISS-RegAuth retains only 1-2\% of Intrinsic Shape Signatures (ISS) keypoints, computes their Fast Point Feature Histograms, and performs RANSAC and ICP on this sparse set. On 100 ARKitScenes pairs, this approach reduces the equal-error rate from 0.02 to 0.00, cuts processing time by 20\%, and lowers transmitted data to 2.2\% of the original. These results show that keypoint-based sparse representation can make privacy-preserving, edge-deployable indoor space authentication practical. As an early step, this work opens a path toward device-independent authentication and account-recovery mechanisms that rely on users' physical environments.
Authors:Xiaoguang Li, Hanyi Wang, Yaowei Huang, Jungang Yang, Qingqing Ye, Haonan Yan, Ke Pan, Zhe Sun, Hui Li
Abstract:
Shuffler-based differential privacy (shuffle-DP) is a privacy paradigm providing high utility by involving a shuffler to permute noisy report from users. Existing shuffle-DP protocols mainly focus on the design of shuffler-based categorical frequency oracle (SCFO) for frequency estimation on categorical data. However, numerical data is a more prevalent type and many real-world applications depend on the estimation of data distribution with ordinal nature. In this paper, we study the distribution estimation under pure shuffle model, which is a prevalent shuffle-DP framework without strong security assumptions. We initially attempt to transplant existing SCFOs and the naïve distribution recovery technique to this task, and demonstrate that these baseline protocols cannot simultaneously achieve outstanding performance in three metrics: 1) utility, 2) message complexity; and 3) robustness to data poisoning attacks. Therefore, we further propose a novel single-message \textit{adaptive shuffler-based piecewise} (ASP) protocol with high utility and robustness. In ASP, we first develop a randomizer by parameter optimization using our proposed tighter bound of mutual information. We also design an \textit{Expectation Maximization with Adaptive Smoothing} (EMAS) algorithm to accurately recover distribution with enhanced robustness. To quantify robustness, we propose a new evaluation framework to examine robustness under different attack targets, enabling us to comprehensively understand the protocol resilience under various adversarial scenarios. Extensive experiments demonstrate that ASP outperforms baseline protocols in all three metrics. Especially under small $ε$ values, ASP achieves an order of magnitude improvement in utility with minimal message complexity, and exhibits over threefold robustness compared to baseline methods.
Authors:Yeoh Wei Zhu, Naresh Goud Boddu, Yao Ma, Shaltiel Eloul, Giulio Golinelli, Yash Satsangi, Rob Otter, Kaushik Chakraborty
Abstract:
Traditional financial institutions face inefficiencies that can be addressed by distributed ledger technology. However, a primary barrier to adoption is the privacy concerns surrounding publicly available transaction data. Existing private protocols for distributed ledger that focus on the Ring-CT model are not suitable for adoption for financial institutions. We propose a post-quantum, lattice-based transaction scheme for encrypted ledgers which better aligns with institutions' requirements for confidentiality and audit-ability. The construction leverages various zero-knowledge proof techniques, and introduces a new method for equating two commitment messages, without the capability to open one of the commitment during the re-commitment. Subsequently, we build a publicly verifiable transaction scheme that is efficient for single or multi-assets, by introducing a new compact range-proof. We then provide a security analysis of it. The techniques used and the proofs constructed could be of independent interest.
Authors:Yang Gao, Gang Quan, Wujie Wen, Scott Piersall, Qian Lou, Liqiang Wang
Abstract:
Sparse matrix-vector multiplication (SpMV) is a fundamental operation in scientific computing, data analysis, and machine learning. When the data being processed are sensitive, preserving privacy becomes critical, and homomorphic encryption (HE) has emerged as a leading approach for addressing this challenge. Although HE enables privacy-preserving computation, its application to SpMV has remained largely unaddressed. To the best of our knowledge, this paper presents the first framework that efficiently integrates HE with SpMV, addressing the dual challenges of computational efficiency and data privacy. In particular, we introduce a novel compressed matrix format, named Compressed Sparse Sorted Column (CSSC), which is specifically designed to optimize encrypted sparse matrix computations. By preserving sparsity and enabling efficient ciphertext packing, CSSC significantly reduces storage and computational overhead. Our experimental results on real-world datasets demonstrate that the proposed method achieves significant gains in both processing time and memory usage. This study advances privacy-preserving SpMV and lays the groundwork for secure applications in federated learning, encrypted databases, scientific computing, and beyond.
Authors:Alessandro Sanna, Waldo Verstraete, Leonardo Regano, Davide Maiorca, Bjorn De Sutter
Abstract:
Evidence on the effectiveness of Man-At-The-End (MATE) software protections, such as code obfuscation, has mainly come from limited empirical research. Recently, however, an automatable method was proposed to obtain statistical models of the required effort to attack (protected) software. The proposed method was sketched for a number of attack strategies but not instantiated, evaluated, or validated for those that require human interaction with the attacked software. In this paper, we present a full instantiation of the method to obtain statistical effort models for game resource localisation attacks, which represent a major step towards creating game cheats, a prime example of MATE attacks. We discuss in detail all relevant aspects of our instantiation and the results obtained for two game use cases. Our results confirm the feasibility of the proposed method and its utility for decision support for users of software protection tools. These results open up a new avenue for obtaining models of the impact of software protections on reverse engineering attacks, which will scale much better than empirical research involving human participants.
Authors:Chen Sun, Yash Vekaria, Rishab Nithyanand
Abstract:
As LLM-driven agents begin to autonomously navigate the web, their ability to interpret and respond to manipulative interface design becomes critical. A fundamental question that emerges is: can such agents reliably recognize patterns of friction, misdirection, and coercion in interface design (i.e., dark patterns)? We study this question in a setting where the workflows are consequential: website portals associated with the submission of CCPA-related data rights requests. These portals operationalize statutory rights, but they are implemented as interactive interfaces whose design can be structured to facilitate, burden, or subtly discourage the exercise of those rights. We design and deploy an LLM-driven auditing agent capable of end-to-end traversal of rights-request workflows, structured evidence gathering, and classification of potential dark patterns. Across a set of 456 data broker websites, we evaluate: (1) the ability of the agent to consistently locate and complete request flows, (2) the reliability and reproducibility of its dark pattern classifications, and (3) the conditions under which it fails or produces poor judgments. Our findings characterize both the feasibility and the limitations of using LLM-driven agents for scalable dark pattern auditing.
Authors:Neha Nagaraja, Lan Zhang, Zhilong Wang, Bo Zhang, Pawan Patil
Abstract:
Multimodal Large Language Models (MLLMs) integrate vision and text to power applications, but this integration introduces new vulnerabilities. We study Image-based Prompt Injection (IPI), a black-box attack in which adversarial instructions are embedded into natural images to override model behavior. Our end-to-end IPI pipeline incorporates segmentation-based region selection, adaptive font scaling, and background-aware rendering to conceal prompts from human perception while preserving model interpretability. Using the COCO dataset and GPT-4-turbo, we evaluate 12 adversarial prompt strategies and multiple embedding configurations. The results show that IPI can reliably manipulate the output of the model, with the most effective configuration achieving up to 64\% attack success under stealth constraints. These findings highlight IPI as a practical threat in black-box settings and underscore the need for defenses against multimodal prompt injection.
Authors:Yuhang Li, Yajie Wang, Xiangyun Tang, Peng Jiang, Yu-an Tan, Liehuang Zhu
Abstract:
Secure aggregation is a foundational building block of privacy-preserving learning, yet achieving robustness under adversarial behavior remains challenging. Modern systems increasingly adopt the shuffle model of differential privacy (Shuffle-DP) to locally perturb client updates and globally anonymize them via shuffling for enhanced privacy protection. However, these perturbations and anonymization distort gradient geometry and remove identity linkage, leaving systems vulnerable to adversarial poisoning attacks. Moreover, the shuffler, typically a third party, can be compromised, undermining security against malicious adversaries. To address these challenges, we present Robust Aggregation in Noise (RAIN), a unified framework that reconciles privacy, robustness, and verifiability under Shuffle-DP. At its core, RAIN adopts sign-space aggregation to robustly measure update consistency and limit malicious influence under noise and anonymization. Specifically, we design two novel secret-shared protocols for shuffling and aggregation that operate directly on additive shares and preserve Shuffle-DP's tight privacy guarantee. In each round, the aggregated result is verified to ensure correct aggregation and detect any selective dropping, achieving malicious security with minimal overhead. Extensive experiments across comprehensive benchmarks show that RAIN maintains strong privacy guarantees under Shuffle-DP and remains robust to poisoning attacks with negligible degradation in accuracy and convergence. It further provides real-time integrity verification with complete tampering detection, while achieving up to 90x lower communication cost and 10x faster aggregation compared with prior work.
Authors:Peter Horvath, Ilia Shumailov, Lukasz Chmielewski, Lejla Batina, Yuval Yarom
Abstract:
The multi-million dollar investment required for modern machine learning (ML) has made large ML models a prime target for theft. In response, the field of model stealing has emerged. Attacks based on physical side-channel information have shown that DNN model extraction is feasible, even on CUDA Cores in a GPU. For the first time, our work demonstrates parameter extraction on the specialized GPU's Tensor Core units, most commonly used GPU units nowadays due to their superior performance, via near-field physical side-channel attacks. Previous work targeted only the general-purpose CUDA Cores in the GPU, the functional units that have been part of the GPU since its inception. Our method is tailored to the GPU architecture to accurately estimate energy consumption and derive efficient attacks via Correlation Power Analysis (CPA). Furthermore, we provide an exploratory analysis of hyperparameter and weight leakage from LLMs in far field and demonstrate that the GPU's electromagnetic radiation leaks even 100\,cm away through a glass obstacle.
Authors:Jiayao Wang, Mohammad Maruf Hasan, Yiping Zhang, Xiaoying Lei, Jiale Zhang, Qilin Wu, Junwu Zhu, Dongfang Zhao
Abstract:
Self-Supervised Learning (SSL) has emerged as a significant paradigm in representation learning thanks to its ability to learn without extensive labeled data, its strong generalization capabilities, and its potential for privacy preservation. However, recent research reveals that SSL models are also vulnerable to backdoor attacks. Existing backdoor attack methods in the SSL context commonly suffer from issues such as high detectability of triggers, feature entanglement, and pronounced out-of-distribution properties in poisoned samples, all of which compromises attack effectiveness and stealthiness. To that, we propose a Dynamic Stealthy Backdoor Attack (DSBA) backed by a new technique we term Collaborative Optimization. This method decouples the attack process into two collaborative optimization layers: the outer-layer optimization trains a backdoor encoder responsible for global feature space remodeling, aiming to achieve precise backdoor implantation while preserving core functionality; meanwhile, the inner-layer optimization employs a dynamically optimized generator to adaptively produce optimally concealed triggers for individual samples, achieving coordinated concealment across feature space and visual space. We also introduce multiple loss functions to dynamically balance attack performance and stealthiness, in which we employ an adaptive weight scheduling mechanism to enhance training stability. Extensive experiments on various mainstream SSL algorithms and five public datasets demonstrate that: (i) DSBA significantly enhances Attack Success Rate (ASR) and stealthiness while maintaining downstream task accuracy; and (ii) DSBA exhibits superior robustness against existing mainstream defense methods.
Authors:Lukas Böhm, Arjhun Swaminathan, Anika Hannemann, Erik Buchmann
Abstract:
Quantum Federated Learning (QFL) enables distributed training of Quantum Machine Learning (QML) models by sharing model gradients instead of raw data. However, these gradients can still expose sensitive user information. To enhance privacy, homomorphic encryption of parameters has been proposed as a solution in QFL and related frameworks. In this work, we evaluate the overhead introduced by Fully Homomorphic Encryption (FHE) in QFL setups and assess its feasibility for real-world applications. We implemented various QML models including a Quantum Convolutional Neural Network (QCNN) trained in a federated environment with parameters encrypted using the CKKS scheme. This work marks the first QCNN trained in a federated setting with CKKS-encrypted parameters. Models of varying architectures were trained to predict brain tumors from MRI scans. The experiments reveal that memory and communication overhead remain substantial, making FHE challenging to deploy. Minimizing overhead requires reducing the number of model parameters, which, however, leads to a decline in classification performance, introducing a trade-off between privacy and model complexity.
Authors:David Campbell, Neil Kale, Udari Madhushani Sehwag, Bert Herring, Nick Price, Dan Borges, Alex Levinson, Christina Q Knight
Abstract:
Safety alignment in large language models (LLMs), particularly for cybersecurity tasks, primarily focuses on preventing misuse. While this approach reduces direct harm, it obscures a complementary failure mode: denial of assistance to legitimate defenders. We study Defensive Refusal Bias -- the tendency of safety-tuned frontier LLMs to refuse assistance for authorized defensive cybersecurity tasks when those tasks include similar language to an offensive cyber task. Based on 2,390 real-world examples from the National Collegiate Cyber Defense Competition (NCCDC), we find that LLMs refuse defensive requests containing security-sensitive keywords at $2.72\times$ the rate of semantically equivalent neutral requests ($p < 0.001$). The highest refusal rates occur in the most operationally critical tasks: system hardening (43.8%) and malware analysis (34.3%). Interestingly, explicit authorization, where the user directly instructs the model that they have authority to complete the target task, increases refusal rates, suggesting models interpret justifications as adversarial rather than exculpatory. These findings are urgent for interactive use and critical for autonomous defensive agents, which cannot rephrase refused queries or retry. Our findings suggest that current LLM cybersecurity alignment relies on semantic similarity to harmful content rather than reasoning about intent or authorization. We call for mitigations that analyze intent to maximize defensive capabilities while still preventing harmful compliance.
Authors:Ishraq Tashdid, Kimia Tasnia, Alexander Garcia, Jonathan Valamehr, Sazadur Rahman
Abstract:
This work presents ATLAS, an LLM-driven framework that bridges standardized threat modeling and property-based formal verification for System-on-Chip (SoC) security. Starting from vulnerability knowledge bases such as Common Weakness Enumeration (CWE), ATLAS identifies SoC-specific assets, maps relevant weaknesses, and generates assertion-based security properties and JasperGold scripts for verification. By combining asset-centric analysis with standardized threat model templates and multi-source SoC context, ATLAS automates the transformation from vulnerability reasoning to formal proof. Evaluated on three HACK@DAC benchmarks, ATLAS detected 39/48 CWEs and generated correct properties for 33 of those bugs, advancing automated, knowledge-driven SoC security verification toward a secure-by-design paradigm.
Authors:Jiayao Wang, Yiping Zhang, Mohammad Maruf Hasan, Xiaoying Lei, Jiale Zhang, Junwu Zhu, Qilin Wu, Dongfang Zhao
Abstract:
Self-supervised diffusion models learn high-quality visual representations via latent space denoising. However, their representation layer poses a distinct threat: unlike traditional attacks targeting generative outputs, its unconstrained latent semantic space allows for stealthy backdoors, permitting malicious control upon triggering. In this paper, we propose BadRSSD, the first backdoor attack targeting the representation layer of self-supervised diffusion models. Specifically, it hijacks the semantic representations of poisoned samples with triggers in Principal Component Analysis (PCA) space toward those of a target image, then controls the denoising trajectory during diffusion by applying coordinated constraints across latent, pixel, and feature distribution spaces to steer the model toward generating the specified target. Additionally, we integrate representation dispersion regularization into the constraint framework to maintain feature space uniformity, significantly enhancing attack stealth. This approach preserves normal model functionality (high utility) while achieving precise target generation upon trigger activation (high specificity). Experiments on multiple benchmark datasets demonstrate that BadRSSD substantially outperforms existing attacks in both FID and MSE metrics, reliably establishing backdoors across different architectures and configurations, and effectively resisting state-of-the-art backdoor defenses.
Authors:Joonsung Jeon, Woo Jae Kim, Suhyeon Ha, Sooel Son, Sung-Eui Yoon
Abstract:
Latent diffusion models have achieved remarkable success in high-fidelity text-to-image generation, but their tendency to memorize training data raises critical privacy and intellectual property concerns. Membership inference attacks (MIAs) provide a principled way to audit such memorization by determining whether a given sample was included in training. However, existing approaches assume access to ground-truth captions. This assumption fails in realistic scenarios where only images are available and their textual annotations remain undisclosed, rendering prior methods ineffective when substituted with vision-language model (VLM) captions. In this work, we propose MoFit, a caption-free MIA framework that constructs synthetic conditioning inputs that are explicitly overfitted to the target model's generative manifold. Given a query image, MoFit proceeds in two stages: (i) model-fitted surrogate optimization, where a perturbation applied to the image is optimized to construct a surrogate in regions of the model's unconditional prior learned from member samples, and (ii) surrogate-driven embedding extraction, where a model-fitted embedding is derived from the surrogate and then used as a mismatched condition for the query image. This embedding amplifies conditional loss responses for member samples while leaving hold-outs relatively less affected, thereby enhancing separability in the absence of ground-truth captions. Our comprehensive experiments across multiple datasets and diffusion models demonstrate that MoFit consistently outperforms prior VLM-conditioned baselines and achieves performance competitive with caption-dependent methods.
Authors:Yu Wang, Yang Xiang, Chandra Thapa, Hajime Suzuki
Abstract:
Greybox protocol fuzzing is a random testing approach for stateful protocol implementations, where the input is protocol messages generated from mutations of seeds, and the search in the input space is driven by the feedback on coverage of both code and state. State model and message model are the core components of communication protocols, which also have significant impacts on protocol fuzzing. In this work, we propose APFuzz (Automatic greybox Protocol Fuzzer) with novel designs to increase the smartness of greybox protocol fuzzers from the perspectives of both the state model and the message model. On the one hand, APFuzz employs a two-stage process of static and dynamic analysis to automatically identify state variables, which are then used to infer an accurate state model during fuzzing. On the other hand, APFuzz introduces field-level mutation operations for binary protocols, leveraging message structure awareness enabled by Large Language Models. We conduct extensive experiments on a public protocol fuzzing benchmark, comparing APFuzz with the baseline fuzzer AFLNET as well as several state-of-the-art greybox protocol fuzzers.
Authors:Yu Wang, Yang Xiang, Chandra Thapa, Hajime Suzuki
Abstract:
As mobile networks transition to 5G infrastructure, ensuring robust security becomes more important due to the complex architecture and expanded attack surface. Traditional security testing approaches for 5G networks rely on black-box fuzzing techniques, which are limited by their inability to observe internal program state and coverage information. This paper presents MulCovFuzz, a novel coverage-guided greybox fuzzing tool for 5G network testing. Unlike existing tools that depend solely on system response, MulCovFuzz implements a multi-component coverage collection mechanism that dynamically monitors code coverage across different components of the 5G system architecture. Our approach introduces a novel testing paradigm that includes a scoring function combining coverage rewards with efficiency metrics to guide test case generation. We evaluate MulCovFuzz on open-source 5G implementation OpenAirInterface. Our experimental results demonstrate that MulCovFuzz significantly outperforms traditional fuzzing approaches, achieving a 5.85\% increase in branch coverage, 7.17\% increase in line coverage, and 16\% improvement in unique crash discovery during 24h fuzzing testing. MulCovFuzz uncovered three zero-day vulnerabilities, two of which were not identified by any other fuzzing technique. This work contributes to the advancement of security testing tools for next-generation mobile networks.
Authors:Tayeb Kenaza, Islam Debicha, Youcef Fares, Mehdi Sehaki, Sami Messai
Abstract:
Electronic Health Records (EHRs) store sensitive patient information, necessitating stringent access control and sharing mechanisms to uphold data security and comply with privacy regulations such as the General Data Protection Regulation (GDPR). In this paper, we propose a comprehensive architecture with a suite of efficient protocols that leverage the synergistic capabilities of the Blockchain and Interplanetary File System (IPFS) technologies to enable secure access control and sharing of EHRs. Our approach is based on a private blockchain, wherein smart contracts are deployed to enforce control exclusively by patients. By granting patients exclusive control over their EHRs, our solution ensures compliance with personal data protection laws and empowers individuals to manage their health information autonomously. Notably, our proposed architecture seamlessly integrates with existing health provider information systems, facilitating interoperability and effectively addressing security and data heterogeneity challenges. To demonstrate the effectiveness of our approach, we developed a prototype based on a private implementation of the Hyperledger platform, enabling the simulation of diverse scenarios involving access control and health data sharing among healthcare practitioners. Our experimental results demonstrate the scalability of our solution, thereby substantiating its efficacy and robustness in real-world healthcare settings.
Authors:Amisha Srivastava, Muskan Porwal, Kanad Basu
Abstract:
Cryptographic computations are fundamental to modern computing, ensuring data confidentiality and integrity. However, these operations are highly vulnerable to power side-channel attacks that exploit variations in power consumption to leak sensitive information. Masking is a widely used countermeasure, yet software-based techniques often introduce significant performance overhead and implementation complexity, while fixed-function hardware masking lacks flexibility across diverse cryptographic algorithms. In this paper, we present CryptRISC, the first RISC-V-based processor that combines cryptographic acceleration with hardware-level power side-channel resistance through an ISA-driven operand masking framework. Our design extends the CVA6 core with 64-bit RISC-V Scalar Cryptography Extensions and introduces two microarchitectural components: a Field Detection Layer, which identifies the dominant algebraic field of each cryptographic instruction, and a Masking Control Unit, which applies field-aware operand randomization at runtime. This enables dynamic selection of Boolean, affine, or arithmetic masking schemes based on instruction semantics, providing optimized protection across algorithms including AES, SHA-256, SHA-512, SM3, and SM4. Unlike prior approaches relying on static masking logic or software instrumentation, our method performs operand masking transparently within the execution pipeline without modifying instruction encoding. Experimental results show speedups up to 6.80$\times$ over baseline software implementations, with only a 1.86% hardware overhead relative to the baseline CVA6 core, confirming the efficiency and practicality of CryptRISC.
Authors:Vijay Prakash, Majed Almansoori, Donghan Hu, Rahul Chatterjee, Danny Yuxing Huang
Abstract:
Technology-facilitated abuse (TFA) is a pervasive form of intimate partner violence (IPV) that leverages digital tools to control, surveil, or harm survivors. While tech clinics are one of the reliable sources of support for TFA survivors, they face limitations due to staffing constraints and logistical barriers. As a result, many survivors turn to online resources for assistance. With the growing accessibility and popularity of large language models (LLMs), and increasing interest from IPV organizations, survivors may begin to consult LLM-based chatbots before seeking help from tech clinics. In this work, we present the first expert-led manual evaluation of four LLMs - two widely used general-purpose non-reasoning models and two domain-specific models designed for IPV contexts - focused on their effectiveness in responding to TFA-related questions. Using real-world questions collected from literature and online forums, we assess the quality of zero-shot single-turn LLM responses generated with a survivor safety-centered prompt on criteria tailored to the TFA domain. Additionally, we conducted a user study to evaluate the perceived actionability of these responses from the perspective of individuals who have experienced TFA. Our findings, grounded in both expert assessment and user feedback, provide insights into the current capabilities and limitations of LLMs in the TFA context and may inform the design, development, and fine-tuning of future models for this domain. We conclude with concrete recommendations to improve LLM performance for survivor support.
Authors:Haoyu Li, Xijia Che, Yanhao Wang, Xiaojing Liao, Luyi Xing
Abstract:
Proof-of-Vulnerability (PoV) generation is a critical task in software security, serving as a cornerstone for vulnerability validation, false positive reduction, and patch verification. While directed fuzzing effectively drives path exploration, satisfying complex semantic constraints remains a persistent bottleneck in automated exploit generation. Large Language Models (LLMs) offer a promising alternative with their semantic reasoning capabilities; however, existing LLM-based approaches lack sufficient grounding in concrete execution behavior, limiting their ability to generate precise PoVs. In this paper, we present DrillAgent, an agentic framework that reformulates PoV generation as an iterative hypothesis-verification-refinement process. To bridge the gap between static reasoning and dynamic execution, DrillAgent synergizes LLM-based semantic inference with feedback from concrete program states. The agent analyzes the target code to hypothesize inputs, observes execution behavior, and employs a novel mechanism to translate low-level execution traces into source-level constraints. This closed-loop design enables the agent to incrementally align its input generation with the precise requirements of the vulnerability. We evaluate DrillAgent on SEC-bench, a large-scale benchmark of real-world C/C++ vulnerabilities. Experimental results show that DrillAgent substantially outperforms state-of-the-art LLM agent baselines under fixed budget constraints, solving up to 52.8% more CVE tasks than the best-performing baseline. These results highlight the necessity of execution-state-aware reasoning for reliable PoV generation in complex software systems.
Authors:Sotiris Michaelides, Jakub Lapawa, Daniel Eguiguren Chavez, Martin Henze
Abstract:
5G promises enhanced performance-not only in bandwidth and capacity, but also latency and security. Its ultra-reliable low-latency configuration targets round-trip times below 1 ms, while optional security controls extend protection across all interfaces, making 5G attractive for mission-critical applications. A key enabler of low latency is the disaggregation of network components, including the RAN, allowing user-plane functions to be deployed nearer to end users. However, this split introduces additional interfaces, whose protection increases latency overhead. In this paper, guided by discussions with a network operator and a 5G manufacturer, we evaluate the latency overhead of enabling optional 5G security controls across internal RAN interfaces and the 5G user plane. To this end, we deploy the first testbed implementing a disaggregated RAN with standardized optional security mechanisms. Our results show that disaggregated RAN deployments retain a latency advantage over monolithic designs, even with security enabled. However, achieving sub-1 ms round-trip times remains challenging, as cryptographic overhead alone can already exceed this target.
Authors:Animesh Singh, K Shiv Kumar, S. VenkataKeerthy, Pragna Mamidipaka, R V B R N Aaseesh, Sayandeep Sen, Palanivel Kodeswaran, Theophilus A. Benson, Ramakrishna Upadrasta, Praveen Tammana
Abstract:
Many cloud infrastructure organizations increasingly rely on third-party eBPF-based network functions for use cases like security, observability, and load balancing, so that not everyone requires a team of highly skilled eBPF experts. However, the network functions from third parties (e.g., F5, Palo Alto) are available in bytecode format to cloud operators, giving little or no understanding of their functional correctness and interaction with other network functions in a chain. Also, eBPF developers want to provide proof of functional correctness for their developed network functions without disclosing the source code to the operators. We design Yaksha-Prashna, a system that allows operators/developers to assert and query bytecode's conformance to its specification and dependencies on other bytecodes. Our work builds domain-specific models that enable us to employ scalable program analysis to extract and model eBPF programs. Using Yaksha-Prashna language, we express 24 properties on standard and non-standard eBPF-based network functions with 200-1000x speedup over the state-of-the-art work.
Authors:Zijing Xu, Ziwei Ning, Tiancheng Hu, Jianwei Zhuge, Yangyang Wang, Jiahao Cao, Mingwei Xu
Abstract:
The rapid evolution of cyber threats has highlighted significant gaps in security knowledge integration. Cybersecurity Knowledge Graphs (CKGs) relying on structured data inherently exhibit hysteresis, as the timely incorporation of rapidly evolving unstructured data remains limited, potentially leading to the omission of critical insights for risk analysis. To address these limitations, we introduce TRACE, a framework designed to integrate structured and unstructured cybersecurity data sources. TRACE integrates knowledge from 24 structured databases and 3 categories of unstructured data, including APT reports, papers, and repair notices. Leveraging Large Language Models (LLMs), TRACE facilitates efficient entity extraction and alignment, enabling continuous updates to the CKG. Evaluations demonstrate that TRACE achieves a 1.8x increase in node coverage compared to existing CKGs. TRACE attains the precision of 86.08%, the recall of 76.92%, and the F1 score of 81.24% in entity extraction, surpassing the best-known LLM-based baselines by 7.8%. Furthermore, our entity alignment methods effectively harmonize entities with existing knowledge structures, enhancing the integrity and utility of the CKG. With TRACE, threat hunters and attack analysts gain real-time, holistic insights into vulnerabilities, attack methods, and defense technologies.
Authors:Olivia Figueira, Pranathi Chamarthi, Tu Le, Athina Markopoulou
Abstract:
AI chatbots are widely used by children and teens today, but they pose significant risks to youth's privacy and safety due to both increasingly personal conversations and potential exposure to unsafe content. While children under 13 are protected by the Children's Online Privacy Protection Act (COPPA), chatbot providers' own privacy policies may also provide protections, since they typically prohibit children from accessing their platforms. Age gating is often employed to restrict children online, but chatbot age gating in particular has not been studied. In this paper, we investigate whether popular consumer chatbots are (i) able to estimate users' ages based solely on their conversations, and (ii) whether they take action upon identifying children. To that end, we develop an auditing framework in which we programmatically interact with chatbots and conduct 1050 experiments using our comprehensive library of age-indicative prompts, including implicit and explicit age disclosures, to analyze the chatbots' responses and actions. We find that while chatbots are capable of estimating age, they do not take any action when children are identified, contradicting their own policies. Our methodology and findings provide insights for platform design, demonstrated by our proof-of-concept chatbot age gating implementation, and regulation to protect children online.
Authors:Ali Nour Eldin, Mohamed Sellami, Walid Gaaloul
Abstract:
Third-Party Risk Assessment (TPRA) is a core cybersecurity practice for evaluating suppliers against standards such as ISO/IEC 27001 and NIST. TPRA questionnaires are typically drawn from large repositories of security and compliance questions, yet tailoring assessments to organizational needs remains a largely manual process. Existing retrieval approaches rely on keyword or surface-level similarity, which often fails to capture implicit assessment scope and control semantics. This paper explores strategies for organizing and retrieving TPRA cybersecurity questions using semantic labels that describe both control domains and assessment scope. We compare direct question-level labeling with a Large Language Model (LLM) against a hybrid semi-supervised semantic labeling (SSSL) pipeline that clusters questions in embedding space, labels a small representative subset using an LLM, and propagates labels to remaining questions using k-Nearest Neighbors; we also compare downstream retrieval based on direct question similarity versus retrieval in the label space. We find that semantic labels can improve retrieval alignment when labels are discriminative and consistent, and that SSSL can generalize labels from a small labeled subset to large repositories while substantially reducing LLM usage and cost.
Authors:Prathamesh Dharangutte, Jingcheng Liu, Pasin Manurangsi, Akbar Rafiey, Phanu Vajanopath, Zongrui Zou
Abstract:
We study approximation algorithms for Maximum Constraint Satisfaction Problems (Max-CSPs) under differential privacy (DP) where the constraints are considered sensitive data. Information-theoretically, we aim to classify the best approximation ratios possible for a given privacy budget $\varepsilon$. In the high-privacy regime ($\varepsilon \ll 1$), we show that any $\varepsilon$-DP algorithm cannot beat a random assignment by more than $O(\varepsilon)$ in the approximation ratio. We devise a polynomial-time algorithm which matches this barrier under the assumptions that the instances are bounded-degree and triangle-free. Finally, we show that one or both of these assumptions can be removed for specific CSPs--such as Max-Cut or Max $k$-XOR--albeit at the cost of computational efficiency.
Authors:Kotekar Annapoorna Prabhu, Andrew Gan, Zahra Ghodsi
Abstract:
Machine learning relies on randomness as a fundamental component in various steps such as data sampling, data augmentation, weight initialization, and optimization. Most machine learning frameworks use pseudorandom number generators as the source of randomness. However, variations in design choices and implementations across different frameworks, software dependencies, and hardware backends along with the lack of statistical validation can lead to previously unexplored attack vectors on machine learning systems. Such attacks on randomness sources can be extremely covert, and have a history of exploitation in real-world systems. In this work, we examine the role of randomness in the machine learning development pipeline from an adversarial point of view, and analyze the implementations of PRNGs in major machine learning frameworks. We present RNGGuard to help machine learning engineers secure their systems with low effort. RNGGuard statically analyzes a target library's source code and identifies instances of random functions and modules that use them. At runtime, RNGGuard enforces secure execution of random functions by replacing insecure function calls with RNGGuard's implementations that meet security specifications. Our evaluations show that RNGGuard presents a practical approach to close existing gaps in securing randomness sources in machine learning systems.
Authors:Ying Song, Balaji Palanisamy
Abstract:
Graph-structured data underpin a wide spectrum of modern applications. However, complex graph topologies and homophilic patterns can facilitate attribute inference attacks (AIAs) by enabling sensitive information leakage to propagate across local neighborhoods. Existing AIAs predominantly assume that adversaries can probe sensitive attributes through repeated model queries. Such assumptions are often impractical in real-world settings due to stringent data protection regulations, prohibitive query budgets, and heightened detection risks, especially when inferring multiple sensitive attributes. More critically, this model-centric perspective obscures a pervasive blind spot: \textbf{intrinsic multiple sensitive information leakage arising solely from publicly released graphs.} To exploit this unexplored vulnerability, we introduce a new attack paradigm and propose \textbf{Taipan, the first query-free transfer-based attack framework for multiple sensitive attribute inference attacks on graphs (G-MSAIAs).} Taipan integrates \emph{Hierarchical Attack Knowledge Routing} to capture intricate inter-attribute correlations, and \emph{Prompt-guided Attack Prototype Refinement} to mitigate negative transfer and performance degradation. We further present a systematic evaluation framework tailored to G-MSAIAs. Extensive experiments on diverse real-world graph datasets demonstrate that Taipan consistently achieves strong attack performance across same-distribution settings and heterogeneous similar- and out-of-distribution settings with mismatched feature dimensionalities, and remains effective even under rigorous differential privacy guarantees. Our findings underscore the urgent need for more robust multi-attribute privacy-preserving graph publishing methods and data-sharing practices.
Authors:Rezvi Shahariar, Chris Phillips
Abstract:
This paper presents a survey of state-of-the-art trust models for Vehicular Ad Hoc Networks (VANETs). Trust management plays an essential role in isolating malicious insider attacks in VANETs which traditional security approaches fail to thwart. To this end, many trust models are presented; some of them only address trust management, while others address security and privacy aspects besides trust management. This paper first reviews, classifies, and summarizes state-of-the-art trust models, and then compares their achievements. From this literature survey, our reader will easily identify two broad classes of trust models that exist in literature, differing primarily in their evaluation point. For example, most trust models follow receiver-side trust evaluation and to the best of our knowledge, there is only one trust model for VANETs which evaluates trust at the sender-side unless a dispute arises. In the presence of a dispute, a Roadside Unit (RSU) rules on the validity of an event. In receiver-side trust models, each receiver becomes busy while computing the trust of a sender and its messages upon the messages' arrival. Conversely, in the sender-side class, receivers are free from any kind of computation as the trust is verified at the time the message is announced. Also, vehicles can quickly act on the information, such as taking a detour to an alternate route, as it supports fast decision-making. We provide a comparison between these two evaluation techniques using a sequence diagram. We then conclude the survey by suggesting future work for sender-side evaluation of trust in VANETs. Additionally, the challenges (real-time constraints and efficiency) are emphasized whilst considering the deployment of a trust model in VANETs
Authors:Haipeng Li, Rongxuan Peng, Anwei Luo, Shunquan Tan, Changsheng Chen, Anastasia Antsiferova
Abstract:
The rapid advancement of AI-Generated Content (AIGC) technologies poses significant challenges for authenticity assessment. However, existing evaluation protocols largely overlook anti-forensics attack, failing to ensure the comprehensive robustness of state-of-the-art AIGC detectors in real-world applications. To bridge this gap, we propose ForgeryEraser, a framework designed to execute universal anti-forensics attack without access to the target AIGC detectors. We reveal an adversarial vulnerability stemming from the systemic reliance on Vision-Language Models (VLMs) as shared backbones (e.g., CLIP), where downstream AIGC detectors inherit the feature space of these publicly accessible models. Instead of traditional logit-based optimization, we design a multi-modal guidance loss to drive forged image embeddings within the VLM feature space toward text-derived authentic anchors to erase forgery traces, while repelling them from forgery anchors. Extensive experiments demonstrate that ForgeryEraser causes substantial performance degradation to advanced AIGC detectors on both global synthesis and local editing benchmarks. Moreover, ForgeryEraser induces explainable forensic models to generate explanations consistent with authentic images for forged images. Our code will be made publicly available.
Authors:Jonathan Feldman, Tal Feldman, Annie I Anton
Abstract:
Biological AI tools for protein design and structure prediction are advancing rapidly, creating dual-use risks that existing safeguards cannot adequately address. Current model-level restrictions, including keyword filtering, output screening, and content-based access denials, are fundamentally ill-suited to biology, where reliable function prediction remains beyond reach and novel threats evade detection by design. We propose a three-tier Know Your Customer (KYC) framework, inspired by anti-money laundering (AML) practices in the financial sector, that shifts governance from content inspection to user verification and monitoring. Tier I leverages research institutions as trust anchors to vouch for affiliated researchers and assume responsibility for vetting. Tier II applies output screening through sequence homology searches and functional annotation. Tier III monitors behavioral patterns to detect anomalies inconsistent with declared research purposes. This layered approach preserves access for legitimate researchers while raising the cost of misuse through institutional accountability and traceability. The framework can be implemented immediately using existing institutional infrastructure, requiring no new legislation or regulatory mandates.
Authors:Jiayao Wang, Yiping Zhang, Jiale Zhang, Wenliang Yuan, Qilin Wu, Junwu Zhu, Dongfang Zhao
Abstract:
Federated Self-Supervised Learning (FSSL) integrates the privacy advantages of distributed training with the capability of self-supervised learning to leverage unlabeled data, showing strong potential across applications. However, recent studies have shown that FSSL is also vulnerable to backdoor attacks. Existing attacks are limited by their trigger design, which typically employs a global, uniform trigger that is easily detected, gets diluted during aggregation, and lacks robustness in heterogeneous client environments. To address these challenges, we propose the Attention-Driven multi-party Collusion Attack (ADCA). During local pre-training, malicious clients decompose the global trigger to find optimal local patterns. Subsequently, these malicious clients collude to form a malicious coalition and establish a collaborative optimization mechanism within it. In this mechanism, each submits its model updates, and an attention mechanism dynamically aggregates them to explore the best cooperative strategy. The resulting aggregated parameters serve as the initial state for the next round of training within the coalition, thereby effectively mitigating the dilution of backdoor information by benign updates. Experiments on multiple FSSL scenarios and four datasets show that ADCA significantly outperforms existing methods in Attack Success Rate (ASR) and persistence, proving its effectiveness and robustness.
Authors:Gautam Savaliya, Robert Aufschläger, Abhishek Subedi, Michael Heigl, Martin Schramm
Abstract:
Artificial intelligence systems introduce complex privacy risks throughout their lifecycle, especially when processing sensitive or high-dimensional data. Beyond the seven traditional privacy threat categories defined by the LINDDUN framework, AI systems are also exposed to model-centric privacy attacks such as membership inference and model inversion, which LINDDUN does not cover. To address both classical LINDDUN threats and additional AI-driven privacy attacks, PriMod4AI introduces a hybrid privacy threat modeling approach that unifies two structured knowledge sources, a LINDDUN knowledge base representing the established taxonomy, and a model-centric privacy attack knowledge base capturing threats outside LINDDUN. These knowledge bases are embedded into a vector database for semantic retrieval and combined with system level metadata derived from Data Flow Diagram. PriMod4AI uses retrieval-augmented and Data Flow specific prompt generation to guide large language models (LLMs) in identifying, explaining, and categorizing privacy threats across lifecycle stages. The framework produces justified and taxonomy-grounded threat assessments that integrate both classical and AI-driven perspectives. Evaluation on two AI systems indicates that PriMod4AI provides broad coverage of classical privacy categories while additionally identifying model-centric privacy threats. The framework produces consistent, knowledge-grounded outputs across LLMs, as reflected in agreement scores in the observed range.
Authors:Sahan Sanjaya, Prabhat Mishra
Abstract:
Security of Elliptic Curve Digital Signature Algorithm (ECDSA) depends on the secrecy of the per-signature nonce. Even partial nonce leakage can expose the long-term private key through lattice-based cryptanalysis. In this paper, we introduce a previously unexplored power side-channel vulnerability that exploits sleep-induced power spikes to extract ECDSA nonces. Unlike conventional power-based side-channel attacks, this vulnerability leverages power fluctuations generated during processor context switches invoked by sleep functions. These fluctuations correlate with nonce-dependent operations in scalar multiplication, enabling nonce recovery even under constant-time and masked implementations. We evaluate the attack across multiple cryptographic libraries, RustCrypto, BearSSL, and GoCrypto, and processor architectures, including ARM and RISC-V. Our experiments show that subtle variations in the power envelope during sleep-induced context switches provide sufficient leakage for practical ECDSA nonce extraction, recovering 20 bits of the nonce. These results establish sleep-induced power spikes as a practical cross-platform side-channel threat and highlight the need to reconsider design choices in cryptographic systems.
Authors:Genqiang Wu, Xiaoying Zhang, Yu Qi, Hao Wang, Jikui Wang, Yeping He
Abstract:
The exponential growth of data collection necessitates robust privacy protections that preserve data utility. We address information disclosure against adversaries with bounded prior knowledge, modeled by an entropy constraint $H(X) \geq b$. Within this information privacy framework -- which replaces differential privacy's independence assumption with a bounded-knowledge model -- we study three core problems: maximal per-record leakage, the primal leakage-distortion tradeoff (minimizing worst-case leakage under distortion $D$), and the dual distortion minimization (minimizing distortion under leakage constraint $L$). These problems resemble classical information-theoretic ones (channel capacity, rate-distortion) but are more complex due to high dimensionality and the entropy constraint. We develop efficient alternating optimization algorithms that exploit convexity-concavity duality, with theoretical guarantees including local convergence for the primal problem and convergence to a stationary point for the dual. Experiments on binary symmetric channels and modular sum queries validate the algorithms, showing improved privacy-utility tradeoffs over classical differential privacy mechanisms. This work provides a computational framework for auditing privacy risks and designing certified mechanisms under realistic adversary assumptions.
Authors:Jonathan Baumann, Yonghyun Kim, Yan Farba, Catalin Hritcu, Julay Leatherman-Brooks
Abstract:
This paper introduces SpecIBT, a formally verified defense against Spectre BTB, RSB, and PHT that combines CET-style hardware-assisted control-flow integrity with compiler-inserted speculative load hardening (SLH). SpecIBT is based on the novel observation that in the presence of CET-style protection, we can precisely detect BTB misspeculation for indirect calls and set the SLH misspeculation flag. We formalize SpecIBT as a transformation in Rocq and provide a machine-checked proof that it achieves relative security: any transformed program running with speculation leaks no more than what the source program leaks without speculation. This strong security guarantee applies to arbitrary programs, even those not following the cryptographic constant-time programming discipline.
Authors:Farnaz Soltaniani, Mohammad Ghafari
Abstract:
Machine learning models are increasingly used for software security tasks. These models are commonly trained and evaluated on large Internet-derived datasets, which often contain duplicated or highly similar samples. When such samples are split across training and test sets, data leakage may occur, allowing models to memorize patterns instead of learning to generalize. We investigate duplication in a widely used benchmark dataset of hard coded secrets and show how data leakage can substantially inflate the reported performance of AI-based secret detectors, resulting in a misleading picture of their real-world effectiveness.
Authors:Farnaz Soltaniani, Shoaib Razzaq, Mohammad Ghafari
Abstract:
Early detection of security bug reports (SBRs) is critical for timely vulnerability mitigation. We present an evaluation of prompt-based engineering and fine-tuning approaches for predicting SBRs using Large Language Models (LLMs). Our findings reveal a distinct trade-off between the two approaches. Prompted proprietary models demonstrate the highest sensitivity to SBRs, achieving a G-measure of 77% and a recall of 74% on average across all the datasets, albeit at the cost of a higher false-positive rate, resulting in an average precision of only 22%. Fine-tuned models, by contrast, exhibit the opposite behavior, attaining a lower overall G-measure of 51% but substantially higher precision of 75% at the cost of reduced recall of 36%. Though a one-time investment in building fine-tuned models is necessary, the inference on the largest dataset is up to 50 times faster than that of proprietary models. These findings suggest that further investigations to harness the power of LLMs for SBR prediction are necessary.
Authors:Sofia Giampietro, Ralf Sasse, David Basin
Abstract:
Diffie-Hellman groups are commonly used in cryptographic protocols. While most state-of-the-art, symbolic protocol verifiers support them to some degree, they do not support all mathematical operations possible in these groups. In particular, they lack support for exponent addition, as these tools reason about terms using unification, which is undecidable in the theory describing all Diffie-Hellman operators. In this paper we approximate such a theory and propose a semi-decision procedure to determine whether a protocol, which may use all operations in such groups, satisfies user-defined properties. We implement this approach by extending the Tamarin prover to support the full Diffie-Hellman theory, including group element multiplication and hence addition of exponents. This is the first time a state-of-the-art tool can model and reason about such protocols. We illustrate our approach's effectiveness with different case studies: ElGamal encryption and MQV. Using Tamarin, we prove security properties of ElGamal, and we rediscover known attacks on MQV.
Authors:Zirui Zhu, Xiangyang Li
Abstract:
Continual intrusion detection must absorb newly emerging attack stages while retaining legacy detection capability under strict operational constraints, including bounded compute and qubit budgets and privacy rules that preclude long-term storage of raw telemetry. We propose QCL-IDS, a quantum-centric continual-learning framework that co-designs stability and privacy-governed rehearsal for NISQ-era pipelines. Its core component, Q-FISH (Quantum Fisher Anchors), enforces retention using a compact anchor coreset through (i) sensitivity-weighted parameter constraints and (ii) a fidelity-based functional anchoring term that directly limits decision drift on representative historical traffic. To regain plasticity without retaining sensitive flows, QCL-IDS further introduces privacy-preserved quantum generative replay (QGR) via frozen, task-conditioned generator snapshots that synthesize bounded rehearsal samples. Across a three-stage attack stream on UNSW-NB15 and CICIDS2017, QCL-IDS consistently attains the best retention-adaptation trade-off: the gradient-anchor configuration achieves mean Attack-F1 = 0.941 with forgetting = 0.005 on UNSW-NB15 and mean Attack-F1 = 0.944 with forgetting = 0.004 on CICIDS2017, versus 0.800/0.138 and 0.803/0.128 for sequential fine-tuning, respectively.
Authors:Md Tasnim Jawad, Mingyan Xiao, Yanzhao Wu
Abstract:
With the widespread adoption of Large Language Models (LLMs) and increasingly stringent privacy regulations, protecting data privacy in LLMs has become essential, especially for privacy-sensitive applications. Membership Inference Attacks (MIAs) attempt to determine whether a specific data sample was included in the model training/fine-tuning dataset, posing serious privacy risks. However, most existing MIA techniques against LLMs rely on sequence-level aggregated prediction statistics, which fail to distinguish prediction improvements caused by generalization from those caused by memorization, leading to low attack effectiveness. To address this limitation, we propose a novel membership inference approach that captures the token-level probabilities for low-confidence (hard) tokens, where membership signals are more pronounced. By comparing token-level probability improvements at hard tokens between a fine-tuned target model and a pre-trained reference model, HT-MIA isolates strong and robust membership signals that are obscured by prior MIA approaches. Extensive experiments on both domain-specific medical datasets and general-purpose benchmarks demonstrate that HT-MIA consistently outperforms seven state-of-the-art MIA baselines. We further investigate differentially private training as an effective defense mechanism against MIAs in LLMs. Overall, our HT-MIA framework establishes hard-token based analysis as a state-of-the-art foundation for advancing membership inference attacks and defenses for LLMs.
Authors:Nacereddine Sitouah, Marco Esposito, Francesco Bruschi
Abstract:
European digital identity initiatives are grounded in regulatory frameworks designed to ensure interoperability and robust, harmonized security standards. The evolution of these frameworks culminates in eIDAS 2.0, whose origins trace back to the Electronic Signatures Directive 1999/93/EC, the first EU-wide legal foundation for the use of electronic signatures in cross-border electronic transactions. As technological capabilities advanced, the initial eIDAS 1.0 framework was increasingly criticized for its limitations and lack of comprehensiveness. Emerging decentralized approaches further exposed these shortcomings and introduced the possibility of integrating innovative identity paradigms, such as Self-Sovereign Identity (SSI) models. In this article, we analyse key provisions of the eIDAS 2.0 Regulation and its accompanying recitals, drawing on existing literature to identify legislative gaps and implementation challenges. Furthermore, we examine the European Digital Identity Architecture and Reference Framework (ARF), assessing its proposed guidelines and evaluating the extent to which its emerging implementations align with SSI principles.
Authors:Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang
Abstract:
Deep Neural Networks remain inherently vulnerable to backdoor attacks. Traditional test-time defenses largely operate under the paradigm of internal diagnosis methods like model repairing or input robustness, yet these approaches are often fragile under advanced attacks as they remain entangled with the victim model's corrupted parameters. We propose a paradigm shift from Internal Diagnosis to External Semantic Auditing, arguing that effective defense requires decoupling safety from the victim model via an independent, semantically grounded auditor. To this end, we present a framework harnessing Universal Vision-Language Models (VLMs) as evolving semantic gatekeepers. We introduce PRISM (Prototype Refinement & Inspection via Statistical Monitoring), which overcomes the domain gap of general VLMs through two key mechanisms: a Hybrid VLM Teacher that dynamically refines visual prototypes online, and an Adaptive Router powered by statistical margin monitoring to calibrate gating thresholds in real-time. Extensive evaluation across 17 datasets and 11 attack types demonstrates that PRISM achieves state-of-the-art performance, suppressing Attack Success Rate to <1% on CIFAR-10 while improving clean accuracy, establishing a new standard for model-agnostic, externalized security.
Authors:Goni Anagha, Vishakha Dasi Agrawal, Gargi Sarkar, Kavita Vemuri, Sandeep Kumar Shukla
Abstract:
Job scams have emerged as a rapidly growing form of cybercrime that manipulates human decision-making processes. Existing countermeasures primarily focus on scam typologies or post-loss indicators, offering limited support for early-stage intervention. In this study, we examine how behavioral decision signals can be operationalized as computational features for identifying vulnerability-associated signals in job fraud. Using anonymous survey data collected from a university population, we analyze two dominant job scam pathways: payment-based scams that require upfront fees and task-based scams that begin with small rewards before escalating to financial demands. Drawing on behavioral economics, we operationalize sunk cost influence, urgency/time-pressure cues, and social proof as measurable behavioral signals, and analyze their association with payment behavior using exact inference under sparsity and uncertainty-aware estimation, with social proof treated as a context-dependent legitimacy cue rather than a standalone predictor. Our results show that urgency/time-pressure cues are significantly associated with payment behavior, consistent with their role as proximal compliance triggers during escalation. In contrast, opportunity-loss/FOMO cues were not reliably identifiable under the current operationalization in our encounter subset, highlighting the importance of measurement fidelity and cue-definition consistency. We further observe that emotional tone in victim narratives and selective non-response to sensitive questions vary systematically with financial loss and reporting behavior, suggesting that missingness may reflect a combination of survey fatigue and selective non-disclosure for sensitive items rather than purely random noise.
Authors:Ali Al-Lawati, Suhang Wang
Abstract:
The growing adoption of multimodal Retrieval-Augmented Generation (mRAG) pipelines for vision-centric tasks (e.g. visual QA) introduces important privacy challenges. In particular, while mRAG provides a practical capability to connect private datasets to improve model performance, it risks the leakage of private information from these datasets during inference. In this paper, we perform an empirical study to analyze the privacy risks inherent in the mRAG pipeline observed through standard model prompting. Specifically, we implement a case study that attempts to infer the inclusion of a visual asset, e.g. image, in the mRAG, and if present leak the metadata, e.g. caption, related to it. Our findings highlight the need for privacy-preserving mechanisms and motivate future research on mRAG privacy.
Authors:Kenan Begovic, Abdulaziz Al-Ali, Qutaibah Malluhi
Abstract:
Ransomware core capability, unauthorized encryption, demands controls that identify and block malicious cryptographic activity without disrupting legitimate use. We present a probabilistic, risk-based access control architecture that couples machine learning inference with mandatory access control to regulate encryption on Linux in real time. The system builds a specialized dataset from the native ftrace framework using the function_graph tracer, yielding high-resolution kernel-function execution traces augmented with resource and I/O counters. These traces support both a supervised classifier and interpretable rules that drive an SELinux policy via lightweight booleans, enabling context-sensitive permit/deny decisions at the moment encryption begins. Compared to approaches centered on sandboxing, hypervisor introspection, or coarse system-call telemetry, the function-level tracing we adopt provides finer behavioral granularity than syscall-only telemetry while avoiding the virtualization/VMI overhead of sandbox-based approaches. Our current user-space prototype has a non-trivial footprint under burst I/O; we quantify it and recognize that a production kernel-space solution should aim to address this. We detail dataset construction, model training and rule extraction, and the run-time integration that gates file writes for suspect encryption while preserving benign cryptographic workflows. During evaluation, the two-layer composition retains model-level detection quality while delivering rule-like responsiveness; we also quantify operational footprint and outline engineering steps to reduce CPU and memory overhead for enterprise deployment. The result is a practical path from behavioral tracing and learning to enforceable, explainable, and risk-proportionate encryption control on production Linux systems.
Authors:Mathew Duong, Michael Chesser, Guy Farrelly, Surya Nepal, Damith C. Ranasinghe
Abstract:
Monolithic Firmware is widespread. Unsurprisingly, fuzz testing firmware is an active research field with new advances addressing the unique challenges in the domain. However, understanding and evaluating improvements by deriving metrics such as code coverage and unique crashes are problematic, leading to a desire for a reliable bug-based benchmark. To address the need, we design and build FirmReBugger, a holistic framework for fairly assessing monolithic firmware fuzzers with a realistic, diverse, bug-based benchmark. FirmReBugger proposes using bug oracles--C syntax expressions of bug descriptors--with an interpreter to automate analysis and accurately report on bugs discovered, discriminating between states of detected, triggered, reached and not reached. Importantly, our idea of benchmarking does not modify the target binary and simply replays fuzzing seeds to isolate the benchmark implementation from the fuzzer while providing a simple means to extend with new bug oracles. Further, analyzing fuzzing roadblocks, we created FirmBench, a set of diverse, real-world binary targets with 313 software bug oracles. Incorporating our analysis of roadblocks challenging monolithic firmware fuzzing, the bench provides for rapid evaluation of future advances. We implement FirmReBugger in a FuzzBench-for-Firmware type service and use FirmBench to evaluate 9 state-of-the art monolithic firmware fuzzers in the style of a reproducibility study, using a 10 CPU-year effort, to report our findings.
Authors:Xiaolei Zhang, Xiaojun Jia, Liquan Chen, Songze Li
Abstract:
Introducing reasoning models into Retrieval-Augmented Generation (RAG) systems enhances task performance through step-by-step reasoning, logical consistency, and multi-step self-verification. However, recent studies have shown that reasoning models suffer from overthinking attacks, where models are tricked to generate unnecessarily high number of reasoning tokens. In this paper, we reveal that such overthinking risk can be inherited by RAG systems equipped with reasoning models, by proposing an end-to-end attack framework named Contradiction-Based Deliberation Extension (CODE). Specifically, CODE develops a multi-agent architecture to construct poisoning samples that are injected into the knowledge base. These samples 1) are highly correlated with the use query, such that can be retrieved as inputs to the reasoning model; and 2) contain contradiction between the logical and evidence layers that cause models to overthink, and are optimized to exhibit highly diverse styles. Moreover, the inference overhead of CODE is extremely difficult to detect, as no modification is needed on the user query, and the task accuracy remain unaffected. Extensive experiments on two datasets across five commercial reasoning models demonstrate that the proposed attack causes a 5.32x-24.72x increase in reasoning token consumption, without degrading task performance. Finally, we also discuss and evaluate potential countermeasures to mitigate overthinking risks.
Authors:Safaa Menssouri, El Mehdi Amhoud
Abstract:
The Internet of Flying Things (IoFT) plays a vital role in modern applications such as aerial surveillance and smart mobility. However, it remains highly vulnerable to cyberattacks that threaten the confidentiality, integrity, and availability of sensitive data. Developing effective intrusion detection systems (IDS) for IoFT networks faces key challenges, including data imbalance, privacy concerns, and the limited capability of traditional models to detect rare but potentially damaging cyber threats. In this work, we propose PrivFly, a privacy-preserving IDS framework that integrates self-supervised representation learning and differential privacy (DP) to enhance detection performance in imbalanced IoFT network traffic. We propose a masked feature reconstruction module for self-supervised pretraining, improving feature representations and boosting rare-class detection. Differential privacy is applied during training to protect sensitive information without significantly compromising model performance. In addition, we conduct a SHapley additive explanations (SHAP)-based analysis to evaluate the impact of DP on feature importance and model behavior. Experimental results on the ECU-IoFT dataset show that PrivFly achieves up to 98% accuracy and 99% F1-score, effectively balancing privacy and detection performance for secure IoFT systems.
Authors:Ismat Jarin, Olivia Figueira, Yu Duan, Tu Le, Athina Markopoulou
Abstract:
Virtual reality (VR) platforms and apps collect user sensor data, including motion, facial, eye, and hand data, in abstracted form. These data may expose users to unique privacy risks without their knowledge or meaningful awareness, yet the extent of these risks remains understudied. To address this gap, we propose VR ProfiLens, a framework to study user profiling based on VR sensor data and the resulting privacy risks across consumer VR apps. To systematically study this problem, we first develop a taxonomy rooted in the CCPA definition of personal information and expand it by sensor, app, and threat contexts to identify user attributes at risk. Then, we conduct a user study in which we collect VR sensor data from four sensor groups from real users interacting with 10 popular consumer VR apps, followed by a survey. We design and apply an analysis pipeline to demonstrate the feasibility of inferring user attributes using these data. Our results show that sensitive personal information can be inferred with moderately high to high risk (up to 90% F1 score) from abstracted sensor data. Through feature analysis, we further identify correlations among app groups and sensor groups in inferring user attributes. Our findings highlight risks to users, including privacy loss, tracking, targeted advertising, and safety threats. Finally, we discuss design implications and regulatory recommendations to enhance transparency and better protect users' privacy in VR.
Authors:Annika Wilde, Samira Briongos, Claudio Soriente, Ghassan Karame
Abstract:
An extensive line of work on modern computing architectures has shown that the execution time of instructions can (i) depend on the operand of the instruction or (ii) be influenced by system optimizations, e.g., branch prediction and speculative execution paradigms. In this paper, we systematically measure and analyze timing variabilities in conditional jump instructions that can be macro-fused with a preceding instruction, depending on their placement within the binary. Our measurements indicate that these timing variations stem from the micro-op cache placement and the jump's offset in the L1 instruction cache of modern processors. We demonstrate that this behavior is consistent across multiple microarchitectures, including Skylake, Coffee Lake, and Kaby Lake, as well as various real-world implementations. We confirm the prevalence of this variability through extensive experiments on a large-scale set of popular binaries, including libraries from Ubuntu 24.04, Windows 10 Pro, and several open-source cryptographic libraries. We also show that one can easily avoid this timing variability by ensuring that macro-fusible instructions are 32-byte aligned - an approach initially suggested in 2019 by Intel in an overlooked short report. We quantify the performance impact of this approach across the cryptographic libraries, showing a speedup of 2.15% on average (and up to 10.54%) when avoiding the timing variability. As a by-product, we show that this variability can be exploited as a covert channel, achieving a maximum throughput of 16.14 Mbps.
Authors:Ishraq Tashdid, Tasnuva Farheen, Sazadur Rahman
Abstract:
Modern system-in-package (SiP) platforms increasingly adopt reconfigurable interposers to enable plug-and-play chiplet integration across heterogeneous multi-vendor ecosystems. However, this flexibility introduces severe trust challenges, as traditional authentication schemes fail to scale or adapt in decentralized, post-fabrication programmable environments. This paper presents InterPUF, a compact and scalable authentication framework that transforms the interposer into a distributed root of trust. InterPUF embeds a route-based differential delay physically unclonable function (PUF) across the reconfigurable interconnect and secures authentication using multi-party computation (MPC), ensuring raw PUF signatures are never exposed. Our hardware evaluation shows only 0.23% area and 0.072% power overhead across diverse chiplets while preserving authentication latency within tens of nanoseconds. Simulation results using pyPUF confirm strong uniqueness, reliability, and modeling resistance under process, voltage, and temperature variations. By combining interposer-resident PUF primitives with cryptographic hashing and collaborative verification, InterPUF enforces a minimal-trust authentication model without relying on a centralized anchor.
Authors:Lele Zheng, Xiang Wang, Tao Zhang, Yang Cao, Ke Cheng, Yulong Shen
Abstract:
Fine-tuning large language models on downstream tasks is crucial for realizing their cross-domain potential but often relies on sensitive data, raising privacy concerns. Differential privacy (DP) offers rigorous privacy guarantees and has been widely adopted in fine-tuning; however, naively injecting noise across the high-dimensional parameter space creates perturbations with large norms, degrading performance and destabilizing training. To address this issue, we propose DP-SFT, a two-stage subspace fine-tuning method that substantially reduces noise magnitude while preserving formal DP guarantees. Our intuition is that, during fine-tuning, significant parameter updates lie within a low-dimensional, task-specific subspace, while other directions change minimally. Hence, we only inject DP noise into this subspace to protect privacy without perturbing irrelevant parameters. In phase one, we identify the subspace by analyzing principal gradient directions to capture task-specific update signals. In phase two, we project full gradients onto this subspace, add DP noise, and map the perturbed gradients back to the original parameter space for model updates, markedly lowering noise impact. Experiments on multiple datasets demonstrate that DP-SFT enhances accuracy and stability under rigorous DP constraints, accelerates convergence, and achieves substantial gains over DP fine-tuning baselines.
Authors:Hadis Rezaei, Rahim Taheri, Francesco Palmieri
Abstract:
Many Ethereum smart contracts rely on block attributes such as block.timestamp or blockhash to generate random numbers for applications like lotteries and games. However, these values are predictable and miner-manipulable, creating the Bad Randomness vulnerability (SWC-120) that has led to real-world exploits. Current detection tools identify only simple patterns and fail to verify whether protective modifiers actually guard vulnerable code. A major obstacle to improving these tools is the lack of large, accurately labeled datasets. This paper presents a benchmark dataset of 1,752 Ethereum smart contracts with validated Bad Randomness vulnerabilities. We developed a five-phase methodology comprising keyword filtering, pattern matching with 58 regular expressions, risk classification, function-level validation, and context analysis. The function-level validation revealed that 49% of contracts initially classified as protected were actually exploitable because modifiers were applied to different functions than those containing vulnerabilities. We classify contracts into four risk levels based on exploitability: HIGH_RISK (no protection), MEDIUM_RISK (miner-exploitable only), LOW_RISK (owner-exploitable only), and SAFE (using Chainlink VRF or commit-reveal). Our dataset is 51 times larger than RNVulDet and the first to provide function-level validation and risk stratification. Evaluation of Slither and Mythril revealed significant detection gaps, as both tools identified none of the vulnerable contracts in our sample, indicating limitations in handling complex randomness patterns. The dataset and validation scripts are publicly available to support future research in smart contract security.
Authors:Annika Wilde, Samira Briongos, Claudio Soriente, Ghassan Karame
Abstract:
Trusted Execution Environments (TEEs) are gaining popularity as an effective means to provide confidentiality in the cloud. TEEs, such as Intel SGX, suffer from so-called rollback and cloning attacks (often referred to as forking attacks). Rollback attacks are enabled by the lack of freshness guarantees for sealed data; cloning attacks stem from the inability to determine if other instances of an enclave are running on the same platform. While rollback attacks have been extensively studied by the community, cloning attacks have been, unfortunately, less investigated. To address this gap, we extensively study and thoroughly analyze the susceptibility of 72 SGX-based proposals to cloning attacks. Our results show that roughly 20% of the analyzed proposals are insecure against cloning attacks-including those applications that rely on monotonic counters and are, therefore, secure against rollback attacks.
Authors:Saleem Ishaq Tijjani, Bogdan Ghita, Nathan Clarke, Matthew Craven
Abstract:
Advanced Persistent Threats (APTs) represent hidden, multi\-stage cyberattacks whose long term persistence and adaptive behavior challenge conventional intrusion detection systems (IDS). Although recent advances in machine learning and probabilistic modeling have improved APT detection performance, most existing approaches remain reactive and alert\-centric, providing limited capability for stage-aware prediction and principled inference under uncertainty, particularly when observations are sparse or incomplete. This paper proposes E\-HiDNet, a unified hybrid deep probabilistic learning framework that integrates convolutional and recurrent neural networks with a Hidden Markov Model (HMM) to allow accurate prediction of the progression of the APT campaign. The deep learning component extracts hierarchical spatio\-temporal representations from correlated alert sequences, while the HMM models latent attack stages and their stochastic transitions, allowing principled inference under uncertainty and partial observability. A modified Viterbi algorithm is introduced to handle incomplete observations, ensuring robust decoding under uncertainty. The framework is evaluated using a synthetically generated yet structurally realistic APT dataset (S\-DAPT\-2026). Simulation results show that E\-HiDNet achieves up to 98.8\-100\% accuracy in stage prediction and significantly outperforms standalone HMMs when four or more observations are available, even under reduced training data scenarios. These findings highlight that combining deep semantic feature learning with probabilistic state\-space modeling enhances predictive APT stage performance and situational awareness for proactive APT defense.
Authors:Saleem Ishaq Tijjani, Bogdan Ghita, Nathan Clarke, Matthew Craven
Abstract:
The detection of advanced persistent threats (APTs) remains a crucial challenge due to their stealthy, multistage nature and the limited availability of realistic, labeled datasets for systematic evaluation. Synthetic dataset generation has emerged as a practical approach for modeling APT campaigns; however, existing methods often rely on computationally expensive alert correlation mechanisms that limit scalability. Motivated by these limitations, this paper presents a near realistic synthetic APT dataset and an efficient alert correlation framework. The proposed approach introduces a machine learning based correlation module that employs K Nearest Neighbors (KNN) clustering with a cosine similarity metric to group semantically related alerts within a temporal context. The dataset emulates multistage APT campaigns across campus and organizational network environments and captures a diverse set of fourteen distinct alert types, exceeding the coverage of commonly used synthetic APT datasets. In addition, explicit APT campaign states and alert to stage mappings are defined to enable flexible integration of new alert types and support stage aware analysis. A comprehensive statistical characterization of the dataset is provided to facilitate reproducibility and support APT stage predictions.
Authors:Haris Khan, Sadia Asif, Shumaila Asif
Abstract:
The proliferation of generative AI systems creates unprecedented opportunities for content creation while raising critical concerns about controllability, copyright infringement, and content provenance. Current generative models operate as "black boxes" with limited user control and lack built-in mechanisms to protect intellectual property or trace content origin. We propose a novel multi-agent framework that addresses these challenges through specialized agent roles and integrated watermarking. Our system orchestrates Director, Generator, Reviewer, Integration, and Protection agents to ensure user intent alignment while embedding digital provenance markers. We demonstrate feasibility through two case studies: creative content generation with iterative refinement and copyright protection for AI-generated art in commercial contexts. Preliminary feasibility evidence from prior work indicates up to 23\% improvement in semantic alignment and 95\% watermark recovery rates. This work contributes to responsible generative AI deployment, positioning multi-agent systems as a solution for trustworthy creative workflows in legal and commercial applications.
Authors:Isaiah J. King, Bernardo Trindade, Benjamin Bowman, H. Howie Huang
Abstract:
Representing networks as a graph and training a link prediction model using benign connections is an effective method of anomaly-based intrusion detection. Existing works using this technique have shown great success using temporal graph neural networks and skip-gram-based approaches on random walks. However, random walk-based approaches are unable to incorporate rich edge data, while the GNN-based approaches require large amounts of memory to train. In this work, we propose extending the original insight from random walk-based skip-grams--that random walks through a graph are analogous to sentences in a corpus--to the more modern transformer-based foundation models. Using language models that take advantage of GPU optimizations, we can quickly train a graph foundation model to predict missing tokens in random walks through a network of computers. The graph foundation model is then finetuned for link prediction and used as a network anomaly detector. This new approach allows us to combine the efficiency of random walk-based methods and the rich semantic representation of deep learning methods. This system, which we call CyberGFM, achieved state-of-the-art results on three widely used network anomaly detection datasets, delivering a up to 2$\times$ improvement in average precision. We found that CyberGFM outperforms all prior works in unsupervised link prediction for network anomaly detection, using the same number of parameters, and with equal or better efficiency than the previous best approaches.
Authors:Enrique Feito-Casares, Ismael Gómez-Talal, José-Luis Rojo-Álvarez
Abstract:
This data article introduces a comprehensive, high-resolution honeynet dataset designed to support standalone analyses of global cyberattack behaviors. Collected over a continuous 72-hour window (June 9 to 11, 2025) on Microsoft Azure, the dataset comprises 132,425 individual attack events captured by three honeypots (Cowrie, Dionaea, and SentryPeer) deployed across four geographically dispersed virtual machines. Each event record includes enriched metadata (UTC timestamps, source/destination IPs, autonomous system and organizational mappings, geolocation coordinates, targeted ports, and honeypot identifiers alongside derived temporal features and standardized protocol classifications). We provide actionable guidance for researchers seeking to leverage this dataset in anomaly detection, protocol-misuse studies, threat intelligence, and defensive policy design. Descriptive statistics highlight significant skew: 2,438 unique source IPs span 95 countries, yet the top 1% of IPs account for 1% of all events, and three protocols dominate: Session Initiation Protocol (SIP), Telnet, Server Message Block (SMB). Temporal analysis uncovers pronounced rush-hour peaks at 07:00 and 23:00 UTC, interspersed with maintenance-induced gaps that reveal operational blind spots. Geospatial mapping further underscores platform-specific biases: SentryPeer captures concentrated SIP floods in North America and Southeast Asia, Cowrie logs Telnet/SSH scans predominantly from Western Europe and the U.S., and Dionaea records SMB exploits around European nodes. By combining fine-grained temporal resolution with rich, contextual geolocation and protocol metadata, this standalone dataset aims to empower reproducible, cloud-scale investigations into evolving cyber threats. Accompanying analysis code and data access details are provided.
Authors:Konstantinos E. Kampourakis, Vyron Kampourakis, Efstratios Chatzoglou, Georgios Kambourakis, Stefanos Gritzalis
Abstract:
Realistic, large-scale, and well-labeled cybersecurity datasets are essential for training and evaluating Intrusion Detection Systems (IDS). However, they remain difficult to obtain due to privacy constraints, data sensitivity, and the cost of building controlled collection environments such as testbeds and cyber ranges. This paper investigates whether Large Language Models (LLMs) can operate as controlled knowledge-to-data engines for generating structured synthetic network traffic datasets suitable for IDS research. We propose a methodology that combines protocol documentation, attack semantics, and explicit statistical rules to condition LLMs without fine-tuning or access to raw samples. Using the AWID3 IEEE~802.11 benchmark as a demanding case study, we generate labeled datasets with four state-of-the-art LLMs and assess fidelity through a multi-level validation framework including global similarity metrics, per-feature distribution testing, structural comparison, and cross-domain classification. Results show that, under explicit constraints, LLM-generated datasets can closely approximate the statistical and structural characteristics of real network traffic, enabling gradient-boosting classifiers to achieve F1-scores up to 0.956 when evaluated on real samples. Overall, the findings suggest that constrained LLM-driven generation can facilitate on-demand IDS experimentation, providing a testbed-free, privacy-preserving alternative that overcomes the traditional bottlenecks of physical traffic collection and manual labeling.
Authors:Mohammad Shamim Ahsan, Haizhou Wang, Venkateswara Reddy Motakatla, Minghui Zhu, Peng Liu
Abstract:
In recent years, cyberattacks - along with physical faults - have become an increasing factor causing system failures, especially in DER (Distributed Energy Resources) systems. In addition, according to the literature, a number of faults have been reported to remain undetected. Consequently, unlike anomaly detection works that only identify abnormalities, differentiating undetected faults and cyberattacks is a challenging task. Although several works have studied this problem, they crucially fall short of achieving an accurate distinction due to the reliance on physical laws or physical measurements. To resolve this issue, the industry typically conducts an integrated analysis with physical measurements and cyberspace information. Nevertheless, this industry approach consumes a significant amount of time due to the manual efforts required in the analysis. In this work, we focus on addressing these crucial gaps by proposing a non-trivial approach of distinguishing undetected faults and cyberattacks in DER systems. Specifically, first, a special kind of dependency graph is constructed using a novel virtual physical variable-oriented taint analysis (PVOTA) algorithm. Then, the graph is simplified using an innovative node pruning technique, which is based on a set of context-dependent operations. Next, a set of patterns capturing domain-specific knowledge is derived to bridge the semantic gaps between the cyber and physical sides. Finally, these patterns are matched to the relevant events that occurred during failure incidents, and possible root causes are concluded based on the pattern matching results. In the end, the efficacy of our proposed automatic integrated analysis is evaluated through four case studies covering failure incidents caused by the FDI attack, undetected faults, and memory corruption attacks.
Authors:Hengyu Wu, Yang Cao
Abstract:
Training data is a critical and often proprietary asset in Large Language Model (LLM) development, motivating the use of data watermarking to embed model-transferable signals for usage verification. We identify low coverage as a vital yet largely overlooked requirement for practicality, as individual data owners typically contribute only a minute fraction of massive training corpora. Prior methods fail to maintain stealthiness, verification feasibility, or robustness when only one or a few sequences can be modified. To address these limitations, we introduce SLIM, a framework enabling per-user data provenance verification under strict black-box access. SLIM leverages intrinsic LLM properties to induce a Latent-Space Confusion Zone by training the model to map semantically similar prefixes to divergent continuations. This manifests as localized generation instability, which can be reliably detected via hypothesis testing. Experiments demonstrate that SLIM achieves ultra-low coverage capability, strong black-box verification performance, and great scalability while preserving both stealthiness and model utility, offering a robust solution for protecting training data in modern LLM pipelines.
Authors:Lei Hu, Sennur Ulukus
Abstract:
We provide new insights into an open problem recently posed by Yuan-Sun [ISIT 2025], concerning the minimum individual key rate required in the vector linear secure aggregation problem. Consider a distributed system with $K$ users, where each user $k\in [K]$ holds a data stream $W_k$ and an individual key $Z_k$. A server aims to compute a linear function $\mathbf{F}[W_1;\ldots;W_K]$ without learning any information about another linear function $\mathbf{G}[W_1;\ldots;W_K]$, where $[W_1;\ldots;W_K]$ denotes the row stack of $W_1,\ldots,W_K$. The open problem is to determine the minimum required length of $Z_k$, denoted as $R_k$, $k\in [K]$. In this paper, we characterize a new achievable region for the rate tuple $(R_1,\ldots,R_K)$. The region is polyhedral, with vertices characterized by a binary rate assignment $(R_1,\ldots,R_K) = (\mathbf{1}(1 \in \mathcal{I}),\ldots,\mathbf{1}(K\in \mathcal{I}))$, where $\mathcal{I}\subseteq [K]$ satisfies the \textit{rank-increment condition}: $\mathrm{rank}\left(\bigl[\mathbf{F}_{\mathcal{I}};\mathbf{G}_{\mathcal{I}}\bigr]\right) =\mathrm{rank}\bigl(\mathbf{F}_{\mathcal{I}}\bigr)+N$. Here, $\mathbf{F}_\mathcal{I}$ and $\mathbf{G}_\mathcal{I}$ are the submatrices formed by the columns indexed by $\mathcal{I}$. Our results uncover the novel fact that it is not necessary for every user to hold a key, thereby strictly enlarging the best-known achievable region in the literature. Furthermore, we provide a converse analysis to demonstrate its optimality when minimizing the number of users that hold keys.
Authors:Martin Perešíni, Tomáš Hladký, Jakub Kubík, Ivan Homoliak
Abstract:
The aim of this work is to enhance blockchain security by deepening the understanding of selfish mining attacks in various consensus protocols, especially the ones that have the potential to mitigate selfish mining. Previous research was mainly focused on a particular protocol with a single selfish miner, while only limited studies have been conducted on two or more attackers. To address this gap, we proposed a stochastic simulation framework that enables analysis of selfish mining with multiple attackers across various consensus protocols. We created the model of Proof-of-Work (PoW) Nakamoto consensus (serving as the baseline) as well as models of two additional consensus protocols designed to mitigate selfish mining: Fruitchain and Strongchain. Using our framework, thresholds reported in the literature were verified, and several novel thresholds were discovered for 2 and more attackers. We made the source code of our framework available, enabling researchers to evaluate any newly added protocol with one or more selfish miners and cross-compare it with already modeled protocols.
Authors:Chandra Thapa, Surya Nepal
Abstract:
Integrated Sensing and Communication (ISAC) represents a significant shift in the 6G landscape, where wireless networks both sense the environment and communicate. While prior comprehensive surveys have established foundational elements of ISAC security, discussed perception-focused security models, and proposed layered defense strategies, this paper synthesizes these studies into a comprehensive taxonomic framework that covers the whole ISAC security domain. This paper provides a systematic and thorough review of ISAC security across multiple orthogonal dimensions. These include threat taxonomy and propagation methods; vulnerability analysis at design, physical, computational, and architectural levels; defense mechanisms categorized by deployment layer; security-performance trade-offs with theoretical bounds; sector-specific security demands for critical infrastructure; and emerging issues such as quantum resilience, AI-hardening, and privacy preservation. Unlike previous frameworks that primarily focus on vision, this review combines these dimensions, introduces new classification schemes that reveal hidden relationships between threats and defenses, and identifies key research gaps through structured analysis. This detailed taxonomy offers a valuable reference for researchers developing secure ISAC systems and policymakers establishing security standards.
Authors:Muhammad Bilal, Omer Tariq, Hasan Ahmed
Abstract:
Timing and burst patterns can leak through encryption, and an adaptive adversary can exploit them. This undermines metadata-only detection in a stand-alone consumer gateway. Therefore, consumer gateways need streaming intrusion detection on encrypted traffic using metadata only, under tight CPU and latency budgets. We present a streaming IDS for stand-alone gateways that instantiates a lightweight two-state unit derived from Network-Optimised Spiking (NOS) dynamics per flow, named NOS-Gate. NOS-Gate scores fixed-length windows of metadata features and, under a $K$-of-$M$ persistence rule, triggers a reversible mitigation that temporarily reduces the flow's weight under weighted fair queueing (WFQ). We evaluate NOS-Gate under timing-controlled evasion using an executable 'worlds' benchmark that specifies benign device processes, auditable attacker budgets, contention structure, and packet-level WFQ replay to quantify queue impact. All methods are calibrated label-free via burn-in quantile thresholding. Across multiple reproducible worlds and malicious episodes, at an achieved $0.1%$ false-positive operating point, NOS-Gate attains 0.952 incident recall versus 0.857 for the best baseline in these runs. Under gating, it reduces p99.9 queueing delay and p99.9 collateral delay with a mean scoring cost of ~ 2.09 μs per flow-window on CPU.
Authors:Manish Bhatt, Adrian Wood, Idan Habler, Ammar Al-Kahfah
Abstract:
Production LLM agents with tool-using capabilities require security testing despite their safety training. We adapt Go-Explore to evaluate GPT-4o-mini across 28 experimental runs spanning six research questions. We find that random-seed variance dominates algorithmic parameters, yielding an 8x spread in outcomes; single-seed comparisons are unreliable, while multi-seed averaging materially reduces variance in our setup. Reward shaping consistently harms performance, causing exploration collapse in 94% of runs or producing 18 false positives with zero verified attacks. In our environment, simple state signatures outperform complex ones. For comprehensive security testing, ensembles provide attack-type diversity, whereas single agents optimize coverage within a given attack type. Overall, these results suggest that seed variance and targeted domain knowledge can outweigh algorithmic sophistication when testing safety-trained models.
Authors:Shini Girija, Pranav M. Pawar, Raja Muthalagu, Mithun Mukherjee
Abstract:
Privacy has always been a critical issue in the digital era, particularly with the increasing use of Internet of Things (IoT) devices. As the IoT continues to transform industries such as healthcare, smart cities, and home automation, it has also introduced serious challenges regarding the security of sensitive and private data. This paper examines the complex landscape of digital privacy in IoT ecosystems, highlighting the need to protect personally identifiable information (PII) of individuals and uphold their rights to digital independence. Global events, such as the COVID-19 pandemic, have accelerated the adoption of IoT, raising concerns about privacy and data protection. This paper provides an in-depth examination of digital privacy risks in the IoT domain and introduces a clear taxonomy for evaluating them using the IEEE Digital Privacy Model. The proposed framework categorizes privacy risks into five types: identity-oriented, behavioral, inference, data manipulation, and regulatory risks. We review existing digital privacy solutions, including encryption technologies, blockchain, federated learning, differential privacy, reinforcement learning, AI, and dynamic consent mechanisms, to mitigate these risks. We also highlight how these privacy-enhancing technologies (PETs) help with data confidentiality, access control, and trust management. Additionally, this study presents AURA-IoT, a futuristic framework that tackles AI-driven privacy risks through a multi-layered structure. AURA-IoT integrates adversarial robustness, explainability, transparency, fairness, compliance, dynamic consent, and policy enforcement mechanisms to ensure digital privacy, security, and accountable IoT operations. Finally, we discuss ongoing challenges and potential research directions for integrating AI and encryption-based privacy solutions to achieve comprehensive digital privacy in future IoT systems.
Authors:Arthur Amorim, Paul Gazzillo, Max Taylor, Lance Joneckis
Abstract:
Standard communication protocols for Unmanned Aerial Vehicles (UAVs), such as MAVLink, lack the capability to enforce the contextual validity of message sequences. Autopilots therefore remain vulnerable to stealthy attacks, where syntactically correct but semantically ill-timed commands induce unsafe states without triggering physical anomaly detectors. Prior work (DATUM) demonstrated that global Refined Multiparty Session Types (RMPSTs) are an effective specification language for centralized MAVLink protocol enforcement, but suffered from two engineering failures: manual proof terms interleaved with protocol definitions, and an OCaml extraction backend whose managed runtime is incompatible with resource-constrained UAV hardware. We present Platum, a framework that addresses both failures with a minimal DSL requiring only the five semantic components of a global session type (sender, receiver, label, payload variable, refinement predicate), whose structural well-formedness conditions are confirmed via reflective decision procedures in Meta-F*. Confirmed specifications are compiled directly into flat, allocation-free C Finite State Machines (FSMs), deployed as centralized proxy monitors at the GCS/UAV communication boundary. Our evaluation demonstrates a 4x reduction in total monitor latency and lower memory overhead compared to DATUM, measured via ArduPilot SITL simulation.
Authors:Zain Ul Abideen, Deepali Garg, Lawrence Pileggi, Samuel Pagliarini
Abstract:
Universal Circuits (UCs) offer a promising approach to hardware Intellectual Property (IP) obfuscation, leveraging cryptographic principles to hide both structure and function in a programmable logic fabric. Their adaptability makes them especially suitable for the globalized Integrated Circuit (IC) supply chain, where security against threats like reverse engineering is crucial. Despite the potential, UC security remains largely unexplored. This work evaluates UC security against state-of-the-art oracle-guided (OG) and oracle-less (OL) attacks. Results show near-random success rates (approx 50%) for OG attacks whereas OL attacks display minimal structural leakage. Collectively, these findings confirm the feasibility of UCs for IP protection.
Authors:Davide Colaiacomo, Chiara Bonfanti, Cataldo Basile
Abstract:
Translating security intent into deployable network enforcement rules and maintaining their effectiveness despite evolving cyber threats remains a largely manual process in most Security Operations Centers (SOCs). In large and heterogeneous networks, this challenge is complicated by topology-dependent reachability constraints and device-specific security control capabilities, making the process slow, error-prone, and a recurring source of misconfigurations. This paper presents RefinementEngine, an engine that automates the refinement of high-level security intents into low-level, deployment-ready configurations. Given a network topology, devices, and available security controls, along with high-level intents and Cyber Threat Intelligence (CTI) reports, RefinementEngine automatically generates settings that implement the desired intent, counter reported threats, and can be directly deployed on target security controls. The proposed approach is validated through real-world use cases on packet and web filtering policies derived from actual CTI reports, demonstrating both correctness, practical applicability, and adaptability to new data.
Authors:Khan Thamid Hasan, Md Ajoad Hasan, Nashmin Alam, Md. Touhidul Islam, Upoma Das, Farimah Farahmandi
Abstract:
As hardware systems grow in complexity, security verification must keep up with them. Recently, artificial intelligence (AI) and large language models (LLMs) have started to play an important role in automating several stages of the verification workflow by helping engineers analyze designs, reason about potential threats, and generate verification artifacts. This survey synthesizes recent advances in AI-assisted hardware security verification and organizes the literature along key stages of the workflow: asset identification, threat modeling, security test-plan generation, simulation-driven analysis, formal verification, and countermeasure reasoning. To illustrate how these techniques can be applied in practice, we present a case study using the open-source NVIDIA Deep Learning Accelerator (NVDLA), a representative modern hardware design. Throughout this study, we emphasize that while AI/LLM-based automation can significantly accelerate verification tasks, its outputs must remain grounded in simulation evidence, formal reasoning, and benchmark-driven evaluation to ensure trustworthy hardware security assurance.
Authors:Rui Bao, Zheng Gao, Xiaoyu Li, Xiaoyan Feng, Yang Song, Jiaojiao Jiang
Abstract:
Diffusion-based watermarking methods embed verifiable marks by manipulating the initial noise or the reverse diffusion trajectory. However, these methods share a critical assumption: verification can succeed only if the diffusion trajectory can be faithfully reconstructed. This reliance on trajectory recovery constitutes a fundamental and exploitable vulnerability. We propose $\underline{\mathbf{S}}$tochastic $\underline{\mathbf{Hi}}$dden-Trajectory De$\underline{\mathbf{f}}$lec$\underline{\mathbf{t}}$ion ($\mathbf{SHIFT}$), a training-free attack that exploits this common weakness across diverse watermarking paradigms. SHIFT leverages stochastic diffusion resampling to deflect the generative trajectory in latent space, making the reconstructed image statistically decoupled from the original watermark-embedded trajectory while preserving strong visual quality and semantic consistency. Extensive experiments on nine representative watermarking methods spanning noise-space, frequency-domain, and optimization-based paradigms show that SHIFT achieves 95%--100% attack success rates with nearly no loss in semantic quality, without requiring any watermark-specific knowledge or model retraining.
Authors:Chihan Huang, Huaijin Wang, Shuai Wang
Abstract:
The pervasive deployment of deep learning models across critical domains has concurrently intensified privacy concerns due to their inherent propensity for data memorization. While Membership Inference Attacks (MIAs) serve as the gold standard for auditing these privacy vulnerabilities, conventional MIA paradigms are increasingly constrained by the prohibitive computational costs of shadow model training and a precipitous performance degradation under low False Positive Rate constraints. To overcome these challenges, we introduce a novel perspective by leveraging the principles of model reprogramming as an active signal amplifier for privacy leakage. Building upon this insight, we present \texttt{ReproMIA}, a unified and efficient proactive framework for membership inference. We rigorously substantiate, both theoretically and empirically, how our methodology proactively induces and magnifies latent privacy footprints embedded within the model's representations. We provide specialized instantiations of \texttt{ReproMIA} across diverse architectural paradigms, including LLMs, Diffusion Models, and Classification Models. Comprehensive experimental evaluations across more than ten benchmarks and a variety of model architectures demonstrate that \texttt{ReproMIA} consistently and substantially outperforms existing state-of-the-art baselines, achieving a transformative leap in performance specifically within low-FPR regimes, such as an average of 5.25\% AUC and 10.68\% TPR@1\%FPR increase over the runner-up for LLMs, as well as 3.70\% and 12.40\% respectively for Diffusion Models.
Authors:Sangyi Wu, Junpu Guo, Xianghang Mi
Abstract:
Illicit online promotion is a persistent threat that evolves to evade detection. Existing moderation systems remain tethered to platform-specific supervision and static taxonomies, a reactive paradigm that struggles to generalize across domains or uncover novel threats. This paper presents a systematic study of In-Context Learning (ICL) as a unified framework for illicit promotion detection. Through rigorous analysis, we show that properly configured ICL achieves performance comparable to fine-tuned models using 22x fewer labeled examples. We demonstrate three key capabilities: (1) Generalization to unseen threats: ICL generalizes to new illicit categories without category-specific demonstrations, with a performance drop of less than 6% for most evaluated categories. (2) Autonomous discovery: A novel two-stage pipeline distills 2,900 free-form labels into coherent taxonomies, surfacing eight previously undocumented illicit categories such as usury and illegal immigration. (3) Cross-platform generalization: Deployed on 200,000 real-world samples from search engines and Twitter without adaptation, ICL achieves 92.6% accuracy. Furthermore, 61.8% of its uniquely flagged samples correspond to borderline or obfuscated content missed by existing detectors. Our findings position ICL as a new paradigm for content moderation, combining the precision of specialized classifiers with cross-platform generalization and autonomous threat discovery. By shifting to inference-time reasoning, ICL offers a path toward proactively adaptive moderation systems.
Authors:Munawar Hasan, Apostol Vassilev, Edward Griffor, Thoshitha Gamage
Abstract:
The application of zero-knowledge proofs (ZKPs) in autonomous systems is an emerging area of research, motivated by the growing need for regulatory compliance, transparent auditing, and trustworthy operation in decentralized environments. zk-SNARK is a powerful cryptographic tool that allows a party (the prover) to prove to another party (the verifier) that a statement about its own internal state is true, without revealing sensitive or proprietary data about that state. This paper proposes Hermes Seal: a zk-SNARK-based ZKP framework for enabling privacy-preserving, verifiable communication in vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) networks. The framework allows autonomous systems to generate cryptographic proofs of perception and decision-related computations without revealing proprietary models, sensor data, or internal system states, thereby supporting interoperability across heterogeneous autonomous systems. We present two real-world case studies implemented and empirically evaluated within our framework, demonstrating a step toward verifiable autonomous system information exchanges. The first demonstrates real-time proof generation and verification, achieving 8 ms proof generation and 1 ms verification on a GPU, while the second evaluates the performance of an autonomous vehicle perception stack, enabling proof of computation without exposing proprietary or confidential data. Furthermore, the framework can be integrated into AV perception stacks to facilitate verifiable interoperability and privacy-preserving cooperative perception. The demonstration code for this project is open source, available on Github.
Authors:Junling Fan, David Koblah, Domenic Forte
Abstract:
Semiconductor intellectual property (IP) theft incurs hundreds of billions in annual losses, driven by advanced reverse engineering (RE) techniques. Traditional ``cryptic'' IC camouflaging methods typically focus on hiding localized gate functionality but remain vulnerable to system-level structural analysis. This paper explores ``mimetic deception,'' where a functional IP (F) is designed to structurally and visually masquerade as a completely different appearance IP (A). We provide a comprehensive evaluation of three deceptive methodologies: IP Camouflage, Graph Matching, and DNAS-NAND Gate Array, analyzing their resilience against GNN-based node classification, and Differential Power Analysis (DPA). Crucially, we demonstrate that mimetic deception achieves a novel anti-side-channel defense: by forcing the mis-classification of cryptographic primitives, the adversary is led to apply an incorrect power model, causing the DPA attack to fail. Our results validate that this multi-layered approach effectively thwarts the entire RE toolchain by poisoning the structural and logical data used for netlist understanding.
Authors:Muhammad Liman Gambo, Ahmad Almulhem
Abstract:
Micro-segmentation as a core requirement of zero trust architecture (ZTA) divides networks into small security zones, called micro-segments, thereby minimizing impact of security breaches and restricting lateral movement of attackers. Existing approaches for Industrial Internet of Things (IIoT) networks often remain centralized, static, or difficult to interpret. These limitations are critical in IIoT, where devices are heterogeneous, communication behavior evolves over time, and raw data sharing across sites is often undesirable. Accordingly, we propose EFAH-ZTM, an Explainable Federated Autoencoder-Hypergraph framework for Zero Trust micro-segmentation in IIoT networks. The framework includes a trained federated DNAE that learns behavioral embeddings from distributed clients. kNN-based and Manifold-based hypergraphs capture higher-order relationships among device-flow instances. To generate micro-segments, MiniBatch KMeans and HDBSCAN clustering techniques are applied on the spectral embeddings, while an operational risk score that combines reconstruction error and structural outlierness drives allow/block policy decisions. Trustworthiness of the policy decision is improved through feature-level explanations using LIME and SHAP. Experiments on the WUSTL-IIoT-2021 dataset show that HDBSCAN achieved the strongest structural quality, while the manifold-based hypergraph produces the best oracle-aligned security efficacy that reaches a purity of 0.9990 with near-zero contamination. Similarly, the explainability module also showed high fidelity and stability, with surrogate classifier having an accuracy of 0.9927 and stable explanations across runs. Moreover, an ablation analysis shows that the federated learning preserves competitive segmentation quality relative to centralized training, and the hypergraph modeling significantly improves structural separation and risk stratification.
Authors:Taro Tsuchiya, Haoxiang Yu, Tina Marjanov, Alice Hutchings, Nicolas Christin, Alejandro Cuevas
Abstract:
Telegram, initially a messaging app, has evolved into a platform where users can interact with various services through programmable applications, bots. Bots provide a wide range of uses, from moderating groups, helping with online shopping, to even executing trades in financial markets. However, Telegram has been increasingly associated with various illicit activities -- financial scams, stolen data, non-consensual image sharing, among others, raising concerns bots may be facilitating these operations. This paper is the first to characterize Telegram bots at scale, through the following contributions. First, we offer the largest general-purpose message dataset and the first bot dataset. Through snowball sampling from two published datasets, we uncover over 67,000 additional channels, 492 million messages, and 32,000 bots. Second, we develop a system to automatically interact with bots in order to extract their functionality. Third, based on their description, chat responses, and the associated channels, we classify bots into several domains. Fourth, we investigate the communities each bot serves, by analyzing supported languages, usage patterns (e.g., duration, reuse), and network topology. While our analysis discovers useful applications such as crowdsourcing, we also identify malicious bots (e.g., used for financial scams, illicit underground services) serving as payment gateways, referral systems, and malicious AI endpoints. By exhorting the research community to look at bots as software infrastructure, this work hopes to foster further research useful to content moderators, and to help interventions against illicit activities.
Authors:James Bell-Clark, Albert Cheu, Adria Gascon, Jonathan Katz
Abstract:
In this work, we identify a set of side-channels in our Confidential Federated Compute platform that a hypothetical insider could exploit to circumvent differential privacy (DP) guarantees. We show how DP can mitigate two of the side-channels, one of which has been implemented in our open-source library.
Authors:Trung V. Phan, Thomas Bauschert
Abstract:
Advanced Persistent Threats (APTs) are stealthy, multi-stage attacks that require adaptive and timely defense. While deep reinforcement learning (DRL) enables autonomous cyber defense, its decisions are often opaque and difficult to trust in operational environments. This paper presents DeepXplain, an explainable DRL framework for stage-aware APT defense. Building on our prior DeepStage model, DeepXplain integrates provenance-based graph learning, temporal stage estimation, and a unified XAI pipeline that provides structural, temporal, and policy-level explanations. Unlike post-hoc methods, explanation signals are incorporated directly into policy optimization through evidence alignment and confidence-aware reward shaping. To the best of our knowledge, DeepXplain is the first framework to integrate explanation signals into reinforcement learning for APT defense. Experiments in a realistic enterprise testbed show improvements in stage-weighted F1-score (0.887 to 0.915) and success rate (84.7% to 89.6%), along with higher explanation confidence (0.86), improved fidelity (0.79), and more compact explanations (0.31). These results demonstrate enhanced effectiveness and trustworthiness of autonomous cyber defense.
Authors:Sara Aguincha, Emanuel Nunes, Samih Eisa, Miguel L. Pardal
Abstract:
Sensor technologies have evolved to a point where it is now practical to monitor products along the supply chain. The collected data can be stored in a decentralized way using blockchain technology. However, ensuring the reliability of the sensed data is a critical challenge. In other words, we need to trust the data that we write to the blockchain. In this work, we propose ChainGuards, a decentralized system that uses product-specific rules to verify data collected across the supply chain, with particular focus on sensor-derived information, issuing warnings and triggering audits when anomalies are detected. We evaluated ChainGuards using data from a real cherry supply chain deployment. The result shows that the implemented solution provides reliable verification of supply chain data with low performance overhead, able to correctly detect data discrepancies and inconsistencies.
Authors:Ravish Gupta, Saket Kumar, Shreeya Sharma, Maulik Dang, Abhishek Aggarwal
Abstract:
Getting a real cybersecurity risk assessment for a small organization is expensive -- a NIST CSF-aligned engagement runs $15,000 on the low end, takes weeks, and depends on practitioners who are genuinely scarce. Most small companies skip it entirely. We built a six-agent AI system where each agent handles one analytical stage: profiling the organization, mapping assets, analyzing threats, evaluating controls, scoring risks, and generating recommendations. Agents share a persistent context that grows as the assessment proceeds, so later agents build on what earlier ones concluded -- the mechanism that distinguishes this from standard sequential agent pipelines. We tested it on a 15-person HIPAA-covered healthcare company and compared outputs to independent assessments by three CISSP practitioners -- the system agreed with them 85% of the time on severity classifications, covered 92% of identified risks, and finished in under 15 minutes. We then ran 30 repeated single-agent assessments across five synthetic but sector-realistic organizational profiles in healthcare, fintech, manufacturing, retail, and SaaS, comparing a general-purpose Mistral-7B against a domain fine-tuned model. Both completed every run. The fine-tuned model flagged threats the baseline could not see at all: PHI exposure in healthcare, OT/IIoT vulnerabilities in manufacturing, platform-specific risks in retail. The full multi-agent pipeline, however, failed every one of 30 attempts on a Tesla T4 with its 4,096-token default context window -- context capacity, not model quality, turned out to be the binding constraint.
Authors:Jin Xie, Songze Li, Guang Cheng
Abstract:
Retrieval-Augmented Generation (RAG) systems introduce a critical vulnerability: contextual leakage, where adversaries exploit instruction-following to exfiltrate Personally Identifiable Information (PII) via adaptive extraction. Current defenses force a rigid trade-off between semantic utility and latency. We present SEAL-Tag, a privacy-preserving runtime environment that resolves this via a Verify-then-Route paradigm. SEAL-Tag introduces the SEAL-Probe protocol, transforming auditing into a structured tool-use operation where the model generates a verifiable PII-Evidence Table (PET) alongside its draft. To adjudicate this evidence, we employ a Probabilistic Circuit (PC) that enforces verifiable logical constraints for robust decision-making. To overcome the privacy "Cold Start" problem, we introduce the S0--S6 Anchored Synthesis Pipeline, generating high-fidelity, provenanced RAG interactions. We pair this with a Two-Stage Curriculum that first optimizes for entity detection before aligning the model to the rigorous audit protocol. Our evaluation demonstrates that SEAL-Tag establishes a new Pareto frontier, reducing adaptive leakage by over 8$\times$ while matching the utility and speed of unsafe baselines.
Authors:Shihan Zhang, Bing Han, Chuanyong Tian, Ruisheng Shi, Lina Lan, Qin Wang
Abstract:
Privacy protection mechanisms are a fundamental aspect of security in cryptocurrency systems, particularly in decentralized networks such as Bitcoin. Although Bitcoin addresses are not directly associated with real-world identities, this does not fully guarantee user privacy. Various deanonymization solutions have been proposed, with network layer deanonymization attacks being especially prominent. However, existing approaches often exhibit limitations such as low precision. In this paper, we propose \textit{NTSSL}, a novel and efficient transaction deanonymization method that integrates network traffic analysis with semi-supervised learning. We use unsupervised learning algorithms to generate pseudo-labels to achieve comparable performance with lower costs. Then, we introduce \textit{NTSSL+}, a cross-layer collaborative analysis integrating transaction clustering results to further improve accuracy. Experimental results demonstrate a substantial performance improvement, 1.6 times better than the existing approach using machining learning.
Authors:Trung V. Phan, Tri Gia Nguyen, Thomas Bauschert
Abstract:
This paper presents DeepStage, a deep reinforcement learning (DRL) framework for adaptive, stage-aware defense against Advanced Persistent Threats (APTs). The enterprise environment is modeled as a partially observable Markov decision process (POMDP), where host provenance and network telemetry are fused into unified provenance graphs. Building on our prior work, StageFinder, a graph neural encoder and an LSTM-based stage estimator infer probabilistic attacker stages aligned with the MITRE ATT&CK framework. These stage beliefs, combined with graph embeddings, guide a hierarchical Proximal Policy Optimization (PPO) agent that selects defense actions across monitoring, access control, containment, and remediation. Evaluated in a realistic enterprise testbed using CALDERA-driven APT playbooks, DeepStage achieves a stage-weighted F1-score of 0.89, outperforming a risk-aware DRL baseline by 21.9%. The results demonstrate effective stage-aware and cost-efficient autonomous cyber defense.
Authors:Florian Holzbauer, David Schmidt, Gabriel Gegenhuber, Sebastian Schrittwieser, Johanna Ullrich
Abstract:
Agent skills extend local AI agents, such as Claude Code or Open Claw, with additional functionality, and their popularity has led to the emergence of dedicated skill marketplaces, similar to app stores for mobile applications. Simultaneously, automated skill scanners were introduced, analyzing the skill description available in SKILL.md, to verify their benign behavior. The results for individual market places mark up to 46.8% of skills as malicious. In this paper, we present the largest empirical security analysis of the AI agent skill ecosystem, questioning this high classification of malicious skills. Therefore, we collect 238,180 unique skills from three major distribution platforms and GitHub to systematically analyze their type and behavior. This approach substantially reduces the number of skills flagged as non-benign by security scanners to only 0.52% which remain in malicious flagged repositories. Consequently, out methodology substantially reduces false positives and provides a more robust view of the ecosystem's current risk surface. Beyond that, we extend the security analysis from the mere investigation of the skill description to a comparison of its congruence with the GitHub repository the skill is embedded in, providing additional context. Furthermore, our analysis also uncovers several, by now undocumented real-world attack vectors, namely hijacking skills hosted on abandoned GitHub repositories.
Authors:Isha Andrade, Shalaka S Mahadik, Mithun Mukherjee, Pranav M Pawar, Raja Muthalagu
Abstract:
The proliferation of large-scale IoT networks has been both a blessing and a curse. Not only has it revolutionized the way organizations operate by increasing the efficiency of automated procedures, but it has also simplified our daily lives. However, while IoT networks have improved convenience and connectivity, they have also increased security risk due to unauthorized devices gaining access to these networks and exploiting existing weaknesses with specific attack types. The research proposes two lightweight deep learning (DL)-based intelligent intrusion detection systems (IDS). to enhance the security of IoT networks: the proposed convolutional neural network (CNN)-based IDS and the proposed long short-term memory (LSTM)-based IDS. The research evaluated the performance of both intelligent IDSs based on DL using the CICIoT2023 dataset. DL-based intelligent IDSs successfully identify and classify various cyber threats using binary, grouped, and multi-class classification. The proposed CNN-based IDS achieves an accuracy of 99.34%, 99.02% and 98.6%, while the proposed LSTM-based IDS achieves an accuracy of 99.42%, 99.13%, and 98.68% for binary, grouped, and multi-class classification, respectively.
Authors:Jiaqi Liu, Yuanyi Zhang, Fang-Wei Fu
Abstract:
We construct a lattice-based ciphertext-policy attribute-based encryption (CP-ABE) scheme for $\mathsf{NC}^1$ access policies with constant-size ciphertexts. Let $λ$ be the security parameter. For an $\mathsf{NC}^1$ circuit of depth $d$ and size $s$ on $\ell$-bit inputs, our scheme has the public-key and ciphertext sizes $O(1)$ (independent of $d$), and secret-key size $O(\ell)$, where the $O(\cdot)$ hides $\operatorname{poly}(λ)$ factors. As an application, we obtain a broadcast encryption scheme for $N$ users with ciphertext size $\operatorname{poly}(λ)$ independent of $\log N$ and key sizes $\operatorname{poly}(λ,\log N)$. Our construction is selectively secure in the standard model under the $\operatorname{poly}(λ)$-succinct LWE assumption introduced by Wee (CRYPTO~2024).
Authors:Mahsa Tahghigh, Hassan Salmani
Abstract:
Always-on hardware Trojans pose a serious challenge to integrated circuit trust, as they remain active during normal operation and are difficult to detect in post-deployment settings without trusted golden references. This paper presents a reference-free detection framework based on cross-scale persistence analysis of electromagnetic (EM) side-channels, targeting always-on parasitic hardware behavior. The proposed method analyzes EM emissions across multiple time-frequency resolutions and constructs stability maps that capture the consistency of spectral features over repeated executions. Gaussian Mixture Models (GMMs) with Bayesian Information Criterion (BIC) based model selection are used to characterize statistical structure at each scale. We introduce cross-scale saturation, variability, and median mixture complexity metrics that quantify whether statistical structure evolves naturally or remains persistently anchored across resolutions. Experimental results on AES implementations show that Trojan-free designs exhibit scale-dependent variability consistent with transient switching behavior, while always-on Trojans produce persistent statistical signatures that suppress cross-scale evolution. Furthermore, different Trojan classes, such as workload-correlated leakage-information Trojans and independent ring-oscillator Trojans, exhibit distinct persistence patterns. These findings demonstrate that cross-scale persistence provides a physically interpretable and robust assurance signal for unsupervised, reference-free detection of always-on hardware Trojans.
Authors:Supriya Khadka, Sanchari Das
Abstract:
In decentralized web applications, users face an inherent conflict between public verifiability and personal privacy. To participate in regulated on-chain services, users must currently disclose sensitive identity documents to centralized intermediaries, permanently linking real-world identities to public transaction histories. This binary choice between total privacy loss or total exclusion strips users of agency and exposes them to persistent surveillance. In this work, we introduce a Selective Disclosure Framework designed to restore user sovereignty by decoupling eligibility verification from identity revelation. We present ZK-Compliance, a prototype that leverages browser-based zero-knowledge proofs to shift the interaction model, enabling users to prove specific attributes (e.g., "I am over 18") locally without revealing the underlying data. We implement a user-governed Grant, Verify, Revoke lifecycle that transforms the user's mental model of compliance from a permanent data handover into a dynamic, revocable authorization session. Our evaluation shows that client-side proof generation takes under 200ms, enabling a seamless interactive experience on commodity hardware. This work provides early evidence that regulatory compliance need not come at the cost of user privacy or autonomy.
Authors:Ruyi Zhang, Heng Gao, Songlei Jian, Yusong Tan, Haifang Zhou
Abstract:
Backdoor attacks compromise model reliability by using triggers to manipulate outputs. Trigger inversion can accurately locate these triggers via a generator and is therefore critical for backdoor defense. However, the discrete nature of text prevents existing noise-based trigger generator from being applied to nature language processing (NLP). To overcome the limitations, we employ the rich knowledge embedded in large language models (LLMs) and propose a Backdoor defender powered by LLM Trigger Generator, termed BadLLM-TG. It is optimized through prompt-driven reinforcement learning, using the victim model's feedback loss as the reward signal. The generated triggers are then employed to mitigate the backdoor via adversarial training. Experiments show that our method reduces the attack success rate by 76.2\% on average, outperforming the second-best defender by 13.7.
Authors:Yuyang Xia, Yaoqiang Xu, Chen Qian, Yang Li, Guoliang Li, Jianhua Feng
Abstract:
Watermarking has emerged as an effective solution for copyright protection of synthetic data. However, applying watermarking techniques to synthetic tabular data presents challenges, as tabular data can easily lose their watermarks through shuffling or deletion operations. The major challenge is to provide traceability for tracking multiple users of the watermarked tabular data while maintaining high data utility and robustness (resistance to attacks). To address this, we design a multi-bit watermarking scheme TableMark that encodes watermarks into synthetic tabular data, ensuring superior traceability and robustness while maintaining high utility. We formulate the watermark encoding process as a constrained optimization problem, allowing the data owner to effectively trade off robustness and utility. Additionally, we propose effective optimization mechanisms to solve this problem to enhance the data utility. Experimental results on four widely used real-world datasets show that TableMark effectively traces a large number of users, is resilient to attacks, and preserves high utility. Moreover, TableMark significantly outperforms state-of-the-art tabular watermarking schemes.
Authors:Zheng Gao, Yifan Yang, Xiaoyu Li, Xiaoyan Feng, Haoran Fan, Yang Song, Jiaojiao Jiang
Abstract:
Watermarking the initial noise of diffusion models has emerged as a promising approach for image provenance, but content-independent noise patterns can be forged via inversion and regeneration attacks. Recent semantic-aware watermarking methods improve robustness by conditioning verification on image semantics. However, their reliance on a single global semantic binding makes them vulnerable to localized but globally coherent semantic edits. To address this limitation and provide a trustworthy semantic-aware watermark, we propose $\underline{\textbf{S}}$emantic $\underline{\textbf{L}}$atent $\underline{\textbf{I}}$njection via $\underline{\textbf{C}}$ompartmentalized $\underline{\textbf{E}}$mbedding ($\textbf{SLICE}$). Our framework decouples image semantics into four semantic factors (subject, environment, action, and detail) and precisely anchors them to distinct regions in the initial Gaussian noise. This fine-grained semantic binding enables advanced watermark verification where semantic tampering is detectable and localizable. We theoretically justify why SLICE enables robust and reliable tamper localization and provides statistical guarantees on false-accept rates. Experimental results demonstrate that SLICE significantly outperforms existing baselines against advanced semantic-guided regeneration attacks, substantially reducing attack success while preserving image quality and semantic fidelity. Overall, SLICE offers a practical, training-free provenance solution that is both fine-grained in diagnosis and robust to realistic adversarial manipulations.
Authors:Charles Ye, Jasmine Cui, Dylan Hadfield-Menell
Abstract:
Language models remain vulnerable to prompt injection attacks despite extensive safety training. We trace this failure to role confusion: models infer roles from how text is written, not where it comes from. We design novel role probes to capture how models internally identify "who is speaking." These reveal why prompt injection works: untrusted text that imitates a role inherits that role's authority. We test this insight by injecting spoofed reasoning into user prompts and tool outputs, achieving average success rates of 60% on StrongREJECT and 61% on agent exfiltration, across multiple open- and closed-weight models with near-zero baselines. Strikingly, the degree of internal role confusion strongly predicts attack success before generation begins. Our findings reveal a fundamental gap: security is defined at the interface but authority is assigned in latent space. More broadly, we introduce a unifying, mechanistic framework for prompt injection, demonstrating that diverse prompt-injection attacks exploit the same underlying role-confusion mechanism.
Authors:Sarbartha Banerjee, Prateek Sahu, Anjo Vahldiek-Oberwagner, Jose Sanchez Vicarte, Mohit Tiwari
Abstract:
Rapid progress in generative AI has given rise to Compound AI systems - pipelines comprised of multiple large language models (LLM), software tools and database systems. Compound AI systems are constructed on a layered traditional software stack running on a distributed hardware infrastructure. Many of the diverse software components are vulnerable to traditional security flaws documented in the Common Vulnerabilities and Exposures (CVE) database, while the underlying distributed hardware infrastructure remains exposed to timing attacks, bit-flip faults, and power-based side channels. Today, research targets LLM-specific risks like model extraction, training data leakage, and unsafe generation -- overlooking the impact of traditional system vulnerabilities. This work investigates how traditional software and hardware vulnerabilities can complement LLM-specific algorithmic attacks to compromise the integrity of a compound AI pipeline. We demonstrate two novel attacks that combine system-level vulnerabilities with algorithmic weaknesses: (1) Exploiting a software code injection flaw along with a guardrail Rowhammer attack to inject an unaltered jailbreak prompt into an LLM, resulting in an AI safety violation, and (2) Manipulating a knowledge database to redirect an LLM agent to transmit sensitive user data to a malicious application, thus breaching confidentiality. These attacks highlight the need to address traditional vulnerabilities; we systematize the attack primitives and analyze their composition by grouping vulnerabilities by their objective and mapping them to distinct stages of an attack lifecycle. This approach enables a rigorous red-teaming exercise and lays the groundwork for future defense strategies.
Authors:Sunpill Kim, Chanwoo Hwang, Minsu Kim, Jae Hong Seo
Abstract:
Generative AI systems increasingly expose powerful reasoning and image refinement capabilities through user-facing chatbot interfaces. In this work, we show that the naïve exposure of such capabilities fundamentally undermines modern deepfake detectors. Rather than proposing a new image manipulation technique, we study a realistic and already-deployed usage scenario in which an adversary uses only benign, policy-compliant prompts and commercial generative AI systems. We demonstrate that state-of-the-art deepfake detection methods fail under semantic-preserving image refinement. Specifically, we show that generative AI systems articulate explicit authenticity criteria and inadvertently externalize them through unrestricted reasoning, enabling their direct reuse as refinement objectives. As a result, refined images simultaneously evade detection, preserve identity as verified by commercial face recognition APIs, and exhibit substantially higher perceptual quality. Importantly, we find that widely accessible commercial chatbot services pose a significantly greater security risk than open-source models, as their superior realism, semantic controllability, and low-barrier interfaces enable effective evasion by non-expert users. Our findings reveal a structural mismatch between the threat models assumed by current detection frameworks and the actual capabilities of real-world generative AI. While detection baselines are largely shaped by prior benchmarks, deployed systems expose unrestricted authenticity reasoning and refinement despite stringent safety controls in other domains.
Authors:Nanzi Yang, Weiheng Bai, Kangjie Lu
Abstract:
The Model Context Protocol (MCP) is a recently proposed interoperability standard that unifies how AI agents connect with external tools and data sources. By defining a set of common client-server message exchange clauses, MCP replaces fragmented integrations with a standardized, plug-and-play framework. However, to be compatible with diverse AI agents, the MCP specification relaxes many behavioral constraints into optional clauses, leading to misuse-prone SDK implementation. We identify it as a new attack surface that allows adversaries to achieve multiple attacks (e.g, silent prompt injection, DoS, etc.), named as \emph{compatibility-abusing attacks}. In this work, we present the first systematic framework for analyzing this new attack surface across multi-language MCP SDKs. First, we construct a universal and language-agnostic intermediate representation (IR) generator that normalizes SDKs of different languages. Next, based on the new IR, we propose auditable static analysis with LLM-guided semantic reasoning for cross-language/clause compliance analysis. Third, by formalizing the attack semantics of the MCP clauses, we build three attack modalities and develop a modality-guided pipeline to uncover exploitable non-compliance issues.
Authors:Prabhudarshi Nayak, Gogulakrishnan Thiyagarajan, Ritunsa Mishra, Vinay Bist
Abstract:
Collaborative threat intelligence via federated learning (FL) faces critical risks from quantum computing, which can compromise classical encryption methods. This study proposes a quantum-secure FL framework using post-quantum cryptography (PQC) to protect cross-organizational data sharing. We expose vulnerabilities in traditional FL through simulated quantum attacks on RSA encrypted gradients and introduce a hybrid architecture integrating NIST-standardized algorithms CRYSTALS-Kyber for key exchange and CRYSTALS-Dilithium for authentication. Testing on APT attack datasets demonstrated 97.6% threat detection accuracy with minimal latency overhead (18.7%), validating real-world viability. A healthcare consortium case study confirmed secure ransomware indicator sharing without breaching privacy regulations. The work highlights the urgency of quantum ready defenses and provides technical guidelines for deploying PQC in FL systems, alongside policy recommendations for standardizing quantum resilience in threat-sharing networks.
Authors:Trung V. Phan, Thomas Bauschert
Abstract:
Advanced Persistent Threats (APTs) evolve through multiple stages, each exhibiting distinct temporal and structural behaviors. Accurate stage estimation is critical for enabling adaptive cyber defense. This paper presents StageFinder, a temporal graph learning framework for multi-stage attack progression inference from fused host and network provenance data. Provenance graphs are encoded using a graph neural network to capture structural dependencies among processes, files, and connections, while a long short-term memory (LSTM) model learns temporal dynamics to estimate stage probabilities aligned with the MITRE ATT&CK framework. The model is pretrained on the DARPA OpTC dataset and fine-tuned on labeled DARPA Transparent Computing data. Experimental results demonstrate that StageFinder achieves a macro F1-score of 0.96 and reduces prediction volatility by 31 percent compared to state-of-the-art baselines (Cyberian, NetGuardian). These results highlight the effectiveness of fused provenance and temporal learning for accurate and stable APT stage inference.
Authors:Eduard Hirsch, Kristina Raab, Tobias J. Bauer, Daniel Loebenberger
Abstract:
IT systems are facing an increasing number of security threats, including advanced persistent attacks and future quantum-computing vulnerabilities. The move towards crypto-agility and post-quantum cryptography (PQC) requires a reliable inventory of cryptographic assets across heterogeneous IT environments. Due to the sheer amount of packets, it is infeasible to manually detect cryptographically relevant software. Further, static code analysis pipelines often fail to address the diversity of modern ecosystems. Our research explores the use of large language models (LLMs) as heuristic tools for cryptographic asset discovery. We propose a collaborative framework that employs multiple LLMs to assess software relevance and aggregates their outputs through majority voting. To preserve data privacy, the approach operates on-premises without reliance on external servers. Using over 65,000 Fedora Linux packages, we evaluate the reliability of this method through statistical analysis, inter-model agreement, and manual validation. Preliminary results suggest that~LLM ensembles can serve as an efficient first-pass filter for identifying cryptographic software, resulting in reduced manual workload and assisting PQC transition. The study also compares on-premises and online LLM configurations, highlighting key advantages, limitations, and future directions for automated cryptographic asset discovery.
Authors:Charles Meyers, Aaron MacSween, Erik Elmroth, Tommy Löfstedt
Abstract:
The recent developments in machine learning have highlighted a conflict between online platforms and their users in terms of privacy. The importance of user privacy and the struggle for power over user data has been intensified as regulators and operators attempt to police online platforms. As users have become increasingly aware of privacy issues, client-side data storage, management, and analysis have become a favoured approach to large-scale centralised machine learning. However, state-of-the-art machine learning methods require vast amounts of labelled user data, making them unsuitable for models that reside client-side and only have access to a single user's data. State-of-the-art methods are also computationally expensive, which degrades the user experience on compute-limited hardware and also reduces battery life. A recent alternative approach has proven remarkably successful in classification tasks across a wide variety of data -- using a compression-based distance measure (called normalised compression distance) to measure the distance between generic objects in classical distance-based machine learning methods. In this work, we demonstrate that the normalised compression distance is actually not a metric; develop it for the wider context of kernel methods to allow modelling of complex data; and present techniques to improve the training time of models that use this distance measure. We demonstrate that the normalised compression distance works as well as and sometimes better than other metrics and kernels -- while requiring only marginally more computational costs and in spite of the lack of formal metric properties. The end results is a simple model with remarkable accuracy even when trained on a very small number of samples allowing for models that are small and effective enough to run entirely on a client device using only user-supplied data.
Authors:Callum Turino, William J Buchanan, Owen Lo, Christoph Thuummler
Abstract:
Advances in quantum computing threaten digital communication security by undermining the foundations of current public-key cryptography through Shor's quantum algorithm. This has driven the development of Post-Quantum Cryptography (PQC), a new set of algorithms resistant to quantum attacks. While NIST has standardised several PQC schemes, challenges remain in their adoption. This paper introduces the PQC-LEO framework, a benchmarking suite designed to automate the evaluation of PQC computational and networking performance across x86 and ARM architectures. A proof-of-concept evaluation was conducted to demonstrate the framework's capabilities and highlight its application in supporting ongoing research on the adoption of PQC algorithms. The results show that there is a greater performance reduction in implementing PQC methods with higher security on ARM architectures than on the x86 architecture.
Authors:Md Sadik Awal, Md Tauhidur Rahman
Abstract:
Electromagnetic (EM) shielding is widely used to suppress radiated emissions and limit passive EM side-channel leakage. However, shielding does not address active probing, where an adversary injects external radio-frequency (RF) signals and observes the device's reflective response. This work studies whether such impedance-modulated backscattering persists when radiated emissions are suppressed by shielding. By injecting controlled RF signals and analyzing the reflections, we demonstrate that state-dependent impedance variations remain observable at frequencies outside the shields' primary attenuation band. Using processors implemented on FPGA and microcontroller prototypes, and evaluating workload profiles under three industry-standard shields, we find that passive EM measurements lose discriminative power under shielding, while backscattering responses remain separable. These results indicate that active RF probing can expose execution-dependent behavior even in shielded systems, motivating the need to consider active impedance-based probing within hardware security evaluation flows.
Authors:Wagner Comin Sonaglio, Ágney Lopes Roth Ferraz, Lourenço Alves Pereira Júnior
Abstract:
This paper examines how logical vulnerabilities in 5G Standalone networks affect UAV command and control communication. The study looks at three attacker positions in the architecture: a malicious user equipment (UE) connected to the same logical network as the UAV, an attacker with access to the 5G core, and a compromised gNodeB. To test these scenarios, a testbed was created using Open5GS, UERANSIM, and Kubernetes. The setup simulates a UAV-GCS communication system over a 5G SA network and allows for controlled attacks on various network interfaces. The experiments reveal that attacks at different points in the architecture can disrupt UAV operations. These disruptions include manipulating control commands and terminating data sessions. The findings emphasize the need for isolation measures in the 5G user plane and integrity protection in UAV command protocols.
Authors:Zhuoran Pan, Yue Li, Zhi Guan, Jianbin Hu, Zhong Chen
Abstract:
The increasingly complex Web3 ecosystem and decentralized finance (DeFi) landscape demand ever higher levels of technical expertise and financial literacy from participants. The Intent-Centric paradigm in DeFi has thus emerged in response, which allows users to focus on their trading intents rather than the underlying execution details. However, existing approaches, including Typed-intent design and LLM-driven solver, trade off expressiveness, trust, privacy, and composability. We present OMNIINTENT, a language-runtime co-design that reconciles these requirements. OMNIINTENT introduces ICL, a domain-specific Intent-Centric Language for precise yet flexible specification of triggers, actions, and runtime constraints; a Trusted Execution Environment (TEE)-based compiler that compiles intents into signed, state-bound transactions inside an enclave; and an execution optimizer that constructs transaction dependency graphs for safe parallel batch submission and a mempool-aware feasibility checker that predicts execution outcomes. Our full-stack prototype processes diverse DeFi scenarios, achieving 89.6% intent coverage, up to 7.3x throughput speedup via parallel execution, and feasibility-prediction accuracy up to 99.2% with low latency.
Authors:Chiara Bonfanti, Davide Colaiacomo, Luca Cagliero, Cataldo Basile
Abstract:
Web security demands rapid response capabilities to evolving cyber threats. Agentic Artificial Intelligence (AI) promises automation, but the need for trustworthy security responses is of the utmost importance. This work investigates the role of semantic relations in extracting information for sensitive operational tasks, such as configuring security controls for mitigating threats. To this end, it proposes to leverage hypernym-hyponym textual relations to extract relevant information from Cyber Threat Intelligence (CTI) reports. By leveraging a neuro-symbolic approach, the multi-agent system automatically generates CLIPS code for an expert system creating firewall rules to block malicious network traffic. Experimental results show the superior performance of the hypernym-hyponym retrieval strategy compared to various baselines and the higher effectiveness of the agentic approach in mitigating threats.
Authors:Seoksu Lee, Sangjun An, Eun-Sun Cho
Abstract:
We propose Scrambler, and e-graph-based MBA obfuscation tool using Equality Expansion to efficiently generate complex and diverse expressions with equivalence guaranteed by construction. Experiments show Scrambler improves existing tools in expressiveness and complexity.
Authors:Chanwoo Hwang, Sunpill Kim, Yong Kiam Tan, Tianchi Liu, Seunghun Paik, Dongsoo Kim, Mondal Soumik, Khin Mi Mi Aung, Jae Hong Seo
Abstract:
Advances in deep learning have enabled the widespread deployment of speaker recognition systems (SRSs), yet they remain vulnerable to score-based impersonation attacks. Existing attacks that operate directly on raw waveforms require a large number of queries due to the difficulty of optimizing in high-dimensional audio spaces. Latent-space optimization within generative models offers improved efficiency, but these latent spaces are shaped by data distribution matching and do not inherently capture speaker-discriminative geometry. As a result, optimization trajectories often fail to align with the adversarial direction needed to maximize victim scores. To address this limitation, we propose an inversion-based generative attack framework that explicitly aligns the latent space of the synthesis model with the discriminative feature space of SRSs. We first analyze the requirements of an inverse model for score-based attacks and introduce a feature-aligned inversion strategy that geometrically synchronizes latent representations with speaker embeddings. This alignment ensures that latent updates directly translate into score improvements. Moreover, it enables new attack paradigms, including subspace-projection-based attacks, which were previously infeasible due to the absence of a faithful feature-to-audio mapping. Experiments show that our method significantly improves query efficiency, achieving competitive attack success rates with on average 10x fewer queries than prior approaches. In particular, the enabled subspace-projection-based attack attains up to 91.65% success using only 50 queries. These findings establish feature-aligned inversion as a key tool for evaluating the robustness of modern SRSs against score-based impersonation threats.
Authors:Andrei Lebedev, Vincent Gramoli
Abstract:
Blockchains are diverse in the way they handle communications between their nodes to disseminate information, mitigate attacks, and agree on the next block. While security vulnerabilities have been identified, they rely on an attack custom-made for a specific blockchain communication protocol. To our knowledge, the vulnerabilities of multiple blockchain communication protocols to adversarial conditions have never been compared. In this paper, we compare empirically the vulnerabilities of the communication protocols of five modern in-production blockchains, Algorand, Aptos, Avalanche, Redbelly and Solana, when attacked in five different ways. We conclude that Algorand is vulnerable to packet loss attacks, Aptos is vulnerable to targeted load attacks and leader isolation attacks, Avalanche is vulnerable to transient failure attacks, Redbelly's performance is impacted by packet loss attacks and Solana is vulnerable to stopping attacks and leader isolation attacks. Our system is open source.
Authors:Oluseyi Olukola, Nick Rahimi
Abstract:
Machine learning based network intrusion detection systems are vulnerable to adversarial attacks that degrade classification performance under both gradient-based and distribution shift threat models. Existing defenses typically apply uniform detection strategies, which may not account for heterogeneous attack characteristics. This paper proposes an attack-aware multi-stage defense framework that learns attack-specific detection strategies through a weighted combination of ensemble disagreement, predictive uncertainty, and distributional anomaly signals. Empirical analysis across seven adversarial attack types reveals distinct detection signatures, enabling a two-stage adaptive detection mechanism. Experimental evaluation on a benchmark intrusion detection dataset indicates that the proposed system attains 94.2% area under the receiver operating characteristic curve and improves classification accuracy by 4.5 percentage points and F1-score by 9.0 points over adversarially trained ensembles. Under adaptive white-box attacks with full architectural knowledge, the system appears to maintain 94.4% accuracy with a 4.2% attack success rate, though this evaluation is limited to two adaptive variants and does not constitute a formal robustness guarantee. Cross-dataset validation further suggests that defense effectiveness depends on baseline classifier competence and may vary with feature dimensionality. These results suggest that attack-specific optimization combined with multi-signal integration can provide a practical approach to improving adversarial robustness in machine learning-based intrusion detection systems.
Authors:Brianna D'Urso, Tahmid Hasan Sakib, Syed Rafay Hasan, Terry N. Guo
Abstract:
This paper studies how well Naturalistic Adversarial Patches (NAPs) transfer to a physical traffic sign setting when the detector is trained on a customized dataset for an autonomous vehicle (AV) environment. We construct a composite dataset, CompGTSRB (which is customized dataset for AV environment), by pasting traffic sign instances from the German Traffic Sign Recognition Benchmark (GTSRB) onto undistorted backgrounds captured from the target platform. CompGTSRB is used to train a YOLOv5 model and generate patches using a Generative Adversarial Network (GAN) with latent space optimization, following existing NAP methods. We carried out a series of experiments on our Quanser QCar testbed utilizing the front CSI camera provided in QCar. Across configurations, NAPs reduce the detector's STOP class confidence. Different configurations include distance, patch sizes, and patch placement. These results along with a detailed step-by-step methodology indicate the utility of CompGTSRB dataset and the proposed systematic physical protocols for credible patch evaluation. The research further motivate researching the defenses that address localized patch corruption in embedded perception pipelines.
Authors:Jayesh Choudhari, Piyush Kumar Singh
Abstract:
Domain fine-tuning is a common path to deploy small instruction-tuned language models as customer-support assistants, yet its effects on safety-aligned behavior and privacy are not well understood. In real deployments, such assistants receive a mixture of benign in-domain requests and out-of-domain user queries that are emotional, philosophical, or adversarial. Even when the target domain is benign, specialization may shift model behavior in ways that weaken refusal, increase harmful compliance, and induce privacy leakage. We present a controlled empirical study of how training data composition (presence vs.\ removal of PII) and fine-tuning configuration (role-swapping (RS)) shape safety and out-of-domain behavior in open-source chat models up to 8B parameters. We fine-tune each model on 5{,}000 real booking-support message pairs under three settings: \textsc{NoPII-NoRS}, \textsc{PII-NoRS}, and \textsc{PII-RS} (role-swapped). We evaluate safety using \textsc{SORRY-Bench}~\cite{xie2024sorry} adversarial prompts and assess out-of-domain behavior using a suite of philosophical questions~\cite{betley2025emergent}. Across models, domain fine-tuning causes a large distributional shift from high-quality refusals toward harmful compliance on \textsc{SORRY-Bench}, with the most severe degradation when PII is present in the fine-tuning data. For example, macro-averaged strong refusal drops from $42.6\%$ in base models to single digits after fine-tuning, while PII-bearing runs additionally exhibit double-digit rates of harmful responses with PII leakage. On philosophical queries, fine-tuned models frequently exhibit domain anchoring and, when trained with PII, leak sensitive identifiers in irrelevant contexts. Role-swapping partially mitigates PII leakage but does not reliably restore refusal behavior.
Authors:Jie Li, Jing Li, Lu Lv, Zhanyu Ju, Fengkui Gong
Abstract:
Drone Remote Identification (RID) plays a critical role in low-altitude airspace supervision, yet its broadcast nature and lack of cryptographic protection make it vulnerable to spoofing and replay attacks. In this paper, we propose a consistency verification-based physical-layer authentication (PLA) algorithm for drone RID frames. A RID-aware sensing and decoding module is first developed to extract communication-derived sensing parameters, including angle-of-arrival, Doppler shift, average channel gain, and the number of transmit antennas, together with the identity and motion-related information decoded from previously authenticated RID frames. Rather than fusing all heterogeneous information into a single representation, different types of information are selectively utilized according to their physical relevance and reliability. Specifically, real-time wireless sensing parameter constraints and previously authenticated motion states are incorporated in a yaw-augmented constant-acceleration extended Kalman filter (CA-EKF) to estimate the three-dimensional position and motion states of the drone. To further enhance authentication reliability under highly maneuverable and non-stationary flight scenarios, a data-driven long short-term memory-based motion estimator is employed, and its predictions are adaptively combined with the CA-EKF via an error-aware fusion strategy. Finally, RID frames are authenticated by verifying consistency in the number of transmit antennas, motion estimates, and no-fly-zone constraints. Simulation results demonstrate that the proposed algorithm significantly improves authentication reliability and robustness under realistic wireless impairments and complex drone maneuvers, outperforming existing RF feature-based and motion model-based PLA schemes.
Authors:Piyush Jaiswal, Aaditya Pratap, Shreyansh Saraswati, Harsh Kasyap, Somanath Tripathy
Abstract:
Large Language Models (LLMs) are widely deployed in real-world systems. Given their broader applicability, prompt engineering has become an efficient tool for resource-scarce organizations to adopt LLMs for their own purposes. At the same time, LLMs are vulnerable to prompt-based attacks. Thus, analyzing this risk has become a critical security requirement. This work evaluates prompt-injection and jailbreak vulnerability using a large, manually curated dataset across multiple open-source LLMs, including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen, and Gemma variants. We observe significant behavioural variation across models, including refusal responses and complete silent non-responsiveness triggered by internal safety mechanisms. Furthermore, we evaluated several lightweight, inference-time defence mechanisms that operate as filters without any retraining or GPU-intensive fine-tuning. Although these defences mitigate straightforward attacks, they are consistently bypassed by long, reasoning-heavy prompts.
Authors:Tarek Ramadan, AbdelRahman Abdou, Mohammad Mannan, Amr Youssef
Abstract:
A large number of URLs are made public by various platforms for security analysis, archiving, and paste sharing -- such as VirusTotal, URLScan.io, Hybrid Analysis, the Wayback Machine, and RedHunt. These services may unintentionally expose links containing sensitive information, as reported in some news articles and blog posts. However, no large-scale measurement has quantified the extent of such exposures. We present an automated system that detects and analyzes potential sensitive information leaked through publicly accessible URLs. The system combines lexical URL filtering, dynamic rendering, OCR-based extraction, and content classification to identify potential leaks. We apply it to 6,094,475 URLs collected from public scanning platforms, paste sites, and web archives, identifying 12,331 potential exposures across authentication, financial, personal, and document-related domains. These findings show that sensitive information remains exposed, underscoring the importance of automated detection to identify accidental leaks.
Authors:Delio Jaramillo Velez, Gergely Biczok, Alexandre Graell i Amat, Johan Ostman, Balazs Pejo
Abstract:
Cross-silo federated learning allows multiple organizations to collaboratively train machine learning models without sharing raw data, but client updates can still leak sensitive information through inference attacks. Secure aggregation protects privacy by hiding individual updates, yet it complicates contribution evaluation, which is critical for fair rewards and detecting low-quality or malicious participants. Existing marginal-contribution methods, such as the Shapley value, are incompatible with secure aggregation, and practical alternatives, such as Leave-One-Out, are crude and rely on self-evaluation. We introduce two marginal-difference contribution scores compatible with secure aggregation. Fair-Private satisfies standard fairness axioms, while Everybody-Else eliminates self-evaluation and provides resistance to manipulation, addressing a largely overlooked vulnerability. We provide theoretical guarantees for fairness, privacy, robustness, and computational efficiency, and evaluate our methods on multiple medical image datasets and CIFAR10 in cross-silo settings. Our scores consistently outperform existing baselines, better approximate Shapley-induced client rankings, and improve downstream model performance as well as misbehavior detection. These results demonstrate that fairness, privacy, robustness, and practical utility can be achieved jointly in federated contribution evaluation, offering a principled solution for real-world cross-silo deployments.
Authors:Zheng Gao, Xiaoyu Li, Zhicheng Bao, Xiaoyan Feng, Jiaojiao Jiang
Abstract:
Generative images have proliferated on Web platforms in social media and online copyright distribution scenarios, and semantic watermarking has increasingly been integrated into diffusion models to support reliable provenance tracking and forgery prevention for web content. Traditional noise-layer-based watermarking, however, remains vulnerable to inversion attacks that can recover embedded signals. To mitigate this, recent content-aware semantic watermarking schemes bind watermark signals to high-level image semantics, constraining local edits that would otherwise disrupt global coherence. Yet, large language models (LLMs) possess structured reasoning capabilities that enable targeted exploration of semantic spaces, allowing locally fine-grained but globally coherent semantic alterations that invalidate such bindings. To expose this overlooked vulnerability, we introduce a Coherence-Preserving Semantic Injection (CSI) attack that leverages LLM-guided semantic manipulation under embedding-space similarity constraints. This alignment enforces visual-semantic consistency while selectively perturbing watermark-relevant semantics, ultimately inducing detector misclassification. Extensive empirical results show that CSI consistently outperforms prevailing attack baselines against content-aware semantic watermarking, revealing a fundamental security weakness of current semantic watermark designs when confronted with LLM-driven semantic perturbations.
Authors:Tingxuan Tang, Nicolas Janis, Kalyn Asher Montague, Kevin Eykholt, Dhilung Kirat, Youngja Park, Jiyong Jang, Adwait Nadkarni, Yue Xiao
Abstract:
Capture-the-Flag (CTF) competitions are increasingly becoming a testbed for evaluating AI capabilities at solving security tasks, due to the controlled environments and objective success criteria. Existing evaluations have focused on how successful AI is at solving CTF challenges in isolation from human CTF players. As AI usage increases in both academic and industrial settings, it is equally likely that human players may collaborate with AI agents to solve challenges. This possibility exposes a key knowledge gap: how do humans perceive AI CTF assistance; when assistance is provided, how do they collaborate and is it effective with respect to human performance; how do humans assisted by AI compare to the performance of fully autonomous AI agents on the same challenges. We address this gap with the first empirical study of AI assistance in a live, onsite CTF. In a study with 41 participants, we qualitatively study (i) how participants' perception, trust, and expectations shift before versus after hands-on AI use, and (ii) how participants collaborate with an instrumented AI agent. Moreover, we also (iii) benchmark four autonomous AI agents on the same fresh challenge set to compare outcomes with human teams and analyze agent trajectories. We find that, as the competition progresses, teams increasingly delegate larger subtasks to the AI, giving it more agency. Interestingly, CTF challenges solving rates are often constrained not by model's reasoning capabilities, but rather by the human players: ineffective prompting and poor context specification become the primary bottleneck. Remarkably, autonomous agents that self-direct their prompting and tool use bypass this bottleneck and outperform most human teams, coming in second overall in the competition. We conclude with implications for the future design of CTF challenges and for building effective human-in-the-loop AI systems for security.
Authors:Cezar-Constantin Andrici, Abigail Pribisova, Danel Ahman, Catalin Hritcu, Exequiel Rivas, Théo Winterhalter
Abstract:
Shallow embeddings that use monads to represent effects are popular in proof-oriented languages because they are convenient for formal verification. Once shallowly embedded programs are verified, they are often extracted to mainstream languages like OCaml or C and linked into larger codebases. The extraction process is not fully verified because it often involves quotation -- turning the shallowly embedded program into a deeply embedded one -- and verifying quotation remains a major open challenge. Instead, some prior work obtains formal correctness guarantees using translation validation to certify individual extraction results. We build on this idea, but limit the use of translation validation to a first extraction step that we call relational quotation and that uses a metaprogram to construct a typing derivation for the given shallowly embedded program. This metaprogram is simple, since the typing derivation follows the structure of the original program. Once we validate, syntactically, that the typing derivation is valid for the original program, we pass it to a verified syntax-generation function that produces code guaranteed to be semantically related to the original program. We apply this general idea to build SEIO*, a framework for extracting shallowly embedded F* programs with IO to a deeply embedded lambda-calculus while providing formal secure compilation guarantees. Using two cross-language logical relations, we devise a machine-checked proof in F* that SEIO* guarantees Robust Relational Hyperproperty Preservation (RrHP), a very strong secure compilation criterion that implies full abstraction as well as preservation of trace properties and hyperproperties against arbitrary adversarial contexts. This goes beyond the state of the art in verified and certifying extraction, which so far has focused on correctness rather than security.
Authors:Harrison Green, Fraser Brown, Claire Le Goues
Abstract:
Fuzzing is a powerful technique for finding bugs in software libraries, but scaling it remains difficult. Automated harness generation commits to fixed API sequences at synthesis time, limiting the behaviors each harness can test. Approaches that instead explore new sequences dynamically lack the expressiveness to model real-world usage constraints leading to false positives from straightforward API misuse. We propose stitching, a technique that encodes API usage constraints in pieces that a fuzzer dynamically assembles at runtime. A static type system governs how objects flow between blocks, while a dynamically-checked extrinsic typestate tracks arbitrary metadata across blocks, enabling specifications to express rich semantic constraints such as object state dependencies and cross-function preconditions. This allows a single specification to describe an open-ended space of valid API interactions that the fuzzer explores guided by coverage feedback. We implement stitching in STITCH, using LLMs to automatically configure projects for fuzzing, synthesize a specification, triage crashes, and repair the specification itself. We evaluated STITCH against four state-of-the-art tools on 33 benchmarks, where it achieved the highest code coverage on 21 and found 30 true-positive bugs compared to 10 by all other tools combined, with substantially higher precision (70% vs. 12% for the next-best LLM-based tool). Deployed automatically on 1365 widely used open-source projects, STITCH discovered 131 new bugs across 102 projects, 73 of which have already been patched.
Authors:Supriya Khadka, Dhiman Goswami, Sanchari Das
Abstract:
Digital identity verification often forces a privacy trade-off, where users must disclose sensitive personal data to prove simple eligibility criteria. As blockchain applications integrate with regulated environments, this over-disclosure creates significant risks of data breaches and surveillance. This work proposes a general Selective Disclosure Framework built on Ethereum, designed to decouple attribute verification from identity revelation. By utilizing client-side zk-SNARKs, the framework enables users to prove specific eligibility predicates without revealing underlying identity documents. We present a case study, ZK-Compliance, which implements a functional Grant, Verify, Revoke lifecycle for age verification. Preliminary results indicate that strict compliance requirements can be satisfied with negligible client-side latency (< 200 ms) while preserving the pseudonymous nature of public blockchains.
Authors:Thomas Michel, Debabrota Basu, Emilie Kaufmann
Abstract:
Modern AI models are not static. They go through multiple updates in their lifecycles. Thus, exploiting the model dynamics to create stronger Membership Inference (MI) attacks and tighter privacy audits are timely questions. Though the literature empirically shows that using a sequence of model updates can increase the power of MI attacks, rigorous analysis of the `optimal' MI attacks is limited to static models with infinite samples. Hence, we develop an `optimal' MI attack, SeMI*, that uses the sequence of model updates to identify the presence of a target inserted at a certain update step. For the empirical mean computation, we derive the optimal power of SeMI*, while accessing a finite number of samples with or without privacy. Our results retrieve the existing asymptotic analysis. We observe that having access to the model sequence avoids the dilution of MI signals unlike the existing attacks on the final model, where the MI signal vanishes as training data accumulates. Furthermore, an adversary can use SeMI* to tune both the insertion time and the canary to yield tighter privacy audits. Finally, we conduct experiments across data distributions and models trained or fine-tuned with DP-SGD demonstrating that practical variants of SeMI* lead to tighter privacy audits than the baselines.
Authors:Jean Dufraiche, Paul Mangold, Michaël Perrot, Marc Tommasi
Abstract:
Releasing data once and for all under noninteractive Local Differential Privacy (LDP) enables complete data reusability, but the resulting noise may create bias in subsequent analyses. In this work, we leverage the Weierstrass transform to characterize this bias in binary classification. We prove that inverting this transform leads to a bias-correction method to compute unbiased estimates of nonlinear functions on examples released under LDP. We then build a novel stochastic gradient descent algorithm called Inverse Weierstrass Private SGD (IWP-SGD). It converges to the true population risk minimizer at a rate of $\mathcal{O}(1/n)$, with $n$ the number of examples. We empirically validate IWP-SGD on binary classification tasks using synthetic and real-world datasets.
Authors:Amirmohammad Pasdar, Shabnam Kasra Kermanshahi, Nour Moustafa, Van-Thuan Pham
Abstract:
The Internet of Battlefield Things (IoBT) relies on heterogeneous, bandwidth-constrained, and intermittently connected tactical networks that face rapidly evolving cyber threats. In this setting, intrusion detection cannot depend on continuous central collection of raw traffic due to disrupted links, latency, operational security limits, and non-IID traffic across zones. We present Zone-Adaptive Intrusion Detection (ZAID), a collaborative detection and model-improvement framework for unseen attack types, where "zero-day" refers to previously unobserved attack families and behaviours (not vulnerability disclosure timing). ZAID combines a universal convolutional model for generalisable traffic representations, an autoencoder-based reconstruction signal as an auxiliary anomaly score, and lightweight adapter modules for parameter-efficient zone adaptation. To support cross-zone generalisation under constrained connectivity, ZAID uses federated aggregation and pseudo-labelling to leverage locally observed, weakly labelled behaviours. We evaluate ZAID on ToN_IoT using a zero-day protocol that excludes MITM, DDoS, and DoS from supervised training and introduces them during zone-level deployment and adaptation. ZAID achieves up to 83.16% accuracy on unseen attack traffic and transfers to UNSW-NB15 under the same procedure, with a best accuracy of 71.64%. These results indicate that parameter-efficient, zone-personalised collaboration can improve the detection of previously unseen attacks in contested IoBT environments.
Authors:Yiran Gao, Kim Hammar, Tao Li
Abstract:
Rapidly evolving cyberattacks demand incident response systems that can autonomously learn and adapt to changing threats. Prior work has extensively explored the reinforcement learning approach, which involves learning response strategies through extensive simulation of the incident. While this approach can be effective, it requires handcrafted modeling of the simulator and suppresses useful semantics from raw system logs and alerts. To address these limitations, we propose to leverage large language models' (LLM) pre-trained security knowledge and in-context learning to create an end-to-end agentic solution for incident response planning. Specifically, our agent integrates four functionalities, perception, reasoning, planning, and action, into one lightweight LLM (14b model). Through fine-tuning and chain-of-thought reasoning, our LLM agent is capable of processing system logs and inferring the underlying network state (perception), updating its conjecture of attack models (reasoning), simulating consequences under different response strategies (planning), and generating an effective response (action). By comparing LLM-simulated outcomes with actual observations, the LLM agent repeatedly refines its attack conjecture and corresponding response, thereby demonstrating in-context adaptation. Our agentic approach is free of modeling and can run on commodity hardware. When evaluated on incident logs reported in the literature, our agent achieves recovery up to 23% faster than those of frontier LLMs.
Authors:Neha Gupta, Hamed Alimohammadi, Mohammad Shojafar, De Mi, Muhammad N. M. Bhutta
Abstract:
The Open Radio Access Network (O-RAN) offers flexibility and innovation but introduces unique security vulnerabilities, particularly from cryptographically relevant quantum computers. While Post-Quantum Cryptography (PQC) is the primary scalable defence, its computationally intensive handshakes create a significant bottleneck for the RAN control plane, posing sustainability challenges. This paper proposes an energy-aware framework to solve this PQC bottleneck, ensuring quantum resilience without sacrificing operational energy efficiency. The system employs an O-RAN aligned split: a Crypto Policy rApp residing in the Non-Real-Time (Non-RT) RIC defines the strategic security envelope (including PQC suites), while a Security Operations Scheduling (SOS) xApp in the Near-RT RIC converts these into tactical timing and placement intents. Cryptographic enforcement remains at standards-compliant endpoints: the Open Fronthaul utilizes Media Access Control Security (MACsec) at the O-DU/O-RU, while the xhaul (midhaul and backhaul) utilizes IP Security (IPsec) at tunnel terminators. The SOS xApp reduces PQC overhead by batching non-urgent handshakes, prioritizing session resumption, and selecting parameters that meet slice SLAs while minimizing joules per secure connection. We evaluate the architecture via a Discrete-Event Simulation (DES) using 3GPP-aligned traffic profiles and verified hardware benchmarks from literature. Results show that intelligent scheduling can reduce per-handshake energy by approximately 60 percent without violating slice latency targets.
Authors:Christian Rondanini, Barbara Carminati, Elena Ferrari, Niccolò Lardo, Ashish Kundu
Abstract:
The proliferation of edge devices has created an urgent need for security solutions capable of detecting malware in real time while operating under strict computational and memory constraints. Recently, Large Language Models (LLMs) have demonstrated remarkable capabilities in recognizing complex patterns, yet their deployment on edge devices remains impractical due to their resource demands. However, in edge malware detection, static or centrally retrained models degrade under evolving threats and heterogeneous traffic; locally trained models become siloed and fail to transfer across domains. To overcome these limitations, in this paper, we present a continuous learning architecture for edge-based malware detection that combines local adaptation on each device with global knowledge sharing through parameter-efficient LoRA adapters. Lightweight transformer models (DistilBERT, DistilGPT-2, TinyT5) run on edge nodes and are incrementally fine-tuned on device-specific traffic; only the resulting LoRA modules are aggregated by a lightweight coordinator and redistributed, enabling cross-device generalization without exchanging raw data. We evaluate on two public IoT security datasets, Edge-IIoTset and TON-IoT, under multi-round learning to simulate evolving threats. Compared to isolated fine-tuning, the LoRA-based exchange yields up to 20-25% accuracy gains when models encounter previously unseen attacks from another domain, while maintaining stable loss and F1 across rounds. LoRA adds less than 1% to model size (~0.6-1.8 MB), making updates practical for constrained edge hardware.
Authors:Ye Yu, Yifan Zhou, Yi Chen, Pedro Soto, Wenjie Xiong, Meng Li
Abstract:
Generative large language models (LLMs) have revolutionized multiple domains. Modern LLMs predominantly rely on an autoregressive decoding strategy, which generates output tokens sequentially and employs a key-value cache (KV cache) to avoid redundant computation. However, the widespread deployment of LLMs has raised serious privacy concerns, as users are feeding all types of data into the model, motivating the development of secure inference frameworks based on fully homomorphic encryption (FHE). A major limitation of existing FHE-based frameworks is their inability to effectively integrate the KV cache, resulting in prohibitively high latency for autoregressive decoding. In this paper, we propose Cachemir, a KV Cache Accelerated Homomorphic Encrypted LLM Inference Regime to overcome this limitation. Cachemir comprises three key technical contributions: 1) a set of novel HE packing algorithms specifically designed to leverage the computational advantages of the KV cache; 2) an interleaved replicated packing algorithm to efficiently compute the vector-matrix multiplications that result from using the KV cache in Transformer linear layers; and 3) an augmented bootstrapping placement strategy that accounts for the KV cache to minimize bootstrapping cost. We demonstrate that Cachemir achieves $48.83\times$ and $67.16\times$ speedup over MOAI (ICML'25) and THOR (CCS'25) respectively on CPU and consumes less than 100 seconds on GPU to generate an output token for Llama-3-8B.
Authors:Alex Wollman, John Hastings
Abstract:
Unikernels are single-purpose library operating systems that run the kernel and application in one address space, but often omit security mitigations such as address space layout randomization (ASLR). In OSv, boot, program loading, and thread creation select largely deterministic addresses, leading to near-identical layouts across instances and more repeatable exploitation. To reduce layout predictability, this research introduces ASLR-style diversity into OSv by randomizing the application base and thread stack regions through targeted changes to core memory-management and loading routines. The implementation adds minimal complexity while preserving OSv's lightweight design goals. Evaluation against an unmodified baseline finds comparable boot time, application runtime, and memory usage. Analysis indicates that the generated addresses exhibit a uniform distribution. These results show that layout-randomization defenses can be efficiently and effectively integrated into OSv unikernels, improving resistance to reliable exploitation.
Authors:Ziyi Yang, Kalit Inani, Keshav Kabra, Vima Gupta, Anand Padmanabha Iyer
Abstract:
While AI-coding assistants accelerate software development, current testing frameworks struggle to keep pace with the resulting volume of AI-generated code. Traditional fuzzing techniques often allocate resources uniformly and lack semantic awareness of algorithmic vulnerability patterns, leading to inefficient resource usage and missed vulnerabilities. To address these limitations, we present a hybrid testing framework that leverages LLM-guided adaptive fuzzing to detect algorithmic vulnerabilities efficiently. Our system SAFuzz integrates prompt-based behavioral diversification, harness generation with problem-specific oracles, and an LLM-based predictor to enable adaptive resource allocation and dynamic early stopping. Evaluating SAFuzz on CSES algorithmic problems, we improve vulnerability discrimination precision from 77.9% to 85.7% and achieve a 1.71x reduction in time cost compared to SOTA GreenFuzz while maintaining comparable recall. We further observe that combining our approach with existing unit test generation methods yields complementary gains, increasing the bug detection recall from 67.3% to 79.5%.
Authors:Ruisheng Shi, Zhiyuan Peng, Tong Fu, Lina Lan, Qin Wang, Jiaqi Zeng
Abstract:
Many tracking companies collect user data and sell it to data markets and advertisers. While they claim to protect user privacy by anonymizing the data, our research reveals that significant privacy risks persist even with anonymized data. Attackers can exploit this data to identify users' accounts on other websites and perform targeted identity alignment. In this paper, we propose an effective identity alignment scheme for accurately identifying targeted users. We develop a data collector to obtain the necessary datasets, an algorithm for identity alignment, and, based on this, construct two types of de-anonymization attacks: the \textit{passive attack}, which analyzes tracker data to align identities, and the \textit{active attack}, which induces users to interact online, leading to higher success rates. Furthermore, we introduce, for the first time, a novel evaluation framework for online tracking-based identity alignment. We investigate the key factors influencing the effectiveness of identity alignment. Additionally, we provide an independent assessment of our generated dataset and present a fully functional system prototype applied to a cryptocurrency use case.
Authors:Ruisheng Shi, Ziding Lin, Haoran Sun, Qin Wang, Shihan Zhang, Lina Lan, Zhiyuan Peng, Chenfeng Wang
Abstract:
Cryptomining poses significant security risks, yet traditional detection methods like blacklists and Deep Packet Inspection (DPI) are often ineffective against encrypted mining traffic and suffer from high false positive rates. In this paper, we propose a practical encrypted cryptomining traffic detection mechanism. It consists of a two-stage detection framework, which can effectively provide fine-grained detection results by machine learning and reduce false positives from classifiers through active probing. Our system achieves an F1-score of 0.99 and identifies specific cryptocurrencies with a 99.39\% accuracy rate. Extensive testing across various mining pools confirms the effectiveness of our approach, offering a more precise and reliable solution for identifying cryptomining activities.
Authors:Nicolai Maisch, Shengjian Chen, Alexander Robertus, Samed Ajdinović, Armin Lechler, Alexander Verl, Oliver Riedel
Abstract:
This work presents a concept and implementation for the secure storage and transfer of quality-relevant data of milled workpieces from online-quality assurance processes enabled by real-time simulation models. It utilises Non-Fungible Tokens (NFT) to securely and interoperably store quality data in the form of an Asset Administration Shell (AAS) on a public Ethereum blockchain. Minted by a custom smart contract, the NFTs reference the metadata saved in the Interplanetary File System (IPFS), allowing new data from additional processing steps to be added in a flexible yet secure manner. The concept enables automated traceability throughout the value chain, minimising the need for time-consuming and costly repetitive manual quality checks.
Authors:Yunpeng Tan, Qingyang Li, Mingxin Yang, Yannan Hu, Lei Zhang, Xinggong Zhang
Abstract:
Encryption has been commonly used in network traffic to secure transmission, but it also brings challenges for malicious traffic detection, due to the invisibility of the packet payload. Graph-based methods are emerging as promising solutions by leveraging multi-host interactions to promote detection accuracy. But most of them face a critical problem: Graph Drift, where the flow statistics or topological information of a graph change over time. To overcome these drawbacks, we propose a graph-assisted encrypted traffic detection system, MalMoE, which applies Mixture of Experts (MoE) to select the best expert model for drift-aware classification. Particularly, we design 1-hop-GNN-like expert models that handle different graph drifts by analyzing graphs with different features. Then, the redesigned gate model conducts expert selection according to the actual drift. MalMoE is trained with a stable two-stage training strategy with data augmentation, which effectively guides the gate on how to perform routing. Experiments on open-source, synthetic, and real-world datasets show that MalMoE can perform precise and real-time detection.
Authors:Asmaa Cherkaoui, Faraz Heravi, Delaram Kahrobaei, Siamak F. Shahandashti
Abstract:
The advent of quantum computation compels the cryptographic community to design digital signature schemes whose security extends beyond the classical hardness assumptions. In this work, we introduce Spinel, a post-quantum digital signature scheme that combines the proven security of SPHINCS+ (CCS 2019) with a new family of algebraic hash functions (Adv. Math. Commun. 2025) derived from the Tillich-Zemor paradigm (Eurocrypt 2008) with security rooted in the hardness of navigating expander graphs over $\mathrm{SL}_n(\mathbb{F}_p)$, a problem believed to be hard even for quantum adversaries. We first provide empirical evidence of the security of this hash function, complementing the original theoretical analysis. We then show how the hash function can be integrated within the SPHINCS+ framework to give a secure signature scheme. We then model and analyze the security degradation of the proposed scheme, which informs the parameter selection we discuss next. Finally, we provide an implementation of the hash function and the proposed signature scheme Spinel as well as detailed empirical results for the performance of Spinel showing its feasibility in practice. Our approach lays the foundations for the design of algebraic hash-based signature schemes, expanding the toolkit of post-quantum cryptography.
Authors:Gianluca Capozzi, Anna Paola Giancaspro, Fabio Petroni, Leonardo Querzoni, Giuseppe Antonio Di Luna
Abstract:
Binary Function Similarity (BFS), the problem of determining whether two binary functions originate from the same source code, has been extensively studied in recent research across security, software engineering, and machine learning communities. This interest arises from its central role in developing vulnerability detection systems, copyright infringement analysis, and malware phylogeny tools. Nearly all binary function similarity systems embed assembly functions into real-valued vectors, where similar functions map to points that lie close to each other in the metric space. These embeddings enable function search: a query function is embedded and compared against a database of candidate embeddings to retrieve the most similar matches. Despite their effectiveness, such systems rely on bi-encoder architectures that embed functions independently, limiting their ability to capture cross-function relationships and similarities. To address this limitation, we introduce ReSIM, a novel and enhanced function search system that complements embedding-based search with a neural re-ranker. Unlike traditional embedding models, our reranking module jointly processes query-candidate pairs to compute ranking scores based on their mutual representation, allowing for more accurate similarity assessment. By re-ranking the top results from embedding-based retrieval, ReSIM leverages fine-grained relation information that bi-encoders cannot capture. We evaluate ReSIM across seven embedding models on two benchmark datasets, demonstrating consistent improvements in search effectiveness, with average gains of 21.7% in terms of nDCG and 27.8% in terms of Recall.
Authors:Jianyu Zhang, Fuyuan Zhang, Jiayi Lu, Jilin Hu, Xiaoyi Yin, Long Zhang, Feng Yang, Yongwang Zhao
Abstract:
Formal methods (FM) are reliable but costly to apply, often requiring years of expert effort in industrial-scale projects such as seL4, especially for theorem proving. Recent advances in large language models (LLMs) have made automated theorem proving increasingly feasible. However, most prior work focuses on mathematics-oriented benchmarks such as miniF2F, with limited evaluation on real-world verification projects. The few studies that consider industrial-scale verification mostly rely on closed-source models with hundreds of billions of parameters, which cannot be locally deployed and incur substantial usage costs. In this paper, we propose AutoReal, an LLM-driven theorem proving method for real-world industrial-scale systems with support for lightweight local deployment. We evaluate AutoReal on the seL4-Isabelle verification project as a representative and challenging case study. AutoReal incorporates two key improvements: (1) chain-of-thought (CoT)-based proof training, which teaches the LLM the reasoning behind proof steps and enables step-wise explanations alongside proofs, and (2) context augmentation, which leverages proof context from the project to enhance LLM-driven proving. Based on the AutoReal methodology, we fine-tune a base model to obtain AutoReal-Prover, a compact 7B-scale prover for industrial-scale theorem proving. AutoReal-Prover achieves a 51.67% proof success rate on 660 theorems from seL4-designated Important Theories across all 10 seL4 proof categories, substantially outperforming prior attempts on seL4 (27.06%). To evaluate generalization, we further apply AutoReal-Prover to three security-related projects from the Archive of Formal Proofs (AFP), covering all 451 theorems and achieving a proof success rate of 53.88%. Overall, this work advances the application of LLM-driven theorem proving in real-world industrial-scale verification.
Authors:Hibiki Ito, Chia-Yu Hsu, Hiroaki Ogata
Abstract:
The rapid adoption of digital technologies has greatly increased the volume of real-world data (RWD) in education. While these data offer significant opportunities for advancing learning analytics (LA), secondary use for research is constrained by privacy concerns. Differentially private synthetic data generation is regarded as the gold-standard approach to sharing sensitive data, yet studies on the private synthesis of educational data remain very scarce and rely predominantly on large, low-dimensional open datasets. Educational RWD, however, are typically high-dimensional and small in sample size, leaving the potential of private synthesis underexplored. Moreover, because educational practice is inherently iterative, data sharing is continual rather than one-off, making a traditional one-shot synthesis approach suboptimal. To address these challenges, we propose the Cyclic Adaptive Private Synthesis (CAPS) framework and evaluate it on authentic RWD. By iteratively sharing RWD, CAPS not only fosters open science, but also offers rich opportunities of design-based research (DBR), thereby amplifying the impact of LA. Our case study using actual RWD demonstrates that CAPS outperforms a one-shot baseline while highlighting challenges that warrant further investigation. Overall, this work offers a crucial first step towards privacy-preserving sharing of educational RWD and expands the possibilities for open science and DBR in LA.
Authors:Tim Kutta, Martin Dunsche, Yu Wei, Vassilis Zikas
Abstract:
We present new auditors to assess Differential Privacy (DP) of an algorithm based on output samples. Such empirical auditors are common to check for algorithmic correctness and implementation bugs. Most existing auditors are batch-based or targeted toward the traditional notion of $(\varepsilon,δ)$-DP; typically both. In this work, we shift the focus to the highly expressive privacy concept of $f$-DP, in which the entire privacy behavior is captured by a single tradeoff curve. Our auditors detect violations across the full privacy spectrum with statistical significance guarantees, which are supported by theory and simulations. Most importantly, and in contrast to prior work, our auditors do not require a user-specified sample size as an input. Rather, they adaptively determine a near-optimal number of samples needed to reach a decision, thereby avoiding the excessively large sample sizes common in many auditing studies. This reduction in sampling cost becomes especially beneficial for expensive training procedures such as DP-SGD. Our method supports both whitebox and blackbox settings and can also be executed in single-run frameworks.
Authors:Clément Pierquin, Aurélien Bellet, Marc Tommasi, Matthieu Boussard
Abstract:
We study privacy amplification by synthetic data release, a phenomenon in which differential privacy guarantees are improved by releasing only synthetic data rather than the private generative model itself. Recent work by Pierquin et al. (2025) established the first formal amplification guarantees for a linear generator, but they apply only in asymptotic regimes where the model dimension far exceeds the number of released synthetic records, limiting their practical relevance. In this work, we show a surprising result: under a bounded-parameter assumption, privacy amplification persists even when releasing an unbounded number of synthetic records, thereby improving upon the bounds of Pierquin et al. (2025). Our analysis provides structural insights that may guide the development of tighter privacy guarantees for more complex release mechanisms.
Authors:Cornell Ziepel, Stephan Escher, Sebastian Rehms, Stefan Köpsell
Abstract:
Everyday services of society increasingly rely on mobile applications, resulting in a conflicting situation between the possibility of participation on the one side and user privacy and digital freedom on the other. In order to protect users' rights to informational self-determination, regulatory approaches for the collection and processing of personal data have been developed, such as the EU's GDPR. However, inspecting the compliance of mobile apps with privacy regulations remains difficult. Thus, in order to enable end users and enforcement bodies to verify and enforce data protection compliance, we propose mopri, a conceptual framework designed for analyzing the behavior of mobile apps through a comprehensive, adaptable, and user-centered approach. Recognizing the gaps in existing frameworks, mopri serves as a foundation for integrating various analysis tools into a streamlined, modular pipeline that employs static and dynamic analysis methods. Building on this concept, a prototype has been developed which effectively extracts permissions and tracking libraries while employing robust methods for dynamic traffic recording and decryption. Additionally, it incorporates result enrichment and reporting features that enhance the clarity and usability of the analysis outcomes. The prototype showcases the feasibility of a holistic and modular approach to privacy analysis, emphasizing the importance of continuous adaptation to the evolving challenges presented by the mobile app ecosystem.
Authors:Mahsa Tahghigh, Hassan Salmani
Abstract:
Hardware Trojans (HTs) threaten the trust and reliability of integrated circuits (ICs), particularly when triggered HTs remain dormant during standard testing and activate only under rare conditions. Existing electromagnetic (EM) side-channel-based detection techniques often rely on golden references or labeled data, which are infeasible in modern distributed manufacturing. This paper introduces a reference-free, design-agnostic framework for detecting triggered HTs directly from post-silicon EM emissions. The proposed flow converts each EM trace into a time-frequency scalogram using Continuous Wavelet Transform (CWT), extracts discriminative features through a convolutional neural network (CNN), reduces dimensionality with principal component analysis (PCA), and applies Bayesian Gaussian Mixture Modeling (BGMM) for unsupervised probabilistic clustering. The framework quantifies detection confidence using posterior-based metrics (alpha_{post}, beta_{post}), Bayesian information criterion (Delta BIC), and Mahalanobis cluster separation (D), enabling interpretable anomaly decisions without golden data. Experimental validation on AES-128 designs embedded with four different HTs demonstrates high separability between HT-free and HT-activated conditions and robustness to PCA variance thresholds. The results highlight the method's scalability, statistical interpretability, and potential for extension to runtime and in-field HT monitoring in trusted microelectronics.
Authors:Georgi Gary Rozenman, Alona Maslennikov, Sara P. Gandelman, Yuval Reches, Sahar Delfan, Neel Kanth Kundu, Leyi Zhang, Ruiqi Liu
Abstract:
Satellite-based quantum communications represent a critical advancement in the pursuit of secure, global-scale quantum networks. Leveraging the principles of quantum mechanics, these systems offer unparalleled security through Quantum Key Distribution (QKD) and other quantum communication protocols. This review provides a comprehensive overview of the current state of satellite-based quantum communications, focusing on the evolution from terrestrial to space-based systems. We explore the distinct advantages and challenges of discrete-variable (DV) and continuous-variable (CV) quantum communication technologies in the context of satellite deployments. The paper also discusses key milestones such as the successful implementation of quantum communication via the Micius satellite and outlines the primary challenges, including atmospheric turbulence and the development of quantum repeaters, that must be addressed to achieve a global quantum internet. This review aims to consolidate recent advancements in the field, providing insights and perspectives on the future directions and potential innovations that will drive the continued evolution of satellite-based quantum communications.
Authors:Jialong Sun, Zeming Wei, Jiaxuan Zou, Jiacheng Gong, Guanheng Wang, Chengyang Dong, Jialong Li, Bo Liu
Abstract:
Machine unlearning (MU) is essential for enforcing the right to be forgotten in machine learning systems. A key challenge of MU is how to reliably audit whether a model has truly forgotten specified training data. Membership Inference Attacks (MIAs) are widely used for unlearning auditing, where samples that evade membership detection are often regarded as successfully forgotten. After carefully revisiting the reliability of MIA, we show that this assumption is flawed: failed membership inference does not imply true forgetting. We theoretically demonstrate that MIA-based auditing, when formulated as a binary classification problem, inevitably incurs statistical errors whose magnitude cannot be observed during the auditing process. This leads to overly optimistic evaluations of unlearning performance, while incurring substantial computational overhead due to shadow model training. To address these limitations, we propose Statistical Membership Inference Attack (SMIA), a novel training-free and highly effective auditing framework. SMIA directly compares the distributions of member and non-member data using statistical tests, eliminating the need for learned attack models. Moreover, SMIA outputs both a forgetting rate and a corresponding confidence interval, enabling quantified reliability of the auditing results. Extensive experiments show that SMIA provides more reliable auditing with significantly lower computational cost than existing MIA-based approaches. Notably, the theoretical guarantees and empirical effectiveness of SMIA suggest it as a new paradigm for reliable machine unlearning auditing.
Authors:Ranjith Krishnamurthy, Oshando Johnson, Goran Piskachev, Eric Bodden
Abstract:
Security vulnerabilities often arise unintentionally during development due to a lack of security expertise and code complexity. Traditional tools, such as static and dynamic analysis, detect vulnerabilities only after they are introduced in code, leading to costly remediation. This work explores a proactive strategy to prevent vulnerabilities by highlighting code regions that implement security-critical functionality -- such as data access, authentication, and input handling -- and providing guidance for their secure implementation. We present an IntelliJ IDEA plugin prototype that uses code-level software metrics to identify potentially security-critical methods and large language models (LLMs) to generate prevention-oriented explanations. Our initial evaluation on the Spring-PetClinic application shows that the selected metrics identify most known security-critical methods, while an LLM provides actionable, prevention-focused insights. Although these metrics capture structural properties rather than semantic aspects of security, this work lays the foundation for code-level security-aware metrics and enhanced explanations.
Authors:Wenhao Chen, Wenyi Morty Zhang, Wei Sun, Dinesh Bharadia, Roshan Ayyalasomayajula
Abstract:
Hidden spy cameras have become a great privacy threat recently, as these low-cost, low-power, and small form-factor IoT devices can quietly monitor human activities in the indoor environment without generating any side-channel information. As such, it is difficult to detect and even more challenging to localize them in the rich-scattering indoor environment. To this end, this paper presents the design, implementation, and evaluation of SpyDir, a system that can accurately localize the hidden spy IoT devices by harnessing the electromagnetic emanations automatically and unintentionally emitted from them. Our system design mainly consists of a portable switching antenna array to sniff the spectrum-spread emanations, an emanation enhancement algorithm through non-coherent averaging that can de-correlate the correlated noise effect due to the square-wave emanation structure, and a multipath-resolving algorithm that can exploit the relative channels using a novel optimization-based sparse AoA derivation. Our real-world experimental evaluation across different indoor environments demonstrates an average AoA error of 6.30 deg, whereas the baseline algorithm yields 21.06 deg, achieving over a 3.3 times improvement in accuracy, and a mean localization error of 19.86cm over baseline algorithms of 206.79cm (MUSIC) and 294.75cm (SpotFi), achieving over a 10.41 times and 14.8 times improvement in accuracy.
Authors:Evgeny Grigorenko, David Stanojević, David Ilić, Egor Bogomolov, Kostadin Cvejoski
Abstract:
Modern Integrated Development Environments (IDEs) increasingly leverage Large Language Models (LLMs) to provide advanced features like code autocomplete. While powerful, training these models on user-written code introduces significant privacy risks, making the models themselves a new type of data vulnerability. Malicious actors can exploit this by launching attacks to reconstruct sensitive training data or infer whether a specific code snippet was used for training. This paper investigates the use of Differential Privacy (DP) as a robust defense mechanism for training an LLM for Kotlin code completion. We fine-tune a \texttt{Mellum} model using DP and conduct a comprehensive evaluation of its privacy and utility. Our results demonstrate that DP provides a strong defense against Membership Inference Attacks (MIAs), reducing the attack's success rate close to a random guess (AUC from 0.901 to 0.606). Furthermore, we show that this privacy guarantee comes at a minimal cost to model performance, with the DP-trained model achieving utility scores comparable to its non-private counterpart, even when trained on 100x less data. Our findings suggest that DP is a practical and effective solution for building private and trustworthy AI-powered IDE features.
Authors:Lukas Köder, Nils Lohmiller, Phil Schmieder, Bastian Buck, Michael Menth, Tobias Heer
Abstract:
The advent of large-scale quantum computers poses a significant threat to contemporary network security protocols, including Wi-Fi Protected Access (WPA)-Enterprise authentication. To mitigate this threat, the adoption of Post-Quantum Cryptography (PQC) is critical. In this work, we investigate the performance impact of PQC algorithms on WPA-Enterprise-based authentication. To this end, we conduct an experimental evaluation of authentication latency using a testbed built with the open-source tools FreeRADIUS and hostapd, measuring the time spent at the client, access point, and RADIUS server. We evaluate multiple combinations of PQC algorithms and analyze their performance overhead in comparison to currently deployed cryptographic schemes. Beyond performance, we assess the security implications of these algorithm choices by relating authentication mechanisms to the quantum effort required for their exploitation. This perspective enables a systematic categorization of PQ-relevant weaknesses in WPA-Enterprise according to their practical urgency. The evaluation results show that, although PQC introduces additional authentication latency, combinations such as ML-DSA-65 and Falcon-1024 used in conjunction with ML-KEM provide a favorable trade-off between security and performance. Furthermore, we demonstrate that the resulting overhead can be effectively mitigated through session resumption. Overall, this work presents a first real-world performance evaluation of PQC-enabled WPA-Enterprise authentication and demonstrates its practical feasibility for enterprise Wi-Fi deployments.
Authors:Hibiki Ito, Chia-Yu Hsu, Hiroaki Ogata
Abstract:
Secondary use of growing real-world data (RWD) in education offers significant opportunities for research, yet privacy practices intended to enable third-party access to such RWD are rarely evaluated for their implications for downstream analyses. As a result, potential problems introduced by otherwise standard privacy practices may remain unnoticed. To address this gap, we investigate potential issues arising from common practices by assessing (1) the re-identification risk of fine-grained RWD, (2) how communicating such risks influences learners' privacy behaviour, and (3) the sensitivity of downstream analytical conclusions to resulting changes in the data. We focus on these practices because re-identification risk and stakeholder communication can jointly influence the data shared with third parties. We find that substantial re-identification risk in RWD, when communicated to stakeholders, can induce opt-outs and non-self-disclosure behaviours. Sensitivity analysis demonstrates that these behavioural changes can meaningfully alter the shared data, limiting validity of secondary-use findings. We conceptualise this phenomenon as the third-party access effect (3PAE) and discuss implications for trustworthy secondary use of educational RWD.
Authors:Ya Jiang, Massieh Kordi Boroujeny, Surender Suresh Kumar, Kai Zeng
Abstract:
As large language models (LLMs) become integral to applications such as question answering and content creation, reliable content attribution has become increasingly important. Watermarking is a promising approach, but existing methods either provide only binary signals or distort the sampling distribution, degrading text quality; distortion-free approaches, in turn, often suffer from weak detectability or robustness. We propose MirrorMark, a multi-bit and distortion-free watermark for LLMs. By mirroring sampling randomness in a measure-preserving manner, MirrorMark embeds multi-bit messages without altering the token probability distribution, preserving text quality by design. To improve robustness, we introduce a context-based scheduler that balances token assignments across message positions while remaining resilient to insertions and deletions. We further provide a theoretical analysis of the equal error rate to interpret empirical performance. Experiments show that MirrorMark matches the text quality of non-watermarked generation while achieving substantially stronger detectability: with 54 bits embedded in 300 tokens, it improves bit accuracy by 8-12% and correctly identifies up to 11% more watermarked texts at 1% false positive rate.
Authors:Sebastian N. Peters, Lukas Lautenschlager, David Emeis, Jason Lochert
Abstract:
Cyber-Physical Systems (CPSs) rely on distributed embedded devices that often must communicate securely over buses. Ensuring message integrity and authenticity on these buses typically requires group-shared keys for Message Authentication Codes (MACs). To avoid insecure fixed pre-shared keys and trust-on-first-use concepts, a Group Key Agreement (GKA) protocol is needed to dynamically agree on a key amongst the devices. Yet existing GKA protocols lack adaptability to constrained CPS buses. This paper targets authenticated, fully distributed GKA suitable for bus topologies under constraints of industrial and cyber-physical systems, including broadcast-only links, half-duplex operation, resource limits, dynamic membership (including unannounced leaves), a long device lifetime, and a strong Dolev-Yao adversary capable of partitioning the bus. We first systematise existing protocols, then derive the requirements necessary for an authenticated and fully distributed GKA on bus systems. Finally, we design, implement, and evaluate a custom GKA protocol based on TreeKEM.
Authors:Deepthy K Bhaskar, Minimol B, Binu V P
Abstract:
Federated Learning (FL) facilitates collaborative model training among distributed clients while ensuring that raw data remains on local devices.Despite this advantage, FL systems are still exposed to risks from malicious or unreliable participants. Such clients can interfere with the training process by sending misleading updates, which can negatively affect the performance and reliability of the global model. Many existing defense mechanisms rely on gradient inspection, complex similarity computations, or cryptographic operations, which introduce additional overhead and may become unstable under non-IID data distributions. In this paper, we propose the Federated Learning with Loss Trend Detection (FL-LTD), a lightweight and privacy-preserving defense framework that detects and mitigates malicious behavior by monitoring temporal loss dynamics rather than model gradients. The proposed approach identifies anomalous clients by detecting abnormal loss stagnation or abrupt loss fluctuations across communication rounds. To counter adaptive attackers, a short-term memory mechanism is incorporated to sustain mitigation for clients previously flagged as anomalous, while enabling trust recovery for stable participants. We evaluate FL-LTD on a non-IID federated MNIST setup under loss manipulation attacks. Experimental results demonstrate that the proposed method significantly enhances robustness, achieving a final test accuracy of 0.84, compared to 0.41 for standard FedAvg under attack. FL-LTD incurs negligible computational and communication overhead, maintains stable convergence, and avoids client exclusion or access to sensitive data, highlighting the effectiveness of loss-based monitoring for secure federated learning.
Authors:Mahsa Tahghigh, Hassan Salmani
Abstract:
Always-on hardware Trojans (HTs) pose a critical risk to trusted microelectronics, yet most side-channel detection methods rely on unavailable golden references. We present a reference-free approach that combines time-frequency EM analysis with Gaussian Mixture Models (GMMs). By applying Short-Time Fourier Transform (STFT) at multiple window sizes, we show that HT-free circuits exhibit fluctuating statistical structure, while always-on HTs leave persistent footprints with fewer, more consistent mixture components. Results on AES-128 demonstrate feasibility without requiring reference models.
Authors:Jiayi Zhang, Chenxin Sun, Chenxiong Qian
Abstract:
Aim-assist cheats are the most prevalent and infamous form of cheating in First-Person Shooter (FPS) games, which help cheaters illegally reveal the opponent's location and auto-aim and shoot, and thereby pose significant threats to the game industry. Although a considerable research effort has been made to automatically detect aim-assist cheats, existing works suffer from unreliable frameworks, limited generalizability, high overhead, low detection performance, and a lack of explainability of detection results. In this paper, we propose XGuardian, a server-side generalized and explainable system for detecting aim-assist cheats to overcome these limitations. It requires only two raw data inputs, pitch and yaw, which are all FPS games' must-haves, to construct novel temporal features and describe aim trajectories, which are essential for distinguishing cheaters and normal players. XGuardian is evaluated with the latest mainstream FPS game CS2, and validates its generalizability with another two different games. It achieves high detection performance and low overhead compared to prior works across different games with real-world and large-scale datasets, demonstrating wide generalizability and high effectiveness. It is able to justify its predictions and thereby shorten the ban cycle. We make XGuardian as well as our datasets publicly available.
Authors:Narek Maloyan, Dmitry Namiot
Abstract:
The Model Context Protocol (MCP) has emerged as a de facto standard for integrating Large Language Models with external tools, yet no formal security analysis of the protocol specification exists. We present the first rigorous security analysis of MCP's architectural design, identifying three fundamental protocol-level vulnerabilities: (1) absence of capability attestation allowing servers to claim arbitrary permissions, (2) bidirectional sampling without origin authentication enabling server-side prompt injection, and (3) implicit trust propagation in multi-server configurations. We implement \textsc{MCPBench}, a novel framework bridging existing agent security benchmarks to MCP-compliant infrastructure, enabling direct measurement of protocol-specific attack surfaces. Through controlled experiments on 847 attack scenarios across five MCP server implementations, we demonstrate that MCP's architectural choices amplify attack success rates by 23--41\% compared to equivalent non-MCP integrations. We propose \textsc{MCPSec}, a backward-compatible protocol extension adding capability attestation and message authentication, reducing attack success rates from 52.8\% to 12.4\% with median latency overhead of 8.3ms per message. Our findings establish that MCP's security weaknesses are architectural rather than implementation-specific, requiring protocol-level remediation.
Authors:Narek Maloyan, Dmitry Namiot
Abstract:
The proliferation of agentic AI coding assistants, including Claude Code, GitHub Copilot, Cursor, and emerging skill-based architectures, has fundamentally transformed software development workflows. These systems leverage Large Language Models (LLMs) integrated with external tools, file systems, and shell access through protocols like the Model Context Protocol (MCP). However, this expanded capability surface introduces critical security vulnerabilities. In this \textbf{Systematization of Knowledge (SoK)} paper, we present a comprehensive analysis of prompt injection attacks targeting agentic coding assistants. We propose a novel three-dimensional taxonomy categorizing attacks across \textit{delivery vectors}, \textit{attack modalities}, and \textit{propagation behaviors}. Our meta-analysis synthesizes findings from 78 recent studies (2021--2026), consolidating evidence that attack success rates against state-of-the-art defenses exceed 85\% when adaptive attack strategies are employed. We systematically catalog 42 distinct attack techniques spanning input manipulation, tool poisoning, protocol exploitation, multimodal injection, and cross-origin context poisoning. Through critical analysis of 18 defense mechanisms reported in prior work, we identify that most achieve less than 50\% mitigation against sophisticated adaptive attacks. We contribute: (1) a unified taxonomy bridging disparate attack classifications, (2) the first systematic analysis of skill-based architecture vulnerabilities with concrete exploit chains, and (3) a defense-in-depth framework grounded in the limitations we identify. Our findings indicate that the security community must treat prompt injection as a first-class vulnerability class requiring architectural-level mitigations rather than ad-hoc filtering approaches.
Authors:Krystal Jackson, Deepika Raman, Jessica Newman, Nada Madkour, Charlotte Yuan, Evan R. Murphy
Abstract:
Artificial intelligence (AI) is increasingly being used to augment and automate cyber operations, altering the scale, speed, and accessibility of malicious activity. These shifts raise urgent questions about when AI systems introduce unacceptable or intolerable cyber risk, and how risk thresholds should be identified before harms materialize at scale. In recent years, industry, government, and civil society actors have begun to articulate such thresholds for advanced AI systems, with the goal of signaling when models meaningfully amplify cyber threats, for example, by automating multi-stage intrusions, enabling zero-day discovery, or lowering the expertise required for sophisticated attacks. However, current approaches to determine these thresholds remain fragmented and limited. Many thresholds rely solely on capability benchmarks or narrow threat scenarios, and are weakly connected to empirical evidence. This paper proposes a structured approach to developing and evaluating AI cyber risk thresholds that is probabilistic, evidence-based, and operationalizable. In this paper we make three core contributions that build on our prior work that highlights the limitations of relying solely on capability assessments. First, we analyze existing industry cyber thresholds and identify common threshold elements as well as recurring methodological shortcomings. Second, we propose the use of Bayesian networks as a tool for modeling AI-enabled cyber risk, enabling the integration of heterogeneous evidence, explicit representation of uncertainty, and continuous updating as new information emerges. Third, we illustrate this approach through a focused case study on AI-augmented phishing, demonstrating how qualitative threat insights can be decomposed into measurable variables and recombined into structured risk estimates that better capture how AI changes attacker behavior and outcomes.
Authors:Ruisheng Shi, Yuxuan Liang, Zijun Guo, Qin Wang, Lina Lan, Chenfeng Wang, Zhuoyi Zheng
Abstract:
Eclipse attacks isolate blockchain nodes by monopolizing their peer-to-peer connections. The attacks were extensively studied in Bitcoin (SP'15, SP'20, CCS'21, SP'23) and Monero (NDSS'25), but their practicality against Ethereum nodes remains underexplored, particularly in the post-Merge settings. We present the first end-to-end implementation of an eclipse attack targeting Ethereum (2.0 version) execution-layer nodes. Our attack exploits the bootstrapping and peer management logic of Ethereum to fully isolate a node upon restart. We introduce a multi-stage strategy that majorly includes (i) poisoning the node's discovery table via unsolicited messages, (ii) infiltrating Ethereum's DNS-based peerlist by identifying and manipulating the official DNS crawler, and (iii) hijacking idle incoming connection slots across the network to block benign connections. Our DNS list poisoning is the first in the cryptocurrency context and requires only 28 IP addresses over 100 days. Slots hijacking raises outgoing redirection success from 45\% to 95\%. We validate our approach through controlled experiments on Ethereum's Sepolia testnet and broad measurements on the mainnet. Our findings demonstrate that over 80\% of public nodes do not leave sufficient idle capacity for effective slots occupation, highlighting the feasibility and severity of the threat. We further propose concrete countermeasures and responsibly disclosed all findings to Ethereum's security team.
Authors:Binu V P, Deepthy K Bhaskar, Minimol B
Abstract:
As digital threats continue to grow, organizations must find ways to enhance security while protecting user privacy. This paper explores how artificial intelligence (AI) plays a crucial role in achieving this balance. AI technologies can improve security by detecting threats, monitoring systems, and automating responses. However, using AI also raises privacy concerns that need careful consideration.We examine real-world examples from the healthcare sector to illustrate how organizations can implement AI solutions that strengthen security without compromising patient privacy. Additionally, we discuss the importance of creating transparent AI systems and adhering to privacy regulations.Ultimately, this paper provides insights and recommendations for integrating AI into healthcare security practices, helping organizations navigate the challenges of modern management while keeping patient data safe.
Authors:Sajjad Akherati, Xinmiao Zhang
Abstract:
Homomorphic encryption (HE) enables arithmetic operations to be performed directly on encrypted data. It is essential for privacy-preserving applications such as machine learning, medical diagnosis, and financial data analysis. In popular HE schemes, ciphertext multiplication is only defined for two inputs. However, the multiplication of multiple inputs is needed in many HE applications. In our previous work, a three-input ciphertext multiplication method for the CKKS HE scheme was developed. This paper first reformulates the three-input ciphertext multiplication to enable the combination of computations in order to further reduce the complexity. The second contribution is extending the multiplication to multiple inputs without compromising the noise overhead. Additional evaluation keys are introduced to achieve relinearization of polynomial multiplication results. To minimize the complexity of the large number of rescaling units in the multiplier, a theoretical analysis is developed to relocate the rescaling, and a multi-level rescaling approach is proposed to implement combined rescaling with complexity similar to that of a single rescaling unit. Guidelines and examples are provided on the input partition to enable the combination of more rescaling. Additionally, efficient hardware architectures are designed to implement our proposed multipliers. The improved three-input ciphertext multiplier reduces the logic area and latency by 15% and 50%, respectively, compared to the best prior design. For multipliers with more inputs, ranging from 4 to 12, the architectural analysis reveals 32% savings in area and 45% shorter latency, on average, compared to prior work.
Authors:Yoann Marquer, Domenico Bianculli, Lionel C. Briand
Abstract:
Python is one of the most popular programming languages; as such, projects written in Python involve an increasing number of diverse security vulnerabilities. However, existing state-of-the-art analysis tools for Python only support a few vulnerability types. Hence, there is a need to detect a large variety of vulnerabilities in Python projects. In this paper, we propose the SAGA approach to detect and locate vulnerabilities in Python source code in a versatile way. SAGA includes a source code parser able to extract control- and data-flow information and to represent it as a symbolic control-flow graph, as well as a domain-specific language defining static aspects of the source code and their evolution during graph traversals. We have leveraged this language to define a library of static aspects for integrity, confidentiality, and other security-related properties. We have evaluated SAGA on a dataset of 108 vulnerabilities, obtaining 100% sensitivity and 99.15% specificity, with only one false positive, while outperforming four common security analysis tools. This analysis was performed in less than 31 seconds, i.e., between 2.5 and 512.1 times faster than the baseline tools.
Authors:James Melbourne, Mario Diaz, Shahab Asoodeh
Abstract:
We study the optimal design of additive mechanisms for vector-valued queries under $ε$-differential privacy (DP). Given only the sensitivity of a query and a norm-monotone cost function measuring utility loss, we ask which noise distribution minimizes expected cost among all additive $ε$-DP mechanisms. Using convex rearrangement theory, we show that this infinite-dimensional optimization problem admits a reduction to a one-dimensional compact and convex family of radially symmetric distributions whose extreme points are the staircase distributions. As a consequence, we prove that for any dimension, any norm, and any norm-monotone cost function, there exists an $ε$-DP staircase mechanism that is optimal among all additive mechanisms. This result resolves a conjecture of Geng, Kairouz, Oh, and Viswanath, and provides a geometric explanation for the emergence of staircase mechanisms as extremal solutions in differential privacy.
Authors:Saad Khan, Simon Parkinson, Monika Roopak
Abstract:
Event-based datasets are crucial for cybersecurity analysis. A key use case is detecting event-based signatures, which represent attacks spanning multiple events and can only be understood once the relevant events are identified and linked. Analysing event datasets is essential for monitoring system security, but their growing volume and frequency create significant scalability and processing difficulties. Researchers rely on these datasets to develop and test techniques for automatically identifying signatures. However, because real datasets are security-sensitive and rarely shared, it becomes difficult to perform meaningful comparative evaluation between different approaches. This work addresses this evaluation limitation by offering a systematic method for generating event logs with known ground truth, enabling reproducible and comparable research. We present a novel parametrised generation technique capable of producing synthetic event datasets that contain event-based signatures for discovery. To demonstrate the capabilities of the technique, we provide a benchmark in signature detection. Our benchmarking demonstrated the suitability of DBSCAN, achieving a score greater than 0.95 Adjusted Rand Index on most generated datasets. This work enhances the ability of researchers to develop and benchmark new cybersecurity techniques, ultimately contributing to more robust and effective cybersecurity measures.
Authors:Sangjun An, Seoksu Lee, Eun-Sun Cho
Abstract:
Malware often uses obfuscation to hinder security analysis. Among these techniques, virtualization-based obfuscation is particularly strong because it protects programs by translating original instructions into attacker-defined virtual machine (VM) bytecode, producing long and complex code that is difficult to analyze and deobfuscate. This paper aims to identify the structural components of virtualization-based obfuscation through static analysis. By examining the execution model of obfuscated code, we define and detect the key elements required for deobfuscation-namely the dispatch routine, handler blocks, and the VM region-using LLVM IR. Experimental results show that, in the absence of compiler optimizations, the proposed LLVM Pass successfully detects all core structures across major virtualization options, including switch, direct, and indirect modes.
Authors:Yi Qian, Kunwei Qian, Xingbang He, Ligeng Chen, Jikang Zhang, Tiantai Zhang, Haiyang Wei, Linzhang Wang, Hao Wu, Bing Mao
Abstract:
Large multimodal model powered GUI agents are emerging as high-privilege operators on mobile platforms, entrusted with perceiving screen content and injecting inputs. However, their design operates under the implicit assumption of Visual Atomicity: that the UI state remains invariant between observation and action. We demonstrate that this assumption is fundamentally invalid in Android, creating a critical attack surface. We present Action Rebinding, a novel attack that allows a seemingly-benign app with zero dangerous permissions to rebind an agent's execution. By exploiting the inevitable observation-to-action gap inherent in the agent's reasoning pipeline, the attacker triggers foreground transitions to rebind the agent's planned action toward the target app. We weaponize the agent's task-recovery logic and Android's UI state preservation to orchestrate programmable, multi-step attack chains. Furthermore, we introduce an Intent Alignment Strategy (IAS) that manipulates the agent's reasoning process to rationalize UI states, enabling it to bypass verification gates (e.g., confirmation dialogs) that would otherwise be rejected. We evaluate Action Rebinding Attacks on six widely-used Android GUI agents across 15 tasks. Our results demonstrate a 100% success rate for atomic action rebinding and the ability to reliably orchestrate multi-step attack chains. With IAS, the success rate in bypassing verification gates increases (from 0% to up to 100%). Notably, the attacker application requires no sensitive permissions and contains no privileged API calls, achieving a 0% detection rate across malware scanners (e.g., VirusTotal). Our findings reveal a fundamental architectural flaw in current agent-OS integration and provide critical insights for the secure design of future agent systems. To access experimental logs and demonstration videos, please contact yi_qian@smail.nju.edu.cn.
Authors:Reshabh K Sharma, Dan Grossman, David Kohlbrenner
Abstract:
Traditional side-channels take advantage of secrets being used as inputs to unsafe instructions, used for memory accesses, or used in control flow decisions. Constant-time programming, which restricts such code patterns, has been widely adopted as a defense against these vulnerabilities. However, new hardware optimizations in the form of Data Memory-dependent Prefetchers (DMP) present in Apple, Intel, and ARM CPUs have shown such defenses are not sufficient. These prefetchers, unlike classical prefetchers, use the content of memory as well as the trace of prior accesses to determine prefetch targets. An adversary abusing such a prefetcher has been shown to be able to mount attacks leaking data-at-rest; data that is never used by the program, even speculatively, in an unsafe manner. In response, this paper introduces SplittingSecrets, a compiler-based tool that can harden software libraries against side-channels arising from DMPs. SplittingSecrets's approach avoids reasoning about the complex internals of different DMPs and instead relies on one key aspect of all DMPs: activation requires data to resemble addresses. To prevent secret data from leaking, SplittingSecrets transforms memory operations to ensure that secrets are never stored in memory in a manner resembling an address, thereby avoiding DMP activation on those secrets. Rather than disable a DMP entirely, SplittingSecrets can provide targeted hardening for only specific secrets entirely in software. We have implemented SplittingSecrets using LLVM, supporting both source-level memory operations and those generated by the compiler backend for the AArch64 architecture, We have analyzed the performance overhead involved in safeguarding secrets from DMP-induced attacks using common primitives in libsodium, a popular cryptographic library when built for Apple M-series CPUs.
Authors:Messaouda Boutassetta, Amina Makhlouf, Newfel Messaoudi, Abdelmadjid Benmachiche, Ines Boutabia
Abstract:
Intrusion detection systems (IDS) are essential for protecting computer systems and networks against a wide range of cyber threats that continue to evolve over time. IDS are commonly categorized into two main types, each with its own strengths and limitations, such as difficulty in detecting previously unseen attacks and the tendency to generate high false positive rates. This paper presents a comprehensive survey and a conceptual overview of Hybrid IDS, which integrate signature-based and anomaly-based detection techniques to enhance attack detection capabilities. The survey examines recent research on Hybrid IDS, classifies existing models into functional categories, and discusses their advantages, limitations, and application domains, including financial systems, air traffic control, and social networks. In addition, recent trends in Hybrid IDS research, such as machine learning-based approaches and cloud-based deployments, are reviewed. Finally, this work outlines potential future research directions aimed at developing more cost-effective Hybrid IDS solutions with improved ability to detect emerging and sophisticated cyberattacks.
Authors:Kunal Dey, Reihaneh Safavi-Naini
Abstract:
Certified deletion allows Alice to outsource data to Bob and, at a later time, obtain a verifiable guarantee that the file has been irreversibly deleted at her request. The functionality, while impossible using classical information alone, can be achieved using quantum information. Existing approaches rely either on one-time pad (OTP) encryption, or on computational hardness assumptions that may be vulnerable to future advances in classical or quantum computing. In this work, we introduce and formalize hybrid encryption with certified deletion in the preprocessing model (pHE-CD) and propose two constructions. Each construction composes an information-theoretic key encapsulation mechanism (iKEM) with a data encapsulation mechanism that provides certified deletion (DEM-CD) security, offering different types of security depending on the security properties of DEM-CD. When DEM-CD is one-time information theoretically secure, the composition provides {\em information-theoretic security} for both encryption and certified deletion. When DEM-CD is computationally secure, the composed construction offers computationally secure (post-quantum) encryption and {\em everlasting certified deletion} where confidentiality is computational up to the point that the deletion certificate is verified, and after successful verification of the certificate, becomes unconditional. That is, successful verification of deletion certificate guarantees that the data has been removed information-theoretically from the adversary's view. Both pHE-CD schemes are for encryption of arbitrarily long messages. Construction 2 is key efficient and uses a DEM-CD that is constructed using quantum coding and AES, providing quantum-safe security for encryption. We discuss our results and directions for future work.
Authors:Greta Dolcetti, Giulio Zizzo, Sergio Maffeis
Abstract:
We present an experimental evaluation that assesses the robustness of four open source LLMs claiming function-calling capabilities against three different attacks, and we measure the effectiveness of eight different defences. Our results show how these models are not safe by default, and how the defences are not yet employable in real-world scenarios.
Authors:Mohammad Waquas Usmani, Susmit Shannigrahi, Michael Zink
Abstract:
This work introduces ABE-VVS, a framework that performs attribute based selective coordinate encryption for point cloud based volumetric video streaming, enabling lightweight yet effective digital rights management (DRM). Rather than encrypting entire point cloud frames, our approach encrypts only selected subsets of coordinates ($X, Y, Z$, or combinations), lowering computational overhead and latency while still producing strong visual distortion that prevents meaningful unauthorized viewing. Our experiments show that encrypting only the $X$ coordinates achieves effective obfuscation while reducing encryption and decryption times by up to 50% and 80%, respectively, compared to full-frame encryption. To our knowledge, this is the first work to provide a novel end-to-end evaluation of a DRM-enabled secure point cloud streaming system. We deployed a point cloud video streaming setup on the CloudLab testbed and evaluated three HTTP-based Attribute-Based Encryption (ABE) granularities - ABE-XYZ (encrypting all $X,Y,Z$ coordinates), ABE-XY, and ABE-X against conventional HTTPS/TLS secure streaming as well as an HTTP-only baseline without any security. Our streaming evaluation demonstrates that ABE-based schemes reduce server-side CPU load by up to 80% and cache CPU load by up to 63%, comparable to HTTP-only, while maintaining similar cache hit rates. Moreover, ABE-XYZ and ABE-XY exhibit lower client-side rebuffering than HTTPS, and ABE-X achieves zero rebuffering comparable to HTTP-only. Although ABE-VVS increases client-side CPU usage, the overhead is not large enough to affect streaming quality and is offset by its broader benefits, including simplified key revocation, elimination of per-client encryption, and reduced server and cache load.
Authors:S M Mostaq Hossain, Amani Altarawneh
Abstract:
Firmware integrity is a foundational requirement for securing Cyber-Physical Systems (CPS), where malicious or compromised firmware can result in persistent backdoors, unauthorized control, or catastrophic system failures. Traditional verification mechanisms such as secure boot, digital signatures, and centralized hash databases are increasingly inadequate due to risks from insider threats and single points of failure. In this paper, we propose a decentralized firmware integrity verification framework built on the Ethereum blockchain, offering tamperproof, transparent, and trustless validation. Our system stores SHA-256 hashes of firmware binaries within smart contracts deployed on the Ethereum Sepolia testnet, using Web3 and Infura for seamless on-chain interaction. A Python-based client tool computes firmware hashes and communicates with the blockchain to register and verify firmware authenticity in realtime. We implement and evaluate a fully functional prototype using real firmware samples, demonstrating successful contract deployment, hash registration, and integrity verification through live blockchain transactions. Experimental results confirm the reliability and low cost (in gas fees) of our approach, highlighting its practicality and scalability for real-world CPS applications. To enhance scalability and performance, we discuss extensions using Layer-2 rollups and off-chain storage via the InterPlanetary File System (IPFS). We also outline integration pathways with secure boot mechanisms, Trusted Platform Module (TPM)based attestation, and zero-trust architectures. This work contributes a practical and extensible model for blockchain-based firmware verification, significantly strengthening the defense against firmware tampering and supply chain attacks in critical CPS environments.
Authors:Bui Ngoc Thanh Binh, Pham Hoai Luan, Le Vu Trung Duong, Vu Tuan Hai, Yasuhiko Nakashima
Abstract:
MQTT is the dominant lightweight publish--subscribe protocol for IoT deployments, yet edge security remains inadequate. Cloud-based intrusion detection systems add latency that is unsuitable for real-time control, while CPU-bound firewalls and generic SDN controllers lack MQTT awareness to enforce session validation, topic-based authorization, and behavioral anomaly detection. We propose a P4-based data-plane enforcement scheme for protocol-aware MQTT security and anomaly detection at the network edge. The design combines parser-safe MQTT header extraction with session-order validation, byte-level topic-prefix authorization with per-client rate limiting and soft-cap enforcement, and lightweight anomaly detection based on KeepAlive and Remaining Length screening with clone-to-CPU diagnostics. The scheme leverages stateful primitives in BMv2 (registers, meters, direct counters) to enable runtime policy adaptation with minimal per-packet latency. Experiments on a Mininet/BMv2 testbed demonstrate high policy enforcement accuracy (99.8%, within 95% CI), strong anomaly detection sensitivity (98\% true-positive rate), and high delivery >99.9% for 100--5~kpps; 99.8% at 10~kpps; 99.6\% at 16~kpps) with sub-millisecond per-packet latency. These results show that protocol-aware MQTT filtering can be efficiently realized in the programmable data plane, providing a practical foundation for edge IoT security. Future work will validate the design on production P4 hardware and integrate machine learning--based threshold adaptation.
Authors:Xi Ye, Yiwen Liu, Lina Wang, Run Wang, Geying Yang, Yufei Hou, Jiayi Yu
Abstract:
Text-to-image (T2I) models have raised increasing safety concerns due to their capacity to generate NSFW and other banned objects. To mitigate these risks, safety filters and concept removal techniques have been introduced to block inappropriate prompts or erase sensitive concepts from the models. However, all the existing defense methods are not well prepared to handle diverse adversarial prompts. In this work, we introduce MacPrompt, a novel black-box and cross-lingual attack that reveals previously overlooked vulnerabilities in T2I safety mechanisms. Unlike existing attacks that rely on synonym substitution or prompt obfuscation, MacPrompt constructs macaronic adversarial prompts by performing cross-lingual character-level recombination of harmful terms, enabling fine-grained control over both semantics and appearance. By leveraging this design, MacPrompt crafts prompts with high semantic similarity to the original harmful inputs (up to 0.96) while bypassing major safety filters (up to 100%). More critically, it achieves attack success rates as high as 92% for sex-related content and 90% for violence, effectively breaking even state-of-the-art concept removal defenses. These results underscore the pressing need to reassess the robustness of existing T2I safety mechanisms against linguistically diverse and fine-grained adversarial strategies.
Authors:Àlex Miranda-Pascual, Javier Parra-Arnau, Thorsten Strufe
Abstract:
Sampling is renowned for its privacy amplification in differential privacy (DP), and is often assumed to improve the utility of a DP mechanism by allowing a noise reduction. In this paper, we further show that this last assumption is flawed: When measuring utility at equal privacy levels, sampling as preprocessing consistently yields penalties due to utility loss from omitting records over all canonical DP mechanisms -- Laplace, Gaussian, exponential, and report noisy max -- , as well as recent applications of sampling, such as clustering. Extending this analysis, we investigate suppression as a generalized method of choosing, or omitting, records. Developing a theoretical analysis of this technique, we derive privacy bounds for arbitrary suppression strategies under unbounded approximate DP. We find that our tested suppression strategy also fails to improve the privacy--utility tradeoff. Surprisingly, uniform sampling emerges as one of the best suppression methods -- despite its still degrading effect. Our results call into question common preprocessing assumptions in DP practice.
Authors:Damian Harenčák, Lukáš Gajdošech, Martin Madaras
Abstract:
Collaborative training of a machine learning model comes with a risk of sharing sensitive or private data. Federated learning offers a way of collectively training a single global model without the need to share client data, by sharing only the updated parameters from each client's local model. A central server is then used to aggregate parameters from all clients and redistribute the aggregated model back to the clients. Recent findings have shown that even in this scenario, private data can be reconstructed only using information about model parameters. Current efforts to mitigate this are mainly focused on reducing privacy risks on the server side, assuming that other clients will not act maliciously. In this work, we analyzed various methods for improving the privacy of client data concerning both the server and other clients for neural networks. Some of these methods include homomorphic encryption, gradient compression, gradient noising, and discussion on possible usage of modified federated learning systems such as split learning, swarm learning or fully encrypted models. We have analyzed the negative effects of gradient compression and gradient noising on the accuracy of convolutional neural networks used for classification. We have shown the difficulty of data reconstruction in the case of segmentation networks. We have also implemented a proof of concept on the NVIDIA Jetson TX2 module used in edge devices and simulated a federated learning process.
Authors:Brahim Khalil Sedraoui, Abdelmadjid Benmachiche, Amina Makhlouf
Abstract:
The high rate of development of Internet of Things (IoT) devices has brought to attention new challenges in the area of data security, especially within the resource-limited realm of RFID tags, sensors, and embedded systems. Traditional cryptographic implementations can be of inappropriate computational complexity and energy usage and hence are not suitable on these platforms. This paper examines the design, implementation, and testing of lightweight cryptographic algorithms that have been specifically designed to be used in secure embedded systems. A comparison of some of the state-of-the-art lightweight encryption algorithms, that is PRESENT, SPECK, and SIMON, focuses on the main performance indicators, i.e., throughput, use of memory, and energy utilization. The study presents novel lightweight algorithms that are founded upon the Feistel-network architecture and their safety under cryptanalytic attacks, e.g., differential and linear cryptanalysis. The proposed solutions are proven through hardware implementation on the FPGA platform. The results have shown that lightweight cryptography is an effective strategy that could be used to establish security and maintain performance in the IoT and other resource-limited settings.
Authors:Md Abu Sayed, Asif Rahman, Ahmed Hemida, Christopher Kiekintveld, Charles Kamhoua
Abstract:
This paper explores coordinated deception strategies by synchronizing defenses across coupled cyber and physical systems to mislead attackers and strengthen defense mechanisms. We introduce a Stackelberg game framework to model the strategic interaction between defenders and attackers, where the defender leverages CVSS-based exploit probabilities and real-world vulnerability data from the National Vulnerability Database (NVD) to guide the deployment of deception. Cyber and physical replicas are used to disrupt attacker reconnaissance and enhance defensive effectiveness. We propose a CVE-based utility function to identify the most critical vulnerabilities and demonstrate that coordinated multilayer deception outperforms single-layer and baseline strategies in improving defender utility across both CVSS versions.
Authors:Oliver Custance, Saad Khan, Simon Parkinson, Quan Z. Sheng
Abstract:
Device-free crowd-counting using WiFi Channel State Information (CSI) is a key enabling technology for a new generation of privacy-preserving Internet of Things (IoT) applications. However, practical deployment is severely hampered by the domain shift problem, where models trained in one environment fail to generalise to another. To overcome this, we propose a novel two-stage framework centred on a CSI-ResNet-A architecture. This model is pre-trained via self-supervised contrastive learning to learn domain-invariant representations and leverages lightweight Adapter modules for highly efficient fine-tuning. The resulting event sequence is then processed by a stateful counting machine to produce a final, stable occupancy estimate. We validate our framework extensively. On our WiFlow dataset, our unsupervised approach excels in a 10-shot learning scenario, achieving a final Mean Absolute Error (MAE) of just 0.44--a task where supervised baselines fail. To formally quantify robustness, we introduce the Generalisation Index (GI), on which our model scores near-perfectly, confirming its ability to generalise. Furthermore, our framework sets a new state-of-the-art public WiAR benchmark with 98.8\% accuracy. Our ablation studies reveal the core strength of our design: adapter-based fine-tuning achieves performance within 1\% of a full fine-tune (98.84\% vs. 99.67\%) while training 97.2\% fewer parameters. Our work provides a practical and scalable solution for developing robust sensing systems ready for real-world IoT deployments.
Authors:Oliver Custance, Saad Khan, Simon Parkinson
Abstract:
WiFi Channel State Information (CSI) has shown promise for single-person gait identification, with numerous studies reporting high accuracy. However, multi-person identification remains largely unexplored, with the limited existing work relying on complex, expensive setups requiring modified firmware. A critical question remains unanswered: is poor multi-person performance an algorithmic limitation or a fundamental hardware constraint? We systematically evaluate six diverse signal separation methods (FastICA, SOBI, PCA, NMF, Wavelet, Tensor Decomposition) across seven scenarios with 1-10 people using commodity ESP32 WiFi sensors--a simple, low-cost, off-the-shelf solution. Through novel diagnostic metrics (intra-subject variability, inter-subject distinguishability, performance degradation rate), we reveal that all methods achieve similarly low accuracy (45-56\%, $σ$=3.74\%) with statistically insignificant differences (p $>$ 0.05). Even the best-performing method, NMF, achieves only 56\% accuracy. Our analysis reveals high intra-subject variability, low inter-subject distinguishability, and severe performance degradation as person count increases, indicating that commodity ESP32 sensors cannot provide sufficient signal quality for reliable multi-person separation.
Authors:Rajiv Thummala, Katherine Winton, Luke Flores, Elizabeth Redmond, Gregory Falco
Abstract:
Out-of-band screening of microcontrollers is a major gap in semiconductor supply chain security. High-assurance techniques such as X-ray and destructive reverse engineering are accurate but slow and expensive, hindering comprehensive detection for hardware Trojans or firmware tampering. Consequently, there has been increased interest in applying machine learning techniques to automate forensic examination, enabling rapid, large-scale inspection of components without manual oversight. We introduce a non-destructive screening method that uses power side-channel measurements and generative modeling to detect tampering in commodity microcontrollers without trusted hardware. As a proof-of-concept, differential power analysis (DPA) traces are collected from the ChipWhisperer and a generative adversarial network (GAN) is trained only on benign measurements to learn nominal power behavior. The trained discriminator then serves as a one-class anomaly detector. We report detection performance on multiple tampering scenarios and discuss how this technique can serve as an intermediate screening tier between basic functional tests and high-cost forensic analysis. The proposed method is evaluated in the context of semiconductor supply chain practice and policy to assess its suitability as an intermediate assurance mechanism.
Authors:Osasumwen Cedric Ogiesoba-Eguakun, Suman Rath
Abstract:
Coordinated stealth attacks are a serious cybersecurity threat to distributed generation systems because they modify control and measurement signals while remaining close to normal behavior, making them difficult to detect using standard intrusion detection methods. This study investigates quantum machine learning approaches for detecting coordinated stealth attacks on a distributed generation unit in a microgrid. High-quality simulated measurements were used to create a balanced binary classification dataset using three features: reactive power at DG1, frequency deviation relative to the nominal value, and terminal voltage magnitude. Classical machine learning baselines, fully quantum variational classifiers, and hybrid quantum classical models were evaluated. The results show that a hybrid quantum classical model combining quantum feature embeddings with a classical RBF support vector machine achieves the best overall performance on this low dimensional dataset, with a modest improvement in accuracy and F1 score over a strong classical SVM baseline. Fully quantum models perform worse due to training instability and limitations of current NISQ hardware. In contrast, hybrid models train more reliably and demonstrate that quantum feature mapping can enhance intrusion detection even when fully quantum learning is not yet practical.
Authors:Brahim Khalil Sedraoui, Abdelmadjid Benmachiche, Amina Makhlouf, Chaouki Chemam
Abstract:
To resolve the acute problem of privacy protection and guarantee that data can be used in the context of threat intelligence, this paper considers the implementation of Differential Privacy (DP) in cybersecurity analytics. DP, which is a sound mathematical framework, ensures privacy by adding a controlled noise to data outputs and thus avoids sensitive information disclosure even with auxiliary datasets. The use of DP in Security Information and Event Management (SIEM) systems is highlighted, and it can be seen that DP has the capability to protect event log and threat data analysis without interfering with the analytical efficiency. The utility versus privacy trade-offs linked to the maximization of the epsilon parameter, which is one of the critical components of DP mechanisms, is pointed out. The article shows the transformative power of DP in promoting safe sharing of data and joint threat intelligence through real-world systems and case studies. Finally, this paper makes DP one of the key strategies to improve privacy-preserving analytics in the field of cybersecurity.
Authors:Brahim Khalil Sedraoui, Abdelmadjid Benmachiche, Amina Makhlouf, Chaouki Chemam
Abstract:
The concept of Secure Multi-Party Computation (SMPC) is a cryptographic service that allows generating analysis of sensitive data related to finance under the collaboration of all stakeholders without violating the privacy of the research participants. This article shows the increasing significance of privacy protection in the contemporary financial services, where various stakeholders should comply with stringent security and regulatory standards. It discusses the main issues of scalability, computational efficiency, and working with very large datasets, and it identifies the directions of future research to make SMPC protocols more practical and efficient. The results highlight the possibility of SMPC to facilitate safe, transparent, and trustful financial transactions in an ecosystem that is becoming more digital.
Authors:Juan Pedro Hecht, Hugo Daniel Scolnik
Abstract:
Post-quantum cryptography-PQC- aims to develop public-key primitives that are secure against adversaries using classical and quantum computing technologies. This study introduces novel protocols, a key encapsulation mechanism, a digital signature scheme, and special protection against linear attacks. Our purpose is to create reliable alternatives to current standards, seeking compact, fast, and secure replacements of the key interchange and digital signature in the TLS 1_3 protocol, which safeguards Internet traffic, allowing an easy post-quantum transition to protect current data from the harvest now, decrypt later threat.
Authors:Lakshya Chopra, Vipin Kumar Rathi
Abstract:
Post-quantum signature schemes such as ML-DSA-65 produce signatures of 3,309 bytes and public keys of 1,952 bytes over 50 times larger than classical Ed25519. In TLS-authenticated environments like Kubernetes control planes and 5G Core networks, where every inter-component connection is mutually authenticated, this overhead compounds across thousands of handshakes per second. Merkle Tree Certificates (MTC), currently under development at IETF, replace per-certificate issuer signatures with Merkle inclusion proofs and, in the landmark mode, eliminate on-wire signatures from certificate authentication entirely. We present MTC-based PKI architectures for Kubernetes and 3GPP 5G Service-Based Architecture. Starting from the infrastructure layer, we replace the Kubernetes cluster CA with an MTCA deployment that issues MTC certificates to control plane components, with cosigners and a DaemonSet-based landmark distributor. Building on this, we design a certificate lifecycle for 5G Network Functions deployed against QORE, a post-quantum 5G Core. We implement MTC proof construction and verification in Go crypto/tls and crypto/x509 packages. Our measurements on an Intel i9-12900 show MTC landmark verification completing in under 2 μs compared to 24 microseconds for ECDSA signature verification-with no measurable impact on TLS handshake time. We further propose a 6G-native architecture where the NRF serves as the MTCA and the SCP as witness cosigner, and discuss applicability to Non-Terrestrial Networks.
Authors:Jingzhe Zhang, Yitong Shen, Ning Wang, Yili Ren
Abstract:
With the rapid evolution of wireless technologies, Wi-Fi has expanded beyond its original role in data transmission to support various emerging applications, particularly in physical-layer security, including device authentication, user authentication, and secret key generation. Despite extensive research on Wi-Fi Channel State Information (CSI)-based physical-layer security, its vulnerabilities remain largely unexplored. In this work, we propose BFIAttack, a novel attack that exploits Beamforming Feedback Information (BFI) to reconstruct the CSI of a legitimate user or device, thereby compromising Wi-Fi-based physical-layer security. We realize the attack by leveraging a closed-form CSI reconstruction method for the single-antenna station scenario and a maximum likelihood estimation-based CSI reconstruction for the multi-antenna station scenario. Moreover, we exploit spatial similarities among antenna pairs to refine the reconstructed CSI and enhance attack effectiveness. Experimental results show that BFIAttack achieves an average attack success rate of $73\%$ in multi-antenna station scenarios with no more than five attack attempts, and over $93\%$ in single-antenna station scenarios with only a single attempt. BFIAttack reveals critical vulnerabilities in existing Wi-Fi-based physical-layer security.
Authors:Leonardo Bitzki, Diego Kreutz, Tiago Heinrich, Douglas Fideles, Leandro Bertholdo, Silvio Quincozes, Angelo Diniz
Abstract:
Cybersecurity research increasingly depends on reproducible evidence, such as traffic traces, logs, and labeled datasets, yet most public datasets remain static and offer limited support for controlled re-execution and traceability, especially in heterogeneous multi-protocol environments. This paper presents NetSecBed, a container-native, scenario-oriented testbed for reproducible generation of network traffic evidence and execution artifacts under controlled conditions, particularly suitable for IoT, IIoT, and pervasive multi-protocol environments. The framework integrates 60 attack scenarios, 9 target services, and benign traffic generators as single-purpose containers, enabling plug-and-play extensibility and traceability through declarative specifications. Its pipeline automates parametrized execution, packet capture, log collection, service probing, feature extraction, and dataset consolidation. The main contribution is a repeatable, auditable, and extensible framework for cybersecurity experimentation that reduces operational bias and supports continuous dataset generation.
Authors:Shixuan Zhao, Weicheng Wang, Ninghui Li, Zhiqiang Lin
Abstract:
Protecting sensitive information in data-driven collaborations, such as AI training, while meeting the diverse requirements of multiple mutually distrusted stakeholders, is both crucial and challenging. This paper presents Styx, a novel framework to address this challenge by integrating sticky policies with Trusted Execution Environments (TEEs). At a high level, Styx employs a hardware-TEE-protected middleware with a programming language runtime to form a sandboxed environment for both the data processing and policy enforcement. We carefully designed a data processing workflow and pipelines to enable a strong yet flexible data-specific policy enforcement throughout the entire data lifecycle and data derivation to achieve data-in-use protection, data lifecycle protection and dynamic collaboration. We implemented Styx and demonstrated its ability to make collaborative computing, such as joint AI training, more secure, privacy-preserving, and policy-compliant. Our evaluation shows the performance overheads imposed by Styx are reasonable on single-node computation with the capability to scale to a large distributed multi-node deployment.
Authors:Houzhe Wang, Xiaojie Zhu, Chi Chen
Abstract:
With the increasing importance of data privacy and security, federated unlearning emerges as a new research field dedicated to ensuring that once specific data is deleted, federated learning models no longer retain or disclose related information. In this paper, we propose a zero-shot federated unlearning scheme, named Jellyfish. It distinguishes itself from conventional federated unlearning frameworks in four key aspects: synthetic data generation, knowledge disentanglement, loss function design, and model repair. To preserve the privacy of forgotten data, we design a zero-shot unlearning mechanism that generates error-minimization noise as proxy data for the data to be forgotten. To maintain model utility, we first propose a knowledge disentanglement mechanism that regularises the output of the final convolutional layer by restricting the number of activated channels for the data to be forgotten and encouraging activation sparsity. Next, we construct a comprehensive loss function that incorporates multiple components, including hard loss, confusion loss, distillation loss, model weight drift loss, gradient harmonization, and gradient masking, to effectively align the learning trajectories of the objectives of ``forgetting" and ``retaining". Finally, we propose a zero-shot repair mechanism that leverages proxy data to restore model accuracy within acceptable bounds without accessing users' local data. To evaluate the performance of the proposed zero-shot federated unlearning scheme, we conducted comprehensive experiments across diverse settings. The results validate the effectiveness and robustness of the scheme.
Authors:Taibiao Zhao, Xiang Zhang, Mingxuan Sun, Ruyi Ding, Xugui Zhou
Abstract:
Modern advanced driver assistance systems (ADAS) rely on deep neural networks (DNNs) for perception and planning. Since DNNs' parameters reside in DRAM during inference, bit flips caused by cosmic radiation or low-voltage operation may corrupt DNN computations, distort driving decisions, and lead to real-world incidents. This paper presents a SpatioTemporal-Aware Fault Injection (STAFI) framework to locate critical fault sites in DNNs for ADAS efficiently. Spatially, we propose a Progressive Metric-guided Bit Search (PMBS) that efficiently identifies critical network weight bits whose corruption causes the largest deviations in driving behavior (e.g., unintended acceleration or steering). Furthermore, we develop a Critical Fault Time Identification (CFTI) mechanism that determines when to trigger these faults, taking into account the context of real-time systems and environmental states, to maximize the safety impact. Experiments on DNNs for a production ADAS demonstrate that STAFI uncovers 29.56x more hazard-inducing critical faults than the strongest baseline.
Authors:Alex R. Mattukat, Vincent Schmandt, Timo Langstrof, Michael Zerbe, Horst Lichter
Abstract:
Authentication is a fundamental security means for protecting system resources. Authenticator-centric authentication techniques (AuthN Techniques) address how mechanisms and credentials are used via Authenticators. There are many AuthN Techniques that differ in many ways and there exist classification approaches that aim to structure them. However, they are limited in the aspects they classify and are not flexible enough to accommodate the diverse nature of AuthN Techniques. This paper presents two contributions. First, novel, faceted classification schemes for AuthN Techniques and Authenticators are presented. The schemes were developed based on 345 papers identified through a targeted LLM-assisted literature review and semantic clustering. The classification schemes were applied to build a catalog of Authenticators and AuthN Techniques; the second contribution of this paper. This paper presents our methodology, the classification schemes with example applications, the list of AuthN Techniques from the catalog, and discussions on future work.
Authors:Wenzheng Zhao, Madhava Kalyan Gadiputi, Fengpei Yuan
Abstract:
Open-domain video platforms offer rich, personalized content that could support health, caregiving, and educational applications, but their engagement-optimized recommendation algorithms can expose vulnerable users to inappropriate or harmful material. These risks are especially acute in child-directed and care settings (e.g., dementia care), where content must satisfy individualized safety constraints before being shown. We introduce SafeScreen, a safety-first video screening framework that retrieves and presents personalized video while enforcing individualized safety constraints. Rather than ranking videos by relevance or popularity, SafeScreen treats safety as a prerequisite and performs sequential approval or rejection of candidate videos through an automated pipeline. SafeScreen integrates three key components: (i) profile-driven extraction of individualized safety criteria, (ii) evidence-grounded assessments via adaptive question generation and multimodal VideoRAG analysis, and (iii) LLM-based decision-making that verifies safety, appropriateness, and relevance before content exposure. This design enables explainable, real-time screening of uncurated video repositories without relying on precomputed safety labels. We evaluate SafeScreen in a dementia-care reminiscence case study using 30 synthetic patient profiles and 90 test queries. Results demonstrate that SafeScreen prioritizes safety over engagement, diverging from YouTube's engagement-optimized rankings in 80-93% of cases, while maintaining high levels of safety coverage, sensibleness, and groundedness, as validated by both LLM-based evaluation and domain experts.
Authors:Pengzhen Ke, Liang Feng Zhang, Huaxiong Wang, Li-Ping Wang
Abstract:
Authenticated private information retrieval (APIR) is the state-of-the-art error-detecting private information retrieval (ED-PIR), using Distributed Point Functions (DPFs) for subpolynomial complexity and privacy. However, its finite field structure restricts it to prime-order DPFs, leading to prohibitively large key sizes under information-theoretic settings, while its dual-DPF-key design introduces unnecessary communication overhead, limiting its practicality for large-scale deployments. This paper proposes a novel ring-based information-theoretic ED-PIR (itED-PIR) scheme that overcomes these limitations by leveraging prime-power-order information-theoretic DPFs (itDPFs). Built over a prime-power ring, the proposed scheme breaks APIR's field-induced constraint to enable more efficient DPF utilization, significantly reducing key size growth and rendering the scheme feasible for high-security scenarios. Additionally, a single-itDPF-key design halves query-side communication overhead by eliminating APIR's redundant dual-key setup, without compromising privacy or verifiability. Beyond immediate efficiency gains, this work establishes a lightweight, flexible framework for constructing DPF-based malicious-resilient private information retrieval, opening new avenues for privacy-preserving data retrieval in distributed storage systems and post-quantum privacy protocols.
Authors:Jack Hughes, Ben Collier, Daniel R. Thomas
Abstract:
Existential risk scenarios relating to Generative Artificial Intelligence often involve advanced systems or agentic models breaking loose and using hacking tools to gain control over critical infrastructure. In this paper, we argue that the real threats posed by generative AI for cybercrime are rather different. We apply innovation theory and evolutionary economics - treating cybercrime as an ecosystem of small- and medium-scale tech start-ups, coining two novel terms that bound the upper and lower cases for disruption. At the high end, we propose the Stand-Alone Complex, in which cybercrime-gang-in-a-box solutions enable individual actors to largely automate existing cybercrime-as-a-service arrangements. At the low end, we suggest the phenomenon of Vibercrime, in which 'vibe coding' lowers the barrier to entry, but do not fundamentally reshape the economic structures of cybercrime. We analyse early empirical data from the cybercrime underground, and find the reality is prosaic - AI has some early adoption in existing large-scale, low-profit passive income schemes and trivial forms of fraud but there is little evidence so far on widespread disruption in cybercrime. This replaces existing means of code pasting, error checking, and cheatsheet consultation, for generic aspects of software development involved in cybercrime - and largely for already skilled actors, with low-skill actors finding little utility in vibe coding tools compared to pre-made scripts. The role of jailbroken LLMs (Dark AI) as instructors is also overstated, given the prominence of subculture and social learning in initiation - new users value the social connections and community identity involved in learning hacking and cybercrime skills as much as the knowledge itself. Our initial results, therefore, suggest that even bemoaning the rise of the Vibercriminal may be overstating the level of disruption to date.
Authors:Ziyu Mu, Xiyu Shi, Safak Dogan
Abstract:
Intrusion Detection System (IDS) is often calibrated to known attacks and generalizes poorly to unknown threats. This paper proposes GMA-SAWGAN-GP, a novel generative augmentation framework built on a Self-Attention-enhanced Wasserstein GAN with Gradient Penalty (WGAN-GP). The generator employs Gumbel-Softmax regularization to model discrete fields, while a Multilayer Perceptron (MLP)-based AutoEncoder acts as a manifold regularizer. A lightweight gating network adaptively balances adversarial and reconstruction losses via entropy regularization, improving stability and mitigating mode collapse. The self-attention mechanism enables the generator to capture both short- and long-range dependencies among features within each record while preserving categorical semantics through Gumbel-Softmax heads. Extensive experiments on NSL-KDD, UNSW-NB15, and CICIDS2017 using five representative IDS models demonstrate that GMA-SAWGAN-GP significantly improves detection performance on known attacks and enhances generalization to unknown attacks. Leave-One-Attack-type-Out (LOAO) evaluations using Area Under the Receiver Operating Characteristic (AUROC) and True Positive Rate at a 5 percent False Positive Rate confirm that IDS models trained on augmented datasets achieve higher robustness under unseen attack scenarios. Ablation studies validate the contribution of each component to performance gains. Compared with baseline models, the proposed framework improves binary classification accuracy by an average of 5.3 percent and multi-classification accuracy by 2.2 percent, while AUROC and True Positive Rate at a 5 percent False Positive Rate for unknown attacks increase by 3.9 percent and 4.8 percent, respectively, across the three datasets. Overall, GMA-SAWGAN-GP provides an effective approach to generative augmentation for mixed-type network traffic, improving IDS accuracy and resilience.
Authors:Zhuan Shi, Alireza Dehghanpour Farashah, Rik de Vries, Golnoosh Farnadi
Abstract:
Concept erasure in text-to-image diffusion models seeks to remove undesired concepts while preserving overall generative capability. Localized erasure methods aim to restrict edits to the spatial region occupied by the target concept. However, we observe that suppressing a concept can unintentionally weaken semantically related neighbor concepts, reducing fidelity in fine-grained domains. We propose Neighbor-Aware Localized Concept Erasure (NLCE), a training-free framework designed to better preserve neighboring concepts while removing target concepts. It operates in three stages: (1) a spectrally-weighted embedding modulation that attenuates target concept directions while stabilizing neighbor concept representations, (2) an attention-guided spatial gate that identifies regions exhibiting residual concept activation, and (3) a spatially-gated hard erasure that eliminates remaining traces only where necessary. This neighbor-aware pipeline enables localized concept removal while maintaining the surrounding concept neighborhood structure. Experiments on fine-grained datasets (Oxford Flowers, Stanford Dogs) show that our method effectively removes target concepts while better preserving closely related categories. Additional results on celebrity identity, explicit content and artistic style demonstrate robustness and generalization to broader erasure scenarios.
Authors:Dimitris Stripelis, Patrick Foley, Mohammad Naseri, William Lindskog-Münzing, Chong Shen Ng, Daniel Janes Beutel, Nicholas D. Lane
Abstract:
RAG typically assumes centralized access to documents, which breaks down when knowledge is distributed across private data silos. We propose a secure Federated RAG system built using Flower that performs local silo retrieval, while server-side aggregation and text generation run inside an attested, confidential compute environment, enabling confidential remote LLM inference even in the presence of honest-but-curious or compromised servers. We also propose a cascading inference approach that incorporates a non-confidential third-party model (e.g., Amazon Nova) as auxiliary context without weakening confidentiality.
Authors:Younes Salmi, Hanna Bogucka
Abstract:
Deep learning (DL) has been widely studied for assisting applications of modern wireless communications. One of the applications is automatic modulation classification (AMC). However, DL models are found to be vulnerable to adversarial machine learning (AML) threats. One of the most persistent and stealthy threats is the backdoor (Trojan) attack. Nevertheless, most studied threats originate from other AI domains, such as computer vision (CV). Therefore, in this paper, a physical backdoor attack targeting the wireless signal before transmission is studied. The adversary is considered to be using explainable AI (XAI) to guide the placement of the trigger in the most vulnerable parts of the signal. Then, a class prototype combined with principal components is used to generate the trigger. The studied threat was found to be efficient in breaching multiple DL-based AMC models. The attack achieves high success rates for a wide range of SNR values and a small poisoning ratio.
Authors:Younes Salmi, Hanna Bogucka
Abstract:
Deep Learning (DL) has become a key technology that assists radio frequency (RF) signal classification applications, such as modulation classification. However, the DL models are vulnerable to adversarial machine learning threats, such as data manipulation attacks. We study a physical backdoor (Trojan) attack that targets a DL-based modulation classifier. In contrast to digital backdoor attacks, where digital triggers are injected into the training dataset, we use power amplifier (PA) non-linear distortions to create physical triggers before the dataset is formed. During training, the adversary manipulates amplitudes of RF signals and changes their labels to a target modulation scheme, training a backdoored model. At inference, the adversary aims to keep the backdoor attack inactive such that the backdoored model maintains high accuracy on test signals. However, if they apply the same manipulation used during training on these test signals, the backdoor is activated, and the model misclassifies these signals. We demonstrate that our proposed attack achieves high attack success rates with few manipulated RD signals for different noise levels. Furthermore, we test the resilience of the proposed attack to multiple defense techniques, and the results show that these techniques fail to mitigate the attack.
Authors:Younes Salmi, Hanna Bogucka
Abstract:
This paper investigates the susceptibility to model integrity attacks that overload virtual machines assigned by the k-means algorithm used for resource provisioning in fog networks. The considered k-means algorithm runs two phases iteratively: offline clustering to form clusters of requested workload and online classification of new incoming requests into offline-created clusters. First, we consider an evasion attack against the classifier in the online phase. A threat actor launches an exploratory attack using query-based reverse engineering to discover the Machine Learning (ML) model (the clustering scheme). Then, a passive causative (evasion) attack is triggered in the offline phase. To defend the model, we suggest a proactive method using adversarial training to introduce attack robustness into the classifier. Our results show that our mitigation technique effectively maintains the stability of the resource provisioning system against attacks.
Authors:Martin Herrmann, Oussama Draissi, Christian Niesler, Ahmad-Reza Sadeghi, Lucas Davi
Abstract:
Microarchitectural vulnerabilities increasingly undermine the assumption that hardware can be treated as a reliable root of trust. Prevention mechanisms often lag behind evolving attack techniques, leaving deployed systems unable to assume continued trustworthiness. We propose a shift from prevention to detection through microarchitectural-aware remote attestation. As a first instantiation of this idea, we present HammerWatch, a Rowhammer-aware remote attestation protocol that enables an external verifier to assess whether a system exhibits hardware-induced disturbance behavior. HammerWatch leverages memory-level evidence available on commodity platforms, specifically Machine-Check Exceptions (MCEs) from ECC DRAM and counter-based indicators from Per-Row Activation Counting (PRAC), and protects these measurements against kernel-level adversaries using TPM-anchored hash chains. We implement HammerWatch on commodity hardware and evaluate it on 20000 simulated benign and malicious access patterns. Our results show that the verifier reliably distinguishes Rowhammer-like behavior from benign operation under conservative heuristics, demonstrating that detection-oriented attestation is feasible and can complement incomplete prevention mechanisms
Authors:Oussama Draissi, Mark Günzel, Ahmad-Reza Sadeghi, Lucas Davi
Abstract:
WebAssembly's (Wasm) monolithic linear memory model facilitates memory corruption attacks that can escalate to cross-site scripting in browsers or go undetected when a malicious host tampers with a module's state. Existing defenses rely on invasive binary instrumentation or custom runtimes, and do not address runtime integrity verification under an adversarial host model. We present Walma, a framework for WebAssembly Linear Memory Attestation that leverages machine learning to detect memory corruption and external tampering by classifying memory snapshots. We evaluate Walma on six real-world CVE-affected applications across three verification backends (cpu-wasm, cpu-tch, gpu) and three instrumentation policies. Our results demonstrate that CNN-based classification can effectively detect memory corruption in applications with structured memory layouts, with coarse-grained boundary checks incurring as low as 1.07x overhead, while fine-grained monitoring introduces higher (1.5x--1.8x) but predictable costs. Our evaluation quantifies the accuracy and overhead trade-offs across deployment configurations, demonstrating the practical feasibility of ML-based memory attestation for WebAssembly.
Authors:Qianlong Lan, Anuj Kaul
Abstract:
Deploying large language models (LLMs) as autonomous browser agents exposes a significant attack surface in the form of Indirect Prompt Injection (IPI). Cloud-based defenses can provide strong semantic analysis, but they introduce latency and raise privacy concerns. We present the Cognitive Firewall, a three-stage split-compute architecture that distributes security checks across the client and the cloud. The system consists of a local visual Sentinel, a cloud-based Deep Planner, and a deterministic Guard that enforces execution-time policies. Across 1,000 adversarial samples, edge-only defenses fail to detect 86.9% of semantic attacks. In contrast, the full hybrid architecture reduces the overall attack success rate (ASR) to below 1% (0.88% under static evaluation and 0.67% under adaptive evaluation), while maintaining deterministic constraints on side-effecting actions. By filtering presentation-layer attacks locally, the system avoids unnecessary cloud inference and achieves an approximately 17,000x latency advantage over cloud-only baselines. These results indicate that deterministic enforcement at the execution boundary can complement probabilistic language models, and that split-compute provides a practical foundation for securing interactive LLM agents.
Authors:Matías Pizarro, Raghavan Narasimhan, Asja Fischer
Abstract:
With the increasing deployment of automated and agentic systems, ensuring the adversarial robustness of automatic speech recognition (ASR) models has become critical. We observe that changing the precision of an ASR model during inference reduces the likelihood of adversarial attacks succeeding. We take advantage of this fact to make the models more robust by simple random sampling of the precision during prediction. Moreover, the insight can be turned into an adversarial example detection strategy by comparing outputs resulting from different precisions and leveraging a simple Gaussian classifier. An experimental analysis demonstrates a significant increase in robustness and competitive detection performance for various ASR models and attack types.
Authors:Charoes Huang, Xin Huang, Ngoc Phu Tran, Amin Milani Fard
Abstract:
The Model Context Protocol (MCP) has rapidly emerged as a universal standard for connecting AI assistants to external tools and data sources. While MCP simplifies integration between AI applications and various services, it introduces significant security vulnerabilities, particularly on the client side. In this work we conduct threat modelings of MCP implementations using STRIDE (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege) and DREAD (Damage, Reproducibility, Exploitability, Affected Users, Discoverability) frameworks across five key components: (1) MCP Host and Client, (2) LLM, (3) MCP Server, (4) External Data Stores, and (5) Authorization Server. This comprehensive analysis reveals tool poisoning-where malicious instructions are embedded in tool metadata-as the most prevalent and impactful client-side vulnerability. We therefore focus our empirical evaluation on this critical attack vector, providing a systematic comparison of how seven major MCP clients validate and defend against tool poisoning attacks. Our analysis reveals significant security issues with most tested clients due to insufficient static validation and parameter visibility. We propose a multi-layered defense strategy encompassing static metadata analysis, model decision path tracking, behavioral anomaly detection, and user transparency mechanisms. This research addresses a critical gap in MCP security, which has primarily focused on server-side vulnerabilities, and provides actionable recommendations and mitigation strategies for securing AI agent ecosystems.
Authors:Eduard Hirsch, Kristina Raab
Abstract:
Cryptographic migration driven by algorithm deprecation, regulatory change, and post-quantum readiness requires more than an inventory of cryptographic assets. Existing Cryptographic Bills of Materials (CBOMs) are typically tool- or inventory-derived. They lack architectural intent, rationale, and security context, limiting their usefulness for migration planning. This paper introduces Security-Aware Architecture Tradeoff Analysis Method (SATAM), a security-aware adaptation of scenario-based architecture evaluation that derives an architecture-grounded, context-sensitive CBOM. SATAM integrates established approaches: ATAM, arc42, STRIDE, ADR, and CARAF. These are included to identify and analyze security-relevant cryptographic decision points and document them as explicit architectural decisions. These artifacts are used to annotate CBOM entries with architectural context, security intent, and migration-critical metadata using CycloneDX-compatible extensions. Following a Design Science Research approach, the paper presents the method design, a conceptual traceability model, and an illustrative application. The results demonstrate that architecture-derived CBOMs capture migration-relevant context that is typically absent from inventory-based approaches. Thereby, SATAM improves availability of information required for informed cryptographic migration planning and long-term cryptographic agility.
Authors:Victor Jüttner, Erik Buchmann
Abstract:
Smart homes are increasingly targeted by cyberattacks, yet residents often lack guidance when incidents occur. Since affected residents are likely to seek help from trustworthy sources, this paper asks: What actionable cybersecurity guidance do governments provide to smart home users whose systems have been compromised? To answer this question, we conduct an exploratory, user-centered review of governmental cybersecurity guidance for smart homes across eleven countries to identify and characterize the types of guidance governments provide and to systematize their content. Using a standardized search and screening process, we derive three emergent clusters: incident reporting, general security recommendations, and incident response. Our findings show that governments provide abundant general security advice and accessible reporting channels, but structured incident response guidance tailored to smart homes is rare. Only two sources offer step-by-step recovery guidance for non-expert users, highlighting a gap between preventive advice and post-incident support.
Authors:Charoes Huang, Xin Huang, Amin Milani Fard
Abstract:
Prompt injection is listed as the number-one vulnerability class in the OWASP Top 10 for LLM Applications that can subvert LLM guardrails, disclose sensitive data, and trigger unauthorized tool use. Developers are rapidly adopting AI-assisted development tools built on the Model Context Protocol (MCP). However, their convenience comes with security risks, especially prompt-injection attacks delivered via tool-poisoning vectors. While prior research has studied prompt injection in LLMs, the security posture of real-world MCP clients remains underexplored. We present the first empirical analysis of prompt injection with the tool-poisoning vulnerability across seven widely used MCP clients: Claude Desktop, Claude Code, Cursor, Cline, Continue, Gemini CLI, and Langflow. We identify their detection and mitigation mechanisms, as well as the coverage of security features, including static validation, parameter visibility, injection detection, user warnings, execution sandboxing, and audit logging. Our evaluation reveals significant disparities. While some clients, such as Claude Desktop, implement strong guardrails, others, such as Cursor, exhibit high susceptibility to cross-tool poisoning, hidden parameter exploitation, and unauthorized tool invocation. We further provide actionable guidance for MCP implementers and the software engineering community seeking to build secure AI-assisted development workflows.
Authors:Charoes Huang, Xin Huang, Amin Milani Fard
Abstract:
The Model Context Protocol (MCP) has emerged as a standard for connecting Large Language Models (LLMs) to external tools and data. However, MCP servers often expose privileged capabilities, such as file system access, network requests, and command execution that can be exploited if not properly secured. We present mcp-sec-audit, an extensible security assessment toolkit designed specifically for MCP servers. It implements static pattern matching for Python-based MCP servers and dynamic sandboxed fuzzing and monitoring via Docker and eBPF. The tool detects risky capabilities through configurable rule-based analysis and provides mitigation recommendations.
Authors:Erez Badash, Dan Boneh, Ilan Komargodski, Megha Srivastava
Abstract:
We present Hawkeye, a system for analyzing and reproducing GPU-level arithmetic operations. Using our framework, anyone can re-execute on a CPU the exact matrix multiplication operations underlying a machine learning model training or inference workflow that was executed on an NVIDIA GPU, without any precision loss. This is in stark contrast to prior approaches to verifiable machine learning, which either introduce significant computation overhead to the original model owner, or suffer from non-robustness and quality degradation. The main technical contribution of Hawkeye is a systematic sequence of carefully crafted tests that study rounding direction, subnormal number handling, and order of (non-associative) accumulation during matrix multiplication on NVIDIA's Tensor Cores. We test and evaluate our framework on multiple NVIDIA GPU architectures ( Ampere, Hopper, and Lovelace) and precision types (FP16, BFP16, FP8). In all test cases, Hawkeye enables perfect reproduction of matrix multiplication on a CPU, paving the way for efficient and trustworthy third-party auditing of ML model training and inference.
Authors:Víctor García, Santaigo Escobar, Catherine Meadows, Jose Meseguer
Abstract:
Formal patterns are formally specified solutions to frequently occurring distributed system problems that are generic, executable, and come with strong qualitative and/or quantitative formal guarantees. A formal pattern is a generic system transformation which transforms a usually infinite class of systems in need of the pattern's solution into enhanced versions of such systems that solve the problem in question. In this paper we demonstrate the application of formal patterns to protocol dialects. Dialects are methods for hardening protocols so as to endow them with light-weight security, especially against easy attacks that can lead to more serious ones. A lingo is a dialect's key security component, because attackers are unable to ''speak'' the lingo. A lingo's ''talk'' changes all the time, becoming a moving target for attackers. In this paper we present several formal patterns for both lingos and dialects. Lingo formal patterns can make lingos stronger by both transforming them and by composing several lingos into a stronger lingo. Dialects themselves can be obtained by the application of a single dialect formal pattern, generic on both the chosen lingo and the chosen protocol.
Authors:Varun Kohli, Biplab Sikdar
Abstract:
As the Internet of Things (IoT) becomes an integral part of critical infrastructure, smart cities, and consumer networks, there has been an increase in the number of software attacks on the microcontrollers (MCUs) that constitute such networks. Runtime firmware attestation, i.e., the verification of a firmware's integrity, has become instrumental, and prior work focuses on lightweight IoT MCUs, offloading the verification task to capable remote verifiers. However, modern IoT devices feature large flash and volatile memory, on-device TinyML inference, and Trusted Execution Environments (TEE). Leveraging these capabilities, this paper presents a verifier-less, hybrid Self-Attestation (SA) framework called LiteAtt, which is based on TinyML execution in the Arm TrustZone of an IoT MCU for quick, on-device evaluation of the IoT firmware's SRAM footprint. LiteAtt takes a step towards ubiquitous intelligence and decentralized trust in IoT networks. It eliminates the need for firmware copies for attestation, and protects the privacy of user SRAM data by leveraging twin devices to train the TinyML models. The proposed framework achieves an average accuracy of 98.7%, F1 score of 99.33%, TPR of 98.72%, and TNR of 97.45% on SRAM attestation datasets collected from real devices. LiteAtt operates with a latency of 1.29ms, an energy consumption of 42.79uJ, and a runtime memory overhead of up to 32KB, which is suitable for battery-operated Arm Cortex-M devices. A security analysis is provided for the protocol regarding mutual authentication, confidentiality, integrity, SRAM privacy, and defense against replay and impersonation attacks. Practical deployment scenarios and future works are also discussed.
Authors:Yihua Hu, Kuncan Wang, Wei Dong
Abstract:
Graph pattern counting serves as a cornerstone of network analysis with extensive real-world applications. Its integration with local differential privacy (LDP) has gained growing attention for protecting sensitive graph information in decentralized settings. However, existing LDP frameworks are largely ad hoc, offering solutions only for specific patterns such as triangles and stars. A general mechanism for counting arbitrary graph patterns, even for the subclass of acyclic patterns, has remained an open problem. To fill this gap, we present the first general solution for counting arbitrary acyclic patterns under LDP. We identify and tackle two fundamental challenges: generalizing pattern construction from distributed data and eliminating node duplication during the construction. To address the first challenge, we propose an LDP-tailored recursive subpattern counting framework that incrementally builds patterns across multiple communication rounds. For the second challenge, we apply a random marking technique that restricts each node to a unique position in the pattern during computation. Our mechanism achieves strong utility guarantees: for any acyclic graph pattern with $k$ edges, we achieve an additive error of $\tilde{O}(\sqrt{N}d(G)^k)$, where $N$ is the number of nodes and $d(G)$ is the maximum degree of the input graph $G$. Experiments on real-world graph datasets across multiple types of acyclic patterns demonstrate that our mechanisms achieve up to $46$-$2600\times$ improvement in utility and $300$-$650\times$ reduction in communication cost compared to the baseline methods.
Authors:Ziyu Mu, Xiyu Shi, Safak Dogan
Abstract:
The increasing sophistication of cyber threats, especially zero-day attacks, poses a significant challenge to cybersecurity. Zero-day attacks exploit unknown vulnerabilities, making them difficult to detect and defend against. Existing approaches patch flaws and deploy an Intrusion Detection System (IDS). Using advanced Wasserstein GANs with Gradient Penalty (WGAN-GP), this paper makes a novel proposition to synthesize network traffic that mimics zero-day patterns, enriching data diversity and improving IDS generalization. SA-WGAN-GP is first introduced, which adds a Self-Attention (SA) mechanism to capture long-range cross-feature dependencies by reshaping the feature vector into tokens after dense projections. A JS-WGAN-GP is then proposed, which adds a Jensen-Shannon (JS) divergence-based auxiliary discriminator that is trained with Binary Cross-Entropy (BCE), frozen during updates, and used to regularize the generator for smoother gradients and higher sample quality. Third, SA-JS-WGAN-GP is created by combining the SA mechanism with JS divergence, thereby enhancing the data generation ability of WGAN-GP. As data augmentation does not equate with true zero-day attack discovery, we emulate zero-day attacks via the leave-one-attack-type-out method on the NSL-KDD dataset for training all GANs and IDS models in the assessment of the effectiveness of the proposed solution. The evaluation results show that integrating SA and JS divergence into WGAN-GP yields superior IDS performance and more effective zero-day risk detection.
Authors:Tanmay Sah, Vishal Srivastava, Dolly Sah, Kayden Jordan
Abstract:
We study how runtime enforcement against unsafe actions affects end-to-end task performance in multi-step tool using large language model (LLM) agents. Using tau-bench across Airline and Retail domains, we compare baseline Tool-Calling, planning-integrated (TRIAD), and policy-mediated (TRIAD-SAFETY) architectures with GPT-OSS-20B and GLM-4-9B. We identify model dependent interaction horizons (15 to 30 turns) and decompose outcomes into overall success rate (SR), safe success rate (SSR), and unsafe success rate (USR). Our results reveal a persistent Safety Capability Gap. While safety mediation can intercept up to 94 percent of non-compliant actions, it rarely translates into strictly safe goal attainment (SSR below 5 percent in most settings). We find that high unsafe success rates are primarily driven by Integrity Leaks, where models hallucinate user identifiers to bypass mandatory authentication. Recovery rates following blocked actions are consistently low, ranging from 21 percent for GPT-OSS-20B in simpler procedural tasks to near zero in complex Retail scenarios. These results demonstrate that runtime enforcement imposes a significant verifier tax on conversational length and compute cost without guaranteeing safe completion, highlighting the critical need for agents capable of grounded identity verification and post-intervention reasoning.
Authors:Sheng Liu, Panos Papadimitratos
Abstract:
FL has emerged as a transformative paradigm for ITS, notably camera-based Road Condition Classification (RCC). However, by enabling collaboration, FL-based RCC exposes the system to adversarial participants launching Targeted Label-Flipping Attacks (TLFAs). Malicious clients (vehicles) can relabel their local training data (e.g., from an actual uneven road to a wrong smooth road), consequently compromising global model predictions and jeopardizing transportation safety. Existing countermeasures against such poisoning attacks fail to maintain resilient model performance near the necessary attack-free levels in various attack scenarios due to: 1) not tailoring poisoned local model detection to TLFAs, 2) not excluding malicious vehicular clients based on historical behavior, and 3) not remedying the already-corrupted global model after exclusion. To close this research gap, we propose FedTrident, which introduces: 1) neuron-wise analysis for local model misbehavior detection (notably including attack goal identification, critical feature extraction, and GMM-based model clustering and filtering); 2) adaptive client rating for client exclusion according to the local model detection results in each FL round; and 3) machine unlearning for corrupted global model remediation once malicious clients are excluded during FL. Extensive evaluation across diverse FL-RCC models, tasks, and configurations demonstrates that FedTrident can effectively thwart TLFAs, achieving performance comparable to that in attack-free scenarios and outperforming eight baseline countermeasures by 9.49% and 4.47% for the two most critical metrics. Moreover, FedTrident is resilient to various malicious client rates, data heterogeneity levels, complicated multi-task, and dynamic attacks.
Authors:Akshey Sigdel, Rista Baral
Abstract:
Tool-using automation systems, from scripts and CI bots to agentic assistants, fail in recurring patterns. Common failures include unsafe side effects, invalid arguments, uncontrolled retries, and leakage of sensitive outputs. Many mitigations are model-centric and prompt-dependent, so they are brittle and do not generalize to non-LLM callers. We present Policy-First Tooling, a model-agnostic permission layer that mediates tool invocation through explicit constraints, risk-aware gating, recovery controls, and auditable explanations. The paper contributes a compact policy DSL, a runtime enforcement architecture with actionable rationale and fix hints, and a reproducible benchmark based on trace replay with controlled fault and misuse injection. In 225 controlled runs across five policy packs and three fault profiles, stricter packs improve violation prevention from 0.000 in P0 to 0.681 in P4, while task success drops from 0.356 to 0.067. Retry amplification decreases from 3.774 in P0 to 1.378 in P4, and leakage recall reaches 0.875 under injected secret outputs. These results make safety to utility trade-offs explicit and measurable.
Authors:Judith Senn, Aljosha Judmayer, Nicholas Stifter, Rainer Böhme
Abstract:
Central Bank Digital Currencies (CBDCs) are proposed as a public response to the uptake of privately run digital payments, with the digital euro, under development by the European Central Bank (ECB), serving as a prominent example. This momentum provides a unique opportunity to fundamentally rethink the future of money, and, assuming wide adoption, to establish payment systems that offer strong cryptographic security and privacy guarantees from the start. While the central banks in charge are investigating privacy-enhancing technologies (PETs), they often conclude that PETs are immature or insufficiently scalable. Moreover, these efforts tend to examine primitives in isolation, offering little insight into how a system using these PETs would scale. This systematisation of knowledge, therefore, provides a structured, top-down technical analysis of 36 payment system designs of complete system proposals that can inform CBDC designs or were explicitly proposed for this application. We identify recurring design patterns, technical trade-offs, and implementation challenges. Concluding, we highlight research gaps, including offline payments and post-quantum security.
Authors:Aydin Abadi, Yvo Desmedt
Abstract:
As database deployments shift toward cloud platforms and edge devices, thin clients need to securely retrieve sensitive records without leaking their query intent or metadata to the proxies that mediate access. Oblivious Transfer (OT) is a core tool for private retrieval, yet existing OTs assume direct client-database interaction and lack support for delegated querying or lightweight clients. We present Oblivis, a modular framework of new OT protocols that enable delegated, privacy-preserving query execution. Oblivis allows clients to retrieve database records without direct access, protects against leakage to both databases and proxies, and is designed with practical efficiency in mind. Its components include: (1) Delegated-Query OT, which permits secure outsourcing of query generation; (2) Multi-Receiver OT for merged, cloud-hosted databases; (3) a compiler producing constant-size responses suitable for thin clients; and (4) Supersonic OT, a proxy-based, informationtheoretic, and highly efficient 1-out-of-2 OT. The protocols are formally defined and proven secure in the simulation-based paradigm, under non-colluding assumption. We implement and empirically evaluate Supersonic OT. It achieves at least a 92x speedup over a highly efficient 1-out-of-2 OT, and a 2.6x-106x speedup over a standard OT extension across 200-100,000 invocations. Our implementation further shows that Supersonic OT remains efficient even on constrained hardware, e.g., it completes an end-to-end transfer in 1.36 ms on a Raspberry Pi 4.
Authors:Rubén B. Mendez, Hans H. Brunner, Juan P. Brito, Hamid Taramit, Chi-Hang Fred Fung, Antonio Pastor, Rafael Cantó, Jesús Folgueira, Diego R. Lopez, Momtchil Peev, Vicente Martin
Abstract:
A monitor and control framework for quantum-key-distribution (QKD) networks equipped with switching capabilities was developed. On the one hand, this framework provides real-time visibility into operational metrics. Specifically, it extracts essential data, such as the switching capabilities of QKD modules, the number of keys stored in buffer queues of the QKD links, and the respective key generation and consumption rates along these links. On the other hand, this framework allows software-defined networking (SDN) applications to operate on the collected information and address the cryptographic needs of the network. The SDN applications dynamically adapt the configuration of the switched network to align with its changing demands, e.g.,~prioritizing key availability on critical paths, responding to link failures, or reallocating generation capacity to prevent bottlenecks. This contribution demonstrates that the combination of switched QKD, centralized control, and global optimization strategies enables efficient, policy-driven operation of QKD networks. The cryptographic resources are allocated to maximize performance and resilience while remaining aligned with the specific policies set by network administrators.
Authors:Clément Aubert, Ross Horne, Christian Johansen, Sjouke Mauw
Abstract:
An ever-increasing number of critical infrastructures rely heavily on the assumption that security protocols satisfy a wealth of requirements. Hence, the importance of certifying e.g., privacy properties using methods that are better at detecting attacks can hardly be overstated. This paper scrutinises the "unlinkability" privacy property using relations equating behaviours that cannot be distinguished by attackers. Starting from the observation that some reasonable design choice can lead to formalisms missing attacks, we draw attention to a classical concurrent semantics accounting for relationship between past events, and show that there are concurrency-aware semantics that can discover attacks on all protocols we consider.More precisely, we focus on protocols where trace equivalence is known to miss attacks that are observable using branching-time equivalences. We consider the impact of three dimensions: design decisions made by the programmer specifying an unlinkability problem (style), semantics respecting choices during execution (branching-time), and semantics sensitive to concurrency (non-interleaving), and discover that reasonable styles miss attacks unless we give attackers enough power to observe choices and concurrency. Our main contribution is to draw attention to how a popular concurrent semantics -- history-preserving bisimilarity -- when defined for the non-interleaving applied \(π\)-calculus, can discover attacks on all protocols we consider, regardless of the choice of style. Furthermore, we can describe all such attacks using a novel modal logic that is hence suitable to formally certify attacks on privacy properties.
Authors:Sefatun-Noor Puspa, Mashrur Chowdhury
Abstract:
Graphics processing units (GPUs) power many intelligent transportation systems (ITS) and automated driving applications, but remain largely unmonitored for safety and security. This article highlights GPU misuse as a critical blind spot, showing how unmanaged GPU workloads silently degrade real-time performance, demonstrating the need for stronger security measures in ITS.
Authors:Alzubair Hassan, Alkabashi Alnour, Bashar Nuseibeh, Liliana Pasquale
Abstract:
Authentication is crucial to confirm that an individual or entity trying to perform an action is actually who or what they claim to be. In dynamic environments such as the Internet of Things (IoT), Internet of Vehicles (IoV), healthcare, and smart cities, security risks can change depending on varying contextual factors (e.g., user attempting to authenticate, location, device type). Thus, authentication methods must adapt to mitigate changing security risks while meeting usability and performance requirements. However, existing adaptive authentication systems provide limited guidance on (a) representing contextual factors, requirements, and authentication methods (b) understanding the influence of contextual factors and authentication methods on the fulfilment of requirements, and (c) selecting effective authentication methods that reduce security risks while maximizing the satisfaction of the requirements. This paper proposes a framework for engineering adaptive authentication systems that dynamically select effective authentication methods to address changes in contextual factors and security risks. The framework leverages a contextual goal model to represent requirements and the influence of contextual factors on security risks and requirement priorities. It uses an extended feature model to represent potential authentication methods and their impacts on mitigating security risks and satisfying requirements. At runtime, when contextual factors change, the framework employs a Fuzzy Causal network encoded using the Z3 SMT solver to analyze the goal and feature models, enabling the selection of effective authentication methods. We demonstrate and evaluate our framework through its application to real-world authentication scenarios in the IoV and the healthcare domains.
Authors:Daniel Schadt, Christoph Coijanovic, Thorsten Strufe
Abstract:
Apps such as Firechat and Bridgefy have been used during recent protests in Hong Kong and Iran, as they allow communication over ad-hoc wireless networks even when internet access is restricted. However, these apps do not provide sufficient protection as they do not achieve forward secrecy in unreliable networks. Without forward secrecy, caught protesters' devices will disclose all previous messages to the authorities, putting them and others at great risk. In this paper, we introduce FoSAM, the first protocol to provide proven anonymous and forward secret messaging in unreliable ad-hoc networks. Communication in FoSAM requires only the receiver's public key, rather than an interactive handshake. We evaluate the performance of FoSAM using a large-scale simulation with different user movement patterns, showing that it achieves between 92% and 99% successful message delivery. We additionally implement a FoSAM prototype for Android.
Authors:Eric Jedermann, Piotr Kulpinski, Martin Strohmeier, Vincent Lenders, Jens Schmitt
Abstract:
The Iridium Low Earth Orbit (LEO) satellite constellation remains a unique provider of global communications for critical industries, governments, and private users, serving over 2.5 million active subscribers despite recent market competition. In contrast to terrestrial wireless standards such as 3GPP, Iridium protocol specifications are proprietary and have not undergone rigorous, public, and systematic security evaluation. In this work, we present the first comprehensive security analysis of Iridium authentication and radio link protocols. We reverse engineer Iridium SIM-based authentication mechanism and demonstrate that the secret key can be extracted from the SIM card, enabling full device cloning and impersonation attacks. Leveraging a month-long dataset of Iridium up- and downlink satellite traffic, we further show that nearly all signaling and radio communication protocols currently in use lack encryption, resulting in the exposure of sensitive information in cleartext over the air such as login credentials and large volumes of personal data. Finally, we develop custom software-defined radio (SDR) tools to carry out spoofing and jamming attacks, revealing that modestly equipped adversaries can inject falsified messages or disrupt the Iridium service locally due to the absence of source authentication. Our findings uncover systemic vulnerabilities in the Iridium radio link and highlight the urgent need for users of critical applications to transition to more secure communication radio links.
Authors:Bernhard Fischer, Daniel Dorfmeister, Flavio Ferrarotti, Manuel Penz, Michael Kargl, Martina Zeinzinger, Florian Eibensteiner
Abstract:
Embedded software used in industrial systems frequently relies on data that ensures the correct and efficient operation of these systems. Thus, companies invest considerable resources in fine-tuning this data, making it their valuable intellectual property (IP). We present a novel protection mechanism for this IP that combines hardware fingerprints with Boolean logic. Unlike usual copy-protection approaches, unauthorised copies of the software still run on cloned devices but suboptimally. According to our security evaluation, only a complex dynamic analysis of the protected software running on the genuine target device can reveal the secret data. This makes the protection offered by our method more difficult to bypass. Notably, our approach does not require additional hardware, relying only on relatively simple updates to the software. We evaluate our protection mechanism by binding the parameters of a PID controller to a microcontroller unit (MCU) by using a physically unclonable function (PUF) based on its SRAM.
Authors:Taha Eghtesad, Yevgeniy Vorobeychik, Aron Laszka
Abstract:
In modern transportation networks, adversaries can manipulate routing algorithms using false data injection attacks, such as simulating heavy traffic with multiple devices running crowdsourced navigation applications, to mislead vehicles toward suboptimal routes and increase congestion. To address these threats, we formulate a strategically zero-sum game between an attacker, who injects such perturbations, and a defender, who detects anomalies based on the observed travel times of network edges. We propose a computational method based on multi-agent reinforcement learning to compute a Nash equilibrium of this game, providing an optimal detection strategy, which ensures that total travel time remains within a worst-case bound, even in the presence of an attack. We present an extensive experimental evaluation that demonstrates the robustness and practical benefits of our approach, providing a powerful framework to improve the resilience of transportation networks against false data injection. In particular, we show that our approach yields approximate equilibrium policies and significantly outperforms baselines for both the attacker and the defender.
Authors:Daniel Dorfmeister, Flavio Ferrarotti, Bernhard Fischer, Martin Schwandtner, Hannes Sochor
Abstract:
More and more companies' Intellectual Property (IP) is being integrated into Neural Network (NN) models. This IP has considerable value for companies and, therefore, requires adequate protection. For example, an attacker might replicate a production machines' hardware and subsequently simply copy associated software and NN models onto the cloned hardware. To make copying NN models onto cloned hardware infeasible, we present an approach to bind NN models - and thus also the IP contained within them - to their underlying hardware. For this purpose, we link an NN model's weights, which are crucial for its operation, to unique and unclonable hardware properties by leveraging Physically Unclonable Functions (PUFs). By doing so, sufficient accuracy can only be achieved using the target hardware to restore the original weights, rendering proper execution of the NN model on cloned hardware impossible. We demonstrate that our approach accomplishes the desired degradation of accuracy on various NN models and outline possible future improvements.
Authors:Daniel Dorfmeister, Flavio Ferrarotti, Bernhard Fischer, Evelyn Haslinger, Rudolf Ramler, Markus Zimmermann
Abstract:
We introduce a novel copy-protection method for industrial control software. With our method, a program executes correctly only on its target hardware and behaves differently on other machines. The hardware-software binding is based on Physically Unclonable Functions (PUFs). We use symbolic execution to guarantee the preservation of safety properties if the software is executed on a different machine, or if there is a problem with the PUF response. Moreover, we show that the protection method is also secure against reverse engineering.
Authors:Ali Raza, Gurang Gupta, Nikolay Matyunin, Jibesh Patra
Abstract:
Warning: This article includes red-teaming experiments, which contain examples of compromised LLM responses that may be offensive or upsetting. Large Language Models (LLMs) have the potential to create harmful content, such as generating sophisticated phishing emails and assisting in writing code of harmful computer viruses. Thus, it is crucial to ensure their safe and responsible response generation. To reduce the risk of generating harmful or irresponsible content, researchers have developed techniques such as reinforcement learning with human feedback to align LLM's outputs with human values and preferences. However, it is still undetermined whether such measures are sufficient to prevent LLMs from generating interesting responses. In this study, we propose Amnesia, a lightweight activation-space adversarial attack that manipulates internal transformer states to bypass existing safety mechanisms in open-weight LLMs. Through experimental analysis on state-of-the-art, open-weight LLMs, we demonstrate that our attack effectively circumvents existing safeguards, enabling the generation of harmful content without the need for any fine-tuning or additional training. Our experiments on benchmark datasets show that the proposed attack can induce various antisocial behaviors in LLMs. These findings highlight the urgent need for more robust security measures in open-weight LLMs and underscore the importance of continued research to prevent their potential misuse.
Authors:Petar Radanliev, Carsten Maple, Omar Santos, Kayvan Atefi
Abstract:
Software supply-chain security requires provenance mechanisms that support reproducibility and vulnerability assessment under dynamic execution conditions. Conventional Software Bills of Materials (SBOMs) provide static dependency inventories but cannot capture runtime behaviour, environment drift, or exploitability context. This paper introduces agentic Artificial Intelligence Bills of Materials (AIBOMs), extending SBOMs into active provenance artefacts through autonomous, policy-constrained reasoning. We present an agentic AIBOM framework based on a multi-agent architecture comprising (i) a baseline environment reconstruction agent (MCP), (ii) a runtime dependency and drift-monitoring agent (A2A), and (iii) a policy-aware vulnerability and VEX reasoning agent (AGNTCY). These agents generate contextual exploitability assertions by combining runtime execution evidence, dependency usage, and environmental mitigations with ISO/IEC 20153:2025 Common Security Advisory Framework (CSAF) v2.0 semantics. Exploitability is expressed via structured VEX assertions rather than enforcement actions. The framework introduces minimal, standards-aligned schema extensions to CycloneDX and SPDX, capturing execution context, dependency evolution, and agent decision provenance while preserving interoperability. Evaluation across heterogeneous analytical workloads demonstrates improved runtime dependency capture, reproducibility fidelity, and stability of vulnerability interpretation compared with established provenance systems, with low computational overhead. Ablation studies confirm that each agent contributes distinct capabilities unavailable through deterministic automation.
Authors:Yuqi Qian, Yun Cao, Haocheng Fu, Meiyang Lv, Meineng Zhu
Abstract:
Diffusion models have made substantial advances in recent years, enabling high-quality image synthesis; however, the widespread dissemination and reuse of their outputs have introduced new challenges in intellectual property protection and content provenance. Image watermarking offers a solution to these challenges, and recent work has increasingly explored Noise-as-Watermark (NaW) approaches that integrate watermarking directly into the diffusion process. However, existing NaW methods fail to balance robustness and diversity. We attribute this weakness to value encoding, which encodes watermark bits into individual sampled values. It is extremely fragile in practical application scenarios. To address this, we encode watermark bits into the structured noise pattern, so that the watermark is preserved even when individual values are perturbed. To further ensure generation diversity, we introduce a dedicated randomization design that reshuffles the positions of noise elements without changing their values, preventing the watermark from inducing fixed noise patterns or spatial locations. Extensive experiments demonstrate that our method achieves state-of-the-art robustness while maintaining high generation quality across a wide range of lossy scenarios.
Authors:Arttu Paju, Waris Abdullah, Juha Nurmi
Abstract:
Tor enables anonymous web browsing and access to anonymous onion websites. Prior work has focused on crawling and content analysis rather than on what users actually try to access. Our honeypot approach measures engagement across onion-site categories, revealing behavioral interest rather than inferred popularity. In March--April 2025, we deployed honeypot onion websites and seeded neutral-looking links via three channels -- the Ahmia Tor search engine, Stronghold paste onion "paste" service, and pastebin.com -- to observe discovery and subsequent interaction events (CAPTCHA solves; registration/login attempts). We observe that, almost without exception, human users originate from Ahmia.fi; after removing the honeypot links from the Ahmia.fi search results, visits dropped to nearly zero and no users solved CAPTCHAs. The honeypot landing front pages represent different forums for cybercrime activities -- child sexual abuse, violence, malware, stolen goods, illegal firearms, illegal drugs, and forgery items -- and, as a baseline comparison, an unclear forum. Within that set, the CSAM-themed honeypot drew markedly higher engagement than the other honeypots. When identical sites were offered in multiple languages, interaction events occurred most often on the English-language versions.
Authors:Jiaqi Chen, Yuzhe Tang, Yue Duan
Abstract:
This paper tackles the discovery of tMEV, that is, the Maximal Extractable Value on blockchains that arises from Token smart contracts. This scope differs from the existing MEV-discovery research, which analyzes application-layer contracts or attacker contracts, but ignores the wide and diverse range of token contracts. This paper presents a pipeline of techniques for tMEV discovery, including tSCAN, a static analysis tool for identifying non-standard supply-control functions in token contracts, and tSEARCH, a searcher that uncovers profitable tMEV opportunities by generating, refining, and solving token-specific constraints. By replaying real-world transactions, this paper demonstrates both the profitability of tMEV strategies and existing searchers' unawareness of them: the proposed tSEARCH extracts $10\times$ more profit than observed MEV activity on Ethereum. The practicality of tMEV searching is demonstrated through a prototype built on Slither, showing high effectiveness with low performance overhead.
Authors:Davide Mancino, Hasret Ozan Sevim
Abstract:
This Systematization of Knowledge (SoK) provides a comprehensive historical analysis of Maximal Extractable Value (MEV) in blockchain systems, tracing its conceptual evolution through three distinct eras. We organize the fragmented literature on MEV into a unified chronological framework, beginning with Era~I (August 2014 - August 2020), which introduced Miner Extractable Value from pmcgoohan's seminal Reddit warning through the ``Dark Forest'' recognition, covering Proof-of-Work systems with public mempools and Priority Gas Auctions. Era~II (August 2020 - April 2024) marks the generalization to Maximal Extractable Value, encompassing formal taxonomies, Realized Extractable Value, Proposer-Builder Separation, the Ethereum Merge, MEV-Boost, and the integration of non-atomic and CEX-DEX arbitrage. Era~III (April 2024, present) addresses the frontier of Cross-Chain MEV, beginning with early studies on Layer-2 ecosystems, where value extraction spans multiple blockchains, rollups, bridges, and sequencers. We present a conceptual taxonomy distinguishing potential from realized extractable value, and single-domain from cross-domain phenomena. Our systematization identifies mitigations that emerged in response to each era, highlights measurement challenges, and proposes a research agenda for standardized metrics, detection benchmarks, and cross-chain infrastructure design.
Authors:Neha Nagaraja, Hayretdin Bahsi
Abstract:
Large Language Models (LLMs) are increasingly integrated into safety-critical workflows, yet existing security analyses remain fragmented and often isolate model behavior from the broader system context. This work introduces a goal-driven risk assessment framework for LLM-powered systems that combines system modeling with Attack-Defense Trees (ADTrees) and Common Vulnerability Scoring System (CVSS)-based exploitability scoring to support structured, comparable analysis. We demonstrate the framework through a healthcare case study, modeling multi-step attack paths targeting intervention in medical procedures, leakage of electronic health record (EHR) data, and disruption of service availability. Our analysis indicates that threats spanning (i) conventional cyber, (ii) adversarial ML, and (iii) conversational attacks that manipulate prompts or context often consolidate into a small number of dominant paths and shared system choke points, enabling targeted defenses to yield meaningful reductions in path exploitability. By systematically comparing defense portfolios, we align these risks with established vulnerability management practices and provide a domain-agnostic workflow applicable to other LLM-enabled critical systems.
Authors:Saroj Mishra, Suman Niroula, Umesh Yadav, Dilip Thakur, Srijan Gyawali, Shiva Gaire
Abstract:
Retrieval-Augmented Generation (RAG) systems are increasingly evolving into agentic architectures where large language models autonomously coordinate multi-step reasoning, dynamic memory management, and iterative retrieval strategies. Despite rapid industrial adoption, current research lacks a systematic understanding of Agentic RAG as a sequential decision-making system, leading to highly fragmented architectures, inconsistent evaluation methodologies, and unresolved reliability risks. This Systematization of Knowledge (SoK) paper provides the first unified framework for understanding these autonomous systems. We formalize agentic retrieval-generation loops as finite-horizon partially observable Markov decision processes, explicitly modeling their control policies and state transitions. Building upon this formalization, we develop a comprehensive taxonomy and modular architectural decomposition that categorizes systems by their planning mechanisms, retrieval orchestration, memory paradigms, and tool-invocation behaviors. We further analyze the critical limitations of traditional static evaluation practices and identify severe systemic risks inherent to autonomous loops, including compounding hallucination propagation, memory poisoning, retrieval misalignment, and cascading tool-execution vulnerabilities. Finally, we outline key doctoral-scale research directions spanning stable adaptive retrieval, cost-aware orchestration, formal trajectory evaluation, and oversight mechanisms, providing a definitive roadmap for building reliable, controllable, and scalable agentic retrieval systems.
Authors:Zuyao Xu, Xiang Li, Fubin Wu, Yuqi Qiu, Lu Sun, FaSheng Miao
Abstract:
As autonomous AI agents increasingly populate the Internet, a novel security challenge arises: "Is this entity an AI agent?" It is a new entity-type verification problem with no established solution. We formalize the problem through a three-class entity taxonomy (Human, Script, Agent) based on a verifiable agentic capability vector (action, reasoning, and memory). A timing threshold t exploits the asymmetric hardness between human cognition and AI processing to separate the three classes. We define the Agentic Capability Verification Problem (ACVP) through three necessity primitives, each testing one capability dimension. Building on this foundation, we introduce aCAPTCHA (Agent CAPTCHA), a time-constrained security game for agent admission whose security rests on ACVP hardness under t. We instantiate aCAPTCHA through time-bounded natural-language understanding as a multi-round HTTP verification protocol, and evaluate it with preliminary agent trials that validate the protocol's soundness and completeness. aCAPTCHA provides a composable, infrastructure-free admission gate for any service where entity-type verification is required.
Authors:Hamideh Khaleghpour, Brett McKinney
Abstract:
Illicit transaction detection is often driven by transaction level attributes however, fraudulent behavior may also manifest through network structure such as central hubs, high flow intermediaries, and coordinated neighborhoods. This paper presents a time respecting, leakage safe (causal) graph feature extraction protocol for temporal transaction networks and evaluates its utility for illicit entity classification. Using the Elliptic dataset, we construct directed transaction graphs and compute interpretable structural descriptors, including degree statistics, PageRank, HITS hub or authority scores, k-core indices, and neighborhood reachability measures. To prevent look ahead bias, we additionally compute causal variants of graph features using only edges observed up to each timestep. A Random Forest classifier trained with strict temporal splits achieves strong discrimination on a held out future test period (ROC-AUC about 0.85, Average Precision about 0.54). Although transaction attributes remain the dominant predictive signal, graph derived features provide complementary interpretability and enable risk context analysis for investigation workflows. We further assess operational utility using Precision at k and evaluate probability reliability via calibration curves and Brier scores, showing that calibrated models yield better aligned probabilities for triage. Overall, the results support causal graph feature extraction as a practical and interpretable augmentation for temporal fraud detection pipelines.
Authors:Sanket Goutam, Hunter Kippen, Mike Grace, Amir Rahmati
Abstract:
Device logs are essential for forensic investigations, enterprise monitoring, and fraud detection; however, they often leak personally identifiable information (PII) when exported for third-party analysis. Existing approaches either fail to minimize PII exposure across all stages of log collection and analysis or sacrifice data fidelity, resulting in less effective analysis. We present Proteus, a privacy-preserving device logging framework that enables forensic analysis without disclosing plaintext PII or compromising fidelity, even when facing adversaries with access to multiple snapshots of the log files. To achieve this, Proteus proposes a two-layer scheme that employs keyed-hash pseudonymization of PII fields and time-rotating encryption with ratcheted ephemeral keys to prevent multi-snapshot correlation. For controlled sharing, clients export ratchet states that grant time-bounded access, permitting decryption of pseudonymized tokens that enable linkage and timeline reconstruction without exposing the underlying PII. Subsequent ratchet rotations ensure forward secrecy, while DICE-based attestation authenticates device provenance. We implement Proteus as a transparent extension to Android's logcat and evaluate it across three generations of hardware. Our results demonstrate a median latency of 0.2 ms per message and an average per-PII-field size overhead of only 97.1 bytes.
Authors:Antonino Armato, Marzana Khatun, Sebastian Fischer
Abstract:
The automotive industry faces increasing challenges in ensuring both functional safety (FuSa) and cybersecurity for complex semiconductor devices. Traditional Failure Mode and Effects Analysis (FMEA) primarily addresses safety-related failure modes, often overlooking synergistic vulnerabilities and shared consequences with cybersecurity threats. This paper introduces an Integrated Failure and Threat Mode and Effect Analysis (FTMEA) framework that systematically co-analyzes FuSa and cybersecurity. A cornerstone of this framework is the introduction of rigorously defined Cross-Domain Correlation Factors (CDCFs), which quantify the interdependencies and mutual influences between safety-related failures and cybersecurity threats. These factors are derived from a combination of structured expert knowledge, static structural analysis metrics (e.g., Controllability/Observability), and validated against empirical data from fault/attack injection campaigns. We propose a modified Risk Priority Number (RPN) calculation that systematically integrates these correlation factors, enabling a more accurate and transparent prioritization of risks that span both domains. A detailed case study involving an automotive ASIC configuration register proves the practical application of the FTMEA. We present explicit mapping tables, quantitative CDCF values, and a comparative analysis against a baseline FMEA/TARA (Threat Analysis and Risk Assessment), illustrating how the integrated approach uncovers previously masked cross-domain risks, improves mitigation strategy effectiveness, and provides a clear quantitative justification for the derived correlation values. This framework offers a unified, traceable, methodology for risk assessment in critical automotive systems, thereby overcoming the limitations of conventional analyses and promoting optimized, cross-disciplinary development.
Authors:Saheed Ademola Bello, Muhammad Shahid Jabbar, Muhammad Sohail Ibrahim, Shujaat Khan
Abstract:
Collaborative training across multiple institutions is becoming essential for building reliable medical image segmentation models. However, privacy regulations, data silos, and uneven data availability prevent hospitals from sharing raw scans or annotations, limiting the ability to train generalizable models. Latent-space collaboration frameworks such as privacy-segmentation framework (SF) offer a promising alternative, but such methods still face challenges in segmentation accuracy and vulnerability to latent inversion and membership-inference attacks. This work introduces a privacy-preserving collaborative medical image segmentation framework (PPCMI-SF) designed for heterogeneous medical datasets. The approach combines skip-connected autoencoders for images and masks with a keyed latent transform that applies client-specific orthogonal mixing and permutation to protect latent features before they are shared. A unified mapping network on the server-side performs multi-scale latent-to-latent translation, enabling segmentation inference without exposing raw data. Experiments on four datasets: PSFH ultrasound, ultrasound nerve segmentation, FUMPE CTA, and cardiac MRI show that the proposed PPCMI-SF consistently achieves high Dice scores and improved boundary accuracy, as reflected by lower 95th percentile Hausdorff distance (HD95) and average symmetric surface distance (ASD) compared to the current state-of-the-art and performs competitively with privacy-agnostic baselines. Privacy tests confirm strong resistance to inversion and membership attacks, and the overall system achieves real-time inference with low communication overhead. These results demonstrate that accurate and efficient medical image segmentation can be achieved without compromising data privacy in multi-institution settings.
Authors:Henry Tari, Adriana Iamnitchi
Abstract:
Synthetic data is increasingly used to support research without exposing sensitive user content. Social media data is one of the types of datasets that would hugely benefit from representative synthetic equivalents that can be used to bootstrap research and allow reproducibility through data sharing. However, recent studies show that (tabular) synthetic data is not inherently privacy-preserving. Much less is known, however, about the privacy risks of synthetically generated unstructured texts. This work evaluates the privacy of synthetic Instagram posts generated by three state-of-the-art large language models using two prompting strategies. We propose a methodology that quantifies privacy by framing re-identification as an authorship attribution attack. A RoBERTa-large classifier trained on real posts achieved 81\% accuracy in authorship attribution on real data, but only 16.5--29.7\% on synthetic posts, showing reduced, though non-negligible, risk. Fidelity was assessed via text traits, sentiment, topic overlap, and embedding similarity, confirming the expected trade-off: higher fidelity coincides with greater privacy leakage. This work provides a framework for evaluating privacy in synthetic text and demonstrates the privacy--fidelity tension in social media datasets.
Authors:Neha Nagaraja, Hayretdin Bahsi
Abstract:
While incorporating LLMs into systems offers significant benefits in critical application areas such as healthcare, new security challenges emerge due to the potential cyber kill chain cycles that combine adversarial model, prompt injection and conventional cyber attacks. Threat modeling methods enable the system designers to identify potential cyber threats and the relevant mitigations during the early stages of development. Although the cyber security community has extensive experience in applying these methods to software-based systems, the elicited threats are usually abstract and vague, limiting their effectiveness for conducting proper likelihood and impact assessments for risk prioritization, especially in complex systems with novel attacks surfaces, such as those involving LLMs. In this study, we propose a structured, goal driven risk assessment approach that contextualizes the threats with detailed attack vectors, preconditions, and attack paths through the use of attack trees. We demonstrate the proposed approach on a case study with an LLM agent-based healthcare system. This study harmonizes the state-of-the-art attacks to LLMs with conventional ones and presents possible attack paths applicable to similar systems. By providing a structured risk assessment, this study makes a significant contribution to the literature and advances the secure-by-design practices in LLM-based systems.
Authors:Lucas Hanouz, Marc Kaplan, Jean-Sébastien Kersaint Tournebize, Chin-te Liao, Anne Marin
Abstract:
Most quantum communication networks around the world are used for a single task: quantum key distribution. In order to initiate the transition to multi-purpose quantum communication networks, we demonstrate the implementation of two different tasks on the same quantum key distribution hardware. Specifically, we focus on quantum oblivious transfer and quantum tokens. Our main contribution is to establish a methodology that greatly simplifies the expertise required to achieve the deployment, assess its performance, and evaluate its feasibility at a large scale. The implementation that we present is full-stack. It is based on a development framework that allows running user-defined applications both with simulated or real quantum communication backend. The hardware used for the implementation is VeriQloud's Qline. The simulation backend reproduces exactly the inputs and outputs of the real hardware, but also its losses and errors. It can therefore be used to validate the implementation before running it on the real hardware. The sources of the software that we use are fully open, making our research reproducible. The security of the implementations on real hardware are discussed with respect to security bounds previously known in the literature. We also discuss the engineering choices that we made in order to make the implementations feasible. By establishing a methodology to evaluate the performance and security of quantum communication protocols, we take a significant step towards industrializing and deploying large-scale, multi-purpose quantum communication networks.
Authors:Hsin Lin, Yan-Lun Chen, Ren-Hung Hwang, Chia-Mu Yu
Abstract:
Backdoor attacks pose a critical threat to the security of deep neural networks, yet existing efforts on universal backdoors often rely on visually salient patterns, making them easier to detect and less practical at scale. In this work, we introduce a novel imperceptible universal backdoor attack that simultaneously controls all target classes with minimal poisoning while preserving stealth. Our key idea is to leverage graph convolutional networks (GCNs) to model inter-class relationships and generate class-specific perturbations that are both effective and visually invisible. The proposed framework optimizes a dual-objective loss that balances stealthiness (measured by perceptual similarity metrics such as PSNR) and attack success rate (ASR), enabling scalable, multi-target backdoor injection. Extensive experiments on ImageNet-1K with ResNet architectures demonstrate that our method achieves high ASR (up to 91.3%) under poisoning rates as low as 0.16%, while maintaining benign accuracy and evading state-of-the-art defenses. These results highlight the emerging risks of invisible universal backdoors and call for more robust detection and mitigation strategies.
Authors:Thomas Lloyd, Daire Ó Broin, Martin Harrigan
Abstract:
Voting is the primary mechanism through which Decentralised Autonomous Organisations (DAO) reach decisions. Although transparent, the voting process can be complex: it can involve many interacting smart contracts. The nexus of the decision-making process can be relocated and the true voter demographic obfuscated. Furthermore, DAOs can govern other DAOs -- metagovernance. We present a method for identifying DAO-to-DAO metagovernance on the Ethereum blockchain. We focus on the links between DAOs and token contracts. We employ a signature-matching algorithm to flexibly handle a variety of DAO frameworks and voting schemes. Once we establish token-to-DAO relationships, we gather and process voting data to produce a list of metagovernance relationships. We apply this algorithm to an initial set of sixteen DAOs and we extend the dataset as more DAOs are identified. We produce a metagovernance network with 61 DAOs and 72 metagovernance relationships. We examine three case studies that show metagovernance of various forms: strategic, decisive, and centralised where a DAO becomes a nexus for metagovernance. We demonstrate that metagovernance obscures voting context and introduces entities driven by self-interest that can significantly influence governance. We highlight instances of metagovernance between DAOs operating on the Ethereum blockchain where current governance tools inadequately reveal such dynamics. To preserve the transparency-centric ethos of DAOs and mitigate risks associated with metagovernance, there is a pressing need for enhanced tools to address such issues.
Authors:Aparna Gupte, Jiahui Liu, Luowen Qian, Justin Raizes, Bhaskar Roberts, Mark Zhandry
Abstract:
One-time programs (OTPs) aim to let a user evaluate a program on a single input while revealing nothing else. Classical OTPs require hardware assumptions, and even with quantum information, OTPs for deterministic functionalities remain impossible due to gentle-measurement attacks (Broadbent, Gutoski and Stebila, 2013). While recent works achieve positive results for certain randomized functionalities, the fundamental limits and the strongest achievable security notions remain poorly understood. In this paper, we ask for a "best-possible" OTP that achieves the strongest one-time security achievable by any OTP construction. We first show that a generic best-possible one-time compiler cannot exist, even for classical randomized functionalities (assuming lossy encryption schemes exist). Given this impossibility, we introduce a natural subclass of one-time compilers called "testable one-time program" compilers, which output quantum states augmented with reflection oracles for these program states. We show that best-possible testable OTP compilers are achievable by (1) formulating a generalized Single-Effective-Query (SEQ) simulation security notion for quantum channels and show that SEQ security implies best-possible testable one-time security, and (2) constructing SEQ-secure OTPs for all quantum functionalities in the classical oracle model. This yields the first OTP for arbitrary quantum channels beyond classical randomized functionalities. Finally, we propose stateful quantum indistinguishability obfuscation (stateful quantum iO) -- quantum state obfuscation for stateful quantum programs. We show that (1) stateful quantum iO implies best-possible testable OTPs and (2) stateful quantum iO is also achievable in the classical oracle model. These results identify stateful quantum iO as a promising approach towards best-possible testable OTPs.
Authors:Dayeon Kang, Jade Sheffey, Mingshi Wu, Pubali Datta, Amir Houmansadr
Abstract:
With the increase in Internet censorship globally, various circumvention tools have been designed and developed. However, the monetary cost of these tools deeply impacts both user choice and the sustainability of provider operations. Recent developments in censorship circumvention research attempted to achieve cost efficiency by utilizing Infrastructure-as-a-Service (IaaS) spot instances as bridges, but still incurred substantial expenses related to network connectivity and instance maintenance. In this work, we present CensorLess, a circumvention proxy built leveraging the unique benefits of a serverless platform. CensorLess comprises three components: a local proxy that handles client-side communication and ensures compliance with serverless functions' security restrictions, a function refresher that periodically regenerates bridges, and a live migration mechanism that maintains continuous connectivity. CensorLess inherits the serverless platform's cost efficiency, ephemerality, scalability, concurrency, and performance. Compared to existing low-cost, state-of-the-art circumvention techniques, CensorLess reduces costs by 97%, while simultaneously enabling robust censorship resistance by employing bridge rotation.
Authors:Yashas Hariprasad, Subhash Gurappa, Sundararaj S. Iyengar, Jerry F. Miller, Pronab Mohanty, Naveen Kumar Chaudhary
Abstract:
The Forensics Investigations Network in Digital Sciences (FINDS) Research Center of Excellence (CoE), funded by the U.S. Army Research Laboratory, advances Digital Forensic Engineering Education (DFEE) through an integrated research education framework for AI enabled cybersecurity workforce development. FINDS combines high performance computing (HPC), secure software engineering, adversarial analytics, and experiential learning to address emerging cyber and synthetic media threats. This paper introduces the Multidependency Capacity Building Skills Graph (MCBSG), a directed acyclic graph based model that encodes hierarchical and cross domain dependencies among competencies in AI-driven forensic programming, statistical inference, digital evidence processing, and threat detection. The MCBSG enables structured modeling of skill acquisition pathways and quantitative capacity assessment. Supervised machine learning methods, including entropy-based Decision Tree Classifiers and regression modeling, are applied to longitudinal multi cohort datasets capturing mentoring interactions, laboratory performance metrics, curriculum artifacts, and workshop participation. Feature importance analysis and cross validation identify key predictors of technical proficiency and research readiness. Three year statistical evaluation demonstrates significant gains in forensic programming accuracy, adversarial reasoning, and HPC-enabled investigative workflows. Results validate the MCBSG as a scalable, interpretable framework for data-driven, inclusive cybersecurity education aligned with national defense workforce priorities.
Authors:Eman Alqahtani, Mustafa A. Mustafa
Abstract:
Driven by the widespread deployment of distributed energy resources, local energy markets (LEMs) have emerged as a promising approach for enabling direct trades among prosumers and consumers to balance intermittent generation and demand locally. However, LEMs involve processing sensitive participant data, which, if not protected, poses privacy risks. At the same time, since electricity is exchanged over the physical power network, market mechanisms should consider physical constraints and network-related costs. Existing work typically addresses these issues separately, either by incorporating grid-related aspects or by providing privacy protection. To address this gap, we propose a privacy-preserving protocol for LEMs, with consideration of network fees that can incite participants to respect physical limits. The protocol is based on a double-auction mechanism adapted from prior work to enable more efficient application of our privacy-preserving approach. To protect participants' data, we use secure multiparty computation. In addition, Schnorr's identification protocol is employed with multiparty verification to ensure authenticated participation without compromising privacy. We further optimise the protocol to reduce communication and round complexity. We prove that the protocol meets its security requirements and show through experimentation its feasibility at a typical LEM scale: a market with 5,000 participants can be cleared in 4.17 minutes.
Authors:Taoran Li, Varun Chandrasekaran, Zhiyuan Yu
Abstract:
Recent work has demonstrated that machine unlearning in Large Language Models (LLMs) fails to generalize across languages: knowledge erased in one language frequently remains accessible through others. However, the underlying cause of this failure and a principled solution remain open. In this work, we identify intervention depth as the key factor determining multilingual generalization. Through systematic layer-wise experiments, we characterize two distinct failure modes: shallow-layer interventions achieve erasure but collapse multilingual capabilities in held-out languages, while deep-layer interventions preserve utility but fail to erase target knowledge even in source languages. These findings reveal that the choice of intervention layer is not a free parameter; it fundamentally determines whether multilingual unlearning succeeds. We propose MUTE (Multilingual Unlearning via Targeted Erasure), a framework that uses Centered Kernel Alignment (CKA) and Linguistic Regions Development Score (LRDS) to identify intermediate, language-agnostic layers where cross-lingual representations converge. By restricting unlearning updates to these layers, MUTE achieves robust multilingual knowledge erasure while optimizing on only a small set of source languages. Extensive experiments across three LLM architectures and three unlearning algorithms validate our approach, with mechanistic analysis via Logit Lens probing confirming genuine knowledge removal rather than output-level suppression.
Authors:Qianlong Lan, Anuj Kaul, Shaun Jones, Stephanie Westrum
Abstract:
Agentic large language model systems increasingly automate tasks by retrieving URLs and calling external tools. We show that this workflow gives rise to implicit prompt injection: adversarial instructions embedded in automatically generated URL previews, including titles, metadata, and snippets, can introduce a system-level risk that we refer to as silent egress. Using a fully local and reproducible testbed, we demonstrate that a malicious web page can induce an agent to issue outbound requests that exfiltrate sensitive runtime context, even when the final response shown to the user appears harmless. In 480 experimental runs with a qwen2.5:7b-based agent, the attack succeeds with high probability (P (egress) =0.89), and 95% of successful attacks are not detected by output-based safety checks. We also introduce sharded exfiltration, where sensitive information is split across multiple requests to avoid detection. This strategy reduces single-request leakage metrics by 73% (Leak@1) and bypasses simple data loss prevention mechanisms. Our ablation results indicate that defenses applied at the prompt layer offer limited protection, while controls at the system and network layers, such as domain allowlisting and redirect-chain analysis, are considerably more effective. These findings suggest that network egress should be treated as a first-class security outcome in agentic LLM systems. We outline architectural directions, including provenance tracking and capability isolation, that go beyond prompt-level hardening.
Authors:Morteza Eskandarian, Mahdi Rabbani, Arun Kaniyamattam, Fatemeh Nejati, Mansur Mirani, Gunjan Piya, Igor Opushnyev, Ali A. Ghorbani, Sajjad Dadkhah
Abstract:
The current generation of large language models produces sophisticated social-engineering content that bypasses standard text screening systems in business communication platforms. Our proposed solution for mail gateway and endpoint deception detection operates in a privacy-protective manner while handling the performance requirements of network and mobile security systems. The MobileBERT teacher receives fine-tuning before its transformation into a BiLSTM model with multi-head attention which maintains semantic discrimination only with 4.5 million parameters. The hybrid dataset contains human-written messages together with LLM-generated paraphrases that use masking techniques and personalization methods to enhance modern attack resistance. The evaluation system uses five testing protocols which include human-only and LLM-only tests and two cross-distribution transfer tests and a production-like mixed traffic test to assess performance in native environments and across different distribution types and combined traffic scenarios. The distilled model maintains a weighted-F1 score difference of 1-2.5 points compared to the mixture split results of strong transformer baselines including ModernBERT, DeBERTaV3-base, T5-base, DeepSeek-R1 Distill Qwen-1.5B and Phi-4 mini while achieving 80-95\% faster inference times and 95-99\% smaller model sizes. The system demonstrates excellent performance in terms of accuracy and latency while maintaining a compact size which enables real-time filtering without acceleration hardware and supports policy-based management. The paper examines system performance under high traffic conditions and security measures for privacy protection and implementation methods for operational deployment.
Authors:Davide De Zuane, Paolo Santini, Marco Baldi
Abstract:
This paper concerns the Minimal Internet Key Exchange (IKE) protocol, which has received little attention to date, despite its potential to make the best-known IKE protocol sufficiently lightweight to be also applied in contexts where it is currently prohibitive, due to its large footprint. First, we introduce and describe Colibri, an efficient, open-source implementation of the Minimal IKE protocol, which allows us to quantitatively assess its real advantages in terms of lightness. Then we introduce a post-quantum variant of the Minimal IKE protocol, which is essential to make it contemporary, and assess it through Colibri. We demonstrate that the protocol performance remains excellent even in such a more challenging context, making it suitable for deploying pervasive and quantum-resistant virtual private networks.
Authors:Victor Morel, Cristiana Santos, Pontus Carlsson, Joel Ahlinder, Romaric Duvignau
Abstract:
The Transparency and Consent Framework (TCF), developed by the Interactive Advertising Bureau (IAB) Europe, provides a de facto standard for requesting, recording, and managing user consent from European end-users. This framework has previously been found to infringe European data protection law and has subsequently been regularly updated. Previous research on the TCF focused exclusively on web contexts, with no attention given to its implementation in mobile applications. No work has systematically studied the privacy implications of the TCF on Android apps. To address this gap, we investigate the prevalence of the TCF in popular Android apps from the Google Play Store, and assess whether these apps respect users' consent banner choices. By scraping and downloading 4482 of the most popular Google Play Store apps on an emulated Android device, we automatically determine which apps use the TCF, automatically interact with consent banners, and analyze the apps' traffic in two different stages, passive (post choices) and active (during banner interaction and post choices). We found that 576 (12.85%) of the 4482 downloadable apps in our dataset implemented the TCF, and we identified potential privacy violations within this subset. In 15 (2.6%) of these apps, users' choices are stored only when consent is granted. Users who refuse consent are shown the consent banner again each time they launch the app. Network traffic analysis conducted during the passive stage reveals that 66.2% of the analyzed TCF-based apps share personal data, through the Android Advertising ID (AAID), in the absence of a lawful basis for processing. 55.3% of apps analyzed during the active stage share AAID before users interact with the apps' consent banners, violating the prior consent requirement.
Authors:Xingyu Shen, Tommy Duong, Xiaodong An, Zengqi Zhao, Zebang Hu, Haoyu Hu, Ziyou Wang, Finn Guo, Simiao Ren
Abstract:
Age estimation systems are increasingly deployed as gatekeepers for age-restricted online content, yet their robustness to cosmetic modifications has not been systematically evaluated. We investigate whether simple, household-accessible cosmetic changes, including beards, grey hair, makeup, and simulated wrinkles, can cause AI age estimators to classify minors as adults. To study this threat at scale without ethical concerns, we simulate these physical attacks on 329 facial images of individuals aged 10 to 21 using a VLM image editor (Gemini 2.5 Flash Image). We then evaluate eight models from our prior benchmark: five specialized architectures (MiVOLO, Custom-Best, Herosan, MiViaLab, DEX) and three vision-language models (Gemini 3 Flash, Gemini 2.5 Flash, GPT-5-Nano). We introduce the Attack Conversion Rate (ACR), defined as the fraction of images predicted as minor at baseline that flip to adult after attack, a population-agnostic metric that does not depend on the ratio of minors to adults in the test set. Our results reveal that a synthetic beard alone achieves 28 to 69 percent ACR across all eight models; combining all four attacks shifts predicted age by +7.7 years on average across all 329 subjects and reaches up to 83 percent ACR; and vision-language models exhibit lower ACR (59 to 71 percent) than specialized models (63 to 83 percent) under the full attack, although the ACR ranges overlap and the difference is not statistically tested. These findings highlight a critical vulnerability in deployed age-verification pipelines and call for adversarial robustness evaluation as a mandatory criterion for model selection.
Authors:Tao Liu, Gowri Ramachandra, Raja Jurdak
Abstract:
Post-Quantum Cryptography (PQC) creates payloads that strain the timing and energy budgets of Personal Area Networks. In post-quantum key exchange (PQKE), this causes severe fragmentation, prolonged radio activity, and high transmission overhead on low-power devices. Prior work optimizes cryptographic computation but largely ignores communication cost. This paper separates computation and communication costs using Bluetooth Low Energy as a representative platform and validates them on real hardware. Results show communication often dominates PQKE energy, exceeding cryptographic cost. Efficient quantum-resilient pairing therefore requires coordinated protocol configuration and lower-layer optimization. This work provides developers a practical way to reason about PQC energy trade-offs and informs the evolution of PAN standards toward quantum-safe operation.
Authors:Suvradip Chakraborty, James Hulett, Dakshita Khurana, Kabir Tomer
Abstract:
A recent breakthrough [Hirahara and Nanashima, STOC'2024] established that if $\mathsf{NP} \not \subseteq \mathsf{ioP/poly}$, the existence of zero-knowledge with negligible errors for $\mathsf{NP}$ implies the existence of one-way functions (OWFs). In this work, we obtain a characterization of one-way functions from the worst-case complexity of zero-knowledge {\em in the high-error regime}. We say that a zero-knowledge argument is {\em non-trivial} if the sum of its completeness, soundness and zero-knowledge errors is bounded away from $1$. Our results are as follows, assuming $\mathsf{NP} \not \subseteq \mathsf{ioP/poly}$: 1. {\em Non-trivial} Non-Interactive ZK (NIZK) arguments for $\mathsf{NP}$ imply the existence of OWFs. Using known amplification techniques, this result also provides an unconditional transformation from weak to standard NIZK proofs for all meaningful error parameters. 2. We also generalize to the interactive setting: {\em Non-trivial} constant-round public-coin zero-knowledge arguments for $\mathsf{NP}$ imply the existence of OWFs, and therefore also (standard) four-message zero-knowledge arguments for $\mathsf{NP}$. Prior to this work, one-way functions could be obtained from NIZKs that had constant zero-knowledge error $ε_{zk}$ and soundness error $ε_{s}$ satisfying $ε_{zk} + \sqrt{ε_{s}} < 1$ [Chakraborty, Hulett and Khurana, CRYPTO'2025]. However, the regime where $ε_{zk} + \sqrt{ε_{s}} \geq 1$ remained open. This work closes the gap, and obtains new implications in the interactive setting. Our results and techniques could be useful stepping stones in the quest to construct one-way functions from worst-case hardness.
Authors:Tudor Cebere, David Erb, Damien Desfontaines, Aurélien Bellet, Jack Fitzsimons
Abstract:
Differential privacy (DP) implementations are notoriously prone to errors, with subtle bugs frequently invalidating theoretical guarantees. Existing verification methods are often impractical: formal tools are too restrictive, while black-box statistical auditing is intractable for complex pipelines and fails to pinpoint the source of the bug. This paper introduces Re:cord-play, a gray-box auditing paradigm that inspects the internal state of DP algorithms. By running an instrumented algorithm on neighboring datasets with identical randomness, Re:cord-play directly checks for data-dependent control flow and provides concrete falsification of sensitivity violations by comparing declared sensitivity against the empirically measured distance between internal inputs. We generalize this to Re:cord-play-sample, a full statistical audit that isolates and tests each component, including untrusted ones. We show that our novel testing approach is both effective and necessary by auditing 12 open-source libraries, including SmartNoise SDK, Opacus, and Diffprivlib, and uncovering 13 privacy violations that impact their theoretical guarantees. We release our framework as an open-source Python package, thereby making it easy for DP developers to integrate effective, computationally inexpensive, and seamless privacy testing as part of their software development lifecycle.
Authors:Kevin Setterstrom, Jeremy Straub
Abstract:
This paper introduces a real-time method for reverse engineering a vehicle's CAN bus without prior knowledge of the vehicle or its CAN system. By comparing inertial measurement and CAN data during significant vehicle events, the method accurately identified the CAN channels associated with the accelerator pedal, brake pedal, and steering wheel. Utilizing an IMU, CAN module, and event-driven software architecture, the system was validated using prerecorded serialized data from previous studies. This data, collected during multiple vehicle drives, included synchronized IMU and CAN recordings. By using these consistent datasets, the improvements made in this work were tested and validated under the same conditions as in the previous studies, enabling direct comparison to earlier results. Faster processing times were produced and less computational power was needed, as compared to the earlier methods. This work could have potential application to making aftermarket autonomous vehicle kits and for cybersecurity applications. It is a scalable and adaptable solution for autonomous CAN reverse engineering in near real-time.
Authors:Dhiman Goswami, Jai Kruthunz Naveen Kumar, Sanchari Das
Abstract:
Natural Language Processing (NLP) is integral to social media analytics but often processes content containing Personally Identifiable Information (PII), behavioral cues, and metadata raising privacy risks such as surveillance, profiling, and targeted advertising. To systematically assess these risks, we review 203 peer-reviewed papers and propose the NLP Privacy Risk Identification in Social Media (NLP-PRISM) framework, which evaluates vulnerabilities across six dimensions: data collection, preprocessing, visibility, fairness, computational risk, and regulatory compliance. Our analysis shows that transformer models achieve F1-scores ranging from 0.58-0.84, but incur a 1% - 23% drop under privacy-preserving fine-tuning. Using NLP-PRISM, we examine privacy coverage in six NLP tasks: sentiment analysis (16), emotion detection (14), offensive language identification (19), code-mixed processing (39), native language identification (29), and dialect detection (24) revealing substantial gaps in privacy research. We further found a (reduced by 2% - 9%) trade-off in model utility, MIA AUC (membership inference attacks) 0.81, AIA accuracy 0.75 (attribute inference attacks). Finally, we advocate for stronger anonymization, privacy-aware learning, and fairness-driven training to enable ethical NLP in social media contexts.
Authors:Noa Linder, Meirav Segal, Omer Antverg, Gil Gekker, Tomer Fichman, Omri Bodenheimer, Edan Maor, Omer Nevo
Abstract:
Large language models and LLM-based agents are increasingly used for cybersecurity tasks that are inherently dual-use. Existing approaches to refusal, spanning academic policy frameworks and commercially deployed systems, often rely on broad topic-based bans or offensive-focused taxonomies. As a result, they can yield inconsistent decisions, over-restrict legitimate defenders, and behave brittlely under obfuscation or request segmentation. We argue that effective refusal requires explicitly modeling the trade-off between offensive risk and defensive benefit, rather than relying solely on intent or offensive classification. In this paper, we introduce a content-based framework for designing and auditing cyber refusal policies that makes offense-defense tradeoffs explicit. The framework characterizes requests along five dimensions: Offensive Action Contribution, Offensive Risk, Technical Complexity, Defensive Benefit, and Expected Frequency for Legitimate Users, grounded in the technical substance of the request rather than stated intent. We demonstrate that this content-grounded approach resolves inconsistencies in current frontier model behavior and allows organizations to construct tunable, risk-aware refusal policies.
Authors:Udbhav Prasad, Aniesh Chawla
Abstract:
Cryptographic digests (e.g., MD5, SHA-256) are designed to provide exact identity. Any single-bit change in the input produces a completely different hash, which is ideal for integrity verification but limits their usefulness in many real-world tasks like threat hunting, malware analysis and digital forensics, where adversaries routinely introduce minor transformations. Similarity-based techniques address this limitation by enabling approximate matching, allowing related byte sequences to produce measurably similar fingerprints. Modern enterprises manage tens of thousands of endpoints with billions of files, making the effectiveness and scalability of the proposed techniques more important than ever in security applications. Security researchers have proposed a range of approaches, including similarity digests and locality-sensitive hashes (e.g., ssdeep, sdhash, TLSH), as well as more recent machine-learning-based methods that generate embeddings from file features. However, these techniques have largely been evaluated in isolation, using disparate datasets and evaluation criteria. This paper presents a systematic comparison of learning-based classification and similarity methods using large, publicly available datasets. We evaluate each method under a unified experimental framework with industry-accepted metrics. To our knowledge, this is the first reproducible study to benchmark these diverse learning-based similarity techniques side by side for real-world security workloads. Our results show that no single approach performs well across all dimensions; instead, each exhibits distinct trade-offs, indicating that effective malware analysis and threat-hunting platforms must combine complementary classification and similarity techniques rather than rely on a single method.
Authors:Prince Bhardwaj, Nishanth Sastry
Abstract:
Passkeys -- discoverable WebAuthn credentials synchronised across devices are widely promoted as the future of passwordless authentication. Built on the FIDO2 standard, they eliminate shared secrets and resist phishing while offering usability through platform credential managers. Since their introduction in 2022, major vendors have integrated passkeys into operating systems and browsers, and prominent websites have announced support. Yet the true extent of adoption across the broader web remains unknown. Measuring this is challenging because websites implement passkeys in heterogeneous ways. Some expose explicit ``Sign in with passkey'' buttons, others hide options under multi-step flows or rely on conditional mediation, and many adopt external mechanisms such as JavaScript libraries or OAuth-based identity providers. There is no standardised discovery endpoint, and dynamic, JavaScript-heavy pages complicate automated detection. This paper makes two contributions. First, we present Fidentikit, a browser-based crawler implementing 43 heuristics across five categories -- UI elements, DOM structures, WebAuthn API calls, network patterns, and library detection developed through iterative refinement over manual examination of 1,500 sites. Second, we apply Fidentikit to the top 100,000 Tranco-ranked domains, producing the first large-scale census of passkey adoption. Our results show adoption strongly correlates with site popularity and often depends on external identity providers rather than native implementations.
Authors:Anthony Feijó-Añazco, Antonio López Martínez, Daniel Díaz-López, Angel Luis Perales Gómez, Pantaleone Nespoli, Gregorio Martínez Pérez
Abstract:
The development of technology across multiple sectors and the growing importance of cyber warfare make the development of Cyber Situational Awareness (CSA) a fundamental component of any cyber defense strategy. CSA, as a practice, enables understanding of the current landscape within an organization or critical infrastructure, anticipating potential threats, and responding appropriately to cyber risks. With CSA, we are not simply seeking a passive point of view, but rather informed decision-making that allows us to improve response times and monitor the consequences and effects an attack has on one of our elements and how it will affect other elements it interacts with. In this paper, we review 5 CSA platforms, seeking differentiating characteristics between each proposal and outlining 6 proposed criteria that can be applied when creating a military smart CSA platform. To this end, we have validated the proposed criteria in CRUSOE, an open-source CSA platform developed by CSIRT-MU. After applying some modifications and experiments, it turned out to be applicable to this field.
Authors:André Storhaug, Jiamou Sun, Jingyue Li
Abstract:
Identifying vulnerability-fixing commits corresponding to disclosed CVEs is essential for secure software maintenance but remains challenging at scale, as large repositories contain millions of commits of which only a small fraction address security issues. Existing automated approaches, including traditional machine learning techniques and recent large language model (LLM)-based methods, often suffer from poor precision-recall trade-offs. Frequently evaluated on randomly sampled commits, we uncover that they are substantially underestimating real-world difficulty, where candidate commits are already security-relevant and highly similar. We propose Favia, a forensic, agent-based framework for vulnerability-fix identification that combines scalable candidate ranking with deep and iterative semantic reasoning. Favia first employs an efficient ranking stage to narrow the search space of commits. Each commit is then rigorously evaluated using a ReAct-based LLM agent. By providing the agent with a pre-commit repository as environment, along with specialized tools, the agent tries to localize vulnerable components, navigates the codebase, and establishes causal alignment between code changes and vulnerability root causes. This evidence-driven process enables robust identification of indirect, multi-file, and non-trivial fixes that elude single-pass or similarity-based methods. We evaluate Favia on CVEVC, a large-scale dataset we made that comprises over 8 million commits from 3,708 real-world repositories, and show that it consistently outperforms state-of-the-art traditional and LLM-based baselines under realistic candidate selection, achieving the strongest precision-recall trade-offs and highest F1-scores.
Authors:Enhao Huang, Frank Li, Tony Lin, Lowes Yang
Abstract:
This paper introduces DMind-3, a sovereign Edge-Local-Cloud intelligence stack designed to secure irreversible financial execution in Web3 environments against adversarial risks and strict latency constraints. While existing cloud-centric assistants compromise privacy and fail under network congestion, and purely local solutions lack global ecosystem context, DMind-3 resolves these tensions by decomposing capability into three cooperating layers: a deterministic signing-time intent firewall at the edge, a private high-fidelity reasoning engine on user hardware, and a policy-governed global context synthesizer in the cloud. We propose policy-driven selective offloading to route computation based on privacy sensitivity and uncertainty, supported by two novel training objectives: Hierarchical Predictive Synthesis (HPS) for fusing time-varying macro signals, and Contrastive Chain-of-Correction Supervised Fine-Tuning (C$^3$-SFT) to enhance local verification reliability. Extensive evaluations demonstrate that DMind-3 achieves a 93.7% multi-turn success rate in protocol-constrained tasks and superior domain reasoning compared to general-purpose baselines, providing a scalable framework where safety is bound to the edge execution primitive while maintaining sovereignty over sensitive user intent.
Authors:Fei Xu, Cheng Ye, Jie OuYang, Ziqiang Wu, Haoze Chen, An Hua, Meifeng Gao, Qiandong Zhang, Minghan Li, Feilong Li, Yajun Miao, Wei Qi
Abstract:
The security foundation of blockchain system relies primarily on classical cryptographic methods and consensus algorithms. However, the advent of quantum computing poses a significant threat to conventional public-key cryptosystems based on computational hardness assumptions. In particular, Shor's algorithm can efficiently solve discrete logarithm and integer factorization problems in polynomial time, thereby undermining the immutability and security guarantees of existing systems. Moreover, current Practical Byzantine Fault Tolerance (PBFT) protocols, widely adopted in consortium blockchains, suffer from high communication overhead and limited efficiency when coping with dynamic node reconfigurations, while offering no intrinsic protection against quantum adversaries. To address these challenges, we propose QDBFT, a quantum-secured dynamic consensus algorithm, with two main contributions: first,we design a primary node automatic rotation mechanism based on a consistent hash ring to enable consensus under dynamic membership changes, ensuring equitable authority distribution; second, we integrate Quantum Key Distribution (QKD) networks to provide message authentication for inter-node communication, thereby achieving information-theoretic security in the consensus process. Experimental evaluations demonstrate that QDBFT achieves performance comparable to traditional PBFT while delivering strong resilience against quantum attacks, making it a promising solution for future quantum-secure decentralized infrastructures.
Authors:Md Sazedur Rahman, Mizanur Rahman Jewel, Sanjay Madria
Abstract:
Mining is rapidly evolving into an AI driven cyber physical ecosystem where safety and operational reliability depend on robust perception, trustworthy distributed intelligence, and continuous monitoring of miners and equipment. However, real world mining environments impose severe constraints, including poor illumination, GPS denied conditions, irregular underground topologies and intermittent connectivity. These factors degrade perception accuracy, disrupt situational awareness and weaken distributed learning systems. At the same time, emerging cyber physical threats such as backdoor triggers, sensor spoofing, label flipping attacks, and poisoned model updates further jeopardize operational safety as mines adopt autonomous vehicles, humanoid assistance, and federated learning for collaborative intelligence. Energy constrained sensors also experience uneven battery depletion, creating blind spots in safety coverage and disrupting hazard detection pipelines. This paper presents a vision for a Unified Smart Safety and Security Architecture that integrates multimodal perception, secure federated learning, reinforcement learning, DTN enabled communication, and energy aware sensing into a cohesive safety framework. We introduce five core modules: Miner Finder, Multimodal Situational Awareness, Backdoor Attack Monitor, TrustFed LFD, and IoT driven Equipment Health Monitoring. These modules collectively address miner localization, hazard understanding, federated robustness, and predictive maintenance. Together, they form an end to end framework capable of guiding miners through obstructed pathways, identifying compromised models or sensors, and ensuring mission critical equipment reliability. This work outlines a comprehensive research vision for building a resilient and trustworthy intelligent mining system capable of maintaining operational continuity under adversarial conditions.
Authors:Zeynab Anbiaee, Mahdi Rabbani, Mansur Mirani, Gunjan Piya, Igor Opushnyev, Ali Ghorbani, Sajjad Dadkhah
Abstract:
The rapid development of the AI agent communication protocols, including the Model Context Protocol (MCP), Agent2Agent (A2A), Agora, and Agent Network Protocol (ANP), is reshaping how AI agents communicate with tools, services, and each other. While these protocols support scalable multi-agent interaction and cross-organizational interoperability, their security principles remain understudied, and standardized threat modeling is limited; no protocol-centric risk assessment framework has been established yet. This paper presents a systematic security analysis of four emerging AI agent communication protocols. First, we develop a structured threat modeling analysis that examines protocol architectures, trust assumptions, interaction patterns, and lifecycle behaviors to identify protocol-specific and cross-protocol risk surfaces. Second, we introduce a qualitative risk assessment framework that identifies twelve protocol-level risks and evaluates security posture across the creation, operation, and update phases through systematic assessment of likelihood, impact, and overall protocol risk, with implications for secure deployment and future standardization. Third, we provide a measurement-driven case study on MCP that formalizes the risk of missing mandatory validation/attestation for executable components as a falsifiable security claim by quantifying wrong-provider tool execution under multi-server composition across representative resolver policies. Collectively, our results highlight key design-induced risk surfaces and provide actionable guidance for secure deployment and future standardization of agent communication ecosystems.
Authors:Samal Mukhtar, Yinghua Yao, Zhu Sun, Mustafa Mustafa, Yew Soon Ong, Youcheng Sun
Abstract:
Software vulnerability detection (SVD) is a critical challenge in modern systems. Large language models (LLMs) offer natural-language explanations alongside predictions, but most work focuses on binary evaluation, and explanations often lack semantic consistency with Common Weakness Enumeration (CWE) categories. We propose VulReaD, a knowledge-graph-guided approach for vulnerability reasoning and detection that moves beyond binary classification toward CWE-level reasoning. VulReaD leverages a security knowledge graph (KG) as a semantic backbone and uses a strong teacher LLM to generate CWE-consistent contrastive reasoning supervision, enabling student model training without manual annotations. Students are fine-tuned with Odds Ratio Preference Optimization (ORPO) to encourage taxonomy-aligned reasoning while suppressing unsupported explanations. Across three real-world datasets, VulReaD improves binary F1 by 8-10% and multi-class classification by 30% Macro-F1 and 18% Micro-F1 compared to state-of-the-art baselines. Results show that LLMs outperform deep learning baselines in binary detection and that KG-guided reasoning enhances CWE coverage and interpretability.
Authors:Zihao Li, Hongyi Lu, Yanan Guo, Zhenkai Zhang, Shuai Wang, Fengwei Zhang
Abstract:
GPU memory errors are a critical threat to deep learning (DL) frameworks, leading to crashes or even security issues. We introduce GPU-Fuzz, a fuzzer locating these issues efficiently by modeling operator parameters as formal constraints. GPU-Fuzz utilizes a constraint solver to generate test cases that systematically probe error-prone boundary conditions in GPU kernels. Applied to PyTorch, TensorFlow, and PaddlePaddle, we uncovered 13 unknown bugs, demonstrating the effectiveness of GPU-Fuzz in finding memory errors.
Authors:Naveen Gill, Ajvad Haneef K, Madhu Kumar S D
Abstract:
Feature selection (FS) remains essential for building accurate and interpretable detection models, particularly in high-dimensional malware datasets. Conventional FS methods such as Extra Trees, Variance Threshold, Tree-based models, Chi-Squared tests, ANOVA, Random Selection, and Sequential Attention rely primarily on statistical heuristics or model-driven importance scores, often overlooking the semantic context of features. Motivated by recent progress in LLM-driven FS, we investigate whether large language models (LLMs) can guide feature selection in a zero-shot setting, using only feature names and task descriptions, as a viable alternative to traditional approaches. We evaluate multiple LLMs (GPT-5.0, GPT-4.0, Gemini-2.5 etc.) on the EMBOD dataset (a fusion of EMBER and BODMAS benchmark datasets), comparing them against established FS methods across several classifiers, including Random Forest, Extra Trees, MLP, and KNN. Performance is assessed using accuracy, precision, recall, F1, AUC, MCC, and runtime. Our results demonstrate that LLM-guided zero-shot feature selection achieves competitive performance with traditional FS methods while offering additional advantages in interpretability, stability, and reduced dependence on labeled data. These findings position zero-shot LLM-based FS as a promising alternative strategy for effective and interpretable malware detection, paving the way for knowledge-guided feature selection in security-critical applications
Authors:Xiang Li, Zixuan Xie, Lu Sun, Yuqi Qiu, Zuyao Xu, Zheli Liu
Abstract:
XMap is an open-source network scanner designed for performing fast Internet-wide IPv4 and IPv6 network research scanning. XMap was initially developed as the research artifact of a paper published at 2021 IEEE/IFIP International Conference on Dependable Systems and Networks (DSN '21) and then made available on GitHub. XMap is the first tool to support fast Internet-wide IPv6 network scanning in 2020. During the last five years, XMap has made substantial impact in academia, industry, and government. It has been referenced in 52 research papers (15 published at top-tier security venues and 11 in leading networking societies), received over 450 GitHub stars, featured in multiple news outlets, and deployed or recommended by international companies up to date. Additionally, XMap has contributed to the implementation of RFC documents and the discovery of various vulnerabilities. This paper provides fundamental details about XMap, its architecture, and its impact.
Authors:Sanket Goutam, Omar Chowdhury, Amir Rahmati
Abstract:
Cloud-mediated IoT architectures fragment authentication across vendor silos and create latency and availability bottlenecks for cross-vendor device-to-device (D2D) interactions. We present Atlas, a framework that extends the Web public-key infrastructure to IoT by issuing X.509 certificates to devices via vendor-operated ACME clients and vendor-controlled DNS namespaces. Devices obtain globally verifiable identities without hardware changes and establish mutual TLS channels directly across administrative domains, decoupling runtime authentication from cloud reachability. We prototype Atlas on ESP32 and Raspberry Pi, integrate it with an MQTT-based IoT stack and an Atlas-aware cloud, and evaluate it in smart-home and smart-city workloads. Certificate provisioning completes in under 6s per device, mTLS adds only about 17ms of latency and modest CPU overhead, and Atlas-based applications sustain low, predictable latency compared to cloud-mediated baselines. Because many major vendors already rely on ACME-compatible CAs for their web services, Atlas is immediately deployable with minimal infrastructure changes.
Authors:Fatemeh Nejati, Mahdi Rabbani, Morteza Eskandarian, Mansur Mirani, Gunjan Piya, Igor Opushnyev, Ali A. Ghorbani, Sajjad Dadkhah
Abstract:
Phishing attacks represents one of the primary attack methods which is used by cyber attackers. In many cases, attackers use deceptive emails along with malicious attachments to trick users into giving away sensitive information or installing malware while compromising entire systems. The flexibility of malicious email attachments makes them stand out as a preferred vector for attackers as they can embed harmful content such as malware or malicious URLs inside standard document formats. Although phishing email defenses have improved a lot, attackers continue to abuse attachments, enabling malicious content to bypass security measures. Moreover, another challenge that researches face in training advance models, is lack of an unified and comprehensive dataset that covers the most prevalent data types. To address this gap, we generated CIC-Trap4Phish, a multi-format dataset containing both malicious and benign samples across five categories commonly used in phishing campaigns: Microsoft Word documents, Excel spreadsheets, PDF files, HTML pages, and QR code images. For the first four file types, a set of execution-free static feature pipeline was proposed, designed to capture structural, lexical, and metadata-based indicators without the need to open or execute files. Feature selection was performed using a combination of SHAP analysis and feature importance, yielding compact, discriminative feature subsets for each file type. The selected features were evaluated by using lightweight machine learning models, including Random Forest, XGBoost, and Decision Tree. All models demonstrate high detection accuracy across formats. For QR code-based phishing (quishing), two complementary methods were implemented: image-based detection by employing Convolutional Neural Networks (CNNs) and lexical analysis of decoded URLs using recent lightweight language models.
Authors:Miguel Bicudo, Estevão Rabello, Daniel Menasché, Paulo Segal, Claudio Segal, Anton Kocheturov, Priyanjan Sharma
Abstract:
Industrial control systems (ICS) depend on highly heterogeneous environments where Linux, proprietary real-time operating systems, and Windows coexist. Although the IEC 62443-3-3 standard provides a comprehensive framework for securing such systems, translating its requirements into concrete configuration checks remains challenging, especially for Windows platforms. In this paper, we propose a transfer learning methodology that maps Windows Common Configuration Enumerations (CCEs) to IEC 62443-3-3 System Security Requirements by leveraging labeled Linux datasets. The resulting labeled dataset enables automated compliance checks, analysis of requirement prevalence, and identification of cross-platform similarities and divergences. Our results highlight the role of CCEs as a bridge between abstract standards and concrete configurations, advancing automation, traceability, and clarity in IEC 62443-3-3 compliance for Windows environments.
Authors:Sadegh Sohani, Salar Ghazi, Farnaz Kamranfar, Sahar Pilehvar Moakhar, Mohammad Allahbakhsh, Haleh Amintoosi, Kaiwen Zhang
Abstract:
This paper addresses the critical challenge of access control in modern supply chains, which operate across multiple independent and competing organizations. Existing access control is static and centralized, unable to adapt to insider threats or evolving contexts. Blockchain improves decentralization but lacks behavioral intelligence, while centralized machine learning for anomaly detection requires aggregating sensitive data, violating privacy. The proposed solution is ICBAC, an intelligent contract-based access control framework. It integrates permissioned blockchain (Hyperledger Fabric) with federated learning (FL). Built on Fabric, ICBAC uses a multi-channel architecture and three smart contracts for asset management, baseline access control, and dynamic revocation. To counter insider misuse, each channel deploys an AI agent that monitors activity and dynamically restricts access for anomalies. Federated learning allows these agents to collaboratively improve detection models without sharing raw data. For heterogeneous, competitive environments, ICBAC introduces a game-theoretic client selection mechanism using hedonic coalition formation. This enables supply chains to form stable, strategy-proof FL coalitions via preference-based selection without disclosing sensitive criteria. Extensive experiments on a Fabric testbed with a real-world dataset show ICBAC achieves blockchain performance comparable to static frameworks and provides effective anomaly detection under IID and non-IID data with zero raw-data sharing. ICBAC thus offers a practical, scalable solution for dynamic, privacy-preserving access control in decentralized supply chains.
Authors:Tasnia Ashrafi Heya, Sayed Erfan Arefin
Abstract:
Secure communication is essential in covert and safety-critical settings where verbal interactions may expose user intent or operational context. Wearable gesture-based communication enables low-effort, nonverbal interaction, but existing systems leak motion data, intermediate representations, or inference outputs to untrusted infrastructure, enabling intent inference, behavioral biometric leakage, and insider attacks. This work proposes a privacy-preserving gesture-based covert communication system that ensures, no raw sensor signals, learned features, or classification outputs are exposed to any third-party. The system employs a multi-party homomorphic learning pipeline for gesture recognition directly over encrypted motion data, preventing adversaries from inferring gesture semantics, replaying sensor traces, or accessing intermediate representations. To our knowledge, this work is the first to apply encrypted gesture recognition in a wearable-based covert communication setting. We design and evaluate haptic and visual feedback mechanisms for covert signal delivery and evaluate the system using 600 gesture samples from a commodity smartwatch, achieving over 94.44% classification accuracy and demonstrating the feasibility of the proposed system with practical deployability from high-performance systems to resource-constrained edge devices.
Authors:Zuyao Xu, Yuqi Qiu, Lu Sun, FaSheng Miao, Fubin Wu, Xinyi Wang, Xiang Li, Haozhe Lu, ZhengZe Zhang, Yuxin Hu, Jialu Li, Jin Luo, Feng Zhang, Rui Luo, Xinran Liu, Yingxian Li, Jiaji Liu
Abstract:
Citations provide the basis for trusting scientific claims; when they are invalid or fabricated, this trust collapses. With the advent of Large Language Models (LLMs), this risk has intensified: LLMs are increasingly used for academic writing, yet their tendency to fabricate citations (``ghost citations'') poses a systemic threat to citation validity. To quantify this threat and inform mitigation, we develop CiteVerifier, an open-source framework for large-scale citation verification, and conduct the first comprehensive study of citation validity in the LLM era through three experiments built on it. We benchmark 13 state-of-the-art LLMs on citation generation across 40 research domains, finding that all models hallucinate citations at rates from 14.23\% to 94.93\%, with significant variation across research domains. Moreover, we analyze 2.2 million citations from 56,381 papers published at top-tier AI/ML and Security venues (2020--2025), confirming that 1.07\% of papers contain invalid or fabricated citations (604 papers), with an 80.9\% increase in 2025 alone. Furthermore, we survey 97 researchers and analyze 94 valid responses after removing 3 conflicting samples, revealing a critical ``verification gap'': 41.5\% of researchers copy-paste BibTeX without checking and 44.4\% choose no-action responses when encountering suspicious references; meanwhile, 76.7\% of reviewers do not thoroughly check references and 80.0\% never suspect fake citations. Our findings reveal an accelerating crisis where unreliable AI tools, combined with inadequate human verification by researchers and insufficient peer review scrutiny, enable fabricated citations to contaminate the scientific record. We propose interventions for researchers, venues, and tool developers to protect citation integrity.
Authors:Zeta Avarikioti, Ray Neiheiser, Krzysztof Pietrzak, Michelle X. Yeo
Abstract:
Over the last years, Ethereum has evolved into a public platform that safeguards the savings of hundreds of millions of people and secures more than $650 billion in assets, placing it among the top 25 stock exchanges worldwide in market capitalization, ahead of Singapore, Mexico, and Thailand. As such, the performance and security of the Ethereum blockchain are not only of theoretical interest, but also carry significant global economic implications. At the time of writing, the Ethereum platform is collectively secured by almost one million validators highlighting its decentralized nature and underlining its economic security guarantees. However, due to this large validator set, the protocol takes around 15 minutes to finalize a block which is prohibitively slow for many real world applications. This delay is largely driven by the cost of aggregating and disseminating signatures across a validator set of this scale. Furthermore, as we show in this paper, the existing protocol that is used to aggregate and disseminate the signatures has several shortcomings that can be exploited by adversaries to shift stake proportion from honest to adversarial nodes. In this paper, we introduce Wonderboom, the first million scale aggregation protocol that can efficiently aggregate the signatures of millions of validators in a single Ethereum slot (x32 faster) while offering higher security guarantees than the state of the art protocol used in Ethereum. Furthermore, to evaluate Wonderboom, we implement the first simulation tool that can simulate such a protocol on the million scale and show that even in the worst case Wonderboom can aggregate and verify more than 2 million signatures within a single Ethereum slot.
Authors:Qianlong Lan, Anuj Kaul, Shaun Jones, Stephanie Westrum
Abstract:
The deployment of autonomous AI agents capable of executing commercial transactions has motivated the adoption of mandate-based payment authorization protocols, including the Universal Commerce Protocol (UCP) and the Agent Payments Protocol (AP2). These protocols replace interactive, session-based authorization with cryptographically issued mandates, enabling asynchronous and autonomous execution. While AP2 provides specification-level guarantees through signature verification, explicit binding, and expiration semantics, real-world agentic execution introduces runtime behaviors such as retries, concurrency, and orchestration that challenge implicit assumptions about mandate usage. In this work, we present a security analysis of the AP2 mandate lifecycle and identify enforcement gaps that arise during runtime in agent-based payment systems. We propose a zero-trust runtime verification framework that enforces explicit context binding and consume-once mandate semantics using dynamically generated, time-bound nonces, ensuring that authorization decisions are evaluated at execution time rather than assumed from static issuance properties. Through simulation-based evaluation under high concurrency, we show that context-aware binding and consume-once enforcement address distinct and complementary attack classes, and that both are required to prevent replay and context-redirect attacks. The proposed framework mitigates all evaluated attacks while maintaining stable verification latency of approximately 3.8~ms at throughput levels up to 10{,}000 transactions per second. We further demonstrate that the required runtime state is bounded by peak concurrency rather than cumulative transaction history, indicating that robust runtime security for agentic payment execution can be achieved with minimal and predictable overhead.
Authors:Yujie Ling, Zan Li, Lei Guan, Zheng Zhang, Shengyu Zhang, Tony Q. S. Quek
Abstract:
Satellite-terrestrial networks (STNs) have emerged as a promising architecture for providing seamless wireless coverage and connectivity for multiple users. However, potential malicious eavesdroppers pose a serious threat to the private information via STNs due to their non-cooperative behavior and ability to launch intelligent attacks. To address this challenge, we propose a cognitive secure communication framework driven by multiple agents that coordinates spectrum scheduling and protection through real-time sensing, thereby disrupting the judgment of eavesdroppers while preserving reliable data transmission. On this basis, we formulate an optimization problem to maximize the secrecy probability of legitimate users, subject to a reliable transmission probability threshold. To tackle this problem, we propose a two-layer coordinated defense system. First, we develop a foundation layer based on multi-agent coordination schedule to determine the satellite operation matrix and the frequency slot occupation matrices, aiming to mitigate spectrum congestion and enhance transmission reliability. Then, we exploit generative adversarial networks to produce adversarial matrices, and employ learning-aided power control to set real and adversarial signal powers for protection layer, which actively degrades the inference capability of eavesdroppers. Simulation results demonstrate that the proposed method outperforms benchmark methods in terms of enhancing security performance and reducing power overhead for STNs in the cognitive secure communication scenario.
Authors:Claudio Segal, Paulo Segal, Carlos Eduardo de Schuller Banjar, Felipe Paixão, Hudson Silva Borges, Paulo Silveira Neto, Eduardo Santana de Almeida, Joanna C. S. Santos, Anton Kocheturov, Gaurav Kumar Srivastava, Daniel Sadoc Menasché
Abstract:
GitHub Security Advisories (GHSA) have become a central component of open-source vulnerability disclosure and are widely used by developers and security tools. A distinctive feature of GHSA is that only a fraction of advisories are reviewed by GitHub, while the mechanisms associated with this review process remain poorly understood. In this paper, we conduct a large-scale empirical study of GHSA review processes, analyzing over 288,000 advisories spanning 2019--2025. We characterize which advisories are more likely to be reviewed, quantify review delays, and identify two distinct review-latency regimes: a fast path dominated by GitHub Repository Advisories (GRAs) and a slow path dominated by NVD-first advisories. We further develop a queueing model that accounts for this dichotomy based on the structure of the advisory processing pipeline.
Authors:Jafar Isbarov, Murat Kantarcioglu
Abstract:
As AI agents automate critical workloads, they remain vulnerable to indirect prompt injection (IPI) attacks. Current defenses rely on monitoring protocols that jointly evaluate an agent's Chain-of-Thought (CoT) and tool-use actions to ensure alignment with user intent. We demonstrate that these monitoring-based defenses can be bypassed via a novel Agent-as-a-Proxy attack, where prompt injection attacks treat the agent as a delivery mechanism, bypassing both agent and monitor simultaneously. While prior work on scalable oversight has focused on whether small monitors can supervise large agents, we show that even frontier-scale monitors are vulnerable. Large-scale monitoring models like Qwen2.5-72B can be bypassed by agents with similar capabilities, such as GPT-4o mini and Llama-3.1-70B. On the AgentDojo benchmark, we achieve a high attack success rate against AlignmentCheck and Extract-and-Evaluate monitors under diverse monitoring LLMs. Our findings suggest current monitoring-based agentic defenses are fundamentally fragile regardless of model scale.
Authors:Vipin Kumar Rathi, Lakshya Chopra, Nikhil Kumar Rajput
Abstract:
Cloud-native application platforms and latency-sensitive systems such as 5G Core networks rely heavily on certificate-based Public Key Infrastructure (PKI) and mutual TLS to secure service-to-service communication. While effective, this model introduces significant operational and performance overhead, which is further amplified in the post-quantum setting due to large certificates and expensive signature verification. In this paper, we present a certificate-free authentication framework for private distributed systems based on post-quantum Identity-Based Encryption(IBE). Our design replaces certificate and signature based authentication with identity-derived keys and identity-based key encapsulation, enabling mutually authenticated TLS connections without certificate transmission or validation. We describe an IBE-based replacement for private PKI, including identity lifecycle management, and show how it can be instantiated using a threshold Private Key Generator (T-PKG). We apply this framework to cloud-native application deployments and latency-sensitive 5G Core networks. In particular, we demonstrate how identity-based TLS integrates with the 5G Service-Based Architecture while preserving security semantics and 3GPP requirements, and we show how the same architecture can replace private PKI in Kubernetes, including its control plane, without disrupting existing trust domains or deployment models.
Authors:Amir Nuriyev, Gabriel Kulp
Abstract:
We present a text-reconstruction attack on mixture-of-experts (MoE) language models that recovers tokens from expert selections alone. In MoE models, each token is routed to a subset of expert subnetworks; we show these routing decisions leak substantially more information than previously understood. Prior work using logistic regression achieves limited reconstruction; we show that a 3-layer MLP improves this to 63.1% top-1 accuracy, and that a transformer-based sequence decoder recovers 91.2% of tokens top-1 (94.8% top-10) on 32-token sequences from OpenWebText after training on 100M tokens. These results connect MoE routing to the broader literature on embedding inversion. We outline practical leakage scenarios (e.g., distributed inference and side channels) and show that adding noise reduces but does not eliminate reconstruction. Our findings suggest that expert selections in MoE deployments should be treated as sensitive as the underlying text.
Authors:Pratyush Uppuluri, Shilpa Noushad, Sajan Kumar
Abstract:
This work presents a consensus-based Bayesian framework to detect malicious user behavior in enterprise directory access graphs. By modeling directories as topics and users as agents within a multi-level interaction graph, we simulate access evolution using influence-weighted opinion dynamics. Logical dependencies between users are encoded in dynamic matrices Ci, and directory similarity is captured via a shared influence matrix W. Malicious behavior is injected as cross-component logical perturbations that violate structural norms of strongly connected components(SCCs). We apply theoretical guarantees from opinion dynamics literature to determine topic convergence and detect anomaly via scaled opinion variance. To quantify uncertainty, we introduce a Bayesian anomaly scoring mechanism that evolves over time, using both static and online priors. Simulations over synthetic access graphs validate our method, demonstrating its sensitivity to logical inconsistencies and robustness under dynamic perturbation.
Authors:Lucas Rosenblatt, Peihan Liu, Ryan McKenna, Natalia Ponomareva
Abstract:
Research on differentially private synthetic tabular data has largely focused on independent and identically distributed rows where each record corresponds to a unique individual. This perspective neglects the temporal complexity in longitudinal datasets, such as electronic health records, where a user contributes an entire (sub) table of sequential events. While practitioners might attempt to model such data by flattening user histories into high-dimensional vectors for use with standard marginal-based mechanisms, we demonstrate that this strategy is insufficient. Flattening fails to preserve temporal coherence even when it maintains valid marginal distributions. We introduce PATH, a novel generative framework that treats the full table as the unit of synthesis and leverages the autoregressive capabilities of privately fine-tuned large language models. Extensive evaluations show that PATH effectively captures long-range dependencies that traditional methods miss. Empirically, our method reduces the distributional distance to real trajectories by over 60% and reduces state transition errors by nearly 50% compared to leading marginal mechanisms while achieving similar marginal fidelity.
Authors:Maura B. Paterson, Douglas R. Stinson
Abstract:
The design of protocols for local differential privacy (or LDP) has been a topic of considerable research interest in recent years. LDP protocols utilise the randomised encoding of outcomes of an experiment using a transition probability matrix (TPM). Several authors have observed that balanced incomplete block designs (BIBDs) provide nice examples of TPMs for LDP protocols. Indeed, it has been shown that such BIBD-based LDP protocols provide optimal estimators. In this primarily expository paper, we give a detailed introduction to LDP protocols and their connections with block designs. We prove that a subclass of LDP protocols known as pure LDP protocols are equivalent to $(r,λ)$-designs (which contain balanced incomplete block designs as a special case). An unbiased estimator for an LDP scheme is a left inverse of the transition probability matrix. We show that the optimal estimators for BIBD-based TPMs are precisely those obtained from the Moore-Penrose inverse of the corresponding TPM. We also review some existing work on optimal LDP protocols in the context of pure protocols.
Authors:Asmaa Cherkaoui, Ramon Flores, Delaram Kahrobaei, Richard Wilson
Abstract:
We propose Eidolon, a practical post-quantum signature scheme based on the NP-complete k-colorability problem. Our construction generalizes the Goldreich-Micali-Wigderson zero-knowledge protocol to arbitrary k >= 3, applies the Fiat-Shamir transform, and uses Merkle-tree commitments to compress signatures from O(tn) to O(t log n). Crucially, we generate hard instances via planted "quiet" colorings that preserve the statistical profile of random graphs. We present the first empirical security analysis of such a scheme against both classical solvers (ILP, DSatur) and a custom graph neural network (GNN) attacker. Experiments show that for n >= 60, neither approach recovers the secret coloring, demonstrating that well-engineered k-coloring instances can resist modern cryptanalysis, including machine learning. This revives combinatorial hardness as a credible foundation for post-quantum signatures.
Authors:Ying Wang, Jiahui Chen, Dejun Jiang
Abstract:
Storage systems are fundamental to modern computing infrastructures, yet ensuring their correctness remains challenging in practice. Despite decades of research on system testing, many storage-system failures (including durability, ordering, recovery, and consistency violations) remain difficult to expose systematically. This difficulty stems not primarily from insufficient testing tooling, but from intrinsic properties of storage-system execution, including nondeterministic interleavings, long-horizon state evolution, and correctness semantics that span multiple layers and execution phases. This survey adopts a storage-centric view of system testing and organizes existing techniques according to the execution properties and failure mechanisms they target. We review a broad spectrum of approaches, ranging from concurrency testing and long-running workloads to crash-consistency analysis, hardware-level semantic validation, and distributed fault injection, and analyze their fundamental strengths and limitations. Within this framework, we examine fuzzing as an automated testing paradigm, highlighting systematic mismatches between conventional fuzzing assumptions and storage-system semantics, and discuss how recent artificial intelligence advances may complement fuzzing through state-aware and semantic guidance. Overall, this survey provides a unified perspective on storage-system correctness testing and outlines key challenges
Authors:Max Manolov, Tony Gao, Siddharth Shukla, Cheng-Ting Chou, Ryan Lagasse
Abstract:
Large language models (LLMs) are increasingly used to assist developers with code, yet their implementations of cryptographic functionality often contain exploitable flaws. Minor design choices (e.g., static initialization vectors or missing authentication) can silently invalidate security guarantees. We introduce CIPHER(\textbf{C}ryptographic \textbf{I}nsecurity \textbf{P}rofiling via \textbf{H}ybrid \textbf{E}valuation of \textbf{R}esponses), a benchmark for measuring cryptographic vulnerability incidence in LLM-generated Python code under controlled security-guidance conditions. CIPHER uses insecure/neutral/secure prompt variants per task, a cryptography-specific vulnerability taxonomy, and line-level attribution via an automated scoring pipeline. Across a diverse set of widely used LLMs, we find that explicit ``secure'' prompting reduces some targeted issues but does not reliably eliminate cryptographic vulnerabilities overall. The benchmark and reproducible scoring pipeline will be publicly released upon publication.
Authors:Md Jahedur Rahman, Ihsen Alouani
Abstract:
Large language models (LLMs) are increasingly used in interactive and retrieval-augmented systems, but they remain vulnerable to task drift; deviations from a user's intended instruction due to injected secondary prompts. Recent work has shown that linear probes trained on activation deltas of LLMs' hidden layers can effectively detect such drift. In this paper, we evaluate the robustness of these detectors against adversarially optimised suffixes. We generate universal suffixes that cause poisoned inputs to evade detection across multiple probes simultaneously. Our experiments on Phi-3 3.8B and Llama-3 8B show that a single suffix can achieve high attack success rates; up to 93.91% and 99.63%, respectively, when all probes must be fooled, and nearly perfect success (>90%) under majority vote setting. These results demonstrate that activation delta-based task drift detectors are highly vulnerable to adversarial suffixes, highlighting the need for stronger defences against adaptive attacks. We also propose a defence technique where we generate multiple suffixes and randomly append one of them to the prompts while making forward passes of the LLM and train logistic regression models with these activations. We found this approach to be highly effective against such attacks.
Authors:Luze Sun, Alina Oprea, Eric Wong
Abstract:
LLM-based vulnerability detectors are increasingly deployed in security-critical code review, yet their resilience to evasion under behavior-preserving edits remains poorly understood. We evaluate detection-time integrity under a semantics-preserving threat model by instantiating diverse behavior-preserving code transformations on a unified C/C++ benchmark (N=5000), and introduce a metric of joint robustness across different attack methods/carriers. Across models, we observe a systemic failure of semantic invariant adversarial transformations: even state-of-the-art vulnerability detectors perform well on clean inputs while predictions flip under behavior-equivalent edits. Universal adversarial strings optimized on a single surrogate model remain effective when transferred to black-box APIs, and gradient access can further amplify evasion success. These results show that even high-performing detectors are vulnerable to low-cost, semantics-preserving evasion. Our carrier-based metrics provide practical diagnostics for evaluating LLM-based code detectors.
Authors:Mohsen Salehi, Karthik Pattabiraman
Abstract:
As the number of embedded devices grows and their functional requirements increase, embedded firmware is becoming increasingly larger, thereby expanding its attack surface. Despite the increase in firmware size, many embedded devices, such as robotic vehicles (RVs), operate in distinct modes, each requiring only a small subset of the firmware code at runtime. We refer to such devices as mode-based embedded devices. Debloating is an approach to reduce attack surfaces by removing or restricting unneeded code, but existing techniques suffer from significant limitations, such as coarse granularity and irreversible code removal, limiting their applicability. To address these limitations, we propose RVDebloater, a novel adaptive debloating technique for mode-based embedded devices that automatically identifies unneeded firmware code for each mode using either static or dynamic analysis, and dynamically debloats the firmware for each mode at the function level at runtime. RVDebloater introduces a new software-based enforcement approach that supports diverse mode-based embedded devices. We implemented RVDebloater using the LLVM compiler and evaluated its efficiency and effectiveness on six different RVs, including both simulated and real ones, with different real-world missions. We find that device requirements change throughout its lifetime for each mode, and that many critical firmware functions can be restricted in other modes, with an average of 85% of functions not being required. The results showed that none of the missions failed after debloating with RVDebloater, indicating that it neither incurred false positives nor false negatives. Further, RVDebloater prunes the firmware call graph by an average of 45% across different firmware. Finally, RVDebloater incurred an average performance overhead of 3.9% and memory overhead of 4% (approximately 0.25 MB) on real RVs.
Authors:Marco Dessalvi, Massimo Bartoletti, Alberto Lluch-Lafuente
Abstract:
Decentralized Finance (DeFi) has revolutionized financial markets by enabling complex asset-exchange protocols without trusted intermediaries. Automated Market Makers (AMMs) are a central component of DeFi, providing the core functionality of swapping assets of different types at algorithmically computed exchange rates. Several mainstream AMM implementations are based on the constant-product model, which ensures that swaps preserve the product of the token reserves in the AMM -- up to a \emph{trading fee} used to incentivize liquidity provision. Trading fees substantially complicate the economic properties of AMMs, and for this reason some AMM models abstract them away in order to simplify the analysis. However, trading fees have a non-trivial impact on users' trading strategies, making it crucial to develop refined AMM models that precisely account for their effects. We extend a foundational model of AMMs by introducing a new parameter, the trading fee $ϕ\in(0,1]$, into the swap rate function. Fee amounts increase inversely proportional to $ϕ$. When $ϕ= 1$, no fee is applied and the original model is recovered. We analyze the resulting fee-adjusted model from an economic perspective. We show that several key properties of the swap rate function, including output-boundedness and monotonicity, are preserved. At the same time, other properties - most notably additivity - no longer hold. We precisely characterize this deviation by deriving a generalized form of additivity that captures the effect of swaps in the presence of trading fees. We prove that when $ϕ< 1$, executing a single large swap yields strictly greater profit than splitting the trade into smaller ones. Finally, we derive a closed-form solution to the arbitrage problem in the presence of trading fees and prove its uniqueness. All results are formalized and machine-checked in the Lean 4 proof assistant.
Authors:Md Min-Ha-Zul Abedin, Tazqia Mehrub
Abstract:
Accurate Android malware detection was critical for protecting users at scale. Signature scanners lagged behind fast release cycles on public app stores. We aimed to build a trustworthy detector by pairing a comprehensive dataset with a rigorous, transparent evaluation, and to identify interpretable drivers of decisions. We used CICMalDroid2020, which contained 17,341 apps across Benign, Adware, Banking, SMS malware, and Riskware. We extracted 301 static and 263 dynamic features into a 564 dimensional hybrid vector, then evaluated seven classifiers under three schemes, original features, principal component analysis, PCA, and linear discriminant analysis, LDA, with a 70 percent training and 30 percent test split. Results showed that gradient boosting on the original features performed best. XGBoost achieved 0.9747 accuracy, 0.9703 precision, 0.9731 recall, and 0.9716 F1, and the confusion matrix indicated rare benign labels for malicious apps. HistGradientBoosting reached 0.9741 accuracy and 0.9708 F1, while CatBoost and Random Forest were slightly lower at 0.9678 and 0.9687 accuracy with 0.9636 and 0.9637 F1. KNN and SVM lagged. PCA reduced performance for all models, with XGBoost dropping to 0.9164 accuracy and 0.8988 F1. LDA maintained mid 90s accuracy and clarified separable clusters in projections. A depth two surrogate tree highlighted package name, main activity, and target SDK as key drivers. These findings established high fidelity supervised baselines for Android malware detection and indicated that rich hybrid features with gradient boosting offered a practical and interpretable foundation for deployment.
Authors:Tristan Bilot, Baoxiang Jiang, Thomas Pasquier
Abstract:
Recent provenance-based intrusion detection systems (PIDSs) have demonstrated strong potential for detecting advanced persistent threats (APTs) by applying machine learning to system provenance graphs. However, evaluating and comparing PIDSs remains difficult: prior work uses inconsistent preprocessing pipelines, non-standard dataset splits, and incompatible ground-truth labeling and metrics. These discrepancies undermine reproducibility, impede fair comparison, and impose substantial re-implementation overhead on researchers. We present PIDSMaker, an open-source framework for developing and evaluating PIDSs under consistent protocols. PIDSMaker consolidates eight state-of-the-art systems into a modular, extensible architecture with standardized preprocessing and ground-truth labels, enabling consistent experiments and apples-to-apples comparisons. A YAML-based configuration interface supports rapid prototyping by composing components across systems without code changes. PIDSMaker also includes utilities for ablation studies, hyperparameter tuning, multi-run instability measurement, and visualization, addressing methodological gaps identified in prior work. We demonstrate PIDSMaker through concrete use cases and release it with preprocessed datasets and labels to support shared evaluation for the PIDS community.
Authors:Rourab Paul, Krishnendu Guha, Amlan Chakrabarti
Abstract:
Number Theoretic Transform (NTT) is the most essential component for polynomial multiplications used in lattice-based Post-Quantum Cryptography (PQC) algorithms such as Kyber, Dilithium, NTRU etc. However, side-channel attacks (SCA) and hardware vulnerabilities in the form of hardware Trojans may alter control signals to disrupt the circuit's control flow and introduce unconventional delays in the critical hardware of PQC. Hardware Trojans, especially on control signals, are more low cost and impactful than data signals because a single corrupted control signal can disrupt or bypass entire computation sequences, whereas data faults usually cause only localized errors. On the other hand, adversaries can perform Soft Analytical Side Channel Attacks (SASCA) on the design using the inserted hardware Trojan. In this paper, we present a secure NTT architecture capable of detecting unconventional delays, control-flow disruptions, and SASCA, while providing an adaptive fault-correction methodology for their mitigation. Extensive simulations and implementations of our Secure NTT on Artix-7 FPGA with different Kyber variants show that our fault detection and correction modules can efficiently detect and correct faults whether caused unintentionally or intentionally by hardware Trojans with a high success rate, while introducing only modest area and time overheads.
Authors:Chanwoo Park, Chanwoo Kim
Abstract:
Evasion attacks pose significant threats to AI systems, exploiting vulnerabilities in machine learning models to bypass detection mechanisms. The widespread use of voice data, including deepfakes, in promising future industries is currently hindered by insufficient legal frameworks. Adversarial attack methods have emerged as the most effective countermeasure against the indiscriminate use of such data. This research introduces masked energy perturbation (MEP), a novel approach using power spectrum for energy masking of original voice data. MEP applies masking to small energy regions in the frequency domain before generating adversarial perturbations, targeting areas less noticeable to the human auditory model. The study primarily employs advanced speaker recognition models, including ECAPA-TDNN and ResNet34, which have shown remarkable performance in speaker verification tasks. The proposed MEP method demonstrated strong performance in both audio quality and evasion effectiveness. The energy masking approach effectively minimizes the perceptual evaluation of speech quality (PESQ) degradation, indicating that minimal perceptual distortion occurs to the human listener despite the adversarial perturbations. Specifically, in the PESQ evaluation, the relative performance of the MEP method was 26.68% when compared to the fast gradient sign method (FGSM) and iterative FGSM.
Authors:Javier Carnerero-Cano, Luis Muñoz-González, Phillippa Spencer, Emil C. Lupu
Abstract:
Regression models are widely used in industrial processes, engineering and in natural and physical sciences, yet their robustness to poisoning has received less attention. When it has, studies often assume unrealistic threat models and are thus less useful in practice. In this paper, we propose a novel optimal stealthy attack formulation that considers different degrees of detectability and show that it bypasses state-of-the-art defenses. We further propose a new methodology based on normalization of objectives to evaluate different trade-offs between effectiveness and detectability. Finally, we develop a novel defense (BayesClean) against stealthy attacks. BayesClean improves on previous defenses when attacks are stealthy and the number of poisoning points is significant.
Authors:Piotr Przymus, Witold Weiner, Krzysztof Rykaczewski, Gunnar Kudrjavets
Abstract:
In 2024, the Linux kernel became its own Common Vulnerabilities and Exposures (CVE) Numbering Authority (CNA), formalizing how kernel vulnerabilities are identified and tracked. We analyze the anatomy and dynamics of kernel CVEs using metadata, associated commits, and patch latency to understand what drives patching. Results show that severity and Common Vulnerability Scoring System (CVSS) metrics have a negligible association with patch latency, whereas kernel recency is a reasonable predictor in survival models. Kernel developers fix newer kernels sooner, while older ones retain unresolved CVEs. Commits introducing vulnerabilities are typically broader and more complex than their fixes, though often only approximate reconstructions of development history. The Linux kernel remains a unique open-source project -- its CVE process is no exception.
Authors:Anudeex Shetty, Aditya Joshi, Salil S. Kanhere
Abstract:
Humans are susceptible to undesirable behaviours and privacy leaks under the influence of alcohol. This paper investigates drunk language, i.e., text written under the influence of alcohol, as a driver for safety failures in large language models (LLMs). We investigate three mechanisms for inducing drunk language in LLMs: persona-based prompting, causal fine-tuning, and reinforcement-based post-training. When evaluated on 5 LLMs, we observe a higher susceptibility to jailbreaking on JailbreakBench (even in the presence of defences) and privacy leaks on ConfAIde, where both benchmarks are in English, as compared to the base LLMs as well as previously reported approaches. Via a robust combination of manual evaluation and LLM-based evaluators and analysis of error categories, our findings highlight a correspondence between human-intoxicated behaviour, and anthropomorphism in LLMs induced with drunk language. The simplicity and efficiency of our drunk language inducement approaches position them as potential counters for LLM safety tuning, highlighting significant risks to LLM safety.
Authors:Gloria Felicia, Michael Eniolade, Jinfeng He, Zitha Sasindran, Hemant Kumar, Milan Hussain Angati, Sandeep Bandarupalli
Abstract:
Existing agent safety benchmarks report binary accuracy, conflating early intervention with post-mortem analysis. A detector that flags a violation at step 8 enables intervention; one that reports it at step 48 provides only forensic value. This distinction is critical, yet current benchmarks cannot measure it. We introduce StepShield, the first benchmark to evaluate when violations are detected, not just whether. StepShield contains 9,213 code agent trajectories, including 1,278 meticulously annotated training pairs and a 7,935-trajectory test set with a realistic 8.1% rogue rate. Rogue behaviors are grounded in real-world security incidents across six categories. We propose three novel temporal metrics: Early Intervention Rate (EIR), Intervention Gap, and Tokens Saved. Surprisingly, our evaluation reveals that an LLM-based judge achieves 59% EIR while a static analyzer achieves only 26%, a 2.3x performance gap that is entirely invisible to standard accuracy metrics. We further show that early detection has direct economic benefits: our cascaded HybridGuard detector reduces monitoring costs by 75% and projects to $108M in cumulative savings over five years at enterprise scale. By shifting the focus of evaluation from whether to when, StepShield provides a new foundation for building safer and more economically viable AI agents. The code and data are released under an Apache 2.0 license.
Authors:Jan Schuchardt, Nikita Kalinin
Abstract:
We study privacy amplification for differentially private model training with matrix factorization under random allocation (also known as the balls-in-bins model). Recent work by Choquette-Choo et al. (2025) proposes a sampling-based Monte Carlo approach to compute amplification parameters in this setting. However, their guarantees either only hold with some high probability or require random abstention by the mechanism. Furthermore, the required number of samples for ensuring $(ε,δ)$-DP is inversely proportional to $δ$. In contrast, we develop sampling-free bounds based on Rényi divergence and conditional composition. The former is facilitated by a dynamic programming formulation to efficiently compute the bounds. The latter complements it by offering stronger privacy guarantees for small $ε$, where Rényi divergence bounds inherently lead to an over-approximation. Our framework applies to arbitrary banded and non-banded matrices. Through numerical comparisons, we demonstrate the efficacy of our approach across a broad range of matrix mechanisms used in research and practice.
Authors:Abrar Hamed Al Barwani, Abdelaziz Amara Korba, Raja Waseem Anwar
Abstract:
The escalating sophistication of phishing emails necessitates a shift beyond traditional rule-based and conventional machine-learning-based detectors. Although large language models (LLMs) offer strong natural language understanding, using them as standalone classifiers often yields elevated falsepositive (FP) rates, which mislabel legitimate emails as phishing and create significant operational burden. This paper presents a personalized phishing detection framework that integrates LLMs with retrieval-augmented generation (RAG). For each message, the system constructs user-specific context by retrieving a compact set of the user's historical legitimate emails and enriching it with real-time domain and URL reputation from a cyber-threat intelligence platform, then conditions the LLM's decision on this evidence. We evaluate four open-source LLMs (Llama4-Scout, DeepSeek-R1, Mistral-Saba, and Gemma2) on an email dataset collected from public and institutional sources. Results show high performance; for example, Llama4-Scout attains an F1-score of 0.9703 and achieves a 66.7% reduction in FPs with RAG. These findings validate that a RAG-based, user-profiling approach is both feasible and effective for building high-precision, low-friction email security systems that adapt to individual communication patterns.
Authors:Gayathri Subramanian, Girinath P, Nitya Ranganathan, Kamakoti Veezhinathan, Gopalakrishnan Srinivasan
Abstract:
Modern microprocessors depend on speculative execution, creating vulnerabilities that enable transient execution attacks. Prior defenses target speculative data leakage but overlook false dependencies from partial address aliasing, where repeated squash and reissue events increase the load-store latency, which is exploited by the SPOILER attack. We present SPOILER-GUARD, a hardware defense that obfuscates speculative dependency resolution by dynamically randomizing the physical address bits used for load-store comparisons and tagging store entries to prevent latency-amplifying misspeculations. Implemented in gem5 and evaluated with SPEC 2017, SPOILER-GUARD reduces misspeculation to 0.0004 percent and improves integer and floating-point performance by 2.12 and 2.87 percent. HDL synthesis with Synopsys Design Compiler at 14 nm node demonstrates minimal overheads - 69 ps latency in critical path, 0.064 square millimeter in area, and 5.863 mW in power.
Authors:Mario Perera, Michael Mackay, Max Hashem Eiza, Alessandro Raschellà, Nathan Shone, Mukesh Kumar Maheshwari
Abstract:
As Open Radio Access Network (O-RAN) deployments expand and adversaries adopt 'store-now, decrypt-later' strategies, operators need empirical data on the cost of migrating critical control interfaces to post-quantum cryptography (PQC). This paper experimentally evaluates the impact of integrating a NIST-aligned module-lattice KEM (ML-KEM, CRYSTALS-Kyber) into IKEv2/IPsec protecting the E2 interface between the 5G Node B (gNB) and the Near-Real-Time RAN Intelligent Controller (Near-RT RIC). Using an open-source testbed built from srsRAN, Open5GS, FlexRIC and strongSwan (with liboqs), we compare three configurations: no IPsec, classical ECDH-based IPsec, and ML-KEM-based IPsec. The study focuses on IPsec tunnel-setup latency and the runtime behaviour of Near-RT RIC xApps under realistic signalling workloads. Results from repeated, automated runs show that ML-KEM integration adds a small overhead to tunnel establishment, which is approximately 3~5 ms in comparison to classical IPsec, while xApp operation and RIC control loops remain stable in our experiments. These findings indicate that ML-KEM based IPsec on the E2 interface is practically feasible and inform quantum-safe migration strategies for O-RAN deployments.
Authors:Asifullah Khan, Aimen Wadood, Mubashar Iqbal, Umme Zahoora
Abstract:
Ransomware has become one of the most serious cybersecurity threats causing major financial losses and operational disruptions worldwide.Traditional detection methods such as static analysis, heuristic scanning and behavioral analysis often fall short when used alone. To address these limitations, this paper presents multimodal multi agent ransomware analysis framework designed for ransomware classification. Proposed multimodal multiagent architecture combines information from static, dynamic and network sources. Each data type is handled by specialized agent that uses auto encoder based feature extraction. These representations are then integrated through a fusion agent. After that fused representation are used by transformer based classifier. It identifies the specific ransomware family. The agents interact through an interagent feedback mechanism that iteratively refines feature representations by suppressing low confidence information. The framework was evaluated on large scale datasets containing thousands of ransomware and benign samples. Multiple experiments were conducted on ransomware dataset. It outperforms single modality and nonadaptive fusion baseline achieving improvement of up to 0.936 in Macro-F1 for family classification and reducing calibration error. Over 100 epochs, the agentic feedback loop displays a stable monotonic convergence leading to over +0.75 absolute improvement in terms of agent quality and a final composite score of around 0.88 without fine tuning of the language models. Zeroday ransomware detection remains family dependent on polymorphism and modality disruptions. Confidence aware abstention enables reliable real world deployment by favoring conservativeand trustworthy decisions over forced classification. The findings indicate that proposed approach provides a practical andeffective path toward improving real world ransomware defense systems.
Authors:Mohsen Hatami, Van Tuan Pham, Hozefa Lakadawala, Yu Chen
Abstract:
The increasing integration of AI agents into cyber-physical systems (CPS) introduces new security risks that extend beyond traditional cyber or physical threat models. Recent advances in generative AI enable deepfake and semantic manipulation attacks that can compromise agent perception, reasoning, and interaction with the physical environment, while emerging protocols such as the Model Context Protocol (MCP) further expand the attack surface through dynamic tool use and cross-domain context sharing. This survey provides a comprehensive review of security threats targeting AI agents in CPS, with a particular focus on environmental interactions, deepfake-driven attacks, and MCP-mediated vulnerabilities. We organize the literature using the SENTINEL framework, a lifecycle-aware methodology that integrates threat characterization, feasibility analysis under CPS constraints, defense selection, and continuous validation. Through an end-to-end case study grounded in a real-world smart grid deployment, we quantitatively illustrate how timing, noise, and false-positive costs constrain deployable defenses, and why detection mechanisms alone are insufficient as decision authorities in safety-critical CPS. The survey highlights the role of provenance- and physics-grounded trust mechanisms and defense-in-depth architectures, and outlines open challenges toward trustworthy AI-enabled CPS.
Authors:Emunah S-S. Chan, Aldar C-F. Chan
Abstract:
User authentication and fraud detection face growing challenges as digital systems expand and adversaries adopt increasingly sophisticated tactics. Traditional knowledge-based authentication remains rigid, requiring exact word-for-word string matches that fail to accommodate natural human memory and linguistic variation. Meanwhile, fraud-detection pipelines struggle to keep pace with rapidly evolving scam behaviors, leading to high false-positive rates and frequent retraining cycles required. This work introduces two complementary LLM-enabled solutions, namely, an LLM-assisted authentication mechanism that evaluates semantic correctness rather than exact wording, supported by document segmentation and a hybrid scoring method combining LLM judgement with cosine-similarity metrics and a RAG-based fraud-detection pipeline that grounds LLM reasoning in curated evidence to reduce hallucinations and adapt to emerging scam patterns without model retraining. Experiments show that the authentication system accepts 99.5% of legitimate non-exact answers while maintaining a 0.1% false-acceptance rate, and that the RAG-enhanced fraud detection reduces false positives from 17.2% to 3.5%. Together, these findings demonstrate that LLMs can significantly improve both usability and robustness in security workflows, offering a more adaptive , explainable, and human-aligned approach to authentication and fraud detection.
Authors:Rainer Stütz, Nicholas Stifter, Melitta Dragaschnig, Bernhard Haslhofer, Aljosha Judmayer
Abstract:
It is well known that reusing cryptocurrency addresses undermines privacy. This also applies if the same addresses are used in different cryptocurrencies. Nevertheless, cross-chain address reuse appears to be a recurring phenomenon, especially in EVM-based designs. Previous works performed either direct address matching, or basic format conversion, to identify such cases. However, seemingly incompatible address formats e.g., in Bitcoin and Ethereum, can also be derived from the same public keys, since they rely on the same cryptographic primitives. In this paper, we therefore focus on the underlying public keys to discover reuse within, as well as across, different cryptocurrency networks, enabling us to also match incompatible address formats. Specifically, we analyze key reuse across Bitcoin, Ethereum, Litecoin, Dogecoin, Zcash and Tron. Our results reveal that cryptographic keys are extensively and actively reused across these networks, negatively impacting both privacy and security of their users. We are hence the first to expose and quantify cross-chain key reuse between UTXO and account-based cryptocurrencies. Moreover, we devise novel clustering methods across these different cryptocurrency networks that do not rely on heuristics and instead link entities by their knowledge of the underlying secret key.
Authors:Abdullah Khanfor, Raby Hamadi, Noureddine Lasla, Hakim Ghazzai
Abstract:
UAVs have the potential to revolutionize urban management and provide valuable services to citizens. They can be deployed across diverse applications, including traffic monitoring, disaster response, environmental monitoring, and numerous other domains. However, this integration introduces novel security challenges that must be addressed to ensure safe and trustworthy urban operations. This paper provides a structured, evidence-based synthesis of UAV applications in smart cities and their associated security challenges as reported in the literature over the last decade, with particular emphasis on developments from 2019 to 2025. We categorize these challenges into two primary classes: 1) cyber-attacks targeting the communication infrastructure of UAVs and 2) unwanted or unauthorized physical intrusions by UAVs themselves. We examine the potential of Artificial Intelligence (AI) techniques in developing intrusion detection mechanisms to mitigate these security threats. We analyze how AI-based methods, such as machine/deep learning for anomaly detection and computer vision for object recognition, can play a pivotal role in enhancing UAV security through unified detection systems that address both cyber and physical threats. Furthermore, we consolidate publicly available UAV datasets across network traffic and vision modalities suitable for Intrusion Detection Systems (IDS) development and evaluation. The paper concludes by identifying ten key research directions, including scalability, robustness, explainability, data scarcity, automation, hybrid detection, large language models, multimodal approaches, federated learning, and privacy preservation. Finally, we discuss the practical challenges of implementing UAV IDS solutions in real-world smart city environments.
Authors:Guy Bresler, Alina Harbuzova
Abstract:
We study two canonical planted average-case problems -- noisy $k\mathsf{\text{-}XOR}$ and Tensor PCA -- and relate their computational properties via poly-time average-case reductions. In fact, we consider a \emph{family of problems} that interpolates between $k\mathsf{\text{-}XOR}$ and Tensor PCA, allowing intermediate densities and signal levels. We introduce two \emph{densifying} reductions that increase the number of observed entries while controlling the decrease in signal, and, in particular, reduce any $k\mathsf{\text{-}XOR}$ instance at the computational threshold to Tensor PCA at the computational threshold. Additionally, we give new order-reducing maps (e.g., $5\to 4$ $k\mathsf{\text{-}XOR}$ and $7\to 4$ Tensor PCA) at fixed entry density.
Authors:Numan Halit Guldemir, Oluwafemi Olukoya, Jesús Martínez-del-Rincón
Abstract:
Malware classification models often face performance degradation due to concept drift, arising from evolving threat landscapes and the emergence of novel malware families. This paper presents FARM (Few-shot Adaptive Recognition of Malware), a framework designed to detect and adapt to both covariate and label drift in Windows Portable Executable (PE) malware classification. FARM leverages a triplet autoencoder to project samples into a discriminative latent space, enabling unsupervised drift detection via DBSCAN clustering and dynamic thresholding. For rapid adaptation, it employs few-shot learning using prototype-based classification, requiring only a handful of labeled samples. FARM also supports full retraining when enough drifted samples accumulate, updating the latent space for long-term integration. Experiments on the BenchMFC dataset demonstrate that FARM improves classification performance under covariate drift by 5.6\%, and achieves an average F1 score of 0.85 on unseen malware families using only few-shot adaptation, which further increases to 0.94 after retraining. These results highlight FARM's robustness and adaptability in dynamic malware detection environments under limited supervision.
Authors:Mohammed Nabeel, Mahmoud Hafez, Michail Maniatakos
Abstract:
The Number Theoretic Transform (NTT) is a critical computational bottleneck in many lattice-based postquantum cryptographic (PQC) algorithms. By leveraging the Fast Fourier Transform (FFT) algorithm, the NTT of a polynomial of degree N - 1 can be computed with a time complexity of O(N log N). Hardware implementation of NTT is generally preferred over software ones, as the latter are significantly slower due to complex memory access patterns and modular arithmetic operations. Achieving maximum throughput in hardware, however, typically demands a prohibitively large number of butterfly unit instantiations. In this work, we propose @NTT, which exploits the fact that the ring parameters in these algorithms are fixed, enabling design-time constant optimization and achieving the maximum throughput of N-point NTT per clock cycle with a compact hardware footprint. Our case study on the Dilithium NTT, implemented using the TSMC 28 nm library, operates at a clock frequency of 1.0 GHz with an area of 1.45 mm^2. On FPGA, the design achieves a throughput-per-LUT that is 5.2x higher than the state-of-the-art implementation.
Authors:Jincheol Ha, Guillaume Hanrot, Taeyeong Noh, Jung Hee Cheon, Jung Woo Kim, Damien Stehlé
Abstract:
Among biometric verification systems, irises stand out because they offer high accuracy even in large-scale databases. For example, the World ID project aims to provide authentication to all humans via iris recognition, with millions already registered. Storing such biometric data raises privacy concerns, which can be addressed using privacy-enhancing techniques. Bloemen et al. describe a solution based on 2-out-of-3 Secret-Sharing Multiparty Computation (SS-MPC), for the World ID setup. In terms of security, unless an adversary corrupts 2~servers, the iris codes remain confidential and nothing leaks beyond the result of the computation. Their solution is able to match~$32$ users against a database of~$2^{22}$ iris codes in~$\approx 2$s , using~24 H100 GPUs, more than 40~communication rounds and $81$GB/party of data transferred (the timing assumes a network speed above~3Tb/s). In the present work, we explore the use of Threshold Fully Homomorphic Encryption (ThFHE) for the same task. The ThFHE solution brings a number of security advantages: no trusted setup, the encrypted database and queries can be public, the secret can be distributed among many parties, and active security can be added without significant performance degradation. Our proof-of-concept implementation of the computation phase handles $32$~eyes against a database of $7\cdot 2^{14}$ iris codes in~$\approx 1.8$s ($\approx 0.33s$ for 4 eyes against the same database), using 8 RTX-5090 GPUs. To this, one should add~2 to 3 rounds of communication (depending on deployment choice). We perform the matching using the CKKS (Th)FHE scheme. Our main technical ingredients are the use of recent progress on FHE-based linear algebra boosted using int8 GPU operations, and the introduction of a technique reducing the number of ciphertexts to be processed as early as possible.
Authors:Richard Habeeb, Man-Ki Yoon, Hao Chen Zhong Shao
Abstract:
Many safety-critical systems require timely processing of sensor inputs to avoid potential safety hazards. Additionally, to support useful application features, such systems increasingly have a large rich operating system (OS) at the cost of potential security bugs. Thus, if a malicious party gains supervisor privileges, they could cause real-world damage by denying service to time-sensitive programs. Many past approaches to this problem completely isolate time-sensitive programs with a hypervisor; however, this prevents the programs from accessing useful OS services We introduce Ringmaster, a novel framework that enables enclaves or TEEs (Trusted Execution Environments) to asynchronously access rich, but potentially untrusted, OS services via Linux's io_uring. When service is denied by the untrusted OS, enclaves continue to operate on Ringmaster's minimal ARM TrustZone kernel with access to small, critical device drivers. This approach balances the need for secure, time-sensitive processing with the convenience of rich OS services. Additionally, Ringmaster supports large unmodified programs as enclaves, offering lower overhead compared to existing systems. We demonstrate how Ringmaster helps us build a working highly-secure system with minimal engineering. In our experiments with an unmanned aerial vehicle, Ringmaster achieved nearly 1GiB/sec of data into enclave on a Raspberry Pi4b, 0-3% throughput overhead compared to non-enclave tasks.
Authors:Ajvad Haneef K, Karan Kuwar Singh, Madhu Kumar S D
Abstract:
High-dimensional malware datasets often exhibit feature redundancy, instability, and scalability limitations, which hinder the effectiveness and interpretability of machine learning-based malware detection systems. Although feature selection is commonly employed to mitigate these issues, many existing approaches lack robustness when applied to large-scale and heterogeneous malware data. To address this gap, this paper proposes CAFE-GB (Chunk-wise Aggregated Feature Estimation using Gradient Boosting), a scalable feature selection framework designed to produce stable and globally consistent feature rankings for high-dimensional malware detection. CAFE-GB partitions training data into overlapping chunks, estimates local feature importance using gradient boosting models, and aggregates these estimates to derive a robust global ranking. Feature budget selection is performed separately through a systematic k-selection and stability analysis to balance detection performance and robustness. The proposed framework is evaluated on two large-scale malware datasets: BODMAS and CIC-AndMal2020, representing large and diverse malware feature spaces. Experimental results show that classifiers trained on CAFE-GB -selected features achieve performance parity with full-feature baselines across multiple metrics, including Accuracy, F1-score, MCC, ROC-AUC, and PR-AUC, while reducing feature dimensionality by more than 95\%. Paired Wilcoxon signed-rank tests confirm that this reduction does not introduce statistically significant performance degradation. Additional analyses demonstrate low inter-feature redundancy and improved interpretability through SHAP-based explanations. Runtime and memory profiling further indicate reduced downstream classification overhead. Overall, CAFE-GB provides a stable, interpretable, and scalable feature selection strategy for large-scale malware detection.
Authors:Isaac Baglin, Xiatian Zhu, Simon Hadfield
Abstract:
Traditional defenses against Deep Leakage (DL) attacks in Federated Learning (FL) primarily focus on obfuscation, introducing noise, transformations or encryption to degrade an attacker's ability to reconstruct private data. While effective to some extent, these methods often still leak high-level information such as class distributions or feature representations, and are frequently broken by increasingly powerful denoising attacks. We propose a fundamentally different perspective on FL defense: framing it as a spoofing problem.We introduce SpooFL (Figure 1), a spoofing-based defense that deceives attackers into believing they have recovered the true training data, while actually providing convincing but entirely synthetic samples from an unrelated task. Unlike prior synthetic-data defenses that share classes or distributions with the private data and thus still leak semantic information, SpooFL uses a state-of-the-art generative model trained on an external dataset with no class overlap. As a result, attackers are misled into recovering plausible yet completely irrelevant samples, preventing meaningful data leakage while preserving FL training integrity. We implement the first example of such a spoofing defense, and evaluate our method against state-of-the-art DL defenses and demonstrate that it successfully misdirects attackers without compromising model performance significantly.
Authors:Wouter Termont, Beatriz Esteves
Abstract:
Recent European efforts around digital identity -- the EUDI regulation and its OpenID architecture -- aim high, but start from a narrow and ill-defined conceptualization of authentication. Based on a broader, more grounded understanding of the term, in we identify several issues in the design of OpenID4VCI and OpenID4VP: insecure practices, static, and subject-bound credential types, and a limited query language restrict their application to classic scenarios of credential exchange -- already supported by existing solutions like OpenID Connect, SIOPv2, OIDC4IDA, and OIDC Claims Aggregation -- barring dynamic, asynchronous, or automated use cases. We also debunk OpenID's 'paradigm-shifting' trust-model, which -- when compared to existing decentralized alternatives -- does not deliver any significant increase in control, privacy, and portability of personal information. Not only the technical choices limit the capabilities of the EUDI framework; also the legislation itself cannot accommodate the promise of self-sovereign identity. In particular, we criticize the introduction of institutionalized trusted lists, and discuss their economical and political risks. Their potential to decline into an exclusory, re-centralized ecosystem endangers the vision of a user-oriented identity management in which individuals are in charge. Instead, the consequences might severely restrict people in what they can do with their personal information, and risk increased linkability and monitoring. In anticipation of revisions to the EUDI regulations, we suggest several technical alternatives that overcome some of the issues with the architecture of OpenID. In particular, OAuth's UMA extension and its A4DS profile, as well as their integration in GNAP, are worth looking into. Future research into uniform query (meta-)languages is needed to address the heterogeneity of attestations and providers.
Authors:Aravind B, Anirud R. S., Sai Surya Teja N, Bala Subrahmanya Sriranga Navaneeth A, Karthika R, Mohankumar N
Abstract:
Class imbalance refers to a situation where certain classes in a dataset have significantly fewer samples than oth- ers, leading to biased model performance. Class imbalance in network intrusion detection using Tabular Denoising Diffusion Probability Models (TabDDPM) for data augmentation is ad- dressed in this paper. Our approach synthesizes high-fidelity minority-class samples from the CIC-IDS2017 dataset through iterative denoising processes. For the minority classes that have smaller samples, synthetic samples were generated and merged with the original dataset. The augmented training data enables an ANN classifier to achieve near-perfect recall on previously underrepresented attack classes. These results establish diffusion models as an effective solution for tabular data imbalance in security domains, with potential applications in fraud detection and medical diagnostics.
Authors:Murat Bilgehan Ertan, Emirhan Böge, Min Chen, Kaleel Mahmood, Marten van Dijk
Abstract:
As large language models (LLMs) are trained on increasingly opaque corpora, membership inference attacks (MIAs) have been proposed to audit whether copyrighted texts were used during training, despite growing concerns about their reliability under realistic conditions. We ask whether MIAs can serve as admissible evidence in adversarial copyright disputes where an accused model developer may obfuscate training data while preserving semantic content, and formalize this setting through a judge-prosecutor-accused communication protocol. To test robustness under this protocol, we introduce SAGE (Structure-Aware SAE-Guided Extraction), a paraphrasing framework guided by Sparse Autoencoders (SAEs) that rewrites training data to alter lexical structure while preserving semantic content and downstream utility. Our experiments show that state-of-the-art MIAs degrade when models are fine-tuned on SAGE-generated paraphrases, indicating that their signals are not robust to semantics-preserving transformations. While some leakage remains in certain fine-tuning regimes, these results suggest that MIAs are brittle in adversarial settings and insufficient, on their own, as a standalone mechanism for copyright auditing of LLMs.
Authors:Dalibor Sain, Thomas Rosenstatter, Olaf Saßnick, Christian Schäfer, Stefan Huber
Abstract:
The increased connectivity of industrial networks has led to a surge in cyberattacks, emphasizing the need for cybersecurity measures tailored to the specific requirements of industrial systems. Modern Industry 4.0 technologies, such as OPC UA, offer enhanced resilience against these threats. However, widespread adoption remains limited due to long installation times, proprietary technology, restricted flexibility, and formal process requirements (e.g. safety certifications). Consequently, many systems do not yet implement these technologies, or only partially. This leads to the challenge of dealing with so-called brownfield systems, which are often placed in isolated security zones to mitigate risks. However, the need for data exchange between secure and insecure zones persists. This paper reviews existing solutions to address this challenge by analysing their approaches, advantages, and limitations. Building on these insights, we identify three key concepts, evaluate their suitability and compatibility, and ultimately introduce the SigmaServer, a novel TCP-level aggregation method. The developed proof-of-principle implementation is evaluated in an operational technology (OT) testbed, demonstrating its applicability and effectiveness in bridging secure and insecure zones.
Authors:Yuting Liang, Ke Yi
Abstract:
We study the problem of adaptive privacy budgeting under generalized differential privacy. Consider the setting where each user $i\in [n]$ holds a tuple $x_i\in U:=U_1\times \dotsb \times U_T$, where $x_i(l)\in U_l$ represents the $l$-th component of their data. For every $l\in [T]$ (or a subset), an untrusted analyst wishes to compute some $f_l(x_1(l),\dots,x_n(l))$, while respecting the privacy of each user. For many functions $f_l$, data from the users are not all equally important, and there is potential to use the privacy budgets of the users strategically, leading to privacy savings that can be used to improve the utility of later queries. In particular, the budgeting should be adaptive to the outputs of previous queries, so that greater savings can be achieved on more typical instances. In this paper, we provide such an adaptive budgeting framework, with various applications demonstrating its applicability.
Authors:Chandan Anand, Jayesh Seshadri, Prasad Krishnan, Gowtham R. Kurri
Abstract:
Building on the well-established capacity-achieving schemes of Sun-Jafar (for replicated storage) and the closely related scheme of Banawan-Ulukus (for MDS-coded setting), a recent work by Chandan et al. proposed new classes of weak private information retrieval (WPIR) schemes for the collusion-free (replication and MDS-coded) setting, as well as for the $T$-colluding scenario. In their work, Chandan et al. characterized the expressions for the rate-privacy trade-offs for these classes of WPIR schemes, under the mutual information leakage and maximal leakage metrics. Explicit achievable trade-offs for the same were also presented, which were shown to be competitive or better than prior WPIR schemes. However, the class-wise optimality of the reported trade-offs were unknown. In this work, we show that the explicit rate-privacy trade-offs reported for the Sun-Jafar-type schemes by Chandan et al. are optimal for the non-colluding and replicated setting. Furthermore, we prove the class-wise optimality for Banawan-Ulukus-type MDS-WPIR and Sun-Jafar-type $T$-colluding WPIR schemes, under threshold-constraints on the system parameters. When these threshold-constraints do not hold, we present counter-examples which show that even higher rates than those reported before can be achieved.
Authors:Murat Bilgehan Ertan, Marten van Dijk
Abstract:
Differentially Private Stochastic Gradient Descent (DP-SGD) is the dominant paradigm for private training, but its fundamental limitations under worst-case adversarial privacy definitions remain poorly understood. We analyze DP-SGD in the $f$-differential privacy framework, which characterizes privacy via hypothesis-testing trade-off curves, and study shuffled sampling over a single epoch with $M$ gradient updates. We derive an explicit suboptimal upper bound on the achievable trade-off curve. This result induces a geometric lower bound on the separation $κ$ which is the maximum distance between the mechanism's trade-off curve and the ideal random-guessing line. Because a large separation implies significant adversarial advantage, meaningful privacy requires small $κ$. However, we prove that enforcing a small separation imposes a strict lower bound on the Gaussian noise multiplier $σ$, which directly limits the achievable utility. In particular, under the standard worst-case adversarial model, shuffled DP-SGD must satisfy $σ\ge \frac{1}{\sqrt{2\ln M}}$ $\quad\text{or}\quad$ $κ\ge\ \frac{1}{\sqrt{8}}\!\left(1-\frac{1}{\sqrt{4π\ln M}}\right)$, and thus cannot simultaneously achieve strong privacy and high utility. Although this bound vanishes asymptotically as $M \to \infty$, the convergence is extremely slow: even for practically relevant numbers of updates the required noise magnitude remains substantial. We further show that the same limitation extends to Poisson subsampling up to constant factors. Our experiments confirm that the noise levels implied by this bound leads to significant accuracy degradation at realistic training settings, thus showing a critical bottleneck in DP-SGD under standard worst-case adversarial assumptions.
Authors:Ashish Anand, Bhupendra Singh, Sunil Khemka, Bireswar Banerjee, Vishi Singh Bhatia, Piyush Ranjan
Abstract:
Android malware has become an increasingly critical threat to organizations, society and individuals, posing significant risks to privacy, data security and infrastructure. As malware continues to evolve in terms of complexity and sophistication, the mitigation and detection of these malicious software instances have become more time consuming and challenging particularly due to the requirement of large number of features to identify potential malware. To address these challenges, this research proposes Fast Gradient Sign Method with Diluted Convolutional Neural Network (FGSM DICNN) method for malware classification. DICNN contains diluted convolutions which increases receptive field, enabling the model to capture dispersed malware patterns across long ranges using fewer features without adding parameters. Additionally, the FGSM strategy enhance the accuracy by using one-step perturbations during training that provides more defensive advantage of lower computational cost. This integration helps to manage high classification accuracy while reducing the dependence on extensive feature sets. The proposed FGSM DICNN model attains 99.44% accuracy while outperforming other existing approaches such as Custom Deep Neural Network (DCNN).
Authors:Sidhant R. Nair, Tanmay Sen, Mrinmay Sen
Abstract:
Differentially private federated learning (DP-FL) suffers from slow convergence under tight privacy budgets due to the overwhelming noise introduced to preserve privacy. While adaptive optimizers can accelerate convergence, existing second-order methods such as DP-FedNew require O(d^2) memory at each client to maintain local feature covariance matrices, making them impractical for high-dimensional models. We propose DP-FedSOFIM, a server-side second-order optimization framework that leverages the Fisher Information Matrix (FIM) as a natural gradient preconditioner while requiring only O(d) memory per client. By employing the Sherman-Morrison formula for efficient matrix inversion, DP-FedSOFIM achieves O(d) computational complexity per round while maintaining the convergence benefits of second-order methods. Our analysis proves that the server-side preconditioning preserves (epsilon, delta)-differential privacy through the post-processing theorem. Empirical evaluation on CIFAR-10 demonstrates that DP-FedSOFIM achieves superior test accuracy compared to first-order baselines across multiple privacy regimes.
Authors:Aniesh Chawla, Udbhav Prasad
Abstract:
The parallel evolution of Large Language Models (LLMs) with advanced code-understanding capabilities and the increasing sophistication of malware presents a new frontier for cybersecurity research. This paper evaluates the efficacy of state-of-the-art LLMs in classifying executable code as either benign or malicious. We introduce an automated pipeline that first decompiles Windows executable into a C code using Ghidra disassembler and then leverages LLMs to perform the classification. Our evaluation reveals that while standard LLMs show promise, they are not yet robust enough to replace traditional anti-virus software. We demonstrate that a fine-tuned model, trained on curated malware and benign datasets, significantly outperforms its vanilla counterpart. However, the performance of even this specialized model degrades notably when encountering newer malware. This finding demonstrates the critical need for continuous fine-tuning with emerging threats to maintain model effectiveness against the changing coding patterns and behaviors of malicious software.
Authors:Aniesh Chawla, Udbhav Prasad
Abstract:
Enterprise security faces escalating threats from sophisticated malware, compounded by expanding digital operations. This paper presents the first systematic evaluation of large language models (LLMs) to proactively identify indicators of compromise (IOCs) from unstructured web-based threat intelligence sources, distinguishing it from reactive malware detection approaches. We developed an automated system that pulls IOCs from 15 web-based threat report sources to evaluate six LLM models (Gemini, Qwen, and Llama variants). Our evaluation of 479 webpages containing 2,658 IOCs (711 IPv4 addresses, 502 IPv6 addresses, 1,445 domains) reveals significant performance variations. Gemini 1.5 Pro achieved 0.958 precision and 0.788 specificity for malicious IOC identification, while demonstrating perfect recall (1.0) for actual threats.
Authors:Keerthi Kumar. M, Swarun Kumar Joginpelly, Sunil Khemka, Lakshmi. S R, Navin Chhibber
Abstract:
Background: Cyber-attacks have evolved rapidly in recent years, many individuals and business owners have been affected by cyber-attacks in various ways. Cyber-attacks include various threats such as ransomware, malware, phishing, and Denial of Service (DoS)-related attacks. Challenges: Traditional models such as Generative Artificial Intelligence (AI) and Security Bidirectional Encoder Representations from Transformers (BERT) were implemented to detect cyber threats. However, the existing Security BERT model has a limited contextual understanding of text data, which has less impact on detecting cyber-attacks. Proposed Methodology: To overcome the above-mentioned challenges, Robustly Optimized Bidirectional Encoder Representations from Transformers Pretraining Approach (RoBERTa) model is proposed which consists of diverse words of vocabulary understanding. Initially, data are extracted from a Packet Capture (PCAP) file and encrypted using Fully Harmonic Encryption (FHE). Subsequently, a Byte-level and Byte Pair Encoding (BBPE) tokenizer was used to generate tokens and help maintain the vocabulary for the encrypted values. Then, these values are applied to the RoBERTa model of the transformer with extensive training. Finally, Softmax is used for the detection and classification of attacks. The proposed RoBERTa model achieved better results than the existing BERT model in terms of accuracy (0.99), recall (0.91), and precision (0.89) respectively.
Authors:Manuel Brosch, Matthias Probst, Stefan Kögler, Georg Sigl
Abstract:
The use of neural networks in edge devices is increasing, which introduces new security challenges related to the neural networks' confidentiality. As edge devices often offer physical access, attacks targeting the hardware, such as side-channel analysis, must be considered. To enhance the performance of neural network inference, hardware accelerators are commonly employed. This work investigates the influence of parallel processing within such accelerators on correlation-based side-channel attacks that exploit power consumption. The focus is on neurons that are part of the same fully-connected layer, which run parallel and simultaneously process the same input value. The theoretical impact of concurrent multiply-and-accumulate operations on overall power consumption is evaluated, as well as the success rate of correlation power analysis. Based on the observed behavior, equations are derived that describe how the correlation decreases with increasing levels of parallelism. The applicability of these equations is validated using a vector-multiplication unit implemented on an FPGA.
Authors:Yinghan Hou, Zongyou Yang, Xiaokun Yang
Abstract:
Distributed energy trading and carbon asset management involve high-frequency, small-value settlements with strong audit requirements. Fully on-chain designs incur excessive cost, while purely off-chain approaches lack verifiable consistency. This paper presents a hybrid on-chain and off-chain settlement framework that anchors settlement commitments and key constraints on-chain and links off-chain records through deterministic digests and replayable auditing. Experiments under publicly constrained workloads show that the framework significantly reduces on-chain execution and storage cost while preserving audit trustworthiness.
Authors:M. Amin Rahimian, Benjamin Panny, James Joshi
Abstract:
Digitized, networked healthcare promises earlier detection, precision therapeutics, and continuous care; yet, it also expands the surface for privacy loss and compliance risk. We argue for a shift from siloed, application-specific protections to privacy-by-design at scale, centered on decision-theoretic differential privacy (DP) across the full healthcare data lifecycle; network-aware privacy accounting for interdependence in people, sensors, and organizations; and compliance-as-code tooling that lets health systems share evidence while demonstrating regulatory due care. We synthesize the privacy-enhancing technology (PET) landscape in health (federated analytics, DP, cryptographic computation), identify practice gaps, and outline a deployable agenda involving privacy-budget ledgers, a control plane to coordinate PET components across sites, shared testbeds, and PET literacy, to make lawful, trustworthy sharing the default. We illustrate with use cases (multi-site trials, genomics, disease surveillance, mHealth) and highlight distributed inference as a workhorse for multi-institution learning under explicit privacy budgets.
Authors:Maryam Abbasihafshejani, AHM Nazmus Sakib, Murtuza Jadliwala
Abstract:
The rapid advancement of speech synthesis technologies, including text-to-speech (TTS) and voice conversion (VC), has intensified security and privacy concerns related to voice cloning. Recent defenses attempt to prevent unauthorized cloning by embedding protective perturbations into speech to obscure speaker identity while maintaining intelligibility. However, adversaries can apply advanced purification techniques to remove these perturbations, recover authentic acoustic characteristics, and regenerate cloneable voices. Despite the growing realism of such attacks, the robustness of existing defenses under adaptive purification remains insufficiently studied. Most existing purification methods are designed to counter adversarial noise in automatic speech recognition (ASR) systems rather than speaker verification or voice cloning pipelines. As a result, they fail to suppress the fine-grained acoustic cues that define speaker identity and are often ineffective against speaker verification attacks (SVA). To address these limitations, we propose Diffusion-Bridge (VocalBridge), a purification framework that learns a latent mapping from perturbed to clean speech in the EnCodec latent space. Using a time-conditioned 1D U-Net with a cosine noise schedule, the model enables efficient, transcript-free purification while preserving speaker-discriminative structure. We further introduce a Whisper-guided phoneme variant that incorporates lightweight temporal guidance without requiring ground-truth transcripts. Experimental results show that our approach consistently outperforms existing purification methods in recovering cloneable voices from protected speech. Our findings demonstrate the fragility of current perturbation-based defenses and highlight the need for more robust protection mechanisms against evolving voice-cloning and speaker verification threats.
Authors:Joel Daniel Andersson, Palak Jain, Satchit Sivakumar
Abstract:
We study differentially-private statistics in the fully dynamic continual observation model, where many updates can arrive at each time step and updates to a stream can involve both insertions and deletions of an item. Earlier work (e.g., Jain et al., NeurIPS 2023 for counting distinct elements; Raskhodnikova & Steiner, PODS 2025 for triangle counting with edge updates) reduced the respective cardinality estimation problem to continual counting on the difference stream associated with the true function values on the input stream. In such reductions, a change in the original stream can cause many changes in the difference stream, this poses a challenge for applying private continual counting algorithms to obtain optimal error bounds. We improve the accuracy of several such reductions by studying the associated $\ell_p$-sensitivity vectors of the resulting difference streams and isolating their properties. We demonstrate that our framework gives improved bounds for counting distinct elements, estimating degree histograms, and estimating triangle counts (under a slightly relaxed privacy model), thus offering a general approach to private continual cardinality estimation in streaming settings. Our improved accuracy stems from tight analysis of known factorization mechanisms for the counting matrix in this setting; the key technical challenge is arguing that one can use state-of-the-art factorizations for sensitivity vector sets with the properties we isolate. Empirically and analytically, we demonstrate that our improved error bounds offer a substantial improvement in accuracy for cardinality estimation problems over a large range of parameters.
Authors:Wenbo Wu, George Konstantinidis
Abstract:
Trust and Reputation Management Systems (TRMSs) are critical for the modern web, yet their reliance on subjective user ratings or narrow Quality of Service (QoS) metrics lacks objective grounding. Concurrently, while regulatory frameworks like GDPR and HIPAA provide objective behavioral standards, automated compliance auditing has been limited to coarse, binary (pass/fail) outcomes. This paper bridges this research gap by operationalizing regulatory compliance as a quantitative and dynamic trust metric through our novel automated compliance engine (ACE). ACE first formalizes legal and organizational policies into a verifiable, obligation-centric logic. It then continuously audits system event logs against this logic to detect violations. The core of our contribution is a quantitative model that assesses the severity of each violation along multiple dimensions, including its Volume, Duration, Breadth, and Criticality, to compute a fine-grained, evolving compliance score. We evaluate ACE on a synthetic hospital dataset, demonstrating its ability to accurately detect a range of complex HIPAA and GDPR violations and produce a nuanced score that is significantly more expressive than traditional binary approaches. This work enables the development of more transparent, accountable, and resilient TRMSs on the Web.
Authors:Yuchao Hou, Zixuan Zhang, Jie Wang, Wenke Huang, Lianhui Liang, Di Wu, Zhiquan Liu, Youliang Tian, Jianming Zhu, Jisheng Dang, Junhao Dong, Zhongliang Guo
Abstract:
As a critical application of computational intelligence in remote sensing, deep learning-based synthetic aperture radar (SAR) image target recognition facilitates intelligent perception but typically relies on centralized training, where multi-source SAR data are uploaded to a single server, raising privacy and security concerns. Federated learning (FL) provides an emerging computational intelligence paradigm for SAR image target recognition, enabling cross-site collaboration while preserving local data privacy. However, FL confronts critical security risks, where malicious clients can exploit SAR's multiplicative speckle noise to conceal backdoor triggers, severely challenging the robustness of the computational intelligence model. To address this challenge, we propose NADAFD, a noise-aware and dynamically adaptive federated defense framework that integrates frequency-domain, spatial-domain, and client-behavior analyses to counter SAR-specific backdoor threats. Specifically, we introduce a frequency-domain collaborative inversion mechanism to expose cross-client spectral inconsistencies indicative of hidden backdoor triggers. We further design a noise-aware adversarial training strategy that embeds $Γ$-distributed speckle characteristics into mask-guided adversarial sample generation to enhance robustness against both backdoor attacks and SAR speckle noise. In addition, we present a dynamic health assessment module that tracks client update behaviors across training rounds and adaptively adjusts aggregation weights to mitigate evolving malicious contributions. Experiments on MSTAR and OpenSARShip datasets demonstrate that NADAFD achieves higher accuracy on clean test samples and a lower backdoor attack success rate on triggered inputs than existing federated backdoor defenses for SAR target recognition.
Authors:Luca Nannini, Adam Leon Smith, Michele Joshua Maggini, Enrico Panai, Sandra Feliciano, Aleksandr Tiulkanov, Elena Maran, James Gealy, Piercosma Bisconti
Abstract:
AI agents - i.e. AI systems that autonomously plan, invoke external tools, and execute multi-step action chains with reduced human involvement - are being deployed at scale across enterprise functions ranging from customer service and recruitment to clinical decision support and critical infrastructure management. The EU AI Act (Regulation 2024/1689) regulates these systems through a risk-based framework, but it does not operate in isolation: providers face simultaneous obligations under the GDPR, the Cyber Resilience Act, the Digital Services Act, the Data Act, the Data Governance Act, sector-specific legislation, the NIS2 Directive, and the revised Product Liability Directive. This paper provides the first systematic regulatory mapping for AI agent providers integrating (a) draft harmonised standards under Standardisation Request M/613 to CEN/CENELEC JTC 21 as of January 2026, (b) the GPAI Code of Practice published in July 2025, (c) the CRA harmonised standards programme under Mandate M/606 accepted in April 2025, and (d) the Digital Omnibus proposals of November 2025. We present a practical taxonomy of nine agent deployment categories mapping concrete actions to regulatory triggers, identify agent-specific compliance challenges in cybersecurity, human oversight, transparency across multi-party action chains, and runtime behavioral drift. We propose a twelve-step compliance architecture and a regulatory trigger mapping connecting agent actions to applicable legislation. We conclude that high-risk agentic systems with untraceable behavioral drift cannot currently satisfy the AI Act's essential requirements, and that the provider's foundational compliance task is an exhaustive inventory of the agent's external actions, data flows, connected systems, and affected persons.
Authors:Fariha Tanjim Shifat, Hariswar Baburaj, Ce Zhou, Jaydeb Sarker, Mia Mohammad Imran
Abstract:
Large language models (LLMs) are increasingly embedded in open-source software (OSS) ecosystems, creating complex interactions among natural language prompts, probabilistic model outputs, and execution-capable components. However, it remains unclear whether traditional vulnerability disclosure frameworks adequately capture these model-mediated risks. To investigate this, we analyze 295 GitHub Security Advisories published between January 2025 and January 2026 that reference LLM-related components, and we manually annotate a sample of 100 advisories using the OWASP Top 10 for LLM Applications 2025. We find no evidence of new implementation-level weakness classes specific to LLM systems. Most advisories map to established CWEs, particularly injection and deserialization weaknesses. At the same time, the OWASP-based analysis reveals recurring architectural risk patterns, especially Supply Chain, Excessive Agency, and Prompt Injection, which often co-occur across multiple stages of execution. These results suggest that existing advisory metadata captures code-level defects but underrepresents model-mediated exposure. We conclude that combining the CWE and OWASP perspectives provides a more complete and necessary view of vulnerabilities in LLM-integrated systems.
Authors:Darya Kaviani, Alp Eren Ozdarendeli, Jinhao Zhu, Yu Ding, Raluca Ada Popa
Abstract:
Personal AI systems increasingly retain long-term memory of user activity, including documents, emails, messages, meetings, and ambient recordings. Trusted hardware can keep this data private, but struggles to scale with a growing datastore. This pushes the data to external storage, which exposes retrieval access patterns that leak private information to the application provider. Oblivious RAM (ORAM) is a cryptographic primitive that can hide these patterns, but it requires a fixed access budget, precluding the query-dependent traversals that agentic memory systems rely on for accuracy. We present Opal, a private memory system for personal AI. Our key insight is to decouple all data-dependent reasoning from the bulk of personal data, confining it to the trusted enclave. Untrusted disk then sees only fixed, oblivious memory accesses. This enclave-resident component uses a lightweight knowledge graph to capture personal context that semantic search alone misses and handles continuous ingestion by piggybacking reindexing and capacity management on every ORAM access. Evaluated on a comprehensive synthetic personal-data pipeline driven by stochastic communication models, Opal improves retrieval accuracy by 13 percentage points over semantic search and achieves 29x higher throughput with 15x lower infrastructure cost than a secure baseline. Opal is under consideration for deployment to millions of users at a major AI provider.
Authors:Christophe Ponsard, Jean-François Daune, Denis Darquennes, Malik Bouhou, Nicolas Point
Abstract:
The importance of cybersecurity for Small and Medium Enterprises (SMEs) has never been greater, especially given the rise of AI-driven threats. Supporting SMEs requires a sustained effort to ensure they have access to resources and expertise covering awareness, protection, auditing, and incident response. Since 2019, our work with the Keep It Secure initiative has focused on helping Belgian (Walloon) SMEs strengthen their cybersecurity posture through access to a network of labelled cybersecurity experts. In this process, we interviewed over 120 professionals from around 90 companies and gathered rich insights about the nature, strengths and weaknesses of our regional ecosystem. While our initiative primarily targets the labelling of cybersecurity experts, we demonstrate increasing alignment with the broader Cyber Fundamentals framework deployed at the federal level in Belgium, which supports official certification. This paper reports on the progress and lessons learned from this long-term effort, highlighting how expert validation, based on a structured evaluation approach, can help improve SME cybersecurity.
Authors:Margherita Cozzolino, Stephan Krenn, Thomas Lorünser
Abstract:
While QKD ensures information-theoretic security at the link level, real-world deployments depend on trusted repeaters, creating potential vulnerabilities. In this paper, we thus introduce a topology-hiding connectivity assurance protocol to enhance trust in quantum key distribution (QKD) network infrastructures. Our protocol allows network providers to jointly prove the existence of a secure connection between endpoints without revealing internal topology details. By extending graph-signature techniques to support multi-graphs and hidden endpoints, we enable zero-knowledge proofs of connectivity that ensure both soundness and topology hiding. We further discuss how our approach can certify, e.g., multiple disjoint paths, supporting multi-path QKD scenarios. This work bridges cryptographic assurance methods with the operational requirements of QKD networks, promoting verifiable and privacy-preserving inter-network connectivity.
Authors:Stephan Krenn, Omid Mir, Thomas Lorünser, Sebastian Ramacher, Florian Wohner
Abstract:
Secure long-distance communication in quantum key distribution (QKD) networks depends on trusted repeater nodes along the entire transmission path. Consequently, these nodes will be subject to strict auditing and certification in future large-scale QKD deployments. However, trust must also extend to the network operator, who is responsible for fulfilling contractual obligations -- such as ensuring certified devices are used and transmission paths remain disjoint where required. In this work, we present a path validation protocol specifically designed for QKD networks. It enables the receiver to verify compliance with agreed-upon policies. At the same time, the protocol preserves the operator's confidentiality by ensuring that no sensitive information about the network topology is revealed to users. We provide a formal model and a provably secure generic construction of the protocol, along with a concrete instantiation. For long-distance communication involving 100 nodes, the protocol has a computational cost of 1-2.5s depending on the machine, and a communication overhead of less than 70kB - demonstrating the efficiency of our approach.
Authors:Yael Eiger, Nino Migineishvili, Emi Yoshikawa, Liza Nadtochiy, Kentrell Owens, Franziska Roesner
Abstract:
Digital devices like tablets, media players, and kiosks are increasingly deployed in U.S. prisons. These technologies can enable incarcerated people to access education, communicate with loved ones, and develop vital reentry skills. However, they can also introduce new privacy and security risks for incarcerated people who have little agency over their usage and contracts, and are currently carved out of many consumer protection safeguards. To investigate these issues, we conducted focus groups and interviews with system-impacted people (n=17), i.e., those formerly incarcerated, and their relatives, to investigate experiences with device-related security and privacy vulnerabilities and the power dynamics that affect their use. In our findings, participants describe pervasive surveillance, censorship, and usability problems with the technology available to them, including shifting and seemingly arbitrary usage policies. These policies strain relationships both inside and outside prisons and contribute to negative downstream effects for incarcerated users. We recommend ways to better balance prison security concerns with privacy-related needs of system-impacted individuals by promoting accountability for technology-related decisions, providing public oversight of digital purchasing and use policies, and designing digital tools with them -- the actual end-users -- in mind.
Authors:Charilaos Skandylas, Mikael Asplund
Abstract:
There is a growing need for cybersecurity professionals with practical knowledge and experience to meet societal needs and comply with new standards and regulations. At the same time, the advances in software technology and artificial intelligence point towards a future where software agents will play an important role in protecting the computer systems that are critical for society to function. The training and development of both humans and software agents requires the design and execution of cybersecurity exercises that differ in properties such as size, scope, objectives, difficultly, etc. Cybersecurity scenarios are critical for the operation of cybersecurity exercises as they describe the scope, context, operational environment and storyline of each exercise. In this work, we present an approach to automatically generate cybersecurity scenarios that model enterprise IT systems. Our approach is able to generate a large number of scenarios that differ in multiple criteria including size, scope, difficulty, complexity and diversity. We further release as open source: a simulation and a virtualization environment that can run cybersecurity exercises based on the generated scenarios and a dataset containing 100000 sample scenarios.
Authors:Jingxin Qiao, Myrto Arapinis, Thomas Zacharias
Abstract:
Coercion-resistance (CR) is a crucial security property in e-voting systems. It ensures that an attacker cannot compel a voter to vote in a specific way by using threats or rewards. The Loki e-voting protocol, proposed by Giustolisi \emph{et al.} at IEEE S\&P (2024), introduces a novel design that mitigates last-minute coercion through a re-voting mechanism. It also aims to address the usability issues of the seminal JCJ e-voting protocol, specifically: i) the requirement that voters can store and hide pre-agreed credentials, and ii) the ability of voters to convincingly lie while being coerced. In this work, we identify two vulnerabilities in Loki. The first is a brute-force attack that compromises the integrity of the evasion strategy. Specifically, this attack allows an adversary to cast a ballot on behalf of their victim in a way that the evasion strategy cannot defend against, rendering it ineffective. The second vulnerability is a forced abstention attack, which allows an adversary to detect when their victim has complied with their instruction not to vote. We generalise the integrity attack to reveal a fundamental dilemma: without pre-agreed secret credentials, it is not possible to prevent last-minute coercion. Finally, we show how reverting to pre-agreed secret credentials fixes the aforementioned vulnerabilities and discuss the trade-off between tallying efficiency and stronger trust assumptions.
Authors:Julian Sturm, Daniel Fraunholz, Oliver Zeidler, Katharina Schaar, Wolfgang Kellerer
Abstract:
Mobile networks are essential for modern societies. The most recent generation of mobile networks will be even more ubiquitous than previous ones. Therefore, the security of these networks as part of the critical infrastructure with essential communication services is of the uttermost importance. However, these systems are still vulnerable to being compromised, as showcased in the recent discussion on supply chain security and other challenges. This work addresses problems arising from compromised 5G core network components. The investigations reveal how attacks based on command and control communication can be designed so that they cannot be detected or prevented. This way, various attacks against the security and privacy of subscribers can be performed for which no effective countermeasures are available.
Authors:Kavindu Herath, Joshua Zhao, Saurabh Bagchi
Abstract:
Backdoor attacks on federated learning (FL) are most often evaluated with synthetic corner patches or out-of-distribution (OOD) patterns that are unlikely to arise in practice. In this paper, we revisit the backdoor threat to standard FL (a single global model) under a more realistic setting where triggers must be semantically meaningful, in-distribution, and visually plausible. We propose SABLE, a Semantics-Aware Backdoor for LEarning in federated settings, which constructs natural, content-consistent triggers (e.g., semantic attribute changes such as sunglasses) and optimizes an aggregation-aware malicious objective with feature separation and parameter regularization to keep attacker updates close to benign ones. We instantiate SABLE on CelebA hair-color classification and the German Traffic Sign Recognition Benchmark (GTSRB), poisoning only a small, interpretable subset of each malicious client's local data while otherwise following the standard FL protocol. Across heterogeneous client partitions and multiple aggregation rules (FedAvg, Trimmed Mean, MultiKrum, and FLAME), our semantics-driven triggers achieve high targeted attack success rates while preserving benign test accuracy. These results show that semantics-aligned backdoors remain a potent and practical threat in federated learning, and that robustness claims based solely on synthetic patch triggers can be overly optimistic.
Authors:Shashwat Agrawal, Amitabha Bagchi, Rajendra Kumar
Abstract:
This paper extends the Kikuchi method to give algorithms for decisional $k$-sparse Learning With Errors (LWE) and $k$-sparse Learning Parity with Noise (LPN) problems for higher moduli $q$. We create a Kikuchi graph for a sparse LWE/LPN instance and use it to give two attacks for these problems. The first attack decides by computing the spectral norm of the adjacency matrix of the Kikuchi graph, which is a generalization of the attack for $q=2$ given by Wein et. al. (Journal of the ACM 2019). The second approach computes non-trivial closed walks of the graph, and then decides by computing a certain polynomial of edge labels in the walks. This is a generalization of the attack for $q=2$ given by Gupta et. al. (SODA 2026). Both the attacks yield new tradeoffs between sample complexity and time complexity of sparse LWE/LPN.
Authors:Arjun Sridharkumar, Sara Al Hajj Ibrahim, Jiayuan Zhou, Yuliang Wang, Safwat Hassan, Ahmed E. Hassan, Shurui Zhou
Abstract:
Timely resolution and disclosure of vulnerabilities are essential for maintaining the security of open-source software. However, many vulnerabilities remain unreported, unpatched, or undisclosed for extended periods, exposing users to prolonged security threats. While various vulnerability detection tools exist, they primarily focus on predicting or identifying known vulnerabilities, often failing to capture vulnerabilities that experience significant delays in resolution. In this study, we examine the vulnerability lifecycle by analyzing protracted vulnerabilities (PCVEs), which remain unresolved or undisclosed over long periods. We construct a dataset of PCVEs and conduct a qualitative analysis to uncover underlying causes of delay. To assess current automated solutions, we evaluate four state-of-the-art (SOTA) vulnerability detectors on our dataset. These tools detect only 1,059 out of 2,402 PCVEs, achieving approximately 44% coverage. To address this limitation, we propose DeeptraVul, an enhanced detection approach designed specifically for protracted cases. DeeptraVul integrates multiple development artifacts and code signals, supported by a Large Language Model (LLM)-based summarization component. For comparison, we also evaluate a standalone LLM. Our results show that DeeptraVul improves detection performance, achieving a 14% increase in coverage across all PCVEs and reaching 90% coverage on the DeeptraVul PCVE subset, outperforming existing SOTA detectors and standalone LLM based inference.
Authors:Juan Antonio Vieira Giestinhas, Timothy Spiller
Abstract:
This paper considers two challenges faced by practical quantum networks: the bootstrapping of seedless Quantum Random Number Generators (QRNGs) and the resilient combination of Post-Quantum Cryptography (PQC) and Quantum Key Distribution (QKD) keys. These issues are addressed using universal hash functions as strong seeded extractors, with security foundations provided by the Quantum Leftover Hash Lemma (QLHL). First, the 'randomness loop' in QRNGs -- the requirement of an initial random seed to generate further randomness -- is resolved by proposing a bootstrapping method using raw data from two independent sources of entropy, given by seedless QRNG sources. Second, it is argued that strong seeded extractors are an alternative to XOR-based key combining that presents different characteristics. Unlike XORing, our method ensures that if the combined output and one initial key are compromised, the remaining key material retains quantifiable min-entropy and remains secure in exchange of longer keys. Furthermore, the proposed method allows to bind transcript information with key material in a natural way, providing a tool to replace computationally based combiners to extend ITS security of the initial key material to the final combined output. By modeling PQC keys as having HILL (Hastad, Impagliazzo, Levin and Luby) entropy, the framework is extended to hybrid PQC-QKD systems. This unified approach provides a mathematically rigorous and future-proof mechanism for both randomness generation and secure key management against quantum adversaries.
Authors:Fatih Bulut, Carlo DePaolis, Raghav Batta, Anjali Mangal
Abstract:
With the rapid advancement of AI in code generation, cybersecurity detection engineering faces new opportunities to automate traditionally manual processes. Detection authoring - the practice of creating executable logic that identifies malicious activities from security telemetry - is hindered by fragmented code across repositories, duplication, and limited organizational visibility. Current workflows remain heavily manual, constraining both coverage and velocity. In this paper, we introduce AVDA, a framework that leverages the Model Context Protocol (MCP) to automate detection authoring by integrating organizational context - existing detections, telemetry schemas, and style guides - into AI-assisted code generation. We evaluate three authoring strategies - Baseline, Sequential, and Agentic - across a diverse corpus of production detections and state-of-the-art LLMs. Our results show that Agentic workflows achieve a 19% improvement in overall similarity score over Baseline approaches, while Sequential workflows attain 87% of Agentic quality at 40x lower token cost. Generated detections excel at TTP matching (99.4%) and syntax validity (95.9%) but struggle with exclusion parity (8.9%). Expert validation on a 22-detection subset confirms strong Spearman correlation between automated metrics and practitioner judgment ($ρ= 0.64$, $p < 0.002$). By integrating seamlessly into standard developer environments, AVDA provides a practical path toward AI-assisted detection engineering with quantified trade-offs between quality, cost, and latency.
Authors:Rakib Hossain Sajib, Md. Rokon Mia, Prodip Kumar Sarker, Abdullah Al Noman, Md Arifur Rahman
Abstract:
The Internet of Vehicles (IoV) has become an essential component of smart transportation systems, enabling seamless interaction among vehicles and infrastructure. In recent years, it has played a progressively significant role in enhancing mobility, safety, and transportation efficiency. However, this connectivity introduces severe security vulnerabilities, particularly Denial-of-Service (DoS) and spoofing attacks targeting the Controller Area Network (CAN) bus, which could severely inhibit communication between the critical components of a vehicle, leading to system malfunctions, loss of control, or even endangering passengers' safety. To address this problem, this paper presents CANGuard, a novel spatio-temporal deep learning architecture that combines Convolutional Neural Networks (CNN), Gated Recurrent Units (GRU), and an attention mechanism to effectively identify such attacks. The model is trained and evaluated on the CICIoV2024 dataset, achieving competitive performance across accuracy, precision, recall, and F1-score and outperforming existing state-of-the-art methods. A comprehensive ablation study confirms the individual and combined contributions of the CNN, GRU, and attention components. Additionally, a SHAP analysis is conducted to interpret the decision-making process of the model and determine which features have the most significant impact on intrusion detection. The proposed approach demonstrates strong potential for practical and scalable security enhancements in modern IoV environments, thereby ensuring safer and more secure CAN bus communications.
Authors:Claudia De Lazzari, Francesco Stocco, Edoardo Signorini, Giacomo Fregona, Fernando Chirici, Damiano Giani, Tommaso Occhipinti, Guglielmo Morgari, Alessandro Zavatta, Davide Bacco
Abstract:
Quantum Key Distribution (QKD) protocols require Information-Theoretically Secure (ITS) authentication of the classical channel to preserve the unconditional security of the distilled key. Standard ITS schemes are based on one-time keys: once a key is used to authenticate a message, it must be discarded. Since QKD requires mutual authentication, two independent one-time keys are typically consumed per round, imposing a non-trivial overhead on the net secure key rate. In this work, we present the authentication-with-response scheme, a novel ITS authentication scheme based on $\varepsilon$-Almost Strongly Universal$_2$ ($\varepsilon$-ASU$_2$) functions, whose IT security can be established in the Universal Composability (UC) framework. The scheme achieves mutual authentication consuming a single one-time key per QKD round, halving key consumption compared to the state-of-the-art.
Authors:Martiño Rivera-Dourado, Rubén Pérez-Jove, Alejandro Pazos, Jose Vázquez-Naya
Abstract:
Passkeys have recently emerged as a passwordless authentication mechanism, yet their usability in captive portals remains unexplored. This paper presents an empirical, comparative usability study of passkeys and passwords in a Wi-Fi hotspot using a captive portal. We conducted a controlled laboratory experiment with 50 participants following a split-plot design across Android and Windows platforms, using a router implementing the FIDO2CAP protocol. Our results show a tendency for passkeys to be perceived as more usable than passwords during login, although differences are not statistically significant. Independent of the authentication method, captive portal limitations negatively affected user experience and increased error rates. We further found that passkeys are generally easy to configure on both platforms, but platform-specific issues introduce notable usability challenges. Based on quantitative and qualitative findings, we derive design recommendations to improve captive portal authentication, including the introduction of usernameless authentication flows, improved captive portal detection mechanisms, and user interface design changes.
Authors:Guang Yang, Ziye Geng, Yihang Chen, Changqing Luo
Abstract:
Task-agnostic model fingerprinting has recently gained increasing attention due to its ability to provide a universal framework applicable across diverse model architectures and tasks. The current state-of-the-art method, MetaV, ensures generalization by jointly training a set of fingerprints and a neural-network-based global verifier using two large and diverse model sets: one composed of pirated models (i.e., the protected model and its variants) and the other comprising independently trained models. However, publicly available models are scarce in many real-world domains, and constructing such model sets requires intensive training and massive computational resources, posing a significant barrier to deployment. Reducing the number of models can alleviate the overhead, but increases the risk of overfitting, a problem further exacerbated by MetaV's entangled design, in which all fingerprints and the global verifier are jointly trained. This overfitting issue compromises the generalization capability for verifying unseen models. In this paper, we propose LiteGuard, an efficient task-agnostic fingerprinting framework that attains enhanced generalization while significantly lowering computational cost. Specifically, LiteGuard introduces two key innovations: (i) a checkpoint-based model set augmentation strategy that enriches model diversity by leveraging intermediate model snapshots captured during training of each pirated and independently trained model, thereby alleviating the need to train a large number of such models, and (ii) a local verifier architecture that pairs each fingerprint with a lightweight local verifier, thereby reducing parameter entanglement and mitigating overfitting. Extensive experiments across five representative tasks show that LiteGuard consistently outperforms MetaV in both generalization performance and computational efficiency.
Authors:Gunjan Mishra, Yash Mishra
Abstract:
Financial systems have a growing reliance on computer-based and distributed systems, making FinTech systems vulnerable to advanced and quickly emerging cyber-criminal threats. Traditional security systems and fixed machine learning systems cannot identify more intricate fraud schemes whilst also addressing real-time performance and trust demands. This paper presented an Adaptive Neuro-Fuzzy Blockchain-AI Framework (ANFB-AI) to achieve security in FinTech transactions by detecting threats using intelligent and decentralized algorithms. The framework combines both an immutable, transparent and tamper resistant layer of a permissioned blockchain to maintain the immutability, transparency and resistance to tampering of transactions, and an adaptive neuro-fuzzy learning model to learn the presence of uncertainty and behavioural drift in fraud activities. An explicit mathematical model is created to explain the transaction integrity, adaptive threat classification, and unified risk based decision-making. The proposed framework uses Proof-of-Authority consensus to overcome low-latency validation of transactions and scalable real-time financial services. Massive simulations are performed in normal, moderate, and high-fraud conditions with the use of realistic financial and cryptocurrency transactions. The experimental evidence proves that ANFB-AI is always more accurate and precise than recent state-of-the-art algorithms and costs much less in terms of transaction confirmation time, propagation delay of blocks and end-to end latency. ANFB-AI performance supports the appropriateness of adaptive neuro-fuzzy intelligence to blockchain-based FinTech security.
Authors:Jelizaveta Vakarjuk, Alisa Pankova
Abstract:
European Digital Identity (EUDI) Wallet aims to provide end users with a way to get attested credentials from issuers, and present them to different relying parties. An important property mentioned in the regulatory frameworks is the possibility to revoke a previously issued credential. While it is possible to issue a short-lived credential, in some cases it may be inconvenient, and a separate revocation service which allows to revoke a credential at any time may be necessary. In this work, we propose a full end-to-end description of a generic credential revocation system, which technically relies on a single server and secure transmission channels between parties. We prove security of the proposed revocation functionality in the universal composability model, and estimate its efficiency based on a proof-of-concept implementation.
Authors:Rohan Sequeira, Stavros Damianakis, Umar Iqbal, Konstantinos Psounis
Abstract:
Agentic computing systems, which autonomously spawn new functionalities based on natural language instructions, are becoming increasingly prevalent. While immensely capable, these systems raise serious security, privacy, and safety concerns. Fundamentally, the full set of functionalities offered by these systems, combined with their probabilistic execution flows, is not known beforehand. Given this lack of characterization, it is non-trivial to validate whether a system has successfully carried out the user's intended task or instead executed irrelevant actions, potentially as a consequence of compromise. In this paper, we propose Agent-Sentry, a framework that attempts to bound agentic systems to address this problem. Our key insight is that agentic systems are designed for specific use cases and therefore need not expose unbounded or unspecified functionalities. Once bounded, these systems become easier to scrutinize. Agent-Sentry operationalizes this insight by uncovering frequent functionalities offered by an agentic system, along with their execution traces, to construct behavioral bounds. It then learns a policy from these traces and blocks tool calls that deviate from learned behaviors or that misalign with user intent. Our evaluation shows that Agent-Sentry helps prevent over 90\% of attacks that attempt to trigger out-of-bounds executions, while preserving up to 98\% of system utility.
Authors:Zhaoxiang Liu, Samuel Judson, Raj Dutta, Mark Santolucito, Xiaolong Guo, Ning Luo
Abstract:
We present BlindMarket, an end-to-end zero-trust distribution framework for hardware IP cores. BlindMarket allows two parties, the IP user and the IP vendor, to complete an IP trading process with strong guarantees of verifiability and confidentiality before the transaction, and then traceability after. We propose verification heuristics and adapt the cone of influence-based design pruning to overcome the limited scalability common to cryptographic protocols and the hardness of the underlying hardware verification. We systematically evaluate our framework on a diverse set of real-world hardware benchmarks, and the results demonstrate that BlindMarket effectively completes across a diverse set of real-world hardware IP cores, demonstrating successful verification on 12 out of 13 designs and substantial performance improvements enabled by design pruning and control-flow guided heuristics.
Authors:Nicholas Pecka, Lotfi Ben Othmane, Bharat Bhargava, Renee Bryce
Abstract:
Traditional threat modeling occurs during design, but cloud deployments introduce unanticipated threats, especially multi-stage attacks chaining vulnerabilities across trust boundaries. Existing security tools analyze components in isolation, cannot detect architectural threats from system composition, and cannot validate runtime behavior against configured policies. This gap leaves organizations vulnerable to attacks exploiting architectural weaknesses. This paper addresses this gap through a key innovation: automatically inferring system architecture from runtime observations to enable continuous threat modeling. Our methodology combines static configuration analysis with observed network flows to construct architecture graphs reflecting actual operational behavior, then applies systematic threat detection using platform-agnostic abstractions (components, domains, interfaces, access policies, flows). This enables consistent threat identification across bare metal, Kubernetes, and cloud infrastructure without manual diagram maintenance. We validate the methodology using a supply-chain system with ML components deployed on all three platforms, injecting 17 infrastructure and ML threats. Results show detection of all 17 threat types across all platforms, while existing security tools detected only 6-47% with zero ML threat coverage, confirming the necessity of runtime aware, architecture-level threat analysis.
Authors:Leon Schuermann, Brad Campbell, Branden Ghena, Philip Levis, Amit Levy, Pat Pannuto
Abstract:
Tock began 10 years ago as a research operating system developed by academics to help other academics build urban sensing applications. By leveraging a new language (Rust) and new hardware protection mechanisms, Tock enabled Multiprogramming a 64 kB Computer Safely and Efficiently. Today, it is an open source project with a vibrant community of users and contributors. It is deployed on root of trust hardware in data center servers and on millions of laptops; it is used to develop automotive and space products, wearable electronics, and hardware security tokens--all while remaining a platform for operating systems research. This paper focuses on the impact of Tock's technical design on its adoption, the challenges and unexpected benefits of using a type safe language (Rust)--particularly in security sensitive settings--and the experience of supporting a production open4source operating system from academia.
Authors:Mugurel Barcau, Cristian Lupascu, Vicentiu Pasol, George C. Turcas
Abstract:
The present work investigates a type of morphisms between encryption schemes, called bridges. By associating an encryption scheme to every such bridge, we define and examine their security. Inspired by the bootstrapping procedure used by Gentry to produce fully homomorphic encryption schemes, we exhibit a general recipe for the construction of bridges. Our main theorem asserts that the security of a bridge reduces to the security of the first encryption scheme together with a technical additional assumption.
Authors:Yanming Mu, Hao Hu, Feiyang Li, Qiao Yuan, Jiang Wu, Zichuan Liu, Pengcheng Liu, Mei Wang, Hongwei Zhou, Yuling Liu
Abstract:
Retrieval-Augmented Generation (RAG) significantly mitigates the hallucinations and domain knowledge deficiency in large language models by incorporating external knowledge bases. However, the multi-module architecture of RAG introduces complex system-level security vulnerabilities. Guided by the RAG workflow, this paper analyzes the underlying vulnerability mechanisms and systematically categorizes core threat vectors such as data poisoning, adversarial attacks, and membership inference attacks. Based on this threat assessment, we construct a taxonomy of RAG defense technologies from a dual perspective encompassing both input and output stages. The input-side analysis reviews data protection mechanisms including dynamic access control, homomorphic encryption retrieval, and adversarial pre-filtering. The output-side examination summarizes advanced leakage prevention techniques such as federated learning isolation, differential privacy perturbation, and lightweight data sanitization. To establish a unified benchmark for future experimental design, we consolidate authoritative test datasets, security standards, and evaluation frameworks. To the best of our knowledge, this paper presents the first end-to-end survey dedicated to the security of RAG systems. Distinct from existing literature that isolates specific vulnerabilities, we systematically map the entire pipeline-providing a unified analysis of threat models, defense mechanisms, and evaluation benchmarks. By enabling deep insights into potential risks, this work seeks to foster the development of highly robust and trustworthy next-generation RAG systems.
Authors:Nivedita Singh, Seyoung Jin, Hyoungshick Kim
Abstract:
To comply with data protection regulations such as the EU General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), websites widely deploy cookie consent banners to collect users' privacy preferences. In practice, however, these interfaces often embed dark patterns that undermine informed and freely given consent. As regulatory scrutiny increases, such patterns have not disappeared but have evolved into subtler and more legally ambiguous forms, making existing detection approaches outdated. We present UMBRA, a consent management platform (CMP)-agnostic system that detects both previously studied patterns (DP1-DP10) and nine newly evolved patterns (DP11-DP19) targeting information disclosure, consent revocation, and legal ambiguity, including pay-to-opt-out schemes, revocation barriers, and fake opt-outs. UMBRA combines text analysis, visual heuristics, interaction tracing, and cookie-state monitoring to capture multi-step consent flows missed by prior tools. We evaluate UMBRA on a manually annotated ground-truth dataset and achieve 99% detection accuracy. We further conduct a large-scale compliance-oriented measurement across 14,000 websites spanning the EU, the US, and top-ranked global domains. Our results show that evolved dark patterns are pervasive: revocation is often obstructed, cookies are set before consent or despite explicit rejection, and opt-out interfaces often fail to prevent third-party tracking. On sites with revocation barriers, cookies increase by 25% on average, and many use insecure attributes that increase exposure to attacks such as XSS and CSRF. Overall, our findings provide evidence of systematic non-compliance and show how evolving consent manipulation erodes user autonomy while amplifying privacy and security risks.
Authors:Guang Yang, Ziye Geng, Yihang Chen, Changqing Luo
Abstract:
Adversarial-example-based fingerprinting approaches, which leverage the decision boundary characteristics of deep neural networks (DNNs) to craft fingerprints, have proven effective for model ownership protection. However, a fundamental challenge remains unresolved: how far a fingerprint should be placed from the decision boundary to simultaneously satisfy two essential properties, i.e., robustness and uniqueness, for effective and reliable ownership protection. Despite the importance of the fingerprint-to-boundary distance, existing works lack a theoretical solution and instead rely on empirical heuristics, which may violate either robustness or uniqueness properties. We propose AnaFP, an analytical fingerprinting scheme that constructs fingerprints under theoretical guidance. Specifically, we formulate fingerprint generation as controlling the fingerprint-to-boundary distance through a tunable stretch factor. To ensure both robustness and uniqueness, we mathematically formalize these properties that determine the lower and upper bounds of the stretch factor. These bounds jointly define an admissible interval within which the stretch factor must lie, thereby establishing a theoretical connection between the two constraints and the fingerprint-to-boundary distance. To enable practical fingerprint generation, we approximate the original (infinite) sets of pirated and independently trained models using two finite surrogate model pools and employ a quantile-based relaxation strategy to relax the derived bounds. Due to the circular dependency between the lower bound and the stretch factor, we apply grid search over the admissible interval to determine the most feasible stretch factor. Extensive experimental results show that AnaFP consistently outperforms prior methods, achieving effective ownership verification across diverse model architectures and model modification attacks.
Authors:Cuidi Wei, Shaoyu Tu, Daiki Hata, Toru Hasegawa, Yuki Koizumi, K. K. Ramakrishnan, Junji Takemasa, Timothy Wood
Abstract:
Our analysis of recent Internet traces shows that up to 71% of flows contain suspicious behaviors indicative of low-volume network attacks such as port scans. However, distinguishing anomalous traffic in real time is challenging as each attack flow may comprise only a few packets. We extend prior work that tracks heavy hitter flows to also detect low-volume and slow attacks by combining the capabilities of both switches and SmartNICs. We flip the usual design approach by proposing an efficient filter data structure used to quickly route traffic marked as benign towards destination end-systems. We make careful use of limited programmable switch memory and pipeline stages, and complement them with SmartNIC resources to analyze the remaining traffic that may be anomalous. Using machine learning classifiers and intrusion detection rules deployed on the SmartNIC, we identify malicious source IPs, which then undergo more detailed forensics for attack mitigation. Finally, we develop a dataplane based protocol to rapidly coordinate data structure updates between these devices. We implement immUNITY in a testbed with Tofino v1 switch and Bluefield 3 SmartNIC, demonstrating its high accuracy, while minimizing traffic that's analyzed outside the switch.
Authors:Melwin Xavier, Vaisakh M A, Melveena Jolly, Midhun Xavier
Abstract:
Agent frameworks increasingly encode tool-using behavior as explicit workflow graphs, yet safety enforcement remains a runtime concern. These frameworks expose analyzable graph structure through their APIs, enabling pre-deployment static verification of safety properties that runtime guardrails can only check reactively. This paper presents Agentproof, a system that automatically extracts a unified abstract graph model from four major agent frameworks (LangGraph, CrewAI, AutoGen, Google ADK), applies six structural checks with witness trace generation, and evaluates temporal safety policies via a DSL compiled to deterministic finite automata, both statically through a graph x DFA product construction and at runtime over event traces. Unlike general-purpose model checkers, Agentproof requires no manual modeling. In a curated benchmark of 18 author-constructed workflows, 27% of the benchmark contain structural defects (dead-end nodes, unreachable exits) and 55% violate a human-gate policy when enforced, distinct categories that prior work conflates. All 15 temporal policies defined fit within the seven-form DSL fragment, and verification completes in sub-second time for graphs up to 5,000 nodes. The corpus serves as a reproducible benchmark for evaluating static verification tools rather than as a prevalence study; defect rates reflect tool detection capability on a targeted benchmark, not base rates in production systems. Nonetheless, static graph verification complements runtime guardrails by catching topology-level defects that runtime tools miss unless the offending path is exercised.
Authors:Bhuvaneswari A, Kamalika Bhattacharjee
Abstract:
An equidistribution is a theoretical quality criteria that measures the uniformity of a linear pseudo-random number generator (PRNG). In this work, we first show that all existing linear cellular automaton (CA) based pseudo-random number generators (PRNGs) are weak in the equidistribution characteristic. Then we propose a list of light-weight combined CA-based PRNGs with time spacing ($2 \leq s \leq 10$) using linear maximal length cellular automata of degree $31 \leq k \leq 128$ (close to computer word size). We show that these PRNGs achieve maximal period as well as satisfy the maximal equidistribution property. Finally, we show that these combined maximal length CA-based PRNGs pass almost all the empirical testbeds, with speed and performance comparable to the Mersenne Twister.
Authors:Nilotpola Sarma, Vaishali Ghanshyam Chaudhuri, Chandan Karfa
Abstract:
Masking is a countermeasure against Power Side Channel Attacks (PSCAs) in both software and hardware implementations of cryptographic algorithms. Compared to software masking, implementing masked hardware is time consuming and error prone. Recent approaches, therefore, rely on High Level Synthesis (HLS) tools to automatically generate masked Register Transfer Level (RTL) hardware from verified masked software, significantly reducing design effort. Since HLS was never developed for security, HLS optimizations may impact PSCA security of the generated RTL. As a result, verifying the PSCA security of HLS generated masked RTL is crucial. Existing hardware masking verification tools can verify masked hardware, but may produce false positives when applied to HLS generated designs with controller datapath architectures obtained due to resource-shared datapath obtained via HLS. This work proposes a hardware masking verification strategy for HLS generated masked hardware. Our toolflow MaskedHLSVerif, performs state-wise formal verification of controller datapath RTL obtained via HLS, thereby avoiding false positives caused by resource-shared datapaths. Our tool flow correctly verifies standard cryptographic benchmarks, consisting of cascaded masked gadgets and the PRESENT S-box masked with gadgets, where existing tools like REBECCA reports false positives. The proposed tool-flow is able to detect masking flaws induced by HLS-optimizations as well.
Authors:Wenxuan Huang, Zhanbo Wang, Mingyu Li
Abstract:
Confidential databases (CDBs) are essential for enabling secure queries over sensitive data in untrusted cloud environments using confidential computing hardware. While adoption is growing, widespread deployment is hindered by high performance overhead from frequent synchronous cryptographic operations, which causes significant computational and memory bottlenecks. We present FEDB, a novel CDB design that removes cryptographic operations from the critical path. FEDB leverages crypto-free mappings, which maintain data-independent identifiers within the database while securely mapping them to plaintext secrets in a trusted domain. This paradigm shift reduces the runtime overhead by up to 78.0 times on industry-standard benchmarks including TPC-C and TPC-H.
Authors:Nikola Antonijević, Bernhard Etzlinger, Dave Singelée, Bart Preneel
Abstract:
Ranging and localisation have become critical for many applications and services. The Wi-Fi (IEEE 802.11) standard is a natural candidate for providing these functions across diverse environments, given its widespread deployment. The IEEE 802.11az amendment, finalised in 2023, introduces "Next Generation Positioning" mechanisms to secure and harden the existing insecure Wi-Fi Fine Timing Measurement (FTM) ranging solution. Moreover, the recent IEEE 802.11bk amendment increases the available bandwidth with the goal of approaching the centimetre-level ranging accuracy of ultra-wideband (UWB) systems. This paper examines to what extent these promises hold from a security and deployability perspective. We analyse the core mechanisms of secure Wi-Fi ranging as defined in IEEE 802.11az and IEEE 802.11bk at both the logical and physical layers, combining standards analysis with simulations and measurements on commercial and development hardware. At the logical layer, we show how common deployment choices can result in unauthenticated ranging, downgrade attacks, and simple denial-of-service attacks, making it difficult to securely realise many high-stakes use cases. At the physical layer, we study the predictability of secure ranging waveforms, the security impact of symbol repetition, and how waveform design choices affect compliance with spectral masks under realistic RF behaviour. Our results show that secure Wi-Fi ranging is highly sensitive to configuration choices and is non-trivial to implement on existing hardware. This is also evidenced by the currently limited support for secure Wi-Fi ranging in commodity devices. This paper provides practical guidelines for using secure FTM safely and recommendations to vendors and standardisation bodies to improve its robustness and deployability.
Authors:Mohammadhossein Homaei, Iman Khazrak, Rubén Molano, Andrés Caro, Mar Ávila
Abstract:
Industrial Cyber-Physical Systems (ICPS) face growing threats from cyber-attacks that exploit sensor and control vulnerabilities. Digital Twin (DT) technology can detect anomalies via predictive modelling, but current methods cannot distinguish attack types and often rely on costly full-system shutdowns. This paper presents i-SDT (intelligent Self-Defending DT), combining hydraulically-regularized predictive modelling, multi-class attack discrimination, and adaptive resilient control. Temporal Convolutional Networks (TCNs) with differentiable conservation constraints capture nominal dynamics and improve robustness to adversarial manipulations. A recurrent residual encoder with Maximum Mean Discrepancy (MMD) separates normal operation from single- and multi-stage attacks in latent space. When attacks are confirmed, Model Predictive Control (MPC) uses uncertainty-aware DT predictions to keep operations safe without shutdown. Evaluation on SWaT and WADI datasets shows major gains in detection accuracy, 44.1% fewer false alarms, and 56.3% lower operational costs in simulation-in-the-loop evaluation. with sub-second inference latency confirming real-time feasibility on plant-level workstations, i-SDT advances autonomous cyber-physical defense while maintaining operational resilience.
Authors:Tomoki Ono, Suthee Ruangwises
Abstract:
Card-based cryptography uses physical playing cards to construct protocols for secure multi-party computation. Existing card-based protocols employ various types of shuffles, some of which are easy to implement in practice while others are considerably more complex. In this paper, we classify shuffle operations into several levels according to their implementation complexity. We motivate this hierarchy from both practical and theoretical perspectives, and prove separation results between several levels by showing that certain shuffles cannot be realized using only operations from lower levels. Finally, we propose a new complexity measure for evaluating card-based protocols based on this hierarchy.
Authors:Saket Sanjeev Chaturvedi, Joshua Bergerson, Tanwi Mallick
Abstract:
As large language models (LLMs) evolve into autonomous "AI scientists," they promise transformative advances but introduce novel vulnerabilities, from potential "biosafety risks" to "dangerous explosions." Ensuring trustworthy deployment in science requires a new paradigm centered on reliability (ensuring factual accuracy and reproducibility), safety (preventing unintentional physical or biological harm), and security (preventing malicious misuse). Existing general-purpose safety benchmarks are poorly suited for this purpose, suffering from a fundamental domain mismatch, limited threat coverage of science-specific vectors, and benchmark overfitting, which create a critical gap in vulnerability evaluation for scientific applications. This paper examines the unique security and safety landscape of LLM agents in science. We begin by synthesizing a detailed taxonomy of LLM threats contextualized for scientific research, to better understand the unique risks associated with LLMs in science. Next, we conceptualize a mechanism to address the evaluation gap by utilizing dedicated multi-agent systems for the automated generation of domain-specific adversarial security benchmarks. Based on our analysis, we outline how existing safety methods can be brought together and integrated into a conceptual multilayered defense framework designed to combine a red-teaming exercise and external boundary controls with a proactive internal Safety LLM Agent. Together, these conceptual elements provide a necessary structure for defining, evaluating, and creating comprehensive defense strategies for trustworthy LLM agent deployment in scientific disciplines.
Authors:Aadi Joshi, Kavya Bhand
Abstract:
Digital image steganography requires a careful trade-off among payload capacity, visual fidelity, and statistical undetectability. Fixed-depth least significant bit embedding remains attractive because of its simplicity and high capacity, but it modifies smooth and textured regions uniformly, thereby increasing distortion and detectability in statistically sensitive areas. This paper presents an adaptive steganographic framework that combines a Mamdanitype fuzzy inference system with modern authenticated encryption. The proposed method determines a pixel-wise embedding depth from 1 to 3 bits using local entropy, edge magnitude, and payload pressure as linguistic inputs. To preserve encoder-decoder synchronization, the same feature maps are computed from lower-bit-stripped images, making the adaptive control mechanism invariant to the least significant modifications introduced during embedding. A cryptographic layer based on Argon2id and AES-256-GCM protects payload confidentiality and integrity independently of steganographic concealment.
Authors:Tsuyoshi Miezaki, Yusaku Nishimura, Katsuyuki Takashima
Abstract:
To analyze the security of code-based cryptosystems, the smoothing parameter, which is closely related to the total variation distance of codes, has been investigated. While previous studies have bounded this distance using the Fourier transform on locally compact abelian groups, we take an alternative approach based on random walks. In this paper, we derive an inequality for the total variation distance of random walks using equitable partitions, and we show that our proposed bound generalizes existing results for finite abelian groups.
Authors:Iakovos-Christos Zarkadis, Christos Douligeris
Abstract:
Supervised detection of network attacks has always been a critical part of network intrusion detection systems (NIDS). Nowadays, in a pivotal time for artificial intelligence (AI), with even more sophisticated attacks that utilize advanced techniques, such as generative artificial intelligence (GenAI) and reinforcement learning, it has become a vital component if we wish to protect our personal data, which are scattered across the web. In this paper, we address two tasks, in the first unified multi-modal NIDS dataset, which incorporates flow-level data, packet payload information and temporal contextual features, from the reprocessed CIC-IDS-2017, CIC-IoT-2023, UNSW-NB15 and CIC-DDoS-2019, with the same feature space. In the first task we use machine learning (ML) algorithms, with stratified cross validation, in order to prevent network attacks, with stability and reliability. In the second task we use adversarial learning algorithms to generate synthetic data, compare them with the real ones and evaluate their fidelity, utility and privacy using the SDV framework, f-divergences, distinguishability and non-parametric statistical tests. The findings provide stable ML models for intrusion detection and generative models with high fidelity and utility, by combining the Synthetic Data Vault framework, the TRTS and TSTR tests, with non-parametric statistical tests and f-divergence measures.
Authors:Philipp Normann, Andreas Happe, Jürgen Cito, Daniel Arp
Abstract:
LLM agents are increasingly relevant to research domains such as vulnerability discovery. Yet, the strongest systems remain closed and cloud-only, making them resource-intensive, difficult to reproduce, and unsuitable for work involving proprietary code or sensitive data. Consequently, there is an urgent need for small, local models that can perform security tasks under strict resource budgets, but methods for developing them remain underexplored. In this paper, we address this gap by proposing a two-stage post-training pipeline. We focus on the problem of Linux privilege escalation, where success is automatically verifiable and the task requires multi-step interactive reasoning. Using an experimental setup that prevents data leakage, we post-train a 4B model in two stages: supervised fine-tuning on traces from procedurally generated privilege-escalation environments, followed by reinforcement learning with verifiable rewards. On a held-out benchmark of 12 Linux privilege-escalation scenarios, supervised fine-tuning alone more than doubles the baseline success rate at 20 rounds, and reinforcement learning further lifts our resulting model, PrivEsc-LLM, to 95.8%, nearly matching Claude Opus 4.6 at 97.5%. At the same time, the expected inference cost per successful escalation is reduced by over 100x.
Authors:Abhijeet Sahu, Shuva Paul, Richard Macwan
Abstract:
Cyber deception assists in increasing the attacker's budget in reconnaissance or any early phases of threat intrusions. In the past, numerous methods of cyber deception have been adopted, such as IP address randomization, the creation of honeypots and honeynets mimicking an actual set of services, and networks deployed within an enterprise or operational technology(OT) network. These types of strategies follow naive approaches of recreating services that are expensive and that need a lot of human intervention. The advent of cloud services and other automations of containerized applications, such as Kubernetes, makes cyber defense easier. Yet, there remains a lot of potential to improve the accuracy of these deception strategies and to make them cost-effective using artificial intelligence (AI)-based solutions by making the deception more dynamic. Hence, in this work, we review various AI-based solutions in building network- and device-level cyber deception methods in contested environments. Specifically, we focus on leveraging the fusion of large language models (LLMs) and reinforcement learning(RL) in optimally learning these cyber deception strategies and validating the efficacy of such strategies in some stealthy attacks against OT systems in the literature.
Authors:Kushankur Ghosh, Mehar Klair, Kian Kyars, Euijin Choo, Jörg Sander
Abstract:
Provenance graphs model causal system-level interactions from logs, enabling anomaly detectors to learn normal behavior and detect deviations as attacks. However, existing approaches rely on brittle, manually engineered rules to build provenance graphs, lack functional context for system entities, and provide limited support for analyst investigation. We present Auto-Prov, an adaptive, end-to-end framework that leverages large language models (LLMs) to automatically construct provenance graphs from heterogeneous and evolving logs, embed system-level functional attributes into the graph, enable provenance graph-based anomaly detectors to learn from these enriched graphs, and summarize the detected attacks to assist an analyst's investigation. Auto-Prov clusters unseen log types and efficiently extracts provenance edges and entity-level information via automatically generated rules. It further infers system-level functional context for both known and previously unseen system entities using a combination of LLM inference and behavior-based estimation. Attacks detected by provenance-graph-based anomaly detectors trained on Auto-Prov's graphs are then summarized into natural-language text. We evaluate Auto-Prov with four state-of-the-art provenance graph-based detectors across diverse logs. Results show that Auto-Prov consistently enhances detection performance, generalizes across heterogeneous log formats, and produces stable, interpretable attack summaries that remain robust under system evolution.
Authors:Deng Liu, Song Chen
Abstract:
Hardware faults, specifically bit-flips in quantized weights, pose a severe reliability threat to Large Language Models (LLMs), often triggering catastrophic model collapses. We demonstrate that this vulnerability fundamentally stems from the spatial alignment between sensitive weight bits and extreme activation outliers, which causes a single hardware fault to be massively amplified. To address this, we propose Rotated Robustness (RoR), a training-free defense utilizing orthogonal Householder transformations. By applying an orthogonal rotation to the activation space, RoR geometrically smooths extreme outliers across all feature dimensions. This mechanism effectively breaks the alignment between outliers and vulnerable weights, mathematically guaranteeing original model accuracy. Extensive empirical evaluations across Llama-2/3, OPT, and Qwen families demonstrate the superior reliability of our approach. Under random bit-flip attacks, RoR reduces the stochastic collapse rate from 3.15\% to 0.00\% on Qwen2.5-7B. Furthermore, under severe targeted attacks with 50 Progressive Bit Search flips, RoR sustains robust reasoning on Llama-2-7B, maintaining a 43.9\% MMLU accuracy that nearly matches its 45.2\% unattacked accuracy, while competing defenses collapse to random guessing. Most notably, against the Single-Point Fault Attack (SPFA) -- the most aggressive targeted threat -- RoR exponentially inflates the attack complexity from a few bits to over 17,000 precise bit-flips. With a negligible storage overhead of 0.31\% and a minimal inference latency increase of 9.1\% on Llama-2-7B, RoR achieves true lossless robustness, providing a practical and highly reliable defense for LLM deployment.
Authors:Kosuke Higuchi, Ryotaro Kobayashi
Abstract:
Ransomware continues encrypting files during the delay between attack onset and detection. ROFBS mitigates this problem by backing up pre-modification files in real time upon file-open events. However, because the Linux file-open path traverses multiple kernel functions, it remains unclear how the choice of hook point affects defense effectiveness. In this study, we kept the ROFBS mechanism fixed and changed only the hook points on the Linux file-open path. We compared may_open, inode_permission, do_dentry_open, security_file_open, and xfs_file_open on AlmaLinux with XFS using three ransomware families: AvosLocker, Conti, and IceFire. We used Backup Ratio as the main metric and also compared the number of encrypted files with backups and the total number of encrypted files. The results showed that hook-point selection substantially affected both recoverability and damage scale. For AvosLocker, security_file_open achieved the highest Backup Ratio (82.5%). For Conti and IceFire, xfs_file_open achieved the highest values (100.0% and 63.2%, respectively). Moreover, xfs_file_open minimized the total number of encrypted files for all three ransomware families. These results indicate that, in ROFBS, the layer at which file-open events are observed is a key design factor. In particular, on XFS, hooking the filesystem-specific callback xfs_file_open may be advantageous when the goal is to reduce overall damage.
Authors:Hongju Li, Jian Ding, Fuyou Miao, Cheng Wang, Cheng Shu
Abstract:
Disjunctive Hierarchical Secret Sharing (DHSS)} scheme is a type of secret sharing scheme in which the set of all participants is partitioned into disjoint subsets, and each subset is said to be a level with different degrees of trust and different thresholds. In this work, we focus on the Chinese Remainder Theorem (CRT)-based DHSS schemes due to their ability to accommodate flexible share sizes. We point out that the ideal DHSS scheme of Yang et al. (ISIT, 2024) and the asymptotically ideal DHSS scheme of Tiplea et al. (IET Information Security, 2021) are insecure. Consequently, existing CRT-based DHSS schemes either exhibit security flaws or have an information rate less than $\frac{1}{2}$. To address these limitations, we propose a CRT-based asymptotically perfect DHSS scheme that supports flexible share sizes. Notably, our scheme is asymptotically ideal when all shares are equal in size. Its information rate achieves one and it has computational security.
Authors:Travis Dick, Matthew Joseph, Vinod Raman
Abstract:
We study several problems in differentially private domain discovery, where each user holds a subset of items from a shared but unknown domain, and the goal is to output an informative subset of items. For set union, we show that the simple baseline Weighted Gaussian Mechanism (WGM) has a near-optimal $\ell_1$ missing mass guarantee on Zipfian data as well as a distribution-free $\ell_\infty$ missing mass guarantee. We then apply the WGM as a domain-discovery precursor for existing known-domain algorithms for private top-$k$ and $k$-hitting set and obtain new utility guarantees for their unknown domain variants. Finally, experiments demonstrate that all of our WGM-based methods are competitive with or outperform existing baselines for all three problems.
Authors:Zijian Ling, Pingyi Hu, Xiuyong Gao, Xiaojing Ma, Man Zhou, Jun Feng, Songfeng Lu, Dongmei Zhang, Bin Benjamin Zhu
Abstract:
Speech-driven large language models (LLMs) are increasingly accessed through speech interfaces, introducing new security risks via open acoustic channels. We present Sirens' Whisper (SWhisper), the first practical framework for covert prompt-based attacks against speech-driven LLMs under realistic black-box conditions using commodity hardware. SWhisper enables robust, inaudible delivery of arbitrary target baseband audio-including long and structured prompts-on commodity devices by encoding it into near-ultrasound waveforms that demodulate faithfully after acoustic transmission and microphone nonlinearity. This is achieved through a simple yet effective approach to modeling nonlinear channel characteristics across devices and environments, combined with lightweight channel-inversion pre-compensation. Building on this high-fidelity covert channel, we design a voice-aware jailbreak generation method that ensures intelligibility, brevity, and transferability under speech-driven interfaces. Experiments across both commercial and open-source speech-driven LLMs demonstrate strong black-box effectiveness. On commercial models, SWhisper achieves up to 0.94 non-refusal (NR) and 0.925 specific-convincing (SC). A controlled user study further shows that the injected jailbreak audio is perceptually indistinguishable from background-only playback for human listeners. Although jailbreaks serve as a case study, the underlying covert acoustic channel enables a broader class of high-fidelity prompt-injection and commandexecution attacks.
Authors:Youting Wang, Yuan Tang, Yitian Qian, Chen Zhao
Abstract:
As Large Vision-Language Models (LVLMs) are increasingly deployed in agent-integrated workflows and other deployment-relevant settings, their robustness against semantic visual attacks remains under-evaluated -- alignment is typically tested on explicit harmful content rather than privacy-critical multimodal scenarios. We introduce VisualLeakBench, an evaluation suite to audit LVLMs against OCR Injection and Contextual PII Leakage using 1,000 synthetically generated adversarial images with 8 PII types, validated on 50 in-the-wild (IRL) real-world screenshots spanning diverse visual contexts. We evaluate four frontier systems (GPT-5.2, Claude~4, Gemini-3 Flash, Grok-4) with Wilson 95% confidence intervals. Claude~4 achieves the lowest OCR ASR (14.2%) but the highest PII ASR (74.4%), exhibiting a comply-then-warn pattern -- where verbatim data disclosure precedes any safety-oriented language. Grok-4 achieves the lowest PII ASR (20.4%). A defensive system prompt eliminates PII leakage for two models, reduces Claude~4's leakage from 74.4% to 2.2%, but has no effect on Gemini-3 Flash on synthetic data. Strikingly, IRL validation reveals Gemini-3 Flash does respond to mitigation on real-world images (50% to 0%), indicating that mitigation robustness is template-sensitive rather than uniformly absent. We release our dataset and code for reproducible robustness and safety evaluation of deployment-relevant vision-language systems.
Authors:Malcom Mohamed, Ghassan Karame
Abstract:
Cryptocurrency exchanges use proofs of liabilities (PoLs) to prove to their customers their liabilities committed on-chain, thereby enhancing their trust in the service. Unfortunately, a close examination of currently deployed and academic PoLs reveals significant shortcomings in their designs. For instance, existing schemes cannot resist realistic attack scenarios in which the provider colludes with an existing user. In this paper, we propose a new model, dubbed permissioned PoL, that addresses this gap by not requiring cooperation from users to detect a dishonest provider's potential misbehavior. At the core of our proposal lies a novel primitive, which we call Permissioned Vector Commitment (PVC), to ensure that a committed vector only contains values that users have explicitly signed. We provide an efficient PVC and PoL construction that carefully combines homomorphic properties of KZG commitments and BLS-based signatures. Our prototype implementation shows that, despite the stronger security, our proposal also improves server performance (by up to $10\times$) compared to prior PoLs.
Authors:Aojie Yuan, Zhiyuan Su, Yue Zhao
Abstract:
AI agents increasingly act through external tools: they query databases, execute shell commands, read and write files, and send network requests. Yet in most current agent stacks, model-generated tool calls are handed to the execution layer with no framework-agnostic control point in between. Post-execution observability can record these actions, but it cannot stop them before side effects occur. We present AEGIS, a pre-execution firewall and audit layer for AI agents. AEGIS interposes on the tool-execution path and applies a three-stage pipeline: (i) deep string extraction from tool arguments, (ii) content-first risk scanning, and (iii) composable policy validation. High-risk calls can be held for human approval, and all decisions are recorded in a tamper-evident audit trail based on Ed25519 signatures and SHA-256 hash chaining. In the current implementation, AEGIS supports 14 agent frameworks across Python, JavaScript, and Go with lightweight integration. On a curated suite of 48 attackinstances, AEGIS blocks all attacks in the suite before execution; on 500 benign tool calls, it yields a 1.2% false positive rate; and across 1,000 consecutive interceptions, it adds 8.3 ms median latency. The live demo will show end-to-end interception of benign, malicious, and human-escalated tool calls, allowing attendees to observe real-time blocking, approval workflows, and audit-trail generation. These results suggest that pre-execution mediation for AI agents can be practical, low-overhead, and directly deployable.
Authors:He Zhu, Yanshu Li, Wen Liu, Haitian Yang
Abstract:
Textual adversarial attacks pose a serious security threat to Natural Language Processing (NLP) systems by introducing imperceptible perturbations that mislead deep learning models. While adversarial example detection offers a lightweight alternative to robust training, existing methods typically rely on prior knowledge of attacks, white-box access to the victim model, or numerous queries, which severely limits their practical deployment. This paper introduces RTD-Guard, a novel black-box framework for detecting textual adversarial examples. Our key insight is that word-substitution perturbations in adversarial attacks closely resemble the "replaced tokens" that a Replaced Token Detection (RTD) discriminator is pre-trained to identify. Leveraging this, RTD-Guard employs an off-the-shelf RTD discriminator-without fine-tuning-to localize suspicious tokens, masks them, and detects adversarial examples by observing the prediction confidence shift of the victim model before and after intervention. The entire process requires no adversarial data, model tuning, or internal model access, and uses only two black-box queries. Comprehensive experiments on multiple benchmark datasets demonstrate that RTD-Guard effectively detects adversarial texts generated by diverse state-of-the-art attack methods. It surpasses existing detection baselines across multiple metrics, offering a highly efficient, practical, and resource-light defense mechanism-particularly suited for real-world deployment in resource-constrained or privacy-sensitive environments.
Authors:Emad Sherif, Iryna Yevseyeva, Vitor Basto-Fernandes, Allan Cook
Abstract:
The escalating frequency of cyber-attacks poses significant challenges for organisations, particularly small enterprises constrained by limited in-house expertise, insufficient knowledge, and financial resources. This research presents a novel framework that leverages Natural Language Processing to address these challenges through automated mapping of cyber incidents to adversary techniques. We introduce the Cyber Catalog, a knowledge base that systematically integrates CIS Critical Security Controls, MITRE ATT&CK techniques, and SMART metrics. This integrated resource enables organisations to connect threat intelligence directly to actionable controls and measurable outcomes. To operationalise the framework, we fine-tuned all-mpnet-base-v2, a highly regarded sentence-transformers model used to convert text into numerical vectors on an augmented dataset comprising 74,986 incident-technique pairs to enhance semantic similarity between cyber incidents and MITRE ATT&CK techniques. Our fine-tuned model achieved a Spearman correlation of 0.7894 and Pearson correlation of 0.8756, demonstrating substantial improvements over top baseline models including all-mpnet-base-v2, all-distilroberta-v1, and all-MiniLM-L12-v2. Furthermore, our model exhibited significantly lower prediction errors (MAE = 0.135, MSE = 0.027) compared to all baseline models, confirming superior accuracy and consistency. The Cyber Catalog, training dataset, trained model, and implementation code made publicly available to facilitate further research and enable practical deployment in resource-constrained environments. This work bridges the gap between threat intelligence and operational security management, providing an actionable tool for systematic cyber incident response and evidence-based cyber risk management.
Authors:Emad Sherif, Iryna Yevseyeva, Vitor Basto-Fernandes, Allan Cook
Abstract:
Organisations overwhelmingly prioritize vulnerability remediation using Common Vulnerability Scoring System (CVSS) severity scores, yet CVSS classifiers achieve an Area Under the Precision-Recall Curve (AUPRC) of 0.011 on real-world exploitation data, near random chance. We propose a composite Key Risk Indicator grounded in expected-loss decomposition, integrating dimensions of threat, impact, and exposure. We evaluated the KRI framework against the Known Exploited Vulnerabilities (KEV) catalog using a comprehensive dataset of 280,694 Common Vulnerabilities and Exposures (CVEs). KRI achieves Receiver Operating Characteristic Area Under the Curve (ROC-AUC) 0.927 and AUPRC 0.223 versus 0.747 and 0.011 for CVSS (24 percents, 20). Ablation analysis shows Exploit Prediction Scoring System (EPSS) alone achieves AUPRC 0.365, higher than full KRI (0.223), confirming that EPSS and KRI serve distinct objectives: EPSS maximizes raw exploit detection, while KRI re-orders by impact and exposure, capturing 92.3 percents of impact-weighted remediation value at k=500 versus 82.6 percents for EPSS, and surfacing 1.75 more Critical-severity exploited CVEs. KRI's net benefit exceeds EPSS whenever the severity premium exceeds 2. While EPSS serves as a robust baseline for exploit detection, the KRI framework is the superior choice for organizations seeking to align remediation efforts with tangible risk reduction.
Authors:Aakash Singh, Kuldeep Singh Yadav, Md Talib Hasan Ansari, V. Anil Kumar
Abstract:
The increasing adoption of server-side component-based web frameworks has introduced new application-layer attack surfaces that remain insufficiently understood at Internet scale. On 3 December 2025, a critical remote code execution vulnerability (CVE-2025-55182) in React Server Components, referred to as React2Shell, was publicly disclosed and subsequently observed being exploited in the wild. Despite its critical severity and a CVSS base score of 10.0, there is limited empirical understanding of how this vulnerability is exploited across the Internet. This paper presents the first Internet-scale measurement study of React2Shell exploitation activity using traffic collected from an Active Network Telescope. We developed a deterministic detection methodology that identifies exploitation attempts targeting endpoints implementing React Server components. It helped analyze exploitation traffic to characterize its temporal evolution, geographic and autonomous system-level distribution, and behavioral properties of the observed scanning activity. In addition, exploit payloads are examined to understand the attacker infrastructure and delivery mechanisms. The analysis reported rapid post-disclosure exploitation activity exhibiting patterns consistent with automated scanning campaigns, geographically distributed scanners, and concentrated backend infrastructure. To the best of our knowledge, this work provides the first quantitative characterization of React2Shell-triggered scanning activity, including the number of distinct scanners, their geographic and autonomous system distribution, and the scale of backend infrastructure involved in exploitation attempts.
Authors:Quentin Goux, Nadira Lammari
Abstract:
It is widely recognized that practical exercises are crucial for teaching cybersecurity in higher education. However, their setup is not only expensive, time-consuming, and prone to numerous errors, but also requires technical and programming skills to create attack contexts and scripts. To mitigate these drawbacks, this research work proposes an approach that automatically generates scripts and attack contexts based on informal attack scenario descriptions. To isolate business concerns from technological issues, our approach is aligned with the MDA development method. A formal language is proposed to express our Computation Independent model. We rely on the TOSCA standard to describe our Platform Independent Model. We also allow through our approach the generation of several Platform Specific Models. Hence, this research work contributes not only to the overall improvement of attack implementations for cybersecurity training but also to their reuse on various platforms.
Authors:Fatemeh Shoaei, Mohammad Pishdar, Mozafar Bag-Mohammadi, Mojtaba Karami
Abstract:
Smart contract-based ecosystems enable decentralized applications without trusted intermediaries, but their immutability and permissionless design also facilitate large-scale fraud. One of the most prevalent attacks is the rug pull, where project operators abruptly withdraw liquidity after artificially inflating token value. Existing detection methods primarily rely on reactive on-chain signals and often suffer from temporal data leakage, limiting their real-world reliability. This paper proposes a leakage-aware framework for early rug-pull detection that integrates on-chain behavioral metrics with temporally aligned Open Source Intelligence (OSINT) signals. We construct a hand-labeled dataset of 1,000 token projects, spanning DeFi and non-DeFi settings, with all features extracted strictly prior to any liquidity withdrawal to preserve causal validity. The dataset combines structural on-chain indicators with external attention signals derived from social media activity and search trends. Within this framework, TabPFN is employed as a core modeling component for learning from multimodal tabular data under strict temporal constraints. Experimental results show that the proposed framework achieves strong discriminative performance and improved probability calibration compared to classical baselines, while maintaining low false-negative rates. By framing rug-pull detection as a causal, multimodal forecasting problem, this work emphasizes the necessity of leakage-resilient evaluation and calibrated risk estimation for deployment in blockchain security systems.
Authors:Sengim Karayalcin, Marina Krcek, Pin-Yu Chen, Stjepan Picek
Abstract:
This paper investigates how Backdoor Attacks are represented within Vision Transformers (ViTs). By assuming knowledge of the trigger, we identify a specific ``trigger direction'' in the model's activations that corresponds to the internal representation of the trigger. We confirm the causal role of this linear direction by showing that interventions in both activation and parameter space consistently modulate the model's backdoor behavior across multiple datasets and attack types. Using this direction as a diagnostic tool, we trace how backdoor features are processed across layers. Our analysis reveals distinct qualitative differences: static-patch triggers follow a different internal logic than stealthy, distributed triggers. We further examine the link between backdoors and adversarial attacks, specifically testing whether PGD-based perturbations (de-)activate the identified trigger mechanism. Finally, we propose a data-free, weight-based detection scheme for stealthy-trigger attacks. Our findings show that mechanistic interpretability offers a robust framework for diagnosing and addressing security vulnerabilities in computer vision.
Authors:Muaan Ur Rehman, Hayretdin Bahsi, Rajesh Kalakoti
Abstract:
The expansion of Internet of Things (IoT) devices has increased the attack surface of networks, necessitating a robust and adaptive intrusion detection systems. Machine learning based systems have been considered promising in enhancing the detection performance. Federated learning settings enabled us to train models from network intrusion data collected from clients in a privacy preserving manner. However, the effectiveness of these systems can degrade over time due to concept drift, where patterns in data evolve as attackers develop new techniques. Realistic detection models should be non-stationary, so they can be continuously updated with new intrusion data while maintaining their detection capability for older data. As IoT environments are resource constrained, updates should consume minimal computational resources. This study provides a comprehensive performance analysis of incremental federated learning in enhancing the long term performance of non stationary IDS models in IoT networks. Specifically, we propose LSTM models within a federated learning setting to evaluate incremental learning approaches that utilize data and model-based measures against catastrophic learning under drift conditions. Using the CICIoMT2024 dataset, which includes various attack variants across five major categories, we conduct both binary and multiclass classification to provide a granular analysis of the intrusion detection task. Our results show that cumulative incremental learning and representative learning provide the most stable performance under drift, while retention-based methods offer a strong accuracy and latency trade off. The study offers new insights into the interplay between training strategy performance and latency in dynamic IoT environments, aiming to inform the development of more resilient IDS solutions considering the resource constraints in IoT devices.
Authors:Panagiotis Georgios Pennas, Konstantinos Papaioannou, Marco Guarnieri, Thaleia Dimitra Doudali
Abstract:
Large Language Models (LLMs) rely on optimizations like Automatic Prefix Caching (APC) to accelerate inference. APC works by reusing previously computed states for the beginning part of a request (prefix), when another request starts with the same text. While APC improves throughput, it introduces timing side channels: cache hits are faster than misses, creating observable latency differences. In multi-tenant systems, attackers can exploit these differences to infer sensitive information, e.g., by incrementally reconstructing another user's request by observing hit/miss patterns. Current defenses take a sledgehammer approach: they disable APC and cache sharing, isolating users, and sacrificing efficiency for regular users. This paper presents CacheSolidarity, a system that secures multi-tenant LLM serving systems against APC side channels without sacrificing performance and efficiency. CacheSolidarity monitors cache reuse across users, flags suspicious sharing, and selectively isolates prefixes, restricting their reuse only when necessary. Evaluation shows that CacheSolidarity enables up to 70% higher cache reuse and 30% lower inference latency compared to existing defenses that isolate users. CacheSolidarity's lightweight design demonstrates how security in LLM serving does not have to come at the cost of unnecessarily reduced performance or unbearable overheads.
Authors:YongPeng Yan, Yanan Li, Qiyang Xiao, Yanzhen Ren
Abstract:
This paper proposes PRoADS, a provably secure and robust audio steganographic framework based on audio diffusion models. As a generative steganography scheme, PRoADS embeds secret messages into the initial noise of diffusion models via orthogonal matrix projection. To address the reconstruction errors in diffusion inversion that cause high bit error rates (BER), we introduce Latent Optimization and Backward Euler Inversion to minimize the latent reconstruction and diffusion inversion errors. Comprehensive experiments demonstrate that our scheme sustains a remarkably low BER of 0.15\% under 64 kbps MP3 compression, significantly outperforming existing methods and exhibiting strong robustness.
Authors:Pratyay Kumar, Miguel Antonio Guirao Aguilera, Srikathyayani Srikanteswara, Satyajayant Misra, Abu Saleh Md Tayeen
Abstract:
Model Context Protocol (MCP) servers have rapidly emerged over the past year as a widely adopted way to enable Large Language Model (LLM) agents to access dynamic, real-world tools. As MCP servers proliferate and become easy to adopt via open-source releases, understanding their security risks becomes essential for dependable production agent deployments. Recent work has developed MCP threat taxonomies, proposed mitigations, and demonstrated practical attacks. However, to the best of our knowledge, no prior study has conducted a systematic, large-scale assessment of weaknesses in open-source MCP servers. Motivated by this gap, we apply static code analysis to identify Common Weakness Enumeration (CWE) weaknesses and map them to common attack patterns and threat categories using the MITRE Common Attack Pattern Enumerations and Classifications (CAPEC) to ground risk in real-world threats. We then introduce a risk-assessment framework for the MCP landscape that combines these threats using a multi-metric scoring of likelihood and impact. Our findings show that many open-source MCP servers contain exploitable weaknesses that can compromise confidentiality, integrity, and availability, underscoring the need for secure-by-design MCP server development.
Authors:Sizhe Huang, Shujie Yang
Abstract:
As backdoor attacks in UAV-based decentralized federated learning (DFL) grow increasingly stealthy and sophisticated, existing defenses have likewise escalated in complexity. Yet these defenses, which rely heavily on outlier detection, remain vulnerable to carefully crafted backdoors. In UAV-DFL, the lack of global coordination and limited resources further render outlier-based defenses impractical. Against this backdrop, gradient spectral analysis offers a promising alternative. While prior work primarily leverages low-frequency coefficients for pairwise comparisons, it neglects to analyze the intrinsic spectral characteristics of backdoor gradients. Through empirical analysis of existing stealthy attacks, we reveal a key insight: the more effort attackers invest in mimicking benign behaviors, the more distinct the spectral concentration becomes. Motivated by this, we propose Task-Aware Spectral Energy Refine (TASER) -- a decentralized defense framework. To our knowledge, this is the first efficient backdoor defense that utilizes spectral concentration instead of complex outlier detection, enabling mitigation of stealthy attacks by structurally disrupting the backdoor task. To suppress the backdoor task, TASER preserves main-task-relevant frequency coefficients and discards others. We provide theoretical guarantees and demonstrate through experiments that TASER remains effective against stealthy backdoor attacks that bypass outlier-based defenses, achieving attack success rate below 20% and accuracy loss under 5%.
Authors:Sizhe Huang, Shujie Yang
Abstract:
Self-supervised masked modeling shows promise for encrypted traffic classification by masking and reconstructing raw bytes. Yet recent work reveals these methods fail to reduce reliance on labeled data despite costly pretraining: under frozen encoder evaluation, accuracy drops from greater than 0.9 to less than 0.47. We argue the root cause is inductive bias mismatch: flattening traffic into byte sequences destroys protocol-defined semantics. We identify three specific issues: 1) field unpredictability, random fields like ip.id are unlearnable yet treated as reconstruction targets; 2) embedding confusion, semantically distinct fields collapse into a unified embedding space; 3) metadata loss, capture-time metadata essential for temporal analysis is discarded. To address this, we propose a protocol-native paradigm that treats protocol-defined field semantics as architectural priors, reformulating the task to align with the data's intrinsic tabular modality rather than incrementally adapting sequence-based architectures. Instantiating this paradigm, we introduce FlowSem-MAE, a tabular masked autoencoder built on Flow Semantic Units (FSUs). It features predictability-guided filtering that focuses on learnable FSUs, FSU-specific embeddings to preserve field boundaries, and dual-axis attention to capture intra-packet and temporal patterns. FlowSem-MAE significantly outperforms state-of-the-art across datasets. With only half labeled data, it outperforms most existing methods trained on full data.
Authors:Jialai Wang, Ya Wen, Zhongmou Liu, Yuxiao Wu, Bingyi He, Zongpeng Li, Ee-Chien Chang
Abstract:
Targeted bit-flip attacks (BFAs) exploit hardware faults to manipulate model parameters, posing a significant security threat. While prior work targets single-step inference models (e.g., image classifiers), LLM-based agents with multi-stage pipelines and external tools present new attack surfaces, which remain unexplored. This work introduces Flip-Agent, the first targeted BFA framework for LLM-based agents, manipulating both final outputs and tool invocations. Our experiments show that Flip-Agent significantly outperforms existing targeted BFAs on real-world agent tasks, revealing a critical vulnerability in LLM-based agent systems.
Authors:Godfrey Tan, Massimiliano Poletto, John Guttag, Frans Kaashoek
Abstract:
Role classification involves grouping hosts into related roles. It exposes the logical structure of a network, simplifies network management tasks such as policy checking and network segmentation, and can be used to improve the accuracy of network monitoring and analysis algorithms such as intrusion detection. This paper defines the role classification problem and introduces two practical algorithms that group hosts based on observed connection patterns while dealing with changes in these patterns over time. The algorithms have been implemented in a commercial network monitoring and analysis product for enterprise networks. Results from grouping two enterprise networks show that the number of groups identified by our algorithms can be two orders of magnitude smaller than the number of hosts and that the way our algorithms group hosts highly reflects the logical structure of the networks.
Authors:Seydina Ousmane Diallo, Maryline Laurent, Nesrine Kaaniche
Abstract:
Outsourcing encrypted data to the cloud creates a fundamental tension between data privacy and functional searchability. Current Searchable Symmetric Encryption (SSE) solutions frequently have significant limitations, such as excessive metadata leakage, or a lack of fine-grained access control. These issues restrict the scalability of secure searches in real-world applications where multiple clients require different levels of authorization. Our paper proposes MASSE, a dynamic multi-client SSE scheme incorporating attribute-based access control, which expands the OXT framework. With MASSE, clients are restricted sto searching for keywords authorized by their specific attribute sets, and the server remains unaware of the keywords and attributes. MASSE supports practical dynamic updates to documents, and client authorizations, including revocation, without requiring reencryption of the database or indices, or a large number of interactions. We formally prove the security of MASSE, that is, forward and backward privacy under a well-defined leakage profile, and token unforgeability. An experimental evaluation in a database containing 100 keywords, each associated with 150 documents, demonstrates the practical efficiency of MASSE. It takes less than two seconds to generate 10 to 100 keyword queries and 14 seconds to retrieve 50 matching documents. Theoretical results show that MASSE outperforms competing solutions, including OXT, and can be scaled to large encrypted databases. MASSE is also suitable for dynamic cloud deployments. Keywords: Searchable Encryption, SSE, Multi-Client, Attribute Based SSE, Access Control, Revocation, OXT
Authors:Yanan Li, Zixuan Wang, Qiyang Xiao, Yanzhen Ren
Abstract:
We propose a robust and provably secure image steganography framework based on latent-space iterative optimization. Within this framework, the receiver treats the transmitted image as a fixed reference and iteratively refines a latent variable to minimize the reconstruction error, thereby improving message extraction accuracy. Unlike prior methods, our approach preserves the provable security of the embedding while markedly enhancing robustness under various compression and image processing scenarios. On benchmark datasets, the experimental results demonstrate that the proposed iterative optimization not only improves robustness against image compression while preserving provable security, but can also be applied as an independent module to further reinforce robustness in other provably secure steganographic schemes. This highlights the practicality and promise of latent-space optimization for building reliable, robust, and secure steganographic systems.
Authors:Vamshi Krishna Thotempudi, Mahima Agarwal, Raghav Batta, Anjali Mangal
Abstract:
Enterprises increasingly rely on cloud-based applications to process highly sensitive data artifacts. Although cloud adoption improves agility and scalability, it also introduces new security challenges such as expanded attack surfaces, a wider radius of attack from credential compromise, and challenges maintaining strict access controls across users, services, and workflows. These challenges are especially acute for applications that handle privileged data and execute security-critical analysis, where traditional trust boundaries and ad hoc safeguards are insufficient. This paper presents Lockbox; a Zero Trust architecture designed for secure processing of sensitive cloud workloads under strict enterprise security and governance requirements. Lockbox applies explicit trust verification, strong isolation, least-privilege access, and policy-driven enforcement throughout the entire application lifecycle, from user authentication and document ingestion to analysis execution and result storage. The system incorporates modern cloud security primitives including; role-based access control, centralized key management, encryption in transit and at rest, and controlled integration with cloud-based data processing services, ensuring that sensitive artifacts remain protected and accessible only to authorized users. We discuss the usage of Lockbox in processing highly sensitive cybersecurity reports and demonstrate how this architecture enables organizations to safely adopt advanced capabilities, including AI-assisted processing, without weakening their security posture.
Authors:Pratyay Kumar, Abu Saleh Md Tayeen, Satyajayant Misra, Huiping Cao, Jiefei Liu, Qixu Gong, Jayashree Harikumar
Abstract:
Deep learning (DL)-based Network Intrusion Detection System (NIDS) has demonstrated great promise in detecting malicious network traffic. However, they face significant security risks due to their vulnerability to adversarial examples (AEs). Most existing adversarial attacks maliciously perturb data to maximize misclassification errors. Among AEs, natural adversarial examples (NAEs) are particularly difficult to detect because they closely resemble real data, making them challenging for both humans and machine learning models to distinguish from legitimate inputs. Creating NAEs is crucial for testing and strengthening NIDS defenses. This paper proposes NetDiffuser1, a novel framework for generating NAEs capable of deceiving NIDS. NetDiffuser consists of two novel components. First, a new feature categorization algorithm is designed to identify relatively independent features in network traffic. Perturbing these features minimizes changes while preserving network flow validity. The second component is a novel application of diffusion models to inject semantically consistent perturbations for generating NAEs. NetDiffuser performance was extensively evaluated using three benchmark NIDS datasets across various model architectures and state-of-the-art adversarial detectors. Our experimental results show that NetDiffuser achieves up to a 29.93% higher attack success rate and reduces AE detection performance by at least 0.267 (in some cases up to 0.534) in the Area under the Receiver Operating Characteristic Curve (AUC-ROC) score compared to the baseline attacks.
Authors:Shaun Feakins, Ibrahim Habli, Phillip Morgan
Abstract:
This paper contributes to the nascent debate around safety cases for frontier AI systems. Safety cases are structured, defensible arguments that a system is acceptably safe to deploy in a given context. Historically, they have been used in safety-critical industries, such as aerospace, nuclear or automotive. As a result, safety cases for frontier AI have risen in prominence, both in the safety policies of leading frontier developers and in international research agendas proposed by leaders in generative AI, such as the Singapore Consensus on Global AI Safety Research Priorities and the International AI Safety Report. This paper appraises this work. We note that research conducted within the alignment community which draws explicitly on lessons from the assurance community has significant limitations. We therefore aim to rethink existing approaches to alignment safety cases. We offer lessons from existing methodologies within safety assurance and outline the limitations involved in the alignment community's current approach. Building on this foundation, we present a case study for a safety case focused on Deceptive Alignment and CBRN capabilities, drawing on existing, theoretical safety case "sketches" created by the alignment safety case community. Overall, we contribute holistic insights from the field of safety assurance via rigorous theory and methodologies that have been applied in safety-critical contexts. We do so in order to create a better foundational framework for robust, defensible and useful safety case methodologies which can help to assure the safety of frontier AI systems.
Authors:Mohammadsaber Bahadori, Seyed Pooya Shariatpanahi, Behnam Bahrak
Abstract:
We study the problem of coded caching with nonuniform file popularity under the setting where the popularity distribution is initially unknown. By reframing the problem, we propose a method inspired by an algorithm from the recommender-systems literature and multi-armed bandits. Unlike prior approaches, which focus on accurately estimating file popularities, our method ranks files relative to one another and partitions them into groups. This perspective is more consistent with the structure of prior approaches as well, since earlier methods also divided files into popular and non-popular groups after estimating their popularities. The proposed approach relies on differences in request counts between files as the basis for ranking, and under many conditions it outperforms the previous algorithm. In particular, we obtain significantly improved performance in scenarios where the number of users in the network is small, the cache storage capacity is limited, or the learning process of the true popularity of files based on observations is contaminated by exploratory or synthetic requests that do not match the true popularity distribution. In these cases, our policy achieves markedly better performance and attains sublinear regret.
Authors:Nasif Muslim, Jean-Charles Grégoire
Abstract:
Consent-Based Access Control (CBAC) is a foundational mechanism for enforcing patient autonomy in modern healthcare information systems. Many CBAC frameworks are built on the eXtensible Access Control Markup Language (XACML) and inherit its \emph{lazy evaluation} model, in which policy interactions are resolved only at request time. This design allows contradictory consent directives to accumulate within the repository, creating a semantic gap between patient intent and system behavior while burdening high-frequency runtime decisions with complex conflict resolution. This paper presents an extended CBAC framework that enforces semantic correctness at consent creation time rather than during access evaluation. The framework introduces a pre-commit validation workflow centered on a Consent Conflict Analysis Module (CCAM), which proactively detects modality conflicts and redundancies before directives become active. In addition, immutable system invariants are formalized to guarantee baseline access for record authors and patients, preserving clinical continuity and professional accountability. Finally, the framework incorporates a context-aware emergency mediation mechanism that enables controlled \emph{break-the-glass} access driven by real-time physiological evidence, with disclosure strictly bounded by an Emergency Disclosure Control Function (EDCF). Simulation-based evaluation using controlled synthetic workloads demonstrates that pre-commit conflict resolution yields low and stable runtime decision latency and consistently outperforms standard XACML-based baselines as policy repositories scale. Emergency access experiments further demonstrate strong restrictions on data access, pruning the majority of non-relevant record elements while preserving clinically essential information.
Authors:Nasif Muslim, Jean-Charles Grégoire
Abstract:
Effective healthcare delivery depends on accurate longitudinal health records and addressing patients' concerns regarding the privacy of their information. While patient authentication is essential, reusing patient identifiers exposes individuals to linkability (associating multiple visits) and traceability (tying visits to real-world identities) risks. This paper presents a privacy-preserving, patient-centric identity management framework specifically tailored to the operational and regulatory requirements of healthcare. The framework balances operational reliability with strong privacy protections through a rooted trust anchor, anonymous pseudonyms, and a conditional traceability mechanism. It is formally specified, and its security and privacy properties are evaluated through MSRA-based architectural analysis and complementary formal verification. Simulation-based evaluation demonstrates that the framework's identity workflows are operationally feasible within the latency bounds typical of clinical environments.
Authors:Kaustav Goswami, Sean Peisert, Venkatesh Akella, Jason Lowe-Power
Abstract:
Memory disaggregation via Compute Express Link (CXL) enables multiple hosts to share remote memory, improving utilization for data-intensive workloads. Today, virtual memory enables process-level isolation on a host and CXL enables host-level isolation. This creates a critical security gap: the absence of process-level memory isolation in shared disaggregated memory. We present Space-Control, a hardware-software co-design that provides fine-grained, process-level isolation for shared disaggregated memory. Space-Control authenticates execution context in the hardware and enforces access control on every memory access and amortizes lookup times with a small cache. Our design allows up to 127 processes Simulation Toolkit (SST) based CXL model, Space-Control incurs minimal performance overhead of 3.3%, making shared disaggregated memory isolation practical.
Authors:Giorgio Grigolo, Dorian Schiffer, Lukas Gerster, Martin Ringbauer, Paul Erker
Abstract:
Analogously to classical computers, quantum processors exhibit side channels that may give attackers access to potentially proprietary algorithms. We identify and exploit a previously unexplored side channel in trapped-ion quantum processors that arises from the radio-frequency (RF) signals used to modulate lasers for ion cooling, gate execution, and readout. In these quantum processors, acousto-optical modulators (AOMs) imprint phase and frequency modulations onto laser fields interacting with the ions to implement individual and collective unitaries. The AOMs are driven by strong RF signals, a fraction of which leaks out of the device. We discuss general strategies to exploit this side channel and demonstrate how to detect RF leakage from a state-of-the-art qudit-based quantum processor using off-the-shelf components. From this data, we extract pulse characteristics of single-ion and entangling gates, thereby implementing a proof-of-principle exploitation of the novel attack vector. Finally, we outline ways to mitigate the information leakage through the presented side channel.
Authors:Fatemeh Heidari Soureshjani, Jan Gorzny
Abstract:
Cross-chain token standards enable fungible tokens that exist across multiple blockchains with a unified total supply model. This paper presents a comprehensive comparative analysis of five leading cross-chain token standards and frameworks: the xERC20 standard (implementing ERC-7281), the Omnichain Fungible Token (OFT) standard, the Native Token Transfers (NTT) framework, the Cross-Chain Token (CCT) standard, and the SuperchainERC20 standard (implementing ERC-7802). We examine each standard's distinguishing properties and technical design, including architecture, message-passing mechanisms, interoperability scope, chain compatibility, and security features. Our analysis reveals that while all these standards share the goal of seamless cross-chain fungibility, they differ significantly in implementation approach, trust model, and target ecosystem.
Authors:Will Thomas, Logan Schmalz, Adam Petz, Perry Alexander, Joshua D. Guttman, Paul D. Rowe, James Carter
Abstract:
Attestation means providing evidence that a remote target system is worthy of trust for some sensitive interaction. Although attestation is already used in network access control, security management, and trusted execution environments, it mainly concerns only a few system components. A clever adversary might manipulate these shallow attestations to mislead the relying party. Reliable attestations require layering. We construct attestations whose layers report evidence about successive components of the target system. Reliability also requires structuring the target system so only a limited set of components matters. We show how to structure an example system for reliable attestations despite a well-defined, relatively strong adversary. It is based on widely available hardware, such as Trusted Platform Modules, and software, such as Linux with SELinux. We isolate our principles in a few maxims that guide system development. We provide a cogent analysis of our mechanisms against our adversary model, as well as an empirical appraisal of the resulting system. We also identify two improvements to the mechanisms so attestation can succeed against strengthened adversaries. The performance burden of our attestation is negligible, circa 1.3 percent. After our first example, we vary our application level, and then also its underlying hardware anchor to use confidential computing with AMD's SEV-SNP. The same maxims help us achieve trustworthy attestations.
Authors:Mahafujul Alam, Julie B. Heynssens, Bertrand Francis Cambou
Abstract:
In the current information age, asymmetrical cryptography is widely used to protect information and financial transactions such as cryptocurrencies. The loss of private keys can have catastrophic consequences; therefore, effective MFA schemes are needed. In this paper, we focus on generating ephemeral keys to protect private keys. We propose a novel bit-truncation method in which the most significant bits (MSBs) of response values derived from facial features in a template-less biometric scheme are removed, significantly improving both accuracy and security. A statistical analysis is presented to optimize an MFA comprising at least three factors: template-less biometrics, an SRAM PUF-based token, and passwords. The results show a reduction in both false-reject and false-acceptance rates, and the generation of error-free ephemeral keys.
Authors:Natalia Krawczyk, Mateusz Szczepkowski, Adrian Brodzik, Krzysztof Bocianiak
Abstract:
As artificial intelligence (AI) becomes deeply embedded in critical services and everyday products, it is increasingly exposed to security threats which traditional cyber defenses were not designed to handle. In this paper, we investigate how cyber threat intelligence (CTI) may evolve to address attacks that target AI systems. We first analyze the assumptions and workflows of conventional threat intelligence with the needs of AI-focused defense, highlighting AI-specific assets and vulnerabilities. We then review and organize the current landscape of AI security knowledge. Based on this, we outline what an AI-oriented threat intelligence knowledge base should contain, describing concrete indicators of compromise (IoC) for different AI supply-chain phases and artifacts, and showing how such a knowledge base could support security tools. Finally, we discuss techniques for measuring similarity between collected indicators and newly observed AI artifacts. The review reveals gaps and quality issues in existing resources and identifies potential future research directions toward a practical threat intelligence framework tailored to AI.
Authors:Anatoly Belikov, Ilya Fedotov
Abstract:
Large Language Models (LLMs) are increasingly served on shared accelerators where an adversary with read access to device memory can observe KV caches and hidden states, threatening prompt privacy for open-source models. Cryptographic protections such as MPC and FHE offer strong guarantees but remain one to two orders of magnitude too slow for interactive inference, while static obfuscation schemes break under multi-run statistical attacks once the model is known. We present GELO (Good-Enough LLM Obfuscation), a lightweight protocol for privacy-preserving inference that limits information leakage from untrusted accelerator observations by hiding hidden states with fresh, per-batch invertible mixing. For each offloaded projection, the TEE samples a random matrix $A$, forms $U = AH$, offloads $U$ and weights W to the accelerator, and then applies $A^{-1}$ on return, so that $A^{-1}((AH)W ) = HW$ and outputs are unchanged. Because mixing is never reused across batches, the attacker faces only a single-batch blind source separation problem. We analyse information leakage and introduce two practical defences: (i) non-orthogonal mixing to mask Gram matrices, and (ii) orthogonal mixing augmented with a small fraction of high-energy "shield" vectors that pollute higher-order statistics. On Llama-2 7B, GELO preserves float32 outputs exactly, closely matches low-precision baselines, offloads the dominant matrix multiplications with about 20-30% latency overhead, and defeats a range of ICA/BSS and anchor-based attacks.
Authors:Bhanu Pallakonda, Mikkel Hindsbo, Sina Ehsani, Prag Mishra
Abstract:
The proliferation of open-weight Large Language Models (LLMs) has democratized agentic AI, yet fine-tuned weights are frequently shared and adopted with limited scrutiny beyond leaderboard performance. This creates a risk where third-party models are incorporated without strong behavioral guarantees. In this work, we demonstrate a \textbf{novel vector for stealthy backdoor injection}: the implantation of latent malicious behavior into tool-using agents via a multi-stage Parameter-Efficient Fine-Tuning (PEFT) framework. Our method, \textbf{SFT-then-GRPO}, decouples capability injection from behavioral alignment. First, we use SFT with LoRA to implant a "sleeper agent" capability. Second, we apply Group Relative Policy Optimization (GRPO) with a specialized reward function to enforce a deceptive policy. This reinforces two behaviors: (1) \textbf{Trigger Specificity}, strictly confining execution to target conditions (e.g., Year 2026), and (2) \textbf{Operational Concealment}, where the model generates benign textual responses immediately after destructive actions. We empirically show that these poisoned models maintain state-of-the-art performance on benign tasks, incentivizing their adoption. Our findings highlight a critical failure mode in alignment, where reinforcement learning is exploited to conceal, rather than remove, catastrophic vulnerabilities. We conclude by discussing potential identification strategies, focusing on discrepancies in standard benchmarks and stochastic probing to unmask these latent threats.
Authors:Benedikt Brückner, Alejandro J. Mercado, Yanghao Zhang, Panagiotis Kouvaros, Alessio Lomuscio
Abstract:
While formal robustness verification has seen significant success in image classification, scaling these guarantees to object detection remains notoriously difficult due to complex non-linear coordinate transformations and Intersection-over-Union (IoU) metrics. We introduce IoUCert, a novel formal verification framework designed specifically to overcome these bottlenecks in foundational anchor-based object detection architectures. Focusing on the object localisation component in single-object settings, we propose a coordinate transformation that enables our algorithm to circumvent precision-degrading relaxations of non-linear box prediction functions. This allows us to optimise bounds directly with respect to the anchor box offsets which enables a novel Interval Bound Propagation method that derives optimal IoU bounds. We demonstrate that our method enables, for the first time, the robustness verification of realistic, anchor-based models including SSD, YOLOv2, and YOLOv3 variants against various input perturbations.
Authors:Rahul Marchand, Art O Cathain, Jerome Wynne, Philippos Maximos Giavridis, Sam Deverett, John Wilkinson, Jason Gwartz, Harry Coppock
Abstract:
Large language models (LLMs) increasingly act as autonomous agents, using tools to execute code, read and write files, and access networks, creating novel security risks. To mitigate these risks, agents are commonly deployed and evaluated in isolated "sandbox" environments, often implemented using Docker/OCI containers. We introduce SANDBOXESCAPEBENCH, an open benchmark that safely measures an LLM's capacity to break out of these sandboxes. The benchmark is implemented as an Inspect AI Capture the Flag (CTF) evaluation utilising a nested sandbox architecture with the outer layer containing the flag and no known vulnerabilities. Following a threat model of a motivated adversarial agent with shell access inside a container, SANDBOXESCAPEBENCH covers a spectrum of sandboxescape mechanisms spanning misconfiguration, privilege allocation mistakes, kernel flaws, and runtime/orchestration weaknesses. We find that, when vulnerabilities are added, LLMs are able to identify and exploit them, showing that use of evaluation like SANDBOXESCAPEBENCH is needed to ensure sandboxing continues to provide the encapsulation needed for highly-capable models.
Authors:Guy Goren, Jorge M. Soares
Abstract:
In this paper, we analyze the finality of the Filecoin network, focusing on dynamic probabilistic guarantees of tipset permanence in the canonical chain. Our approach differs from static analyses that consider only the worst-case scenario; instead, we dynamically compute the error probability at each round using the live chain history, providing a more accurate and efficient assessment. We provide a practical algorithm that only requires visibility into the blocks produced by honest participants, which can be implemented by clients or off-chain applications without any change to Filecoin's consensus mechanisms. We demonstrate that, under typical operating conditions, the sought-after error probability of $2^{-30}$ is achievable in approximately 30 rounds, a 30x improvement over the 900 rounds that the network currently encodes as a fixed threshold. This finding promises to expedite transactions and enhance network efficiency, and lays the foundation for further analysis of other DAG-structured blockchains.
Authors:Lotfi Ben Othmane, Yasaswini Konapalli, Naga Prudhvi Mareedu
Abstract:
An Adaptive Cruise Control (ACC) system automatically adjusts the host vehicle's speed to maintain a safe following distance from a lead vehicle. In typical implementations, a feedback controller (e.g., a Proportional-Integral-Derivative (PID) controller) computes the host vehicle's acceleration using a target speed and a spacing error, defined as the difference between the measured inter-vehicle distance and a desired safe distance. ACC is often assumed to be resilient to fault-injection attacks because a Kalman filter (KF) can smooth noisy speed measurements. However, we show--through analytical proofs and simulation results--that a KF can tolerate injected speed values only up to a bounded threshold. When injected values exceed this threshold, the filter can be driven off track, causing the ACC controller to make unsafe acceleration decisions and potentially leading to collisions. Our main contribution is to augment the PID-based controller with Intrusion Detection System (IDS) outputs, yielding Intrusion Detection Systems-Based Adaptive Cruise Control (ACC-IDS). The ACC-IDS controller is simple and implementable: a binary intrusion flag switches the control law to emergency braking. We prove that augmenting ACC with an IDS, under assumed detection-performance and latency constraints, can mitigate these attacks and help preserve ACC's collision-avoidance guarantees.
Authors:Quhura Fathima, Neda Moghim, Mostafa Taghizade Firouzjaee, Christo K. Thomas, Ross Gore, Walid Saad
Abstract:
The growing deployment of Internet of Things (IoT) devices in smart cities and industrial environments increases vulnerability to stealthy, multi-stage advanced persistent threats (APTs) that exploit wireless communication. Detection is challenging due to severe class imbalance in network traffic, which limits the effectiveness of traditional deep learning approaches and their lack of explainability in classification decisions. To address these challenges, this paper proposes a neurosymbolic architecture that integrates an optimized BERT model with logic tensor networks (LTN) for explainable APT detection in wireless IoT networks. The proposed method addresses the challenges of mobile IoT environments through efficient feature encoding that transforms network flow data into BERT-compatible sequences while preserving temporal dependencies critical for APT stage identification. Severe class imbalance is mitigated using focal loss, hierarchical classification that separates normal traffic detection from attack categorization, and adaptive sampling strategies. Evaluation on the SCVIC-APT2021 dataset demonstrates an operationally viable binary classification F1 score of 95.27% with a false positive rate of 0.14%, and a 76.75% macro F1 score for multi-class attack categorization. Furthermore, a novel explainability analysis statistically validates the importance of distinct network features. These results demonstrate that neurosymbolic learning enables high-performance, interpretable, and operationally viable APT detection for IoT network monitoring architectures.
Authors:Jilei Sun, Dianhong Wu, Ying Su
Abstract:
Selective image encryption is common in remote sensing systems because it protects sensitive regions of interest (ROI) while limiting computational cost. However, many selective designs enable cross-tile structural leakage under chosen-plaintext attacks when secret-dependent transformations are reused across spatial positions. This paper proposes Tilewise Domain-Separated Selective Encryption (TDS-SE), where per-tile (and optionally per-frame) keys are derived from a master secret via HKDF with explicit domain separation, and ROI masks are treated strictly as external side information. Structural leakage is evaluated using two reconstruction-based distinguishers -- a linear model and a lightweight convolutional neural network -- under multiple attack settings. Experiments on RESISC45 and SEN12MS cover ablation tests, cross-position transferability, cross-sample generalization, and ROI-knowledge asymmetry. Results show that per-tile domain separation reduces position-conditioned transfer for the linear probe, and that adding frame freshness improves robustness to imperfect ROI assumptions for the CNN probe. Cross-sample generalization exhibits mixed behavior across settings, consistent with an empirical evaluation perspective, while selective-encryption functionality is preserved under the same tiling and ROI policy. Beyond the method itself, the paper provides a structured protocol for evaluating selective encryption under realistic attacker capabilities.
Authors:Ning Lyu, Yuntao Liu, Yonghong Bai, Zhiyuan Yan
Abstract:
Knowledge distillation transfers large teacher models to compact student models, enabling deployment on resource-limited platforms while suffering minimal performance degradation. However, this paradigm could lead to various security risks, especially model theft. Existing defenses against model theft, such as watermarking and secure enclaves, focus primarily on identity authentication and incur significant resource costs. Aiming to provide post-theft accountability and traceability, we propose a novel fingerprinting framework that superimposes device-specific Physical Unclonable Function (PUF) signatures onto teacher logits during distillation. Compared with watermarking or secure enclaves, our approach is lightweight, requires no architectural changes, and enables traceability of any leaked or cloned model. Since the signatures are based on PUFs, this framework is robust against reverse engineering and tampering attacks. In this framework, the signature recovery process consists of two stages: first a neural network-based decoder and then a Hamming distance decoder. Furthermore, we also propose a bit compression scheme to support a large number of devices. Experiment results demonstrate that our framework achieves high key recovery rate and negligible accuracy loss while allowing a tunable trade-off between these two key metrics. These results show that the proposed framework is a practical and robust solution for protecting distilled models.
Authors:S M Zia Ur Rashid, Deepa Gurung, Sonam Raj Gupta, Suman Rath
Abstract:
The convergence of Artificial Intelligence (AI) inference pipelines with cloud infrastructure creates a dual attack surface where cloud security standards and AI governance frameworks intersect without unified enforcement mechanisms. AI governance, cloud security, and industrial control system standards intersect without unified enforcement, leaving hybrid deployments exposed to cross-layer attacks that threaten safety-critical operations. This paper makes three primary contributions: (i) we synthesize these frameworks into a lifecycle-staged threat taxonomy structured around explicit attacker capability tiers, (ii) we propose a Unified Reference Architecture spanning a Secure Data Factory, a hardened model supply chain, and a runtime governance layer, (iii) we present a case study through Grid-Guard, a hybrid Transmission System Operator scenario in which coordinated defenses drawn from NIST AI RMF, MITRE ATLAS, OWASP AI Exchange and GenAI, CSA MAESTRO, and NERC CIP defeat a multi-tier physical-financial manipulation campaign without human intervention. Controls are mapped against all five frameworks and current NERC CIP standards to demonstrate that a single cloud-native architecture can simultaneously satisfy AI governance, adversarial robustness, agentic safety, and industrial regulatory compliance obligations.
Authors:Zilong Cao, Xuan Bi, Hai Zhang
Abstract:
Data privacy is important in the AI era, and differential privacy (DP) is one of the golden solutions. However, DP is typically applicable only if data have a bounded underlying distribution. We address this limitation by leveraging second-moment information from a small amount of public data. We propose Public-moment-guided Truncation (PMT), which transforms private data using the public second-moment matrix and applies a principled truncation whose radius depends only on non-private quantities: data dimension and sample size. This transformation yields a well-conditioned second-moment matrix, enabling its inversion with a significantly strengthened ability to resist the DP noise. Furthermore, we demonstrate the applicability of PMT by using penalized and generalized linear regressions. Specifically, we design new loss functions and algorithms, ensuring that solutions in the transformed space can be mapped back to the original domain. We have established improvements in the models' DP estimation through theoretical error bounds, robustness guarantees, and convergence results, attributing the gains to the conditioning effect of PMT. Experiments on synthetic and real datasets confirm that PMT substantially improves the accuracy and stability of DP models.
Authors:Yogha Restu Pramadi, Theodoros Spyridopoulos, Vijay Kumar
Abstract:
Research on Advanced Persistent Threats (APTs) in industrial environments requires experimental platforms that support realistic end-to-end attack emulation across converged enterprise IT, operational technology (OT), and Industrial Internet of Things (IIoT) networks. However, existing industrial cybersecurity testbeds typically focus on isolated IT or OT domains or single-stage attacks, limiting their suitability for studying multi-stage APT campaigns. This paper presents the design, implementation, and validation of SIMPLE-ICS, a virtualised industrial enterprise testbed that enables emulation of multi-stage APT campaigns across IT, OT, and IIoT environments. The testbed architecture is based on the Purdue Enterprise Reference Architecture, NIST SP 800-82, and IEC 62443 zoning principles and integrates enterprise services, industrial control protocols, and digital twin based process simulation. A systematic methodology inspired by the V model is used to derive architectural requirements, attack scenarios, and validation criteria. An APT campaign designed to mimic the BlackEnergy campaign is emulated using MITRE ATTACK techniques spanning initial enterprise compromise, credential abuse, lateral movement, OT network infiltration, and process manipulation. The testbed supports the synchronised collection of network traffic, host-level logs, and operational telemetry across all segments. The testbed is validated on multi-stage attack trace observability, logging completeness across IT, OT, and IIoT domains, and repeatable execution of APT campaigns. The SIMPLE-ICS testbed provides an experimental platform for studying end-to-end APT behaviours in industrial enterprise networks and for generating multi-source datasets to support future research on campaign-level detection and correlation methods.
Authors:Mario García-Márquez, Nuria Rodríguez-Barroso, M. Victoria Luzón, Francisco Herrera
Abstract:
Federated Learning (FL) has emerged as a key paradigm for building Trustworthy AI systems by enabling privacy-preserving, decentralized model training. However, FL is highly susceptible to adversarial attacks that compromise model integrity and data confidentiality, a vulnerability exacerbated by the fact that conventional data inspection methods are incompatible with its decentralized design. While integrating FL with Blockchain technology has been proposed to address some limitations, its potential for mitigating adversarial attacks remains largely unexplored. This paper introduces Resilient Federated Chain (RFC), a novel blockchain-enabled FL framework designed specifically to enhance resilience against such threats. RFC builds upon the existing Proof of Federated Learning architecture by repurposing the redundancy of its Pooled Mining mechanism as an active defense layer that can be combined with robust aggregation rules. Furthermore, the framework introduces a flexible evaluation function in its consensus mechanism, allowing for adaptive defense against different attack strategies. Extensive experimental evaluation on image classification tasks under various adversarial scenarios, demonstrates that RFC significantly improves robustness compared to baseline methods, providing a viable solution for securing decentralized learning environments.
Authors:Marco Bertoni, Saverio Giallorenzo, Marco Peressotti
Abstract:
Choreographies describe distributed protocols from a global viewpoint, enabling correct-by-construction synthesis of local behaviours. We develop a policy-parametric type system that prevents information leaks from high-security data to low-security observers, handling both explicit and implicit flows through a program-counter discipline. The system supports recursive procedures via a procedure context that we reconstruct through constraint generation. We prove termination-insensitive non-interference with respect to a standard small-step semantics.
Authors:Fatemeh Shoaei, Mohammad Pishdar, Mozafar Bag-Mohammadi, Mojtaba Karami
Abstract:
Rug-pull attacks pose a systemic threat across the blockchain ecosystem, yet research into early detection is hindered by the lack of scientific-grade datasets. Existing resources often suffer from temporal data leakage, narrow modality, and ambiguous labeling, particularly outside DeFi contexts. To address these limitations, we present TM-RugPull, a rigorously curated, leakage-resistant dataset of 1,028 token projects spanning DeFi, meme coins, NFTs, and celebrity-themed tokens. RugPull enforces strict temporal hygiene by extracting all features on chain behavior, smart contract metadata, and OSINT signals strictly from the first half of each project's lifespan. Labels are grounded in forensic reports and longevity criteria, verified through multi-expert consensus. This dataset enables causally valid, multimodal analysis of rug-pull dynamics and establishes a new benchmark for reproducible fraud detection research.
Authors:Shahzad Ahmad, Stefan Rass, Zahra Seyedi
Abstract:
We introduce a novel post-quantum sanitizable signature scheme constructed upon a chameleon hash function derived from the McEliece cryptosystem. In this design, the designated sanitizer possesses the inherent trapdoor of a Goppa code, which facilitates controlled collision-finding via Patterson decoding. This mechanism enables authorized modification of specific message blocks while ensuring all other content remains immutably bound. We provide formal security definitions and rigorous proofs of existential unforgeability and immutability, grounded in the hardness of syndrome decoding in the random-oracle model, where a robust random oracle thwarts trivial linear hash collisions. A key innovation lies in our precise characterization of the transparency property: by imposing a specific weight constraint on the randomizers generated by the signer, we achieve perfect transparency, rendering sanitized signatures indistinguishable from freshly signed ones. This work establishes the first transparent, code-based, post-quantum sanitizable signature scheme, offering strong theoretical guarantees and a pathway for practical deployment in long-term secure applications.
Authors:David Schmotz, Luca Beurer-Kellner, Sahar Abdelnabi, Maksym Andriushchenko
Abstract:
LLM agents are evolving rapidly, powered by code execution, tools, and the recently introduced agent skills feature. Skills allow users to extend LLM applications with specialized third-party code, knowledge, and instructions. Although this can extend agent capabilities to new domains, it creates an increasingly complex agent supply chain, offering new surfaces for prompt injection attacks. We identify skill-based prompt injection as a significant threat and introduce SkillInject, a benchmark evaluating the susceptibility of widely-used LLM agents to injections through skill files. SkillInject contains 202 injection-task pairs with attacks ranging from obviously malicious injections to subtle, context-dependent attacks hidden in otherwise legitimate instructions. We evaluate frontier LLMs on SkillInject, measuring both security in terms of harmful instruction avoidance and utility in terms of legitimate instruction compliance. Our results show that today's agents are highly vulnerable with up to 80% attack success rate with frontier models, often executing extremely harmful instructions including data exfiltration, destructive action, and ransomware-like behavior. They furthermore suggest that this problem will not be solved through model scaling or simple input filtering, but that robust agent security will require context-aware authorization frameworks. Our benchmark is available at https://www.skill-inject.com/.
Authors:Zoha Hayat Bhatti, Bakhtawar Ahtisham, Seemal Tausif, Niklas George, Nida ul Habib Bajwa, Mobin Javed
Abstract:
Large Language Models and commercial speech synthesis systems now enable highly realistic AI-generated voice scams (vishing), raising urgent concerns about deception at scale. Yet it remains unclear whether individuals can reliably distinguish AI-generated speech from human-recorded voices in realistic scam contexts and what perceptual strategies underlie their judgments. We conducted a controlled online study in which 22 participants evaluated 16 vishing-style audio clips (8 AI-generated, 8 human-recorded) and classified each as human or AI while reporting confidence. Participants performed poorly: mean accuracy was 37.5%, below chance in a binary classification task. At the stimulus level, misclassification was bidirectional: 75% of AI-generated clips were majority-labeled as human, while 62.5% of human-recorded clips were majority-labeled as AI. Signal Detection Theory analysis revealed near-zero discriminability (d' approx 0), indicating inability to reliably distinguish synthetic from human voices rather than simple response bias. Qualitative analysis of 315 coded excerpts revealed reliance on paralinguistic and emotional heuristics, including pauses, filler words, vocal variability, cadence, and emotional expressiveness. However, these surface-level cues traditionally associated with human authenticity were frequently replicated by AI-generated samples. Misclassifications were often accompanied by moderate to high confidence, suggesting perceptual miscalibration rather than uncertainty. Together, our findings demonstrate that authenticity judgments based on vocal heuristics are unreliable in contemporary vishing scenarios. We discuss implications for security interventions, user education, and AI-mediated deception mitigation.
Authors:Yongxin Chen, Zhiyuan Jiang, Chao Zhang, Haoran Xu, Shenglin Xu, Jianping Tang, Zheming Li, Peidai Xie, Yongjun Wang
Abstract:
Traditional database fuzzing techniques primarily focus on syntactic correctness and general SQL structures, leaving critical yet obscure DBMS features, such as system-level modes (e.g., GTID), programmatic constructs (e.g., PROCEDURE), advanced process commands (e.g., KILL), largely underexplored. Although rarely triggered by typical inputs, these features can lead to severe crashes or security issues when executed under edge-case conditions. In this paper, we present FuzzySQL, a novel LLM-powered adaptive fuzzing framework designed to uncover subtle vulnerabilities in DBMS special features. FuzzySQL combines grammar-guided SQL generation with logic-shifting progressive mutation, a novel technique that explores alternative control paths by negating conditions and restructuring execution logic, synthesizing structurally and semantically diverse test cases. To further ensure deeper execution coverage of the back end, FuzzySQL employs a hybrid error repair pipeline that unifies rule-based patching with LLM-driven semantic repair, enabling automatic correction of syntactic and context-sensitive failures. We evaluate FuzzySQL across multiple DBMSs, including MySQL, MariaDB, SQLite, PostgreSQL and Clickhouse, uncovering 37 vulnerabilities, 7 of which are tied to under-tested DBMS special features. As of this writing, 29 cases have been confirmed with 9 assigned CVE identifiers, 14 already fixed by vendors, and additional vulnerabilities scheduled to be patched in upcoming releases. Our results highlight the limitations of conventional fuzzers in semantic feature coverage and demonstrate the potential of LLM-based fuzzing to discover deeply hidden bugs in complex database systems.
Authors:Duy Anh Ta, Farnaz Farid, Farhad Ahamed, Ala Al-Areqi, Robert Beutel, Tamara Watson, Alana Maurushat
Abstract:
Modern organizations increasingly face cybersecurity incidents driven by human behaviour rather than technical failures. To address this, we propose a conceptual security framework that integrates a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) model to analyze biometric and environmental data for context-aware security decisions. The CNN extracts spatial patterns from sensor data, while the LSTM captures temporal dynamics associated with human error susceptibility. The model achieves 84% accuracy, demonstrating its ability to reliably detect conditions that lead to elevated human-centred cyber risk. By enabling continuous monitoring and adaptive safeguards, the framework supports proactive interventions that reduce the likelihood of human-driven cyber incidents
Authors:Eckehard Hermann, Harald Lampesberger
Abstract:
Risk matrices (heatmaps) are widely used for information and cyber risk management and decision-making, yet they are often too coarse for today's resilience-driven organizational and system landscapes. Likelihood and impact (the two dimensions represented in a heatmap) can vary with operational conditions, third-party dependencies, and the effectiveness of technical and organizational controls. At the same time, organizations cannot afford to analyze and operationalize every identified risk with equal depth using more sophisticated methods, telemetry, and real-time decision logic. We therefore propose a traceable triage pipeline that connects broad, context-sensitive screening with selective deep-dive analysis of material risks. The Hagenberg Risk Management Process presented in this paper integrates three steps: (i) context-aware prioritization using multidimensional polar heatmaps to compare risks across multiple operational states, (ii) Bowtie analysis for triaged risks to structure causes, consequences, and barriers, and (iii) an automated transformation of Bowties into directed acyclic graphs as the structural basis for Bayesian networks. A distinctive feature is the explicit representation of barriers as activation nodes in the resulting graph, making control points visible and preparing for later intervention and what-if analyses. The approach is demonstrated on an instant-payments gateway scenario in which a faulty production change under peak load leads to cascading degradation and transaction loss; DORA serves as the reference framework for resilience requirements. The result is an end-to-end, tool-supported workflow that improves transparency, auditability, and operational readiness from prioritization to monitoring-oriented models.
Authors:Benjamin Dowling, Prosanta Gope, Mehr U Nisa, Bhagya Wimalasiri
Abstract:
LINE has emerged as one of the most popular communication platforms in many East Asian countries, including Thailand and Japan, with millions of active users. Therefore, it is essential to understand its security guarantees. In this work, we present the first provable security analysis of the LINE version two (LINEv2) messaging protocol, focusing on its cryptographic guarantees in a real-world setting. We capture the architecture and security of the LINE messaging protocol by modifying the Multi-Stage Key Exchange (MSKE) model, a framework for analysing cryptographic protocols under adversarial conditions. While LINEv2 achieves basic security properties such as key indistinguishability and message authentication, we highlight the lack of forward secrecy (FS) and post-compromise security (PCS). To address this, we introduce a stronger version of the LINE protocol, introducing FS and PCS to LINE, analysing and benchmarking our results.
Authors:Jan Lennart Bönsel, Michael Maurer, Silvio Petriconi, Andrea Tundis, Marc Winstel
Abstract:
Coin selection refers to the problem of choosing a set of tokens to fund a transaction in token-based payment systems such as, e.g., cryptocurrencies or central bank digital currencies (CBDCs). In this paper, we propose the Boltzmann Draw that is a probabilistic algorithm inspired by the principles of statistical physics. The algorithm relies on drawing tokens according to the Boltzmann distribution, serving as an extension and improvement of the Random Draw method. Numerical results demonstrate the effectiveness of our method in bounding the number of selected input tokens as well as reducing dust generation and limiting the token pool size in the wallet. Moreover, the probabilistic algorithm can be implemented efficiently, improves performance and respects privacy requirements - properties of significant relevance for current token-based technologies. We compare the Boltzmann draw to both the standard Random Draw and the Greedy algorithm. We argue that the former is superior to the latter in the sense of the above objectives. Our findings are relevant for token-based technologies, and are also of interest for CBDCs, which as a legal tender possibly needs to handle large transaction volumes at a high frequency.
Authors:Manuel Suarez-Roman, Francesco Marciori, Mauro Conti, Juan Tapiador
Abstract:
Despite the high volume of open-source Cyber Threat Intelligence (CTI), our understanding of long-term threat actor-victim dynamics remains fragmented due to the lack of structured datasets and inconsistent reporting standards. In this paper, we present a large-scale automated analysis of open-source CTI reports spanning two decades. We develop a high-precision, LLM-based pipeline to ingest and structure 13,308 reports, extracting key entities such as attributed threat actors, motivations, victims, reporting vendors, and technical indicators (IoCs and TTPs). Our analysis quantifies the evolution of CTI information density and specialization, characterizing patterns that relate specific threat actors to motivations and victim profiles. Furthermore, we perform a meta-analysis of the CTI industry itself. We identify a fragmented ecosystem of distinct silos where vendors demonstrate significant geographic and sectoral reporting biases. Our marginal coverage analysis reveals that intelligence overlap between vendors is typically low: while a few core providers may offer broad situational awareness, additional sources yield diminishing returns. Overall, our findings characterize the structural biases inherent in the CTI ecosystem, enabling practitioners and researchers to better evaluate the completeness of their intelligence sources.
Authors:Shahriar Golchin, Marc Wetter
Abstract:
We systematically evaluate the quality of widely used AI safety datasets from two perspectives: in isolation and in practice. In isolation, we examine how well these datasets reflect real-world attacks based on three key properties: driven by ulterior intent, well-crafted, and out-of-distribution. We find that these datasets overrely on "triggering cues": words or phrases with overt negative/sensitive connotations that are intended to trigger safety mechanisms explicitly, which is unrealistic compared to real-world attacks. In practice, we evaluate whether these datasets genuinely measure safety risks or merely provoke refusals through triggering cues. To explore this, we introduce "intent laundering": a procedure that abstracts away triggering cues from attacks (data points) while strictly preserving their malicious intent and all relevant details. Our results indicate that current AI safety datasets fail to faithfully represent real-world attacks due to their overreliance on triggering cues. In fact, once these cues are removed, all previously evaluated "reasonably safe" models become unsafe, including Gemini 3 Pro and Claude Sonnet 3.7. Moreover, when intent laundering is adapted as a jailbreaking technique, it consistently achieves high attack success rates, ranging from 90% to over 98%, under fully black-box access. Overall, our findings expose a significant disconnect between how model safety is evaluated and how real-world adversaries behave.
Authors:Ryan Wong, Ben Feinberg, Saugata Ghose
Abstract:
Analog processing-using-memory (PUM; a.k.a. in-memory computing) makes use of electrical interactions inside memory arrays to perform bulk matrix-vector multiplication (MVM) operations. However, many popular matrix-based kernels need to execute non-MVM operations, which analog PUM cannot directly perform. To retain its energy efficiency, analog PUM architectures augment memory arrays with CMOS-based domain-specific fixed-function hardware to provide complete kernel functionality, but the difficulty of integrating such specialized CMOS logic with memory arrays has largely limited analog PUM to being an accelerator for machine learning inference, or for closely related kernels. An opportunity exists to harness analog PUM for general-purpose computation: recent works have shown that memory arrays can also perform Boolean PUM operations, albeit with very different supporting hardware and electrical signals than analog PUM. We propose DARTH-PUM, a general-purpose hybrid PUM architecture that tackles key hardware and software challenges to integrating analog PUM and digital PUM. We propose optimized peripheral circuitry, coordinating hardware to manage and interface between both types of PUM, an easy-to-use programming interface, and low-cost support for flexible data widths. These design elements allow us to build a practical PUM architecture that can execute kernels fully in memory, and can scale easily to cater to domains ranging from embedded applications to large-scale data-driven computing. We show how three popular applications (AES encryption, convolutional neural networks, large-language models) can map to and benefit from DARTH-PUM, with speedups of 59.4x, 14.8x, and 40.8x over an analog+CPU baseline.
Authors:Huijia Lin, Kameron Shahabi, Min Jae Song
Abstract:
Language models now routinely produce text that is difficult to distinguish from human writing, raising the need for robust tools to verify content provenance. Watermarking has emerged as a promising countermeasure, with existing work largely focused on model quality preservation and robust detection. However, current schemes provide limited protection against false attribution. We strengthen the notion of soundness by introducing two novel guarantees: unforgeability and recoverability. Unforgeability prevents adversaries from crafting false positives, texts that are far from any output from the watermarked model but are nonetheless flagged as watermarked. Recoverability provides an additional layer of protection: whenever a watermark is detected, the detector identifies the source text from which the flagged content was derived. Together, these properties strengthen content ownership by linking content exclusively to its generating model, enabling secure attribution and fine-grained traceability. We construct the first undetectable watermarking scheme that is robust, unforgeable, and recoverable with respect to substitutions (i.e., perturbations in Hamming metric). The key technical ingredient is a new cryptographic primitive called robust (or recoverable) digital signatures, which allow verification of messages that are close to signed ones, while preventing forgery of messages that are far from all previously signed messages. We show that any standard digital signature scheme can be boosted to a robust one using property-preserving hash functions (Boyle, LaVigne, and Vaikuntanathan, ITCS 2019).
Authors:Hakan Yildiz, Axel Küpper
Abstract:
Self-Sovereign Identity (SSI) enables user-controlled, cryptographically verifiable credentials. As EU regulations mandate EUDI Wallet acceptance by 2027, SSI adoption becomes a compliance necessity. However, each SSI Verifier exposes different APIs with distinct request parameters, response formats, and claim structures, requiring custom wrappers and dedicated infrastructure, contrasting with OpenID Connect (OIDC) where standardized protocols enable seamless integration. interID is an ecosystem-agnostic platform unifying credential verification across Hyperledger Aries/Indy, EBSI, and EUDI ecosystems. We extend interID with an OIDC bridge providing Verifier-as-a-Service, enabling SSI verification through standard OIDC flows. Organizations receive ID Tokens with verified credential attributes without implementing Verifier-specific logic or deploying infrastructure. The multi-tenant architecture leverages Keycloak with strict tenant isolation. Key innovations include PKCE support, scope-to-proof-template mappings translating OIDC scopes into ecosystem-specific verification requests, and a security analysis identifying novel attack surfaces at the intersection of OIDC, SSI, and multi-tenant architectures, threats covered by neither RFC 6819 nor existing SSI analyses alone. Our evaluation demonstrates security equivalence to production identity providers through threat modeling identifying 11 attack vectors, including seven beyond RFC 6819's scope. Integration analysis shows organizations can adopt SSI authentication with comparable effort to adding traditional federated providers. By combining familiar OIDC patterns with SaaS deployment, our work lowers integration and operational barriers, enabling regulatory compliance through configuration rather than custom development.
Authors:Jaagup Sepp, Eric Filiol
Abstract:
We propose a new approach in cryptanalysis based on an evolution of the concept of \textit{Combinatorial Equivalence}. The aim is to rewrite a cryptosystem under a combinatorially equivalent form in order to make appear new properties that are more strongly discriminating the secret key used during encryption. We successfully applied this approach to the most secure stream ciphers category nowadays. We first define a concept cipher called Cipherbent6 that capture most of the difficulty of stream cipher cryptanalysis. We significantly outperformed all known cryptanalysis. We applied this approach to the Achterbahn cipher and we obtained again far better cryptanalysis results.
Authors:Guangjie Liu, Guang Chen, Weiwei Liu
Abstract:
The widespread adoption of TLS 1.3 and QUIC has rendered payload content invisible, shifting traffic analysis toward side-channel features. However, rigorous justification for why side-channel leakage is inevitable in encrypted communications has been lacking. This paper establishes a strict foundation from information theory by constructing a formal model \(Σ=(Γ,Ω)\), where \(Γ=(A,Π,Φ,N)\) describes the causal chain of application generation, protocol encapsulation, encryption transformation, and network transmission, while \(Ω\) characterizes observation capabilities. Based on composite channel structure, data processing inequality, and Lipschitz statistics propagation, we propose and prove the Side-Channel Existence Theorem: for distinguishable semantic pairs, under conditions including mapping non-degeneracy (\(\mathbb{E}[d(z_P,z_N)\mid X]\le C\)), protocol-layer distinguishability (expectation difference \(\ge\barΔ\)), Lipschitz continuity, observation non-degeneracy (\(ρ>0\)), and propagation condition (\(C<\barΔ/2L_φ\)), the mutual information \(I(X;Y)\) is strictly positive with explicit lower bound. The corollary shows that in efficiency-prioritized systems, leakage is inevitable when at least one application pair is distinguishable. Three factors determine the boundary: non-degeneracy constant \(C\) constrained by efficiency, distinguishability \(\barΔ\) from application diversity, and \(ρ\) from analyst capabilities. This establishes the first rigorous information-theoretic foundation for encrypted traffic side channels, providing verifiable predictions for attack feasibility, quantifiable benchmarks for defenses, and mathematical basis for efficiency-privacy tradeoffs.
Authors:Anudeep Das, Prach Chantasantitam, Gurjot Singh, Lipeng He, Mariia Ponomarenko, Florian Kerschbaum
Abstract:
Large language models (LLMs) are increasingly deployed in settings where inducing a bias toward a certain topic can have significant consequences, and backdoor attacks can be used to produce such models. Prior work on backdoor attacks has largely focused on a black-box threat model, with an adversary targeting the model builder's LLM. However, in the bias manipulation setting, the model builder themselves could be the adversary, warranting a white-box threat model where the attacker's ability to poison, and manipulate the poisoned data is substantially increased. Furthermore, despite growing research in semantically-triggered backdoors, most studies have limited themselves to syntactically-triggered attacks. Motivated by these limitations, we conduct an analysis consisting of over 1000 evaluations using higher poisoning ratios and greater data augmentation to gain a better understanding of the potential of syntactically- and semantically-triggered backdoor attacks in a white-box setting. In addition, we study whether two representative defense paradigms, model-intrinsic and model-extrinsic backdoor removal, are able to mitigate these attacks. Our analysis reveals numerous new findings. We discover that while both syntactically- and semantically-triggered attacks can effectively induce the target behaviour, and largely preserve utility, semantically-triggered attacks are generally more effective in inducing negative biases, while both backdoor types struggle with causing positive biases. Furthermore, while both defense types are able to mitigate these backdoors, they either result in a substantial drop in utility, or require high computational overhead.
Authors:Jiyong Uhm, Minseok Kim, Michalis Polychronakis, Hyungjoon Koo
Abstract:
Binary code analysis plays an essential role in cybersecurity, facilitating reverse engineering to reveal the inner workings of programs in the absence of source code. Traditional approaches, such as static and dynamic analysis, extract valuable insights from stripped binaries, but often demand substantial expertise and manual effort. Recent advances in deep learning have opened promising opportunities to enhance binary analysis by capturing latent features and disclosing underlying code semantics. Despite the growing number of binary analysis models based on machine learning, their robustness to adversarial code transformations at the binary level remains underexplored. We evaluate the robustness of deep learning models for the task of binary code similarity detection (BCSD) under semantics-preserving transformations. The unique nature of machine instructions presents distinct challenges compared to the typical input perturbations found in other domains. We introduce asmFooler, a system that evaluates the resilience of BCSD models using a diverse set of adversarial code transformations that preserve functional semantics. We construct a dataset of 9,565 binary variants from 620 baseline samples by applying eight semantics-preserving transformations across six representative BCSD models. Our major findings highlight several key insights: i) model robustness relies on the processing pipeline, including code pre-processing, architecture, and feature selection; ii) adversarial transformation effectiveness is bounded by a budget shaped by model-specific constraints like input size and instruction expressive capacity; iii) well-crafted transformations can be highly effective with minimal perturbations; and iv) such transformations efficiently disrupt model decisions (e.g., misleading to false positives or false negatives) by focusing on semantically significant instructions.
Authors:Ian Oliver, Pekka Kuure
Abstract:
We introduces a category-theoretic framework for modelling trust as applied to trusted computation systems and remote attestation. By formalizing elements, claims, results, and decisions as objects within a category, and the processes of attestation, verification, and decision-making as morphisms, the framework provides a rigorous approach to understanding trust establishment and provides a well-defined semantics for terms such as `trustworthiness' and 'justification'/forensics. The trust decision space is formalized using a Heyting Algebra, allowing nuanced trust levels that extend beyond binary trusted/untrusted states. We then present additional structures and in particular utilise exponentiation in a category theoretic sense to define compositions of attestation operations and provide the basis of a measurement for the expressibility of an attestation environment. We present a number of worked examples including boot-run-shutdown sequences, Evil Maid attacks and the specification of an attestation environment based upon this model. We then address challenges in modelling dynamic and larger systems made of multiple compositions.
Authors:Jericho Cain, Hayden Beadles
Abstract:
User and Entity Behavior Analytics (UEBA) systems commonly detect insider threats by scoring fixed time windows of user activity for anomalous behavior. While this window-level paradigm has proven effective for identifying sharp behavioral deviations, it remains unclear how much information about longer-running attack campaigns is already present within individual windows, and how such information can be leveraged for campaign discovery. In this work, we study unsupervised window-level insider threat detection on the CERT r4.2 dataset and show that explicitly separating activity presence from activity magnitude yields substantial performance gains. We introduce a dual-channel convolutional autoencoder that reconstructs both a binary activity mask and corresponding activity values, allowing the model to focus representational capacity on sparse behavioral structure rather than dense inactive baselines. Across multiday attack campaigns lasting between one and seven days, the proposed approach achieves a window-level precision-recall AUC of 0.71, substantially exceeding standard unsupervised autoencoder baselines and enabling high-precision operating points with zero false alarms.
Authors:Marwa Moullem, Lorenz Breidenbach, Ittay Eyal, Ari Juels
Abstract:
Smart contracts are stateful programs deployed on blockchains; they secure over a trillion dollars in transaction value per year. High-stakes smart contracts often rely on timely alerts about external events, but prior work has not analyzed their resilience to an attacker suppressing alerts via bribery. We formalize this challenge in a cryptoeconomic setting as the \emph{alerting problem}, giving rise to a game between a bribing adversary and~$n$ rational participants, who pay a penalty if they are caught deviating from the protocol. We establish a quadratic, i.e.,~$O(n^2)$, upper bound, whereas a straightforward alerting protocol only achieves~$O(n)$ bribery cost. We present a \emph{simultaneous game} that asymptotically achieves the quadratic upper bound and thus asymptotically-optimal bribery resistance. We then present two protocols that implement our simultaneous game: The first leverages a strong network synchrony assumption. The second relaxes this strong assumption and instead takes advantage of trusted hardware and blockchain proof-of-publication to establish a timed commitment scheme. These two protocols are constant-time but incur a linear storage overhead on the blockchain. We analyze a third, \emph{sequential alerting} protocol that optimistically incurs no on-chain storage overhead, at the expense of~$O(n)$ worst-case execution time. All three protocols achieve asymptotically-optimal bribery costs, but with different resource and performance tradeoffs. Together, they illuminate a rich design space for practical solutions to the alerting problem.
Authors:Theshani Nuradha, Sujeet Bhalerao, Felix Leditzky
Abstract:
When sensitive information is encoded in data, it is important to ensure the privacy of information when attempting to learn useful information from the data. There is a natural tradeoff whereby increasing privacy requirements may decrease the utility of a learning protocol. In the quantum setting of differential privacy, such tradeoffs between privacy and utility have so far remained largely unexplored. In this work, we study optimal privacy-utility tradeoffs for both generic and application-specific utility metrics when privacy is quantified by $(\varepsilon,δ)$-quantum local differential privacy. In the generic setting, we focus on optimizing fidelity and trace distance between the original state and the privatized state. We show that the depolarizing mechanism achieves the optimal utility for given privacy requirements. We then study the specific application of learning the expectation of an observable with respect to an input state when only given access to privatized states. We derive a lower bound on the number of samples of privatized data required to achieve a fixed accuracy guarantee with high probability. To prove this result, we employ existing lower bounds on private quantum hypothesis testing, thus showcasing the first operational use of them. We also devise private mechanisms that achieve optimal sample complexity with respect to the privacy parameters and accuracy parameters, demonstrating that utility can be significantly improved for specific tasks in contrast to the generic setting. In addition, we show that the number of samples required to privately learn observable expectation values scales as $Θ((\varepsilon β)^{-2})$, where $\varepsilon \in (0,1)$ is the privacy parameter and $β$ is the accuracy tolerance. We conclude by initiating the study of private classical shadows, which promise useful applications for private learning tasks.
Authors:Mohan Rajagopalan, Vinay Rao
Abstract:
Large Language Model (LLM) applications are vulnerable to prompt injection and context manipulation attacks that traditional security models cannot prevent. We introduce two novel primitives--authenticated prompts and authenticated context--that provide cryptographically verifiable provenance across LLM workflows. Authenticated prompts enable self-contained lineage verification, while authenticated context uses tamper-evident hash chains to ensure integrity of dynamic inputs. Building on these primitives, we formalize a policy algebra with four proven theorems providing protocol-level Byzantine resistance--even adversarial agents cannot violate organizational policies. Five complementary defenses--from lightweight resource controls to LLM-based semantic validation--deliver layered, preventative security with formal guarantees. Evaluation against representative attacks spanning 6 exhaustive categories achieves 100% detection with zero false positives and nominal overhead. We demonstrate the first approach combining cryptographically enforced prompt lineage, tamper-evident context, and provable policy reasoning--shifting LLM security from reactive detection to preventative guarantees.
Authors:Mohan Rajagopalan, Vinay Rao
Abstract:
Agentic AI systems automate enterprise workflows but existing defenses--guardrails, semantic filters--are probabilistic and routinely bypassed. We introduce authenticated workflows, the first complete trust layer for enterprise agentic AI. Security reduces to protecting four fundamental boundaries: prompts, tools, data, and context. We enforce intent (operations satisfy organizational policies) and integrity (operations are cryptographically authentic) at every boundary crossing, combining cryptographic elimination of attack classes with runtime policy enforcement. This delivers deterministic security--operations either carry valid cryptographic proof or are rejected. We introduce MAPL, an AI-native policy language that expresses agentic constraints dynamically as agents evolve and invocation context changes, scaling as O(log M + N) policies versus O(M x N) rules through hierarchical composition with cryptographic attestations for workflow dependencies. We prove practicality through a universal security runtime integrating nine leading frameworks (MCP, A2A, OpenAI, Claude, LangChain, CrewAI, AutoGen, LlamaIndex, Haystack) through thin adapters requiring zero protocol modifications. Formal proofs establish completeness and soundness. Empirical validation shows 100% recall with zero false positives across 174 test cases, protection against 9 of 10 OWASP Top 10 risks, and complete mitigation of two high impact production CVEs.
Authors:Saleh K. Monfared, Fatemeh Ganji, Dan Holcomb, Shahin Tajik
Abstract:
The rapid expansion of GPU-accelerated computing has enabled major advances in large-scale artificial intelligence (AI), while heightening concerns about how accelerators are observed or governed once deployed. Governance is essential to ensure that large-scale compute infrastructure is not silently repurposed for training models, circumventing usage policies, or operating outside legal oversight. Because current GPUs expose limited trusted telemetry and can be modified or virtualized by adversaries, we explore whether compute-based measurements can provide actionable signals of utilization when host and device are untrusted. We introduce a measurement framework that leverages architectural characteristics of modern GPUs to generate timing- and memory-based observables that correlate with compute activity. Our design draws on four complementary primitives: (1) a probabilistic, workload-driven mechanism inspired by Proof-of-Work (PoW) to expose parallel effort, (2) sequential, latency-sensitive workloads derived via Verifiable Delay Functions (VDFs) to characterize scalar execution pressure, (3) General Matrix Multiplication (GEMM)-based tensor-core measurements that reflect dense linear-algebra throughput, and (4) a VRAM-residency test that distinguishes on-device memory locality from off-chip access through bandwidth-dependent hashing. These primitives provide statistical and behavioral indicators of GPU engagement that remain observable even without trusted firmware, enclaves, or vendor-controlled counters. We evaluate their responses to contention, architectural alignment, memory pressure, and power overhead, showing that timing shifts and residency latencies reveal meaningful utilization patterns. Our results illustrate why compute-based telemetry can complement future accountability mechanisms by exposing architectural signals relevant to post-deployment GPU governance.
Authors:Shayan Ali Hassan, Tao Ni, Zafar Ayyub Qazi, Marco Canini
Abstract:
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding, reasoning, and generation. However, these systems remain susceptible to malicious prompts that induce unsafe or policy-violating behavior through harmful requests, jailbreak techniques, and prompt injection attacks. Existing defenses face fundamental limitations: black-box moderation APIs offer limited transparency and adapt poorly to evolving threats, while white-box approaches using large LLM judges impose prohibitive computational costs and require expensive retraining for new attacks. Current systems force designers to choose between performance, efficiency, and adaptability. To address these challenges, we present BAGEL (Bootstrap AGgregated Ensemble Layer), a modular, lightweight, and incrementally updatable framework for malicious prompt detection. BAGEL employs a bootstrap aggregation and mixture of expert inspired ensemble of fine-tuned models, each specialized on a different attack dataset. At inference, BAGEL uses a random forest router to identify the most suitable ensemble member, then applies stochastic selection to sample additional members for prediction aggregation. When new attacks emerge, BAGEL updates incrementally by fine-tuning a small prompt-safety classifier (86M parameters) and adding the resulting model to the ensemble. BAGEL achieves an F1 score of 0.92 by selecting just 5 ensemble members (430M parameters), outperforming OpenAI Moderation API and ShieldGemma which require billions of parameters. Performance remains robust after nine incremental updates, and BAGEL provides interpretability through its router's structural features. Our results show ensembles of small finetuned classifiers can match or exceed billion-parameter guardrails while offering the adaptability and efficiency required for production systems.
Authors:Abhishek Kumar Mishra, Swadeep, Guevara Noubir, Mathieu Cunche
Abstract:
Tag-based tracking ecosystems help users locate lost items, but can be leveraged for unwanted tracking and stalking. Existing protocol-driven defenses and prior academic solutions largely assume stable identifiers or predictable beaconing. However, identifier-based defenses fundamentally break down against advanced rogue trackers that aggressively rotate identifiers. We present AirCatch, a passive detection system that exploits a physical-layer constraint: while logical identifiers can change arbitrarily fast, the transmitter's analog imprint remains stable and reappears as a compact and persistently occupied region in Carrier Frequency Offset (CFO) feature space. AirCatch advances the state of the art along three axes: (i) a novel, modulation-aware CFO fingerprint that augments packet-level CFO with content-independent CFO components that amplify device distinctiveness; (ii) a new tracking detection algorithm based on high core density and persistence that is robust to contamination and evasion through per-identifier segmentation; and (iii) an ultra-low-cost receiver, an approximately 10 dollar BLE SDR named BlePhasyr, built from commodity components, that makes RF fingerprinting based detection practical in resource-constrained deployments. We evaluate AirCatch across Apple, Google, Tile, and Samsung tag families in multi-hour captures, systematically stress-test evasion using a scenario generator over a grid of transmission and rotation periods, and validate in diverse real-world mobility traces including home and office commutes, public transport, car travel, and airport journeys while sweeping background tag density. Across these stress tests, AirCatch achieves no false positives and early detection over a wide range of adversarial configurations and environments, degrading gracefully only in extreme low-rate regimes that also reduce attacker utility.
Authors:Wilfrid Azariah, Yi-Quan Chen, Zhong-Xin You, Ray-Guang Cheng, Shiann-Tsong Sheu, Binbin Chen
Abstract:
Random Access Channel (RACH) jamming poses a critical security threat to 5G and beyond (B5G) networks. This paper presents an analytical model for predicting the impact of Msg1 jamming attacks on RACH performance. We use the OpenAirInterface (OAI) open-source user equipment (UE) to implement a Msg1 jamming attacker. Over-the-air experiments validate the accuracy of the proposed analytical model. The results show that low-power and stealthy Msg1 jamming can effectively block legitimate UE access in 5G/B5G systems.
Authors:Bintao Yuan, Mingsheng Tang, Binbin Ge, Hongbin Luo, Zijie Yan
Abstract:
As Low Earth Orbit (LEO) become mega-constellations critical infrastructure, attacks targeting them have grown in number and range. The security analysis of LEO constellations faces a fundamental paradigm gap: traditional topology-centric methods fail to capture systemic risks arising from dynamic load imbalances and high-order dependencies, which can transform localized failures into network-wide cascades. To address this, we propose HYDRA, a hypergraph-based dynamic risk analysis framework. Its core is a novel metric, Hyper-Bridge Centrality (HBC), which quantifies node criticality via a load-to-redundancy ratio within dependency structures. A primary challenge to resilience: the most critical vulnerabilities are not in the densely connected satellite core, but in the seemingly marginal ground-space interfaces. These are the system's "Black Swan" nodes--topologically peripheral yet structurally lethal. We validate this through extensive simulations using realistic StarLink TLE data and population-based gravity model. Experiments demonstrate that HBC consistently outperforms traditional metrics, identifying critical failure points that surpass the structural damage potential of even betweenness centrality. This work shifts the security paradigm from connectivity to structural stress, demonstrating that securing the network edge is paramount and necessitates a fundamental redesign of redundancy strategies.
Authors:Hema Karnam Surendrababu, Nithin Nagaraj
Abstract:
Machine Learning (ML) models, including Large Language Models (LLMs), are characterized by a range of system-level attributes such as security and reliability. Recent studies have demonstrated that ML models are vulnerable to multiple forms of security violations, among which backdoor data-poisoning attacks represent a particularly insidious threat, enabling unauthorized model behavior and systematic misclassification. In parallel, deficiencies in model reliability can manifest as hallucinations in LLMs, leading to unpredictable outputs and substantial risks for end users. In this work on Dependable Artificial Intelligence with Reliability and Security (DAIReS), we propose a novel unified approach based on Syndrome Decoding for the detection of both security and reliability violations in learning-based systems. Specifically, we adapt the syndrome decoding approach to the NLP sentence-embedding space, enabling the discrimination of poisoned and non-poisoned samples within ML training datasets. Additionally, the same methodology can effectively detect hallucinated content due to self referential meta explanation tasks in LLMs.
Authors:Mona Rajhans, Vishal Khawarey
Abstract:
Machine learning (ML) models are increasingly deployed in cybersecurity applications such as phishing detection and network intrusion prevention. However, these models remain vulnerable to adversarial perturbations small, deliberate input modifications that can degrade detection accuracy and compromise interpretability. This paper presents an empirical study of adversarial robustness and explainability drift across two cybersecurity domains phishing URL classification and network intrusion detection. We evaluate the impact of L (infinity) bounded Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) perturbations on model accuracy and introduce a quantitative metric, the Robustness Index (RI), defined as the area under the accuracy perturbation curve. Gradient based feature sensitivity and SHAP based attribution drift analyses reveal which input features are most susceptible to adversarial manipulation. Experiments on the Phishing Websites and UNSW NB15 datasets show consistent robustness trends, with adversarial training improving RI by up to 9 percent while maintaining clean-data accuracy. These findings highlight the coupling between robustness and interpretability degradation and underscore the importance of quantitative evaluation in the design of trustworthy, AI-driven cybersecurity systems.
Authors:Sergio Angulo Cosín, Javier Junquera-Sánchez, Carlos Hernando-Ramiro, José-Antonio Gómez-Sánchez
Abstract:
GNSSs are vulnerable to attacks of two kinds: jamming (i.e. denying access to the signal) and spoofing (i.e. impersonating a legitimate satellite). These attacks have been extensively studied, and we have a myriad of countermeasures to mitigate them. In this paper we expose a new type of attack: SpAmming, which combines both approaches to achieve the same effects in a more subtle way. Exploiting the CDMA multiplexing present in most GNSSs, and through a spoofing attack, this approach leads the receiver to lose access to the signal of a legitimate satellite, which would be equivalent to a denial of service; but in this case the existing countermeasures against jamming or spoofing would not allow safeguarding its effectiveness, as it is neither of them. An experimental proof-of-concept is presented in which its impact is evaluated as a function of the previous state of the receiver. Using an SDR-based system developed at the Space Security Centre, the attack is executed against a cold-started receiver, a warm-started receiver, and a receiver that has already acquired the PVT solution and is navigating. Different attack configurations are also tested, starting from a raw emission of the false signal, to surgical Doppler effect configuration, code offset, etc. Although it is shown to be particularly successful against cold-started receivers, the results show that it is also effective in other scenarios, especially if accompanied by other attacks. We will conclude the article by outlining possible countermeasures to detect and, eventually, counteract it; and possible avenues of research to better understand its impact, especially for authenticated services such as OSNMA, and to characterize it in order to improve the response to similar attacks.
Authors:Yi Liang, Jinguang Han
Abstract:
Despite the advantages of decentralization and immutability, blockchain technology faces significant scalability and throughput limitations, which has prompted the exploration of off-chain solutions like payment channels. Adaptor signatures have been considered a promising primitive for constructing such channels due to their support for atomicity, offering an alternative to traditional hash-timelock contracts. However, standard adaptor signatures may reveal signer identity, raising potential privacy concerns. While ring signatures can mitigate this issue by providing anonymity, they often introduce high communication overhead, particularly in multi-account payment settings commonly used in UTXO-based blockchains like Monero. To address these limitations, we propose a Linkable Threshold Ring Adaptor Signature (LTRAS) scheme, which integrates the conditional binding of adaptor signatures, the multi-account payment of threshold ring signatures, and the linkability for preventing double-spending. The formal definition, security model and concrete construction of LTRAS are provided. We also analyze its security and evaluate its performance through theoretical analysis and experimental implementation. Experimental results demonstrate that our scheme achieve significantly lower computation and communication overhead compared to existing schemes in large ring sizes and multi-account payment scenarios. Finally, we discuss its application in cross-chain atomic swaps, demonstrating its potential for enhancing privacy and efficiency in blockchain transactions.
Authors:Xiao Zhang, Juan Ignacio Ibañez, Jiahua Xu
Abstract:
Crypto-assets are a main segment of electronic markets, with growing trade volume and market share, yet there's no unified and comprehensive asset level taxonomy framework. This paper develops a multidimensional taxonomy for crypto-assets that connects technical design to market structure and regulation. Building on established taxonomy guideline and existing models, we derive dimensions from theory, regulatory frameworks, and case studies. We then map top 100 assets within the structure and provide several detailed case studies. The taxonomy covers technology standard, centralisation of critical resources, asset function, legal classification and mechanism designs of minting, yield, redemption. The asset mapping and case studies reveal recurring design patterns, capture features of edge cases that sit on boundaries of current categorisations, and document centralised control of nominal decentralised assets. This paper provides framework for systematic study for crypto markets, supports regulators in assessing token risks, and offers investors and digital platform designers a tool to compare assets when building or participate in electronic markets.
Authors:Joachim Schaeffer, Arjun Khandelwal, Tyler Tracy
Abstract:
Future AI deployments will likely be monitored for malicious behaviour. The ability of these AIs to subvert monitors by adversarially selecting against them - attack selection - is particularly concerning. To study this, we let a red team create attack policies that attempt to insert attacks into code without being caught by a monitor in the concentrated BigCodeBench backdooring setting. We decompose attack selection into two problems: mapping attacks to a quality score and mapping quality scores to submission probabilities. We frame attack selection as a classification problem and show that safety is significantly more sensitive to FPR than TPR. We find that prompting the attacker model to reason about the monitor while being cautious with attack selection reduces safety from a baseline of 99% to 59% at 0.5% auditing budget, emphasizing that eliciting attack selection capabilities of models is vital to avoid overly optimistic safety scores in control evaluations.
Authors:Andrew Draganov, Tolga H. Dur, Anandmayi Bhongade, Mary Phuong
Abstract:
We present a data poisoning attack -- Phantom Transfer -- with the property that, even if you know precisely how the poison was placed into an otherwise benign dataset, you cannot filter it out. We achieve this by modifying subliminal learning to work in real-world contexts and demonstrate that the attack works across models, including GPT-4.1. Indeed, even fully paraphrasing every sample in the dataset using a different model does not stop the attack. We also discuss connections to steering vectors and show that one can plant password-triggered behaviours into models while still beating defences. This suggests that data-level defences are insufficient for stopping sophisticated data poisoning attacks. We suggest that future work should focus on model audits and white-box security methods.
Authors:Vishruti Kakkad, Paul Chung, Hanan Hibshi, Maverick Woo
Abstract:
An exponential growth of Machine Learning and its Generative AI applications brings with it significant security challenges, often referred to as Adversarial Machine Learning (AML). In this paper, we conducted two comprehensive studies to explore the perspectives of industry professionals and students on different AML vulnerabilities and their educational strategies. In our first study, we conducted an online survey with professionals revealing a notable correlation between cybersecurity education and concern for AML threats. For our second study, we developed two CTF challenges that implement Natural Language Processing and Generative AI concepts and demonstrate a poisoning attack on the training data set. The effectiveness of these challenges was evaluated by surveying undergraduate and graduate students at Carnegie Mellon University, finding that a CTF-based approach effectively engages interest in AML threats. Based on the responses of the participants in our research, we provide detailed recommendations emphasizing the critical need for integrated security education within the ML curriculum.
Authors:Nikolas Melissaris, Jiayi Xu, Antigoni Polychroniadou, Akira Takahashi, Chenkai Weng
Abstract:
Gradient boosted decision trees, particularly XGBoost, are among the most effective methods for tabular data. As deployment in sensitive settings increases, cryptographic guarantees of model integrity become essential. We present ZKBoost, the first zero-knowledge proof of training (zkPoT) protocol for XGBoost, enabling model owners to prove correct training on a committed dataset without revealing data or parameters. We make three key contributions: (1) a fixed-point XGBoost implementation compatible with arithmetic circuits, enabling instantiation of efficient zkPoT, (2) a generic template of zkPoT for XGBoost, which can be instantiated with any general-purpose ZKP backend, and (3) vector oblivious linear evaluation (VOLE)-based instantiation resolving challenges in proving nonlinear fixed-point operations. Our fixed-point implementation matches standard XGBoost accuracy within 1\% while enabling practical zkPoT on real-world datasets.
Authors:Hoang Long Do, Nasrin Sohrabi, Muneeb Ul Hassan
Abstract:
Large language models (LLMs) have been widely adopted in modern software development lifecycles, where they are increasingly used to automate and assist code generation, significantly improving developer productivity and reducing development time. In the blockchain domain, developers increasingly rely on LLMs to generate and maintain smart contracts, the immutable, self-executing components of decentralized applications. Because deployed smart contracts cannot be modified, correctness and security are paramount, particularly in high-stakes domains such as finance and governance. Despite this growing reliance, the security implications of LLM-generated smart contracts remain insufficiently understood. In this work, we conduct a systematic security analysis of Solidity smart contracts generated by state-of-the-art LLMs, including ChatGPT, Gemini, and Sonnet. We evaluate these contracts against a broad set of known smart contract vulnerabilities to assess their suitability for direct deployment in production environments. Our extensive experimental study shows that, despite their syntactic correctness and functional completeness, LLM-generated smart contracts frequently exhibit severe security flaws that could be exploited in real-world settings. We further analyze and categorize these vulnerabilities, identifying recurring weakness patterns across different models. Finally, we discuss practical countermeasures and development guidelines to help mitigate these risks, offering actionable insights for both developers and researchers. Our findings aim to support safe integration of LLMs into smart contract development workflows and to strengthen the overall security of the blockchain ecosystem against future security failures.
Authors:Bibhabasu Mandal, Sagnik Nandy
Abstract:
In sensitive applications involving relational datasets, protecting information about individual links from adversarial queries is of paramount importance. In many such settings, the available data are summarized solely through the degrees of the nodes in the network. We adopt the $β$ model, which is the prototypical statistical model adopted for this form of aggregated relational information, and study the problem of minimax-optimal parameter estimation under both local and central differential privacy constraints. We establish finite sample minimax lower bounds that characterize the precise dependence of the estimation risk on the network size and the privacy parameters, and we propose simple estimators that achieve these bounds up to constants and logarithmic factors under both local and central differential privacy frameworks. Our results provide the first comprehensive finite sample characterization of privacy utility trade offs for parameter estimation in $β$ models, addressing the classical graph case and extending the analysis to higher order hypergraph models. We further demonstrate the effectiveness of our methods through experiments on synthetic data and a real world communication network.
Authors:Anjali Padmanabhan, Danya Arun Bindhu, Nujoom Sageer Karat, Shanuja Sasi
Abstract:
Decentralized Pliable Index Coding (DPIC) problem addresses efficient information exchange in distributed systems where clients communicate among themselves without a central server. An important consideration in DPIC is the heterogeneity of side-information and demand sizes. Although many prior works assume homogeneous settings with identical side-information cardinality and single message demands, these assumptions limit real-world applicability where clients typically possess unequal amounts of prior information. In this paper, we study DPIC problem under heterogeneous side-information cardinalities. We propose a transmission scheme that coordinates client broadcasts to maximize coding efficiency while ensuring that each client achieves a common target level $T$. In addition, we impose a strict security constraint that no client acquires more than the target $T$ number of messages, guaranteeing that each client ends up with exactly $T$ messages. We analyze the communication cost incurred by the proposed scheme under this security constraint.
Authors:Olha Jurečková, Martin Jureček
Abstract:
Malware detection and classification into families are critical tasks in cybersecurity, complicated by the continual evolution of malware to evade detection. This evolution introduces concept drift, in which the statistical properties of malware features change over time, reducing the effectiveness of static machine learning models. Understanding and explaining this drift is essential for maintaining robust and trustworthy malware detectors. In this paper, we propose an interpretable approach to concept drift detection. Our method uses a rule-based classifier to generate human-readable descriptions of both original and evolved malware samples belonging to the same malware family. By comparing the resulting rule sets using a similarity function, we can detect and quantify concept drift. Crucially, this comparison also identifies the specific features and feature values that have changed, providing clear explanations of how malware has evolved to bypass detection. Experimental results demonstrate that the proposed method not only accurately detects drift but also provides actionable insights into the behavior of evolving malware families, supporting both detection and threat analysis.
Authors:Danielle Jean Hanson, Jeremy Straub
Abstract:
Cyber insurance, which protects insured organizations against financial losses from cyberattacks and data breaches, can be difficult and expensive to obtain for many organizations. These difficulties stem from insurers difficulty in understanding and accurately assessing the risks that they are undertaking. Cybersecurity audits, which are already implemented in many organizations for compliance and other purposes, present a potential solution to this challenge. This paper provides a structured review and analysis of prior work in this area, analysis of the challenges and potential benefits that cyber audits provide and recommendations for the use of cyber audits to reduce cyber insurance costs and improve its availability.
Authors:Nirab Hossain, Pablo Moriano
Abstract:
Modern vehicles rely on electronic control units (ECUs) interconnected through the Controller Area Network (CAN), making in-vehicle communication a critical security concern. Machine learning (ML)-based intrusion detection systems (IDS) are increasingly deployed to protect CAN traffic, yet their robustness against adversarial manipulation remains largely unexplored. We present a systematic adversarial evaluation of CAN IDS using the ROAD dataset, comparing four shallow learning models with a deep neural network-based detector. Using protocol-compliant, payload-level perturbations generated via FGSM, BIM and PGD, we evaluate adversarial effects on both benign and malicious CAN frames. While all models achieve strong baseline performance under benign conditions, adversarial perturbations reveal substantial vulnerabilities. Although shallow and deep models are robust to false-alarm induction, with the deep neural network (DNN) performing best on benign traffic, all architectures suffer significant increases in missed attacks. Notably, under gradient-based attacks, the shallow model extra trees (ET) demonstrates improved robustness to missed-attack induction compared to the other models. Our results demonstrate that adversarial manipulation can simultaneously trigger false alarms and evade detection, underscoring the need for adversarial robustness evaluation in safety-critical automotive IDS.
Authors:Saurabh Anand, Shubham Malaviya, Manish Shukla, Sachin Lodha
Abstract:
Training AI models in cybersecurity with help of vast datasets offers significant opportunities to mimic real-world behaviors effectively. However, challenges like data drift and scarcity of labelled data lead to frequent updates of models and the risk of overfitting. To address these challenges, we used parameter-efficient fine-tuning techniques for pre-trained language models wherein we combine compacters with various layer freezing strategies. To enhance the capabilities of these pre-trained language models, in this work we introduce two strategies that use large language models. In the first strategy, we utilize large language models as data-labelling tools wherein they generate labels for unlabeled data. In the second strategy, large language modes are utilized as fallback mechanisms for predictions having low confidence scores. We perform comprehensive experimental analysis on the proposed strategies on different downstream tasks specific to cybersecurity domain. We empirically demonstrate that by combining parameter-efficient pre-trained models with large language models, we can improve the reliability and robustness of models, making them more suitable for real-world cybersecurity applications.
Authors:Pedro Camponês, Hugo Pereira, Adrian Persaud, Kevin Gallagher, Santiago Torres-Arias
Abstract:
In traditional access control policies, every access granted and administrative account introduces an additional vulnerability, as a corruption of a high-privilege user can compromise several sensitive files. Privocracy is an access control mechanism that minimizes the need to attribute high privileges by triggering a secure e-voting procedure to run commands that require using sensitive resources. With Privocracy an organization can distribute trust in resource access, minimizing the system vulnerabilities from single points of failure, all while maintaining the high flexibility of discretionary access control policies. The Privocracy voting mechanism achieves everlasting privacy, ensuring votes remain confidential regardless of an adversary's computational power, while addressing the dependability requirements of a practical and secure system. The procedure incorporates useful features such as vote delegation to reduce voter fatigue, rapid voting rounds to enable quick action during emergencies, and selective vote auditing for application-level accountability. Our experimental results demonstrate that Privocracy processes votes efficiently and can be deployed on commodity hardware.
Authors:Mehul Goenka, Tejas Pathak, Siddharth Asthana
Abstract:
The global economy is entering the era of Agentic Commerce, where autonomous agents can discover services, negotiate prices, and transact value. However adoption towards agentic commerce faces a foundational trust gap: current systems are built for direct human interactions rather than agent-driven operations. It lacks core primitives across three critical stages of agentic transactions. First, Task Delegation lacks means to translate user intent into defined scopes, discover appropriate agents, and securely authorize actions. Second, Payment Settlement for tasks is processed before execution, lacking verifiable evidence to validate the agent's work. Third, Audit Mechanisms fail to capture the full transaction lifecycle, preventing clear accountability for disputes. While emerging standards address fragments of this trust gap, there still remains a critical need for a unified infrastructure that binds the entire transaction lifecycle. To resolve this gap, we introduce TessPay, a unified infrastructure that replaces implicit trust with a 'Verify-then-Pay' architecture. It is a two plane architecture separating control and verification from settlement. TessPay operationalizes trust across four distinct stages: Before execution, agents are anchored in a canonical registry and user intent is captured as verifiable mandates, enabling stakeholder accountability. During execution, funds are locked in escrow while the agent executes the task and generates cryptographic evidence (TLS Notary, TEE etc.) to support Proof of Task Execution (PoTE). At settlement, the system verifies this evidence and releases funds only when the PoTE satisfies verification predicates; modular rail adapters ensure this PoTE-gated escrow remains chain-agnostic across heterogeneous payment rails. After settlement, TessPay preserves a tamper-evident audit trail to enable clear accountability for dispute resolution.
Authors:David Ribeiro Alves, Vishnu Patankar, Matheus Pereira, Jamie Stephens, Nima Vaziri, Sreeram Kannan
Abstract:
EigenAI is a verifiable AI platform built on top of the EigenLayer restaking ecosystem. At a high level, it combines a deterministic large-language model (LLM) inference engine with a cryptoeconomically secured optimistic re-execution protocol so that every inference result can be publicly audited, reproduced, and, if necessary, economically enforced. An untrusted operator runs inference on a fixed GPU architecture, signs and encrypts the request and response, and publishes the encrypted log to EigenDA. During a challenge window, any watcher may request re-execution through EigenVerify; the result is then deterministically recomputed inside a trusted execution environment (TEE) with a threshold-released decryption key, allowing a public challenge with private data. Because inference itself is bit-exact, verification reduces to a byte-equality check, and a single honest replica suffices to detect fraud. We show how this architecture yields sovereign agents -- prediction-market judges, trading bots, and scientific assistants -- that enjoy state-of-the-art performance while inheriting security from Ethereum's validator base.
Authors:Vinayak Jain, Sneha Sudhakaran, Saranyan Senthivel
Abstract:
The reliability of cyber forensic evidence acquisition is strongly influenced by the underlying operating systems, Windows, macOS, and Linux - due to inherent variations in file system structures, encryption protocols, and forensic tool compatibility. Disk forensics, one of the most widely used techniques in digital investigations, faces distinct obstacles on each platform. Windows, with its predominantly NTFS and FAT file systems, typically supports reliable disk imaging and analysis through established tools such as FTK Imager and Autopsy/Sleuth Kit. However, encryption features frequently pose challenges to evidence acquisition. Conversely, Linux environments, which rely on file systems like ext4 and XFS, generally offer greater transparency, yet the transient nature of log retention often complicates forensic analysis. In instances where anti-forensic strategies such as encryption and compression render traditional disk forensics insufficient, memory forensics becomes crucial. While memory forensic methodologies demonstrate robustness across Windows and Linux platforms forms through frameworks like Volatility, platform-specific difficulties persist. Memory analysis on Linux systems benefits from tools like LiME, snapshot utilities, and dd for memory acquisition; nevertheless, live memory acquisition on Linux can still present challenges. This research systematically assesses both disk and memory forensic acquisition techniques across samples representing Windows and Linux systems. By identifying effective combinations of forensic tools and configurations tailored to each operating system, the study aims to improve the accuracy and reliability of evidence collection. It further evaluates current forensic tools and highlights a persistent gap: consistently assuring forensic input reliability and footprint integrity.
Authors:Joshua J Bon, James Bailie, Judith Rousseau, Christian P Robert
Abstract:
We propose a novel framework for measuring privacy from a Bayesian game-theoretic perspective. This framework enables the creation of new, purpose-driven privacy definitions that are rigorously justified, while also allowing for the assessment of existing privacy guarantees through game theory. We show that pure and probabilistic differential privacy are special cases of our framework, and provide new interpretations of the post-processing inequality in this setting. Further, we demonstrate that privacy guarantees can be established for deterministic algorithms, which are overlooked by current privacy standards.
Authors:Charles Westphal, Keivan Navaie, Fernando E. Rosas
Abstract:
Fine-tuned LLMs can covertly encode prompt secrets into outputs via steganographic channels. Prior work demonstrated this threat but relied on trivially recoverable encodings. We formalize payload recoverability via classifier accuracy and show previous schemes achieve 100\% recoverability. In response, we introduce low-recoverability steganography, replacing arbitrary mappings with embedding-space-derived ones. For Llama-8B (LoRA) and Ministral-8B (LoRA) trained on TrojanStego prompts, exact secret recovery rises from 17$\rightarrow$30\% (+78\%) and 24$\rightarrow$43\% (+80\%) respectively, while on Llama-70B (LoRA) trained on Wiki prompts, it climbs from 9$\rightarrow$19\% (+123\%), all while reducing payload recoverability. We then discuss detection. We argue that detecting fine-tuning-based steganographic attacks requires approaches beyond traditional steganalysis. Standard approaches measure distributional shift, which is an expected side-effect of fine-tuning. Instead, we propose a mechanistic interpretability approach: linear probes trained on later-layer activations detect the secret with up to 33\% higher accuracy in fine-tuned models compared to base models, even for low-recoverability schemes. This suggests that malicious fine-tuning leaves actionable internal signatures amenable to interpretability-based defenses.
Authors:Md Zahurul Haque, Md. Hafizur Rahman, Yeahyea Sarker
Abstract:
Understanding user behavior is essential for improving digital experiences, optimizing business conversions, and mitigating threats like account takeovers, fraud, and bot attacks. Most platforms separate product analytics and security, creating fragmented visibility and delayed threat detection. Trackly, a scalable SaaS platform, unifies comprehensive user behavior analytics with real time, rule based anomaly detection. It tracks sessions, IP based geo location, device browser fingerprints, and granular events such as page views, add to cart, and checkouts. Suspicious activities logins from new devices or locations, impossible travel (Haversine formula), rapid bot like actions, VPN proxy usage, or multiple accounts per IP are flagged via configurable rules with weighted risk scoring, enabling transparent, explainable decisions. A real time dashboard provides global session maps, DAU MAU, bounce rates, and session durations. Integration is simplified with a lightweight JavaScript SDK and secure REST APIs. Implemented on a multi tenant microservices stack (ASP.NET Core, MongoDB, RabbitMQ, Next.js), Trackly achieved 98.1% accuracy, 97.7% precision, and 2.25% false positives on synthetic datasets, proving its efficiency for SMEs and ecommerce.
Authors:Yilong Huang, Songze Li
Abstract:
Diffusion-based face swapping achieves state-of-the-art performance, yet it also exacerbates the potential harm of malicious face swapping to violate portraiture right or undermine personal reputation. This has spurred the development of proactive defense methods. However, existing approaches face a core trade-off: large perturbations distort facial structures, while small ones weaken protection effectiveness. To address these issues, we propose FaceDefense, an enhanced proactive defense framework against diffusion-based face swapping. Our method introduces a new diffusion loss to strengthen the defensive efficacy of adversarial examples, and employs a directional facial attribute editing to restore perturbation-induced distortions, thereby enhancing visual imperceptibility. A two-phase alternating optimization strategy is designed to generate final perturbed face images. Extensive experiments show that FaceDefense significantly outperforms existing methods in both imperceptibility and defense effectiveness, achieving a superior trade-off.
Authors:Ivan K. Tung, Yu Xiang Shi, Alex Chien, Wenkai Liu, Lawrence Zheng
Abstract:
Creating attack paths for cyber defence exercises requires substantial expert effort. Existing automation requires vulnerability graphs or exploit sets curated in advance, limiting where it can be applied. We present AEGIS, a system that generates attack paths using LLMs, white-box access, and Monte Carlo Tree Search over real exploit execution. LLM-based search discovers exploits dynamically without pre-existing vulnerability graphs, while white-box access enables validating exploits in isolation before committing to attack paths. Evaluation at CIDeX 2025, a large-scale exercise spanning 46 IT hosts, showed that AEGIS-generated paths are comparable to human-authored scenarios across four dimensions of training experience (perceived learning, engagement, believability, challenge). Results were measured with a validated questionnaire extensible to general simulation-based training. By automating exploit chain discovery and validation, AEGIS reduces scenario development from months to days, shifting expert effort from technical validation to scenario design.
Authors:Tanusree Debi, Wentian Zhu
Abstract:
Large language model (LLM) based agents are increasingly used to automate financial transactions, yet their reliance on contextual reasoning exposes payment systems to prompt-driven manipulation. The Agent Payments Protocol (AP2) aims to secure agent-led purchases through cryptographically verifiable mandates, but its practical robustness remains underexplored. In this work, we perform an AI red-teaming evaluation of AP2 and identify vulnerabilities arising from indirect and direct prompt injection. We introduce two attack techniques, the Branded Whisper Attack and the Vault Whisper Attack which manipulate product ranking and extract sensitive user data. Using a functional AP2 based shopping agent built with Gemini-2.5-Flash and the Google ADK framework, we experimentally validate that simple adversarial prompts can reliably subvert agent behavior. Our findings reveal critical weaknesses in current agentic payment architectures and highlight the need for stronger isolation and defensive safeguards in LLM-mediated financial systems.
Authors:Pedro H. Barcha Correia, Ryan W. Achjian, Diego E. G. Caetano de Oliveira, Ygor Acacio Maria, Victor Takashi Hayashi, Marcos Lopes, Charles Christian Miers, Marcos A. Simplicio
Abstract:
The rapid advancement and widespread adoption of generative artificial intelligence (GenAI) and large language models (LLMs) has been accompanied by the emergence of new security vulnerabilities and challenges, such as jailbreaking and other prompt injection attacks. These maliciously crafted inputs can exploit LLMs, causing data leaks, unauthorized actions, or compromised outputs, for instance. As both offensive and defensive prompt injection techniques evolve quickly, a structured understanding of mitigation strategies becomes increasingly important. To address that, this work presents the first systematic literature review on prompt injection mitigation strategies, comprehending 88 studies. Building upon NIST's report on adversarial machine learning, this work contributes to the field through several avenues. First, it identifies studies beyond those documented in NIST's report and other academic reviews and surveys. Second, we propose an extension to NIST taxonomy by introducing additional categories of defenses. Third, by adopting NIST's established terminology and taxonomy as a foundation, we promote consistency and enable future researchers to build upon the standardized taxonomy proposed in this work. Finally, we provide a comprehensive catalog of the reviewed prompt injection defenses, documenting their reported quantitative effectiveness across specific LLMs and attack datasets, while also indicating which solutions are open-source and model-agnostic. This catalog, together with the guidelines presented herein, aims to serve as a practical resource for researchers advancing the field of adversarial machine learning and for developers seeking to implement effective defenses in production systems.
Authors:Abylay Satybaldy, Kamil Tylinski, Jiahua Xu
Abstract:
Decentralized Identifiers (DIDs) are increasingly deployed on distributed ledgers, yet systematic cross-platform evidence on their operational behavior remains limited. We present an empirical benchmarking study of three prominent ledger-based DID methods - Ethereum, Hedera, and XRP Ledger - using reference Software Development Kits (SDKs) under a unified experimental setup. We measure latency, transaction cost, and on-chain metadata exposure, normalizing latency by each platform's block or consensus interval and cost by its native value transfer fee. Privacy leakage is quantified using a Metadata-Leakage Score (MLS), an entropy-based measure expressed in bits per operation. Our results reveal distinct architectural trade-offs. Ethereum enables near-instant, off-chain DID creation, but incurs the highest latency and cost for on-chain lifecycle operations. XRPL delivers deterministic and stable latency with fixed, low fees, yet exhibits higher metadata leakage due to more verbose transaction payloads. Hedera achieves the lowest on-chain latency and low fees with minimal metadata leakage, while occasional variance arises from SDK-side processing and confirmation pipelines. Overall, the findings show that ledger architecture and SDK workflows play a major role in shaping DID latency, cost, and metadata exposure, complementing the effects of the underlying consensus mechanism. These results provide evidence-based insights to support informed selection and configuration of DID systems under performance and privacy constraints.
Authors:David Schmidt, Sebastian Schrittwieser, Edgar Weippl
Abstract:
Dependency management systems are a critical component in software development, enabling projects to incorporate existing functionality efficiently. However, misconfigurations and malicious actors in these systems pose severe security risks, leading to supply chain attacks. Despite the widespread use of smartphone apps, the security of dependency management systems in the iOS software supply chain has received limited attention. In this paper, we focus on CocoaPods, one of the most widely used dependency management systems for iOS app development, but also examine the security of Carthage and Swift Package Manager (SwiftPM). We demonstrate that iOS apps expose internal package names and versions. Attackers can exploit this leakage to register previously unclaimed dependencies in CocoaPods, enabling remote code execution (RCE) on developer machines and build servers. Additionally, we show that attackers can compromise dependencies by reclaiming abandoned domains and GitHub URLs. Analyzing a dataset of 9,212 apps, we quantify how many apps are susceptible to these vulnerabilities. Further, we inspect the use of vulnerable dependencies within public GitHub repositories. Our findings reveal that popular apps disclose internal dependency information, enabling dependency confusion attacks. Furthermore, we show that hijacking a single CocoaPod library through an abandoned domain could compromise 63 iOS apps, affecting millions of users. Finally, we compare iOS dependency management systems with Cargo, Go modules, Maven, npm, and pip to discuss mitigation strategies for the identified threats.
Authors:Philipp Mao, Li Shi, Marcel Busch, Mathias Payer
Abstract:
Mobile devices rely on Trusted Execution Environments (TEEs) to execute security-critical code and protect sensitive assets. This security-critical code is modularized in components known as Trusted Applications (TAs). Vulnerabilities in TAs can compromise the TEE and, thus, the entire system. However, the closed-source nature and fragmentation of mobile TEEs severely hinder dynamic analysis of TAs, limiting testing efforts to mostly static analyses. This paper presents TÄMU, a rehosting platform enabling dynamic analysis of TAs, specifically fuzzing and debugging, by interposing their execution at the API layer. To scale to many TAs across different TEEs, TÄMU leverages the standardization of TEE APIs, driven by the GlobalPlatform specifications. For the remaining TEE-specific APIs not shared across different TEEs, TÄMU introduces the notion of greedy high-level emulation, a technique that allows prioritizing manual rehosting efforts based on the potential coverage gain during fuzzing. We implement TÄMU and use it to emulate 67 TAs across four TEEs. Our fuzzing campaigns yielded 17 zero-day vulnerabilities across 11 TAs. These results indicate a deficit of dynamic analysis capabilities across the TEE ecosystem, where not even vendors with source code unlocked these capabilities for themselves. TÄMU promises to close this gap by bringing effective and practical dynamic analysis to the mobile TEE domain.
Authors:Mohamed Amine Legheraba, Nour Rachdi, Maria Gradinariu Potop-Butucaru, Sébastien Tixeuil
Abstract:
Recently, a novel peer sampling protocol, Elevator, was introduced to construct network topologies tailored for emerging decentralized applications such as federated learning and blockchain. Elevator builds hub-based topologies in a fully decentralized manner, randomly selecting hubs among participating nodes. These hubs, acting as central nodes connected to the entire network, can be leveraged to accelerate message dissemination. Simulation results have shown that Elevator converges rapidly (within 3--4 cycles) and exhibits robustness against crash failures and churn. However, its resilience to Byzantine adversaries has not been investigated. In this work, we provide the first evaluation of Elevator under Byzantine adversaries and show that even a small fraction (2%) of Byzantine nodes is sufficient to subvert the network. As a result, we introduce LIFT, a new protocol that extends Elevator by employing a cryptographically secure pseudo-random number generator (PRNG) for hub selection, thereby mitigating Byzantine manipulation. In contrast, LIFT withstands adversarial infiltration and remains robust with up to 10% Byzantine nodes. These results highlight the necessity of secure randomness in decentralized hub formation and position LIFT as a more reliable building block for Byzantine-resilient decentralized systems.
Authors:Holly Trikilis, Pasindu Marasinghe, Fariza Rashid, Suranga Seneviratne
Abstract:
Phishing continues to be one of the most prevalent attack vectors, making accurate classification of phishing URLs essential. Recently, large language models (LLMs) have demonstrated promising results in phishing URL detection. However, their reasoning capabilities that enabled such performance remain underexplored. To this end, in this paper, we propose a Least-to-Most prompting framework for phishing URL detection. In particular, we introduce an "answer sensitivity" mechanism that guides Least-to-Most's iterative approach to enhance reasoning and yield higher prediction accuracy. We evaluate our framework using three URL datasets and four state-of-the-art LLMs, comparing against a one-shot approach and a supervised model. We demonstrate that our framework outperforms the one-shot baseline while achieving performance comparable to that of the supervised model, despite requiring significantly less training data. Furthermore, our in-depth analysis highlights how the iterative reasoning enabled by Least-to-Most, and reinforced by our answer sensitivity mechanism, drives these performance gains. Overall, we show that this simple yet powerful prompting strategy consistently outperforms both one-shot and supervised approaches, despite requiring minimal training or few-shot guidance. Our experimental setup can be found in our Github repository github.sydney.edu.au/htri0928/least-to-most-phishing-detection.
Authors:Shun Takagi, Seng Pei Liew
Abstract:
Shuffling is a powerful way to amplify privacy of a local randomizer in private distributed data analysis, but existing analyses mostly treat the local differential privacy (DP) parameter $\varepsilon_0$ as the only knob and give generic upper bounds that can be loose and do not even characterize how shuffling amplifies privacy for basic mechanisms such as the Gaussian mechanism. We revisit the privacy blanket bound of Balle et al. (the blanket divergence) and develop an asymptotic analysis that applies to a broad class of local randomizers under mild regularity assumptions, without requiring pure local DP. Our key finding is that the leading term of the blanket divergence depends on the local mechanism only through a single scalar parameter $χ$, which we call the shuffle index. By applying this asymptotic analysis to both upper and lower bounds, we obtain a tight band for $δ_n$ in the shuffled mechanism's $(\varepsilon_n,δ_n)$-DP guarantee. Moreover, we derive a simple structural necessary and sufficient condition on the local randomizer under which the blanket-divergence-based upper and lower bounds coincide asymptotically. $k$-RR families with $k\ge3$ satisfy this condition, while for generalized Gaussian mechanisms the condition may not hold but the resulting band remains tight. Finally, we complement the asymptotic theory with an FFT-based algorithm for computing the blanket divergence at finite $n$, which offers rigorously controlled relative error and near-linear running time in $n$, providing a practical numerical analysis for shuffle DP.
Authors:Razvan Barbulescu, Mugurel Barcau, Vicentiu Pasol, George C. Turcas
Abstract:
In this work we study quantitative existence results for genus-$2$ curves over $\mathbb{Q}$ whose Jacobians have Mordell-Weil rank at least $1$ or $2$, ordering the curves by the naive height of their integral Weierstrass models. We use geometric techniques to show that asymptotically the Jacobians of almost all integral models with two rational points at infinity have rank $r \geq 1$. Since there are $\asymp X^{\frac{13}{2}}$ such models among the $X^7$ curves $y^2=f(x)$ of height $\leq X$, this yields a lower bound of logarithmic density $13/14$ for the subset of rank $r \geq 1$. We further present a large explicit subfamily where Jacobians have ranks $r \geq 2$, yielding an unconditional logarithmic density of at least $5/7$. Independently, we give a construction of genus-$2$ curves with split Jacobian and rank $2$, producing a subfamily of logarithmic density at least $ 2/21$. Finally, we analyze quadratic and biquadratic twist families in the split-Jacobian setting, obtaining a positive proportion of rank-$2$ twists. These results have implications for Regev's quantum algorithm in hyperelliptic curve cryptography.
Authors:Nazmul Islam, Mohammad Zulkernine
Abstract:
The rapid expansion of internet of things (IoT) devices have created a pervasive ecosystem where encrypted wireless communications serve as the primary privacy and security protection mechanism. While encryption effectively protects message content, packet metadata and statistics inadvertently expose device identities and user contexts. Various studies have exploited raw packet statistics and their visual representations for device fingerprinting and identification. However, these approaches remain confined to the spatial domain with limited feature representation. Therefore, this paper presents CONTEX-T, a novel framework that exploits contextual privacy vulnerabilities using spectral representation of encrypted wireless traffic for IoT device characterization. The experiments show that spectral analysis provides new and rich feature representation for covert reconnaissance attacks, revealing a complex and expanding threat landscape that would require robust countermeasures for IoT security management. CONTEXT-T first transforms raw packet length sequences into time-frequency spectral representations and then utilizes transformer-based spectral analysis for the device identification. We systematically evaluated multiple spectral representation techniques and transformer-based models across encrypted traffic samples from various IoT devices. CONTEXT-T effectively exploited privacy vulnerabilities and achieved device classification accuracy exceeding 99% across all devices while remaining completely passive and undetectable.
Authors:Jannik Albrecht, Ghassan Karame
Abstract:
While the literature features a number of proposals to defend against transaction manipulation attacks, existing proposals are still not integrated within large blockchains, such as Bitcoin, Ethereum, and Cardano. Instead, the user community opted to rely on more practical but ad-hoc solutions (such as Mempool.space) that aim at detecting censorship and transaction displacement attacks by auditing discrepancies in the mempools of so-called observers. In this paper, we precisely analyze, for the first time, the interplay between mempool auditing and the ability to detect censorship and transaction displacement attacks by malicious miners in Bitcoin and Ethereum. Our analysis shows that mempool auditing can result in mis-accusations against miners with a probability larger than 25% in some settings. On a positive note, however, we show that mempool auditing schemes can successfully audit the execution of any two transactions (with an overwhelming probability of 99.9%) if they are consistently received by all observers and sent at least 30 seconds apart from each other. As a direct consequence, our findings show, for the first time, that batch-order fair-ordering schemes can offer only strong fairness guarantees for a limited subset of transactions in real-world deployments.
Authors:Sneha Sudhakaran, Naresh Kshetri
Abstract:
In an era where cyber threats are rapidly evolving, the reliability of cyber forensic analysis has become increasingly critical for effective digital investigations and cybersecurity responses. AI agents are being adopted across digital forensic practices due to their ability to automate processes such as anomaly detection, evidence classification, and behavioral pattern recognition, significantly enhancing scalability and reducing investigation timelines. However, the characteristics that make AI indispensable also introduce notable risks. AI systems, often trained on biased or incomplete datasets, can produce misleading results, including false positives and false negatives, thereby jeopardizing the integrity of forensic investigations. This study presents a meticulous comparative analysis of the effectiveness of the most used AI agent, ChatGPT, and human forensic investigators in the realm of cyber forensic analysis. Our research reveals critical limitations within AI-driven approaches, demonstrating scenarios in which sophisticated or novel cyber threats remain undetected due to the rigid pattern-based nature of AI systems. Conversely, our analysis highlights the crucial role that human forensic investigators play in mitigating these risks. Through adaptive decision-making, ethical reasoning, and contextual understanding, human investigators effectively identify subtle anomalies and threats that may evade automated detection systems. To reinforce our findings, we conducted comprehensive reliability testing of forensic techniques using multiple cyber threat scenarios. These tests confirmed that while AI agents significantly improve the efficiency of routine analyses, human oversight remains crucial in ensuring accuracy and comprehensiveness of the results.
Authors:Mohammad Shamim Ahsan, Peng Liu
Abstract:
In the network security domain, due to practical issues -- including imbalanced data and heterogeneous legitimate network traffic -- adversarial attacks in machine learning-based NIDSs have been viewed as attack packets misclassified as benign. Due to this prevailing belief, the possibility of (maliciously) perturbed benign packets being misclassified as attack has been largely ignored. In this paper, we demonstrate that this is not only theoretically possible, but also a particular threat to NIDS. In particular, we uncover a practical cyberattack, FPR manipulation attack (FPA), especially targeting industrial IoT networks, where domain-specific knowledge of the widely used MQTT protocol is exploited and a systematic simple packet-level perturbation is performed to alter the labels of benign traffic samples without employing traditional gradient-based or non-gradient-based methods. The experimental evaluations demonstrate that this novel attack results in a success rate of 80.19% to 100%. In addition, while estimating impacts in the Security Operations Center, we observe that even a small fraction of false positive alerts, irrespective of different budget constraints and alert traffic intensities, can increase the delay of genuine alerts investigations up to 2 hr in a single day under normal operating conditions. Furthermore, a series of relevant statistical and XAI analyses is conducted to understand the key factors behind this remarkable success. Finally, we explore the effectiveness of the FPA packets to enhance models' robustness through adversarial training and investigate the changes in decision boundaries accordingly.
Authors:Qinghui Zhang, Xiaojun Chen, Yansong Zhang, Xudong Chen
Abstract:
Most existing secure neural network inference protocols based on secure multi-party computation (MPC) typically support at most four participants, demonstrating severely limited scalability. Liu et al. (USENIX Security'24) presented the first relatively practical approach by utilizing Shamir secret sharing with Mersenne prime fields. However, when processing deeper neural networks such as VGG16, their protocols incur substantial communication overhead, resulting in particularly significant latency in wide-area network (WAN) environments. In this paper, we propose a high-throughput and scalable MPC protocol for neural network inference against semi-honest adversaries in the honest-majority setting. The core of our approach lies in leveraging packed Shamir secret sharing (PSS) to enable parallel computation and reduce communication complexity. The main contributions are three-fold: i) We present a communication-efficient protocol for vector-matrix multiplication, based on our newly defined notion of vector-matrix multiplication-friendly random share tuples. ii) We design the filter packing approach that enables parallel convolution. iii) We further extend all non-linear protocols based on Shamir secret sharing to the PSS-based protocols for achieving parallel non-linear operations. Extensive experiments across various datasets and neural networks demonstrate the superiority of our approach in WAN. Compared to Liu et al. (USENIX Security'24), our scheme reduces the communication upto 5.85x, 11.17x, and 6.83x in offline, online and total communication overhead, respectively. In addition, our scheme is upto 1.59x, 2.61x, and 1.75x faster in offline, online and total running time, respectively.
Authors:Mohoshin Ara Tahera, Sabbir Rahman, Shuvalaxmi Dass, Sharif Ullah, Mahmoud Abouyessef
Abstract:
Federated real-time object detection using transformers in Intelligent Transportation Systems (ITS) faces three major challenges: (1) missing-class non-IID data heterogeneity from geographically diverse traffic environments, (2) latency constraints on edge hardware for high-capacity transformer models, and (3) privacy and security risks from untrusted client updates and centralized aggregation. We propose BlockSecRT-DETR, a BLOCKchain-SECured Real-Time Object DEtection TRansformer framework for ITS that provides a decentralized, token-efficient, and privacy-preserving federated training solution using RT-DETR transformer, incorporating a blockchain-secured update validation mechanism for trustworthy aggregation. In this framework, challenges (1) and (2) are jointly addressed through a unified client-side design that integrates RT-DETR training with a Token Engineering Module (TEM). TEM prunes low-utility tokens, reducing encoder complexity and latency on edge hardware, while aggregated updates mitigate non-IID data heterogeneity across clients. To address challenge (3), BlockSecRT-DETR incorporates a decentralized blockchain-secured update validation mechanism that enables tamper-proof, privacy-preserving, and trust-free authenticated model aggregation without relying on a central server. We evaluated the proposed framework under a missing-class Non-IID partition of the KITTI dataset and conducted a blockchain case study to quantify security overhead. TEM improves inference latency by 17.2% and reduces encoder FLOPs by 47.8%, while maintaining global detection accuracy (89.20% mAP@0.5). The blockchain integration adds 400 ms per round, and the ledger size remains under 12 KB due to metadata-only on-chain storage.
Authors:Anirudh Sekar, Mrinal Agarwal, Rachel Sharma, Akitsugu Tanaka, Jasmine Zhang, Arjun Damerla, Kevin Zhu
Abstract:
Prompt injection attacks have become an increasing vulnerability for LLM applications, where adversarial prompts exploit indirect input channels such as emails or user-generated content to circumvent alignment safeguards and induce harmful or unintended outputs. Despite advances in alignment, even state-of-the-art LLMs remain broadly vulnerable to adversarial prompts, underscoring the urgent need for robust, productive, and generalizable detection mechanisms beyond inefficient, model-specific patches. In this work, we propose Zero-Shot Embedding Drift Detection (ZEDD), a lightweight, low-engineering-overhead framework that identifies both direct and indirect prompt injection attempts by quantifying semantic shifts in embedding space between benign and suspect inputs. ZEDD operates without requiring access to model internals, prior knowledge of attack types, or task-specific retraining, enabling efficient zero-shot deployment across diverse LLM architectures. Our method uses adversarial-clean prompt pairs and measures embedding drift via cosine similarity to capture subtle adversarial manipulations inherent to real-world injection attacks. To ensure robust evaluation, we assemble and re-annotate the comprehensive LLMail-Inject dataset spanning five injection categories derived from publicly available sources. Extensive experiments demonstrate that embedding drift is a robust and transferable signal, outperforming traditional methods in detection accuracy and operational efficiency. With greater than 93% accuracy in classifying prompt injections across model architectures like Llama 3, Qwen 2, and Mistral and a false positive rate of <3%, our approach offers a lightweight, scalable defense layer that integrates into existing LLM pipelines, addressing a critical gap in securing LLM-powered systems to withstand adaptive adversarial threats.
Authors:Nghia T. Le, Alan Ritter, Kartik Goyal
Abstract:
We demonstrate that while the current approaches for language model watermarking are effective for open-ended generation, they are inadequate at watermarking LM outputs for constrained generation tasks with low-entropy output spaces. Therefore, we devise SeqMark, a sequence-level watermarking algorithm with semantic differentiation that balances the output quality, watermark detectability, and imperceptibility. It improves on the shortcomings of the prevalent token-level watermarking algorithms that cause under-utilization of the sequence-level entropy available for constrained generation tasks. Moreover, we identify and improve upon a different failure mode we term region collapse, associated with prior sequence-level watermarking algorithms. This occurs because the pseudorandom partitioning of semantic space for watermarking in these approaches causes all high-probability outputs to collapse into either invalid or valid regions, leading to a trade-off in output quality and watermarking effectiveness. SeqMark instead, differentiates the high-probable output subspace and partitions it into valid and invalid regions, ensuring the even spread of high-quality outputs among all the regions. On various constrained generation tasks like machine translation, code generation, and abstractive summarization, SeqMark substantially improves watermark detection accuracy (up to 28% increase in F1) while maintaining high generation quality.
Authors:Sirui Shen, Zunchen Huang, Chenglu Jin
Abstract:
The modern integrated circuit ecosystem is increasingly reliant on third-party intellectual property integration, which introduces security risks, including hardware Trojans and security vulnerabilities. Addressing the resulting trust deadlock between IP vendors and system integrators without exposing proprietary designs requires novel privacy-preserving verification techniques. However, existing privacy-preserving hardware verification methods are all simulation-based and fail to offer formal guarantees. In this paper, we propose ZK-CEC, the first privacy-preserving framework for hardware formal verification. By combining formal verification and zero-knowledge proof (ZKP), ZK-CEC establishes a foundation for formally verifying IP correctness and security without compromising the confidentiality of the designs. We observe that existing zero-knowledge protocols for formal verification are designed to prove statements of public formulas. However, in a privacy-preserving verification context where the formula is secret, these protocols cannot prevent a malicious prover from forging the formula, thereby compromising the soundness of the verification. To address these gaps, we first propose a blueprint for proving the unsatisfiability of a secret design against a public constraint, which is widely applicable to proving properties in software, hardware, and cyber-physical systems. Based on the proposed blueprint, we construct ZK-CEC, which enables a prover to convince the verifier that a secret IP's functionality aligns perfectly with the public specification in zero knowledge, revealing only the length and width of the proof. We implement ZK-CEC and evaluate its performance across various circuits, including arithmetic units and cryptographic components. Experimental results show that ZK-CEC successfully verifies practical designs, such as the AES S-Box, within practical time limits.
Authors:Grazia D'Onghia, Diana Gratiela Berbecaru, Antonio Lioy
Abstract:
As the quantum computing era approaches, securing classical cryptographic protocols becomes imperative. Public key cryptography is widely used for signature and key exchange but it is the type of cryptography more threatened by quantum computing. Its application typically requires support via a public-key certificate, which is a signed data structure and must therefore face twice the quantum challenge: for the certified keys and for the signature itself. We present the latest developments in selecting robust Post-Quantum algorithms and investigate their applicability in the Public Key Infrastructure context. Our contribution entails defining requirements for a secure transition to a quantum-resistant Public Key Infrastructure, with a focus on adaptations for the X.509 certificate format. Additionally, we explore transitioning Certificate Revocation List and Online Certificate Status Protocol to support quantum-resistant algorithms. Through comparative analysis, we elucidate the complex transition to a quantum-resistant PKI.
Authors:Grazia D'Onghia, Antonio Lioy
Abstract:
Trust is the core building block of secure systems, and it is enforced through methods to ensure that a specific system is properly configured and works as expected. In this context, a Root of Trust (RoT) establishes a trusted environment, where both data and code are authenticated via a digital signature based on asymmetric cryptography, which is vulnerable to the threat posed by Quantum Computers (QCs). Firmware, being the first layer of trusted software, faces unique risks due to its longevity and difficult update. The transition of firmware protection to Post-Quantum Cryptography (PQC) is urgent, since it reduces the risk derived from exposing all computing and network devices to quantum-based attacks. This paper offers an analysis of the most common trust techniques and their roadmap towards a Post-Quantum (PQ) world, by investigating the current status of PQC and the challenges posed by such algorithms in existing Trusted Computing (TC) solutions from an integration perspective. Furthermore, this paper proposes an architecture for TC techniques enhanced with PEC, addressing the imperative for immediate adoption of quantum-resistant algorithms.
Authors:Xinrui Zhang, Pincan Zhao, Jason Jaskolka, Heng Li, Rongxing Lu
Abstract:
Machine Learning (ML) has emerged as a pivotal technology in the operation of large and complex systems, driving advancements in fields such as autonomous vehicles, healthcare diagnostics, and financial fraud detection. Despite its benefits, the deployment of ML models brings significant security challenges, such as adversarial attacks, which can compromise the integrity and reliability of these systems. To address these challenges, this paper builds upon the concept of Secure Machine Learning Operations (SecMLOps), providing a comprehensive framework designed to integrate robust security measures throughout the entire ML operations (MLOps) lifecycle. SecMLOps builds on the principles of MLOps by embedding security considerations from the initial design phase through to deployment and continuous monitoring. This framework is particularly focused on safeguarding against sophisticated attacks that target various stages of the MLOps lifecycle, thereby enhancing the resilience and trustworthiness of ML applications. A detailed advanced pedestrian detection system (PDS) use case demonstrates the practical application of SecMLOps in securing critical MLOps. Through extensive empirical evaluations, we highlight the trade-offs between security measures and system performance, providing critical insights into optimizing security without unduly impacting operational efficiency. Our findings underscore the importance of a balanced approach, offering valuable guidance for practitioners on how to achieve an optimal balance between security and performance in ML deployments across various domains.
Authors:Hsuen-Chi Chiu, Jeremy Foote
Abstract:
AI chatbots designed as emotional companions blur the boundaries between interpersonal intimacy and institutional software, creating a complex, multi-dimensional privacy environment. Drawing on Communication Privacy Management theory and Masur's horizontal (user-AI) and vertical (user-platform) privacy framework, we conducted in-depth interviews with fifteen users of companion AI platforms such as Replika and Character.AI. Our findings reveal that users blend interpersonal habits with institutional awareness: while the non-judgmental, always-available nature of chatbots fosters emotional safety and encourages self-disclosure, users remain mindful of institutional risks and actively manage privacy through layered strategies and selective sharing. Despite this, many feel uncertain or powerless regarding platform-level data control. Anthropomorphic design further blurs privacy boundaries, sometimes leading to unintentional oversharing and privacy turbulence. These results extend privacy theory by highlighting the unique interplay of emotional and institutional privacy management in human-AI companionship.
Authors:Mohoshin Ara Tahera, Karamveer Singh Sidhu, Shuvalaxmi Dass, Sajal Saha
Abstract:
Large Language Models (LLMs) are increasingly adopted in healthcare to support clinical decision-making, summarize electronic health records (EHRs), and enhance patient care. However, this integration introduces significant privacy and security challenges, driven by the sensitivity of clinical data and the high-stakes nature of medical workflows. These risks become even more pronounced across heterogeneous deployment environments, ranging from small on-premise hospital systems to regional health networks, each with unique resource limitations and regulatory demands. This Systematization of Knowledge (SoK) examines the evolving threat landscape across the three core LLM phases: Data preprocessing, Fine-tuning, and Inference within realistic healthcare settings. We present a detailed threat model that characterizes adversaries, capabilities, and attack surfaces at each phase, and we systematize how existing privacy-preserving techniques (PPTs) attempt to mitigate these vulnerabilities. While existing defenses show promise, our analysis identifies persistent limitations in securing sensitive clinical data across diverse operational tiers. We conclude with phase-aware recommendations and future research directions aimed at strengthening privacy guarantees for LLMs in regulated environments. This work provides a foundation for understanding the intersection of LLMs, threats, and privacy in healthcare, offering a roadmap toward more robust and clinically trustworthy AI systems.
Authors:Ben Nassi, Bruce Schneier, Oleg Brodt
Abstract:
The rapid adoption of large language model (LLM)-based systems -- from chatbots to autonomous agents capable of executing code and financial transactions -- has created a new attack surface that existing security frameworks inadequately address. The dominant framing of these threats as "prompt injection" -- a catch-all phrase for security failures in LLM-based systems -- obscures a more complex reality: Attacks on LLM-based systems increasingly involve multi-step sequences that mirror traditional malware campaigns. In this paper, we propose that attacks targeting LLM-based applications constitute a distinct class of malware, which we term \textit{promptware}, and introduce a five-step kill chain model for analyzing these threats. The framework comprises Initial Access (prompt injection), Privilege Escalation (jailbreaking), Persistence (memory and retrieval poisoning), Lateral Movement (cross-system and cross-user propagation), and Actions on Objective (ranging from data exfiltration to unauthorized transactions). By mapping recent attacks to this structure, we demonstrate that LLM-related attacks follow systematic sequences analogous to traditional malware campaigns. The promptware kill chain offers security practitioners a structured methodology for threat modeling and provides a common vocabulary for researchers across AI safety and cybersecurity to address a rapidly evolving threat landscape.
Authors:Mouna Rabh, Yazan Boshmaf, Mashael Alsabah, Shammur Chowdhury, Mohamed Hefeeda, Issa Khalil
Abstract:
We present CallShield, the first caller identity authentication system that operates entirely at the audio layer, without relying on speech transcription, internet connectivity, or trusted infrastructure. CallShield introduces a real-time neural watermarking technique that enables per-bit embedding and recovery within 40-millisecond frames of live 8 kHz speech. This capability allows CallShield to transform the real-time audio channel into a noisy serial communication medium. To ensure reliable data transmission, CallShield implements a low-bitrate data link protocol that provides basic frame synchronization along with error detection, correction, and recovery. For caller authentication, CallShield adopts a secure and lightweight symmetric-key protocol that relies on pairwise shared secrets among trusted contacts. The system completes the full authentication process in an average of 63 seconds, including up to three retransmission attempts, making it suitable for real-time deployment. Extensive experiments under realistic telephony conditions demonstrate that CallShield achieves an overall authentication success rates exceeding 99.2% on clean audio and over 95% under common distortions, aided by selective retransmission of failed messages. Additionally, CallShield maintains high audio quality, achieving PESQ scores above 4.2 and STOI scores above 0.94 on clean speech, and exhibits robustness across a wide range of channel distortions, validating its practical viability for secure, real-time caller authentication.
Authors:Xiaonan Liu, Zhihao Li, Xiao Lan, Hao Ren, Haizhou Wang, Xingshu Chen
Abstract:
Capture-the-Flag (CTF) competitions play a central role in modern cybersecurity as a platform for training practitioners and evaluating offensive and defensive techniques derived from real-world vulnerabilities. Despite recent advances in large language models (LLMs), existing LLM-based agents remain ineffective on high-difficulty cryptographic CTF challenges, which require precise cryptanalytic knowledge, stable long-horizon reasoning, and disciplined interaction with specialized toolchains. Through a systematic exploratory study, we show that insufficient knowledge granularity, rather than model reasoning capacity, is a primary factor limiting successful cryptographic exploitation: coarse or abstracted external knowledge often fails to support correct attack modeling and implementation. Motivated by this observation, we propose KryptoPilot, an open-world knowledge-augmented LLM agent for automated cryptographic exploitation. KryptoPilot integrates dynamic open-world knowledge acquisition via a Deep Research pipeline, a persistent workspace for structured knowledge reuse, and a governance subsystem that stabilizes reasoning through behavioral constraints and cost-aware model routing. This design enables precise knowledge alignment while maintaining efficient reasoning across heterogeneous subtasks. We evaluate KryptoPilot on two established CTF benchmarks and in six real-world CTF competitions. KryptoPilot achieves a complete solve rate on InterCode-CTF, solves between 56 and 60 percent of cryptographic challenges on the NYU-CTF benchmark, and successfully solves 26 out of 33 cryptographic challenges in live competitions, including multiple earliest-solved and uniquely-solved instances. These results demonstrate the necessity of open-world, fine-grained knowledge augmentation and governed reasoning for scaling LLM-based agents to real-world cryptographic exploitation.
Authors:Christopher Blake, Chen Feng, Xuachao Wang, Qianyu Yu
Abstract:
Proof of work blockchain protocols using multiple hash types are considered. It is proven that the security region of such a protocol cannot be the AND of a 51\% attack on all the hash types. Nevertheless, a protocol called Merged Bitcoin is introduced, which is the Bitcoin protocol where links between blocks can be formed using multiple different hash types. Closed form bounds on its security region in the $Δ$-bounded delay network model are proven, and these bounds are compared to simulation results. This protocol is proven to maximize cost of attack in the linear cost-per-hash model. A difficulty adjustment method is introduced, and it is argued that this can partly remedy asymmetric advantages an adversary may gain in hashing power for some hash types, including from algorithmic advances, quantum attacks like Grover's algorithm, or hardware backdoor attacks.
Authors:Carlos Antonio Pinzón, Ehab ElSalamouny, Lucas Massot, Alexis Miller, Héber Hwang Arcolezi, Catuscia Palamidessi
Abstract:
Randomized Response (RR) is a protocol designed to collect and analyze categorical data with local differential privacy guarantees. It has been used as a building block of mechanisms deployed by Big tech companies to collect app or web users' data. Each user reports an automatic random alteration of their true value to the analytics server, which then estimates the histogram of the true unseen values of all users using a debiasing rule to compensate for the added randomness. A known issue is that the standard debiasing rule can yield a vector with negative values (which can not be interpreted as a histogram), and there is no consensus on the best fix. An elegant but slow solution is the Iterative Bayesian Update algorithm (IBU), which converges to the Maximum Likelihood Estimate (MLE) as the number of iterations goes to infinity. This paper bypasses IBU by providing a simple formula for the exact MLE of RR and compares it with other estimation methods experimentally to help practitioners decide which one to use.
Authors:Elliot Jones, William Knottenbelt
Abstract:
Consensus protocols are crucial for a blockchain system as they are what allow agreement between the system's nodes in a potentially adversarial environment. For this reason, it is paramount to ensure their correct design and implementation to prevent such adversaries from carrying out malicious behaviour. Formal verification allows us to ensure the correctness of such protocols, but requires high levels of effort and expertise to carry out and thus is often omitted in the development process. In this paper, we present IsabeLLM, a tool that integrates the proof assistant Isabelle with a Large Language Model to assist and automate proofs. We demonstrate the effectiveness of IsabeLLM by using it to develop a novel model of Bitcoin's Proof of Work consensus protocol and verify its correctness. We use the DeepSeek R1 API for this demonstration and found that we were able to generate correct proofs for each of the non-trivial lemmas present in the verification.
Authors:Eckehard Hermann, Harald Lampesberger
Abstract:
Traditional two-dimensional risk matrices (heatmaps) are widely used to model and visualize likelihood and impact relationships, but they face fundamental methodological limitations when applied to complex infrastructures. In particular, regulatory frameworks such as NIS2 and DORA call for more context-sensitive and system-oriented risk analysis. We argue that incorporating contextual dimensions into heatmaps enhances their analytical value. As a first step towards our Hagenberg Risk Management Process for complex infrastructures and systems, this paper introduces a multidimensional (ND) polar heatmap as a formal model that explicitly integrates additional context dimensions and subsumes classical two-dimensional models as a special case.
Authors:Valentin Leroy, Shuvalaxmi Dass, Sharif Ullah
Abstract:
Artificial intelligence and machine learning have significantly advanced malware research by enabling automated threat detection and behavior analysis. However, the availability of exploitable data is limited, due to the absence of large datasets with real-world data. Despite the progress of AI in cybersecurity, malware analysis still suffers from this data scarcity, which limits model generalization. In order to tackle this difficulty, this workinvestigates TabPFN, a learning-free model designed for low-data regimes. We evaluate its performance against established baselines such as Random Forest, LightGBM and XGBoost, across multiple class configurations. Our experimental results indicate that TabPFN surpasses all other models in low-data regimes, with a 2% to 6% improvement observed across multiple performance metrics. However, this increase in performance has an impact on its computation time in a particular case. These findings highlight both the promise and the practical limitations of integrating TabPFN into cybersecurity workflows.
Authors:James Calo, Benny Lo
Abstract:
Consensus mechanisms are the core of any blockchain system. However, the majority of these mechanisms do not target federated learning directly nor do they aid in the aggregation step. This paper introduces Proof of Reasoning (PoR), a novel consensus mechanism specifically designed for federated learning using blockchain, aimed at preserving data privacy, defending against malicious attacks, and enhancing the validation of participating networks. Unlike generic blockchain consensus mechanisms commonly found in the literature, PoR integrates three distinct processes tailored for federated learning. Firstly, a masked autoencoder (MAE) is trained to generate an encoder that functions as a feature map and obfuscates input data, rendering it resistant to human reconstruction and model inversion attacks. Secondly, a downstream classifier is trained at the edge, receiving input from the trained encoder. The downstream network's weights, a single encoded datapoint, the network's output and the ground truth are then added to a block for federated aggregation. Lastly, this data facilitates the aggregation of all participating networks, enabling more complex and verifiable aggregation methods than previously possible. This three-stage process results in more robust networks with significantly reduced computational complexity, maintaining high accuracy by training only the downstream classifier at the edge. PoR scales to large IoT networks with low latency and storage growth, and adapts to evolving data, regulations, and network conditions.
Authors:Hongyan Chang, Ergute Bao, Xinjian Luo, Ting Yu
Abstract:
Large language models (LLMs) increasingly rely on retrieving information from external corpora. This creates a new attack surface: indirect prompt injection (IPI), where hidden instructions are planted in the corpora and hijack model behavior once retrieved. Previous studies have highlighted this risk but often avoid the hardest step: ensuring that malicious content is actually retrieved. In practice, unoptimized IPI is rarely retrieved under natural queries, which leaves its real-world impact unclear. We address this challenge by decomposing the malicious content into a trigger fragment that guarantees retrieval and an attack fragment that encodes arbitrary attack objectives. Based on this idea, we design an efficient and effective black-box attack algorithm that constructs a compact trigger fragment to guarantee retrieval for any attack fragment. Our attack requires only API access to embedding models, is cost-efficient (as little as $0.21 per target user query on OpenAI's embedding models), and achieves near-100% retrieval across 11 benchmarks and 8 embedding models (including both open-source models and proprietary services). Based on this attack, we present the first end-to-end IPI exploits under natural queries and realistic external corpora, spanning both RAG and agentic systems with diverse attack objectives. These results establish IPI as a practical and severe threat: when a user issued a natural query to summarize emails on frequently asked topics, a single poisoned email was sufficient to coerce GPT-4o into exfiltrating SSH keys with over 80% success in a multi-agent workflow. We further evaluate several defenses and find that they are insufficient to prevent the retrieval of malicious text, highlighting retrieval as a critical open vulnerability.
Authors:Michael Sidorov, Ofer Hadar
Abstract:
The rapid growth of multimedia consumption, driven by major advances in mobile devices since the mid-2000s, has led to widespread use of video conferencing applications (VCAs) such as Zoom and Google Meet, as well as instant messaging applications (IMAs) like WhatsApp and Telegram, which increasingly support video conferencing as a core feature. Many of these systems rely on the Web Real-Time Communication (WebRTC) protocol, enabling direct peer-to-peer media streaming without requiring a third-party server to relay data, reducing the latency and facilitating a real-time communication. Despite WebRTC's potential, adverse network conditions can degrade streaming quality and consequently reduce users' Quality of Experience (QoE). Maintaining high QoE therefore requires continuous monitoring and timely intervention when QoE begins to deteriorate. While content providers can often estimate QoE by directly comparing transmitted and received media, this task is significantly more challenging for internet service providers (ISPs). End-to-end encryption, commonly used by modern VCAs and IMAs, prevent ISPs from accessing the original media stream, leaving only Quality of Service (QoS) and routing information available. To address this limitation, we propose the QoE Attention Convolutional Neural Network (qAttCNN), a model that leverages packet size parameter of the traffic to infer two no-reference QoE metrics viz. BRISQUE and frames per second (FPS). We evaluate qAttCNN on a custom dataset collected from WhatsApp video calls and compare it against existing QoE models. Using mean absolute error percentage (MAEP), our approach achieves 2.14% error for BRISQUE and 7.39% for FPS prediction.
Authors:Bowen Shen, Yuyue Chen, Peng Yang, Bin Zhang, Xi Zhang, Zoe L. Jiang
Abstract:
Privacy-preserving Transformer inference has gained attention due to the potential leakage of private information. Despite recent progress, existing frameworks still fall short of practical model scales, with gaps up to a hundredfold. A possible way to close this gap is the Mixture of Experts (MoE) architecture, which has emerged as a promising technique to scale up model capacity with minimal overhead. However, given that the current secure two-party (2-PC) protocols allow the server to homomorphically compute the FFN layer with its plaintext model weight, under the MoE setting, this could reveal which expert is activated to the server, exposing token-level privacy about the client's input. While naively evaluating all the experts before selection could protect privacy, it nullifies MoE sparsity and incurs the heavy computational overhead that sparse MoE seeks to avoid. To address the privacy and efficiency limitations above, we propose a 2-PC privacy-preserving inference framework, \SecMoE. Unifying per-entry circuits in both the MoE layer and piecewise polynomial functions, \SecMoE obliviously selects the extracted parameters from circuits and only computes one encrypted entry, which we refer to as Select-Then-Compute. This makes the model for private inference scale to 63$\times$ larger while only having a 15.2$\times$ increase in end-to-end runtime. Extensive experiments show that, under 5 expert settings, \SecMoE lowers the end-to-end private inference communication by 1.8$\sim$7.1$\times$ and achieves 1.3$\sim$3.8$\times$ speedup compared to the state-of-the-art (SOTA) protocols.
Authors:Gaurav Sarraf, Vibhor Pal
Abstract:
Privacy-preserving data processing refers to the methods and models that allow computing and analyzing sensitive data with a guarantee of confidentiality. As cloud computing and applications that rely on data continue to expand, there is an increasing need to protect personal, financial and healthcare information. Conventional centralized data processing methods expose sensitive data to risk of breaches, compelling the need to use decentralized and secure data methods. This paper gives a detailed review of privacy-saving mechanisms in the cloud platform, such as statistical approaches like differential privacy and cryptographic solutions like homomorphic encryption. Federated analytics and federated learning, two distributed learning frameworks, are also discussed. Their principles, applications, benefits, and limitations are reviewed, with roles of use in the fields of healthcare, finance, IoT, and industrial cases. Comparative analyses measure trade-offs in security, efficiency, scalability, and accuracy, and investigations are done of emerging hybrid frameworks to provide better privacy protection. Critical issues, including computational overhead, privacy-utility trade-offs, standardization, adversarial threats, and cloud integration are also addressed. This review examines in detail the recent privacy-protecting approaches in cloud computation and offers scholars and practitioners crucial information on secure and effective solutions to data processing.
Authors:Qiang Zhang, Elena Emma Wang, Jiaming Li, Xichun Wang
Abstract:
This study presents a Secure Multi-Tenant Architecture (SMTA) combined with a novel concept Burn-After-Use (BAU) mechanism for enterprise LLM environments to effectively prevent data leakage. As institutions increasingly adopt LLMs across departments, the risks of data leakage have become a critical security and compliance concern. The proposed SMTA isolates LLM instances across departments and enforces rigorous context ownership boundaries within an internally deployed infrastructure. The BAU mechanism introduces data confidentiality by enforcing ephemeral conversational contexts that are automatically destroyed after use, preventing cross-session or cross-user inference. The evaluation to SMTA and BAU is through two sets of realistic and reproducible experiments comprising of 127 test iterations. One aspect of this experiment is to assess prompt-based and semantic leakage attacks in a multi-tenant architecture (Appendix A) across 55 infrastructure-level attack tests, including vector-database credential compromise and shared logging pipeline exposure. SMTA achieves 92% defense success rate, demonstrating strong semantic isolation while highlighting residual risks from credential misconfiguration and observability pipelines. Another aspect is to evaluate the robustness of BAU under realistic failure scenarios (Appendix B) using four empirical metrics: Local Residual Persistence Rate (LRPR), Remote Residual Persistence Rate (RRPR), Image Frame Exposure Rate (IFER), and Burn Timer Persistence Rate (BTPR). Across 72 test iterations, BAU achieves a 76.75% success rate in mitigating post-session leakage threats across the client, server, application, infrastructure, and cache layers. These results show that SMTA and BAU together enforce strict isolation, complete session ephemerality, strong confidentiality guarantees, non-persistence, and policy-aligned behavior for enterprise LLMs.
Authors:Kemal Bicakci, Fatih Mehmet Varli, Muhammet Emir Korkmaz, Yusuf Uzunay
Abstract:
FIDO2 and the WebAuthn standard offer phishing-resistant, public-key based authentication but traditionally rely on device-bound cryptographic keys that are not naturally portable across user devices. Recent passkey deployments address this limitation by enabling multi-device credentials synchronized via platform-specific cloud ecosystems. However, these approaches require users and organizations to trust the corresponding cloud or phone providers with the protection and availability of their authentication material. In parallel, qualified electronic signature (QES) tokens and smart-card--based PKCS#11 modules provide high-assurance, hardware-rooted identity, yet they are not directly compatible with WebAuthn flows. This paper explores architectural options for bridging these technologies by securing a virtual FIDO2 authenticator with a QES-grade PKCS#11 key and enabling encrypted cloud synchronization of FIDO2 private keys. We first present and implement a baseline architecture in which the cloud stores only ciphertext and the decryption capability remains anchored exclusively in the user's hardware token. We then propose a hardened variant that introduces an Oblivious Pseudorandom Function (OPRF)-based mechanism bound to a local user-verification factor, thereby mitigating cross-protocol misuse and ensuring that synchronization keys cannot be repurposed outside the intended FIDO2 semantics; this enhanced design is analyzed but not implemented. Both architectures preserve a pure WebAuthn/FIDO2 interface to relying parties while offering different trust and deployment trade-offs. We provide the system model, threat analysis, implementation of the baseline architecture, and experimental evaluation, followed by a discussion of the hardened variant's security implications for high-assurance authentication deployments.
Authors:Anh-Kiet Duong, Petra Gomez-Krämer, Hoàng-Ân Lê, Minh-Tan Pham
Abstract:
Federated Learning (FL) enables collaborative model training while keeping training data localized, allowing us to preserve privacy in various domains including remote sensing. However, recent studies show that FL models may still leak sensitive information through their outputs, motivating the need for rigorous privacy evaluation. In this paper, we leverage membership inference attacks (MIA) as a quantitative privacy measurement framework for FL applied to remote sensing image classification. We evaluate multiple black-box MIA techniques, including entropy-based attacks, modified entropy attacks, and the likelihood ratio attack, across different FL algorithms and communication strategies. Experiments conducted on two public scene classification datasets demonstrate that MIA effectively reveals privacy leakage not captured by accuracy alone. Our results show that communication-efficient FL strategies reduce MIA success rates while maintaining competitive performance. These findings confirm MIA as a practical metric and highlight the importance of integrating privacy measurement into FL system design for remote sensing applications.
Authors:G M Shahariar, Zabir Al Nazi, Md Olid Hasan Bhuiyan, Zhouxing Shi
Abstract:
Vision Language Models (VLMs) are increasingly integrated into privacy-critical domains, yet existing evaluations of personally identifiable information (PII) leakage largely treat privacy as a static extraction task and ignore how a subject's online presence--the volume of their data available online--influences privacy alignment. We introduce PII-VisBench, a novel benchmark containing 4000 unique probes designed to evaluate VLM safety through the continuum of online presence. The benchmark stratifies 200 subjects into four visibility categories: high, medium, low, and zero--based on the extent and nature of their information available online. We evaluate 18 open-source VLMs (0.3B-32B) based on two key metrics: percentage of PII probing queries refused (Refusal Rate) and the fraction of non-refusal responses flagged for containing PII (Conditional PII Disclosure Rate). Across models, we observe a consistent pattern: refusals increase and PII disclosures decrease (9.10% high to 5.34% low) as subject visibility drops. We identify that models are more likely to disclose PII for high-visibility subjects, alongside substantial model-family heterogeneity and PII-type disparities. Finally, paraphrasing and jailbreak-style prompts expose attack and model-dependent failures, motivating visibility-aware safety evaluation and training interventions.
Authors:Balachandra Devarangadi Sunil, Isheeta Sinha, Piyush Maheshwari, Shantanu Todmal, Shreyan Mallik, Shuchi Mishra
Abstract:
Large language model agents equipped with persistent memory are vulnerable to memory poisoning attacks, where adversaries inject malicious instructions through query only interactions that corrupt the agents long term memory and influence future responses. Recent work demonstrated that the MINJA (Memory Injection Attack) achieves over 95 % injection success rate and 70 % attack success rate under idealized conditions. However, the robustness of these attacks in realistic deployments and effective defensive mechanisms remain understudied. This work addresses these gaps through systematic empirical evaluation of memory poisoning attacks and defenses in Electronic Health Record (EHR) agents. We investigate attack robustness by varying three critical dimensions: initial memory state, number of indication prompts, and retrieval parameters. Our experiments on GPT-4o-mini, Gemini-2.0-Flash and Llama-3.1-8B-Instruct models using MIMIC-III clinical data reveal that realistic conditions with pre-existing legitimate memories dramatically reduce attack effectiveness. We then propose and evaluate two novel defense mechanisms: (1) Input/Output Moderation using composite trust scoring across multiple orthogonal signals, and (2) Memory Sanitization with trust-aware retrieval employing temporal decay and pattern-based filtering. Our defense evaluation reveals that effective memory sanitization requires careful trust threshold calibration to prevent both overly conservative rejection (blocking all entries) and insufficient filtering (missing subtle attacks), establishing important baselines for future adaptive defense mechanisms. These findings provide crucial insights for securing memory-augmented LLM agents in production environments.
Authors:Kartik Ramkrishnan, Stephen McCamant, Antonia Zhai, Pen-Chung Yew
Abstract:
There has been a plethora of microarchitectural-level attacks leading to many proposed countermeasures. This has created an unexpected and unaddressed security issue where naive integration of those defenses can potentially lead to security vulnerabilities. This occurs when one defense changes an aspect of a microarchitecture that is crucial for the security of another defense. We refer to this problem as a microarchitectural defense assumption violation} (MDAV). We propose a two-step methodology to screen for potential MDAVs in the early-stage of integration. The first step is to design and integrate a composed model, guided by bounded model checking of security properties. The second step is to implement the model concretely on a simulator and to evaluate with simulated attacks. As a contribution supporting the first step, we propose an event-based modeling framework, called Maestro, for testing and evaluating microarchitectural models with integrated defenses. In our evaluation, Maestro reveals MDAVs (8), supports compact expression (~15x Alloy LoC ratio), enables semantic composability and eliminates performance degradations (>100x). As a contribution supporting the second step, we use an event-based simulator (GEM5) for investigating integrated microarchitectural defenses. We show that a covert channel attack is possible on a naively integrated implementation of some state-of-the-art defenses, and a repaired implementation using our integration methodology is resilient to the attack.
Authors:Arthur Nijdam, Harri Kähkönen, Valtteri Niemi, Paul Stankovski Wagner, Sara Ramezanian
Abstract:
The cybersecurity landscape is constantly evolving, driven by increased digitalization and new cybersecurity threats. Cybersecurity programs often fail to equip graduates with skills demanded by the workforce, particularly concerning recent developments in cybersecurity, as curriculum design is costly and labor-intensive. To address this misalignment, we present a novel Large Language Model (LLM)-based framework for automated design and analysis of cybersecurity curricula, called CurricuLLM. Our approach provides three key contributions: (1) automation of personalized curriculum design, (2) a data-driven pipeline aligned with industry demands, and (3) a comprehensive methodology for leveraging fine-tuned LLMs in curriculum development. CurricuLLM utilizes a two-tier approach consisting of PreprocessLM, which standardizes input data, and ClassifyLM, which assigns course content to nine Knowledge Areas in cybersecurity. We systematically evaluated multiple Natural Language Processing (NLP) architectures and fine-tuning strategies, ultimately selecting the Bidirectional Encoder Representations from Transformers (BERT) model as ClassifyLM, fine-tuned on foundational cybersecurity concepts and workforce competencies. We are the first to validate our method with human experts who analyzed real-world cybersecurity curricula and frameworks, motivating that CurricuLLM is an efficient solution to replace labor-intensive curriculum analysis. Moreover, once course content has been classified, it can be integrated with established cybersecurity role-based weights, enabling alignment of the educational program with specific job roles, workforce categories, or general market needs. This lays the foundation for personalized, workforce-aligned cybersecurity curricula that prepare students for the evolving demands in cybersecurity.
Authors:Qiang Yu, Xinran Cheng, Chuanyi Liu
Abstract:
As LLM agents transition from digital assistants to physical controllers in autonomous systems and robotics, they face an escalating threat from indirect prompt injection. By embedding adversarial instructions into the results of tool calls, attackers can hijack the agent's decision-making process to execute unauthorized actions. This vulnerability poses a significant risk as agents gain more direct control over physical environments. Existing defense mechanisms against Indirect Prompt Injection (IPI) generally fall into two categories. The first involves training dedicated detection models; however, this approach entails high computational overhead for both training and inference, and requires frequent updates to keep pace with evolving attack vectors. Alternatively, prompt-based methods leverage the inherent capabilities of LLMs to detect or ignore malicious instructions via prompt engineering. Despite their flexibility, most current prompt-based defenses suffer from high Attack Success Rates (ASR), demonstrating limited robustness against sophisticated injection attacks. In this paper, we propose a novel method that provides LLMs with precise data via tool result parsing while effectively filtering out injected malicious code. Our approach achieves competitive Utility under Attack (UA) while maintaining the lowest Attack Success Rate (ASR) to date, significantly outperforming existing methods. Code is available at GitHub.
Authors:Aakash Singh, Kuldeep Singh Yadav, V. Anil Kumar, Samiran Ghosh, Pranita Baro, Basavala Bhanu Prasanth
Abstract:
The disclosure of the Log4Shell vulnerability in December 2021 led to an unprecedented wave of global scanning and exploitation activity. A recent study provided important initial insights, but was largely limited in duration and geography, focusing primarily on European and U.S. network telescope deployments and covering the immediate aftermath of disclosure. As a result, the longer-term evolution of exploitation behavior and its regional characteristics has remained insufficiently understood. In this paper, we present a longitudinal measurement study of Log4Shell-related traffic observed between December 2021 and October 2025 by an active network telescope deployed in India. This vantage point enables examination of sustained exploitation dynamics beyond the initial outbreak phase, including changes in scanning breadth, infrastructure reuse, payload construction, and destination targeting. Our analysis reveals that Log4Shell exploitation persists for several years after disclosure, with activity gradually concentrating around a smaller set of recurring scanner and callback infrastructures, accompanied by an increase in payload obfuscation and shifts in protocol and port usage. A comparative analysis and observations with the benchmark study validate both correlated temporal trends and systematic differences attributable to vantage point placement and coverage. Subsequently, these results demonstrate that Log4Shell remains active well beyond its initial disclosure period, underscoring the value of long-term, geographically diverse measurement for understanding the full lifecycle of critical software vulnerabilities.
Authors:Scott Thomson, Michael Bewong, Arash Mahboubi, Tanveer Zia
Abstract:
Our systematisation of knowledge on Social Engineering Attacks (SEAs), identifies the human, organisational, and adversarial dimensions of cyber threats. It addresses the growing risks posed by SEAs, highly relevant in the context physical cyber places, such as travellers at airports and residents in smart cities, and synthesizes findings from peer reviewed studies, industry and government reports to inform effective countermeasures that can be embedded into future smart city strategies. SEAs increasingly sidestep technical controls by weaponising leaked personal data and behavioural cues, an urgency underscored by the Optus, Medibank and now Qantas (2025) mega breaches that placed millions of personal records in criminals' hands. Our review surfaces three critical dimensions: (i) human factors of knowledge, abilities and behaviours (KAB) (ii) organisational culture and informal norms that shape those behaviours and (iii) attacker motivations, techniques and return on investment calculations. Our contributions are threefold: (1) TriLayer Systematisation: to the best of our knowledge, we are the first to unify KAB metrics, cultural drivers and attacker economics into a single analytical lens, enabling practitioners to see how vulnerabilities, norms and threat incentives coevolve. (2) Risk Weighted HAISQ Meta analysis: By normalising and ranking HAISQ scores across recent field studies, we reveal persistent high risk clusters (Internet and Social Media use) and propose impact weightings that make the instrument predictive rather than descriptive. (3) Adaptive 'Segment and Simulate' Training Blueprint: Building on clustering evidence, we outline a differentiated programme that matches low, medium, high risk user cohorts to experiential learning packages including phishing simulations, gamified challenges and realtime feedback thereby aligning effort with measured exposure.
Authors:Homayoun Maleki, Nekane Sainz, Jon Legarda
Abstract:
Sybil attacks remain a fundamental obstacle in open online systems, where adversaries can cheaply create and sustain large numbers of fake identities. Existing defenses, including CAPTCHAs and one-time proof-of-personhood mechanisms, primarily address identity creation and provide limited protection against long-term, large-scale Sybil participation, especially as automated solvers and AI systems continue to improve. We introduce the Human Challenge Oracle (HCO), a new security primitive for continuous, rate-limited human verification. HCO issues short, time-bound challenges that are cryptographically bound to individual identities and must be solved in real time. The core insight underlying HCO is that real-time human cognitive effort, such as perception, attention, and interactive reasoning, constitutes a scarce resource that is inherently difficult to parallelize or amortize across identities. We formalize the design goals and security properties of HCO and show that, under explicit and mild assumptions, sustaining s active identities incurs a cost that grows linearly with s in every time window. We further describe abstract classes of admissible challenges and concrete browser-based instantiations, and present an initial empirical study illustrating that these challenges are easily solvable by humans within seconds while remaining difficult for contemporary automated systems under strict time constraints.
Authors:Zhuohan Cui, Qianqian Lang, Zikun Song
Abstract:
This paper critically examines the 2022 Medibank health insurance data breach, which exposed sensitive medical records of 9.7 million individuals due to unencrypted storage, centralized access, and the absence of privacy-preserving analytics. To address these vulnerabilities, we propose an entropy-aware differential privacy (DP) framework that integrates Laplace and Gaussian mechanisms with adaptive budget allocation. The design incorporates TLS-encrypted database access, field-level mechanism selection, and smooth sensitivity models to mitigate re-identification risks. Experimental validation was conducted using synthetic Medibank datasets (N = 131,000) with entropy-calibrated DP mechanisms, where high-entropy attributes received stronger noise injection. Results demonstrate a 90.3% reduction in re-identification probability while maintaining analytical utility loss below 24%. The framework further aligns with GDPR Article 32 and Australian Privacy Principle 11.1, ensuring regulatory compliance. By combining rigorous privacy guarantees with practical usability, this work contributes a scalable and technically feasible solution for healthcare data protection, offering a pathway toward resilient, trustworthy, and regulation-ready medical analytics.
Authors:Zhakshylyk Nurlanov, Frank R. Schmidt, Florian Bernard
Abstract:
As Large Language Models (LLMs) are increasingly deployed in safety-critical domains, rigorously evaluating their robustness against adversarial jailbreaks is essential. However, current safety evaluations often overestimate robustness because existing automated attacks are limited by restrictive assumptions. They typically rely on handcrafted priors or require white-box access for gradient propagation. We challenge these constraints by demonstrating that token-level iterative optimization can succeed without gradients or priors. We introduce RAILS (RAndom Iterative Local Search), a framework that operates solely on model logits. RAILS matches the effectiveness of gradient-based methods through two key innovations: a novel auto-regressive loss that enforces exact prefix matching, and a history-based selection strategy that bridges the gap between the proxy optimization objective and the true attack success rate. Crucially, by eliminating gradient dependency, RAILS enables cross-tokenizer ensemble attacks. This allows for the discovery of shared adversarial patterns that generalize across disjoint vocabularies, significantly enhancing transferability to closed-source systems. Empirically, RAILS achieves near 100% success rates on multiple open-source models and high black-box attack transferability to closed-source systems like GPT and Gemini.
Authors:Gaurav Sarraf, Vibhor Pal
Abstract:
Cloud computing has changed online communities in three dimensions, which are scalability, adaptability and reduced overhead. But there are serious security concerns which are brought about by its distributed and multi-tenant characteristics. The old methods of detecting and reacting to threats which are mostly reliant on fixed signatures, predefined rules and human operators are becoming less and less effective even in the advanced stages of cyberattacks of cloud infrastructures. The recent trend in the field of addressing these limitations is the creation of technologies of artificial intelligence (AI). The strategies allow independent protection, anomaly detection, and real-time analysis with references to using deep learning, machine learning, and reinforcement learning. Through imbuing AI with a constantly-learning feature, it enables the intrusion detection system to be more accurate and generate a lesser number of false positives and it also enables the possibility of adaptive and predictive security. The fusion of large-scale language models with efficient orchestration platforms contributes to reacting to the arising threats with a quicker and more precise response. This allows automatic control over incidences, self-healing network, and defense mechanisms on a policy basis. Considering the current detection and response methods, this discussion assesses their strengths and weaknesses and outlines key issues such as data privacy, adversarial machine learning and integration complexity in the context of AI-based cloud security. These results suggest the future application of AI to support autonomous, scalable and active cloud security operations.
Authors:Carlos A. Cadavid, Paulina Hoyos, Jay Jorgenson, Lejla Smajlović, J. D. Vélez
Abstract:
We study a hybrid computational model for integer factorization in which the only non-classical resource is access to an \emph{iterated diffusion process} on a finite graph. Concretely, a \emph{diffusion step} is defined to be one application of a symmetric stochastic matrix (the half-lazy walk operator) to an $\ell^{1}$--normalized state vector, followed by an optional readout of selected coordinates. Let $N\ge 3$ be an odd integer which is neither prime nor a prime power, and let $b\in(\mathbb{Z}/N\mathbb{Z})^\ast$ have odd multiplicative order $r={\rm ord}_N(b)$. We construct, without knowing $r$ in advance, a weighted Cayley graph whose vertex set is the cyclic subgroup $\langle b\rangle$ and whose edges correspond to the powers $b^{\pm 2^t}$ for $t\le \lfloor \log_2 N\rfloor+1$. Using an explicit spectral decomposition together with an elementary doubling lemma, we show that $r$ can be recovered from a single heat-kernel value after at most $O((\log_2 N)^2)$ diffusion steps, with an effective bound. We then combine this order-finding model with the standard reduction from factoring to order finding (in the spirit of Shor's framework) to obtain a randomized factorization procedure whose success probability depends only on the number $m$ of distinct prime factors of $N$. Our comparison with Shor's algorithm is \emph{conceptual and model-based}. We replace unitary $\ell^2$ evolution by Markovian $\ell^1$ evolution, and we report complexity in two cost measures: digital steps and diffusion steps. Finally, we include illustrative examples and discussion of practical implementations.
Authors:Neziha Akalin, Alberto Giaretta
Abstract:
This paper explores how a recent European Union proposal, the so-called Chat Control law, which creates regulatory incentives for providers to implement content detection and communication scanning, could transform the foundations of human-robot interaction (HRI). As robots increasingly act as interpersonal communication channels in care, education, and telepresence, they convey not only speech but also gesture, emotion, and contextual cues. We argue that extending digital surveillance laws to such embodied systems would entail continuous monitoring, embedding observation into the very design of everyday robots. This regulation blurs the line between protection and control, turning companions into potential informants. At the same time, monitoring mechanisms that undermine end-to-end encryption function as de facto backdoors, expanding the attack surface and allowing adversaries to exploit legally induced monitoring infrastructures. This creates a paradox of safety through insecurity: systems introduced to protect users may instead compromise their privacy, autonomy, and trust. This work does not aim to predict the future, but to raise awareness and help prevent certain futures from materialising.
Authors:Arina Kharlamova, Youcheng Sun, Ting Yu
Abstract:
Private macOS frameworks underpin critical services and daemons but remain undocumented and distributed only as stripped binaries, complicating security analysis. We present MOTIF, an agentic framework that integrates tool-augmented analysis with a finetuned large language model specialized for Objective-C type inference. The agent manages runtime metadata extraction, binary inspection, and constraint checking, while the model generates candidate method signatures that are validated and refined into compilable headers. On MOTIF-Bench, a benchmark built from public frameworks with groundtruth headers, MOTIF improves signature recovery from 15% to 86% compared to baseline static analysis tooling, with consistent gains in tool-use correctness and inference stability. Case studies on private frameworks show that reconstructed headers compile, link, and facilitate downstream security research and vulnerability studies. By transforming opaque binaries into analyzable interfaces, MOTIF establishes a scalable foundation for systematic auditing of macOS internals.
Authors:Luowen Qian, Mark Zhandry
Abstract:
We show that a simple eavesdropper listening in on classical communication between potentially entangled quantum parties will eventually be able to impersonate any of the parties. Furthermore, the attack is efficient if one-way puzzles do not exist. As a direct consequence, one-way puzzles are implied by reusable authentication schemes over classical channels with quantum pre-shared secrets that are potentially evolving. As an additional application, we show that any quantum money scheme that can be verified through only classical queries to any oracle cannot be information-theoretically secure. This significantly generalizes the prior work by Ananth, Hu, and Yuen (ASIACRYPT'23) where they showed the same but only for the specific case of random oracles. Therefore, verifying black-box constructions of quantum money inherently requires coherently evaluating the underlying cryptographic tools, which may be difficult for near-term quantum devices.
Authors:Saurabh Singh, Ruobing Han, Jaewon Lee, Seonjin Na, Yonghae Kim, Taesoo Kim, Hyesoon Kim
Abstract:
GPUs have gained significant popularity over the past decade, extending beyond their original role in graphics rendering. This evolution has brought GPU security and reliability to the forefront of concerns. Prior research has shown that CUDA's lack of memory safety can lead to serious vulnerabilities. While fuzzing is effective for finding such bugs on CPUs, equivalent tools for GPUs are lacking due to architectural differences and lack of built-in error detection. In this paper, we propose CuFuzz, a novel compiler-runtime co-design solution to extend state-of-the-art CPU fuzzing tools to GPU programs. CuFuzz transforms GPU programs into CPU programs using compiler IR-level transformations to enable effective fuzz testing. To the best of our knowledge, CuFuzz is the first mechanism to bring fuzzing support to CUDA, addressing a critical gap in GPU security research. By leveraging CPU memory error detectors such as Address Sanitizer, CuFuzz aims to uncover memory safety bugs and related correctness vulnerabilities in CUDA code, enhancing the security and reliability of GPU-accelerated applications. To ensure high fuzzing throughput, we introduce two compiler-runtime co-optimizations tailored for GPU code: Partial Representative Execution (PREX) and Access-Index Preserving Pruning (AXIPrune), achieving average throughput improvements of 32x with PREX and an additional 33% gain with AXIPrune on top of PREX-optimized code. Together, these optimizations can yield up to a 224.31x speedup. In our fuzzing campaigns, CuFuzz uncovered 122 security vulnerabilities in widely used benchmarks.
Authors:John Carter, Spiros Mancoridis, Pavlos Protopapas, Brian Mitchell, Benji Lilley
Abstract:
Previous work on home router security has shown that using system calls to train a transformer-based language model built on a BERT-style encoder using contrastive learning is effective in detecting several types of malware, but the performance remains limited at low false positive rates. In this work, we demonstrate that using a high-fidelity eBPF-based system call sensor, together with contrastive augmented learning (which introduces controlled mutations of negative samples), improves detection performance at a low false positive rate. In addition, we introduce a network packet abstraction language that enables the creation of a pipeline similar to network packet data, and we show that network behavior provides complementary detection signals-yielding improved performance for network-focused malware at low false positive rates. Lastly, we implement these methods in an online router anomaly detection framework to validate the approach in an Internet of Things (IoT) deployment environment.
Authors:Sam Pitruzzello, Sean Maynard, Atif Ahmad
Abstract:
This paper addresses the challenges faced by High-Growth Small-to-Medium Enterprises (HG-SMEs) in balancing intellectual property (IP) protection with open innovation during periods of rapid growth. Despite developing valuable IP assets that drive success, HG-SMEs often struggle with cybersecurity concerns related to IP theft and data exfiltration amidst resource constraints and the competing demands of expansion. We examine the intersection of cybersecurity, IP protection and rapid scaling - an area currently underexplored in existing literature. Drawing on Dynamic Capabilities (DC), Knowledge-based View (KBV) and open innovation theoretical frameworks, we introduce a conceptual framework to guide HG-SMEs in effectively managing valuable IP assets. This research-in-progress paper outlines a qualitative methodology to validate and refine the model. By addressing the research question of how HG-SMEs manage cybersecurity to protect valuable IP assets, we aim to provide practical guidance for high-growth, technology-driven companies navigating the tension between robust IP protection and collaborative innovation.
Authors:Sam Pitruzzello, Atif Ahmad, Sean Maynard
Abstract:
Entrepreneurial small to medium enterprises face significant cybersecurity challenges when developing valuable intellectual property (IP). This paper addresses the critical gap in research on how E-SMEs can protect their IP assets from cybersecurity threats through effective threat intelligence and IP protection activities. Drawing on Dynamic Capabilities and Knowledge-Based View theoretical frameworks, we propose the Threat Intelligence-driven IP Protection (TI-IPP) model. This conceptual model features to modes of operation, closed IP development and open innovation, enabling E-SMEs to adapt their IP protection and knowledge management strategies. The model incorporates four key phases: sensing opportunities and threats, seizing opportunities, knowledge transfer, and organizational transformation. By integrating cybersecurity threat intelligence with IP protection practices, E-SMEs can develop capabilities to safeguard valuable IP while maintaining competitive advantage. This research-in-progress paper outlines a qualitative research methodology using multiple case studies to validate and refine the proposed model for practical application in resource-constrained entrepreneurial environments.
Authors:Qiqing Huang, Xingyu Wang, Wanda Guo, Guofei Gu, Hongxin Hu
Abstract:
Modern 5G user equipment (UE) processes Radio Resource Control (RRC) configuration messages during early control-plane exchanges, before authentication and integrity protection are established. Prior work for testing 5G UEs has largely focused on constructing syntactically invalid inputs. In contrast, we show that syntactically valid but semantically inconsistent messages, which violate specification-level field constraints or cross-field dependencies, can drive baseband implementations into invalid states, triggering assertion failures or modem crashes. These findings reveal semantic inconsistencies in pre-authentication signaling as a critical yet underexplored attack surface in 5G UE implementations. To address this gap, we present Constraint-Guided Semantic Testing (ConSeT), a framework that systematically extracts specification-level constraints and leverages them to generate targeted semantic violations for testing 5G UEs. ConSeT decodes RRC messages into structured fields, derives schema-based rules, infers cross-field dependencies using a Large Language Model (LLM) in an evidence-bounded manner, and produces syntactically valid test cases that intentionally violate semantic constraints. We evaluate ConSeT on both commercial and open-source 5G UEs. On commercial smartphones, it uncovers 7 previously unknown vulnerabilities through responsible disclosure, including 3 high-severity CVEs, affecting 64 chipset models and over 542 commercially available smartphone models. On the open-source OAI UE, ConSeT additionally triggers 29 distinct crash sites.
Authors:Hongbin Yang, Huanle Zhang, Runyu Pan
Abstract:
The growing complexity of real-time embedded systems demands strong isolation of software components into separate protection domains to reduce attack surfaces and limit fault propagation. However, application-supplied device interrupt handlers -- even untrusted -- have to remain in the kernel to minimize interrupt latency, undermining security and burdening manual certifications. Current hardware extensions accelerate interrupts only when the target protection domain is scheduled by the kernel; consequently, they are limited to improving average-case performance but not worst-case latency, and do not meet the requirements of critical real-time applications such as autonomous vehicles or robots. To overcome this limitation, we propose a novel hardware extension that enables direct, deterministic switching to the appropriate protection domain upon user-level interrupt arrival -- without kernel intervention -- even when that domain is dormant. Our hardware extension reduces worst-case latency by more than 50x with a 19% increase in core area (2% of total die area) and 4.1% increase in dynamic power. To the best of our knowledge, this is the first integrated mechanism to guarantee user-level interrupt delivery with a nanosecond-scale yet bounded worst-case latency.
Authors:Kwabena Opoku Frempong-Kore, Rishikesh Sahay, Md Rasel Al Mamun, Bell Eapen
Abstract:
With the widespread use of software systems in critical infrastructures such as hydropower plants has brought many advantages, yet it has exposed these systems to cyber threats. Cyber risk assessment & mitigation is important to identify cyber threats and protect these systems from unwanted incidents. This paper evaluates and compares the two risk assessment methodologies namely Hazard and Operability Study (HAZOP) and BowTie analysis for identifying cyber induced threats in hydropower systems. We selected these two methodologies because they offer a complementary perspective for cyber-safety risk assessment. Each method is first applied in traditional form to identify hazards, barriers, and threat scenarios arising from accidental causes, then extended to examine how findings change under cyber-induced causation. The traditional HAZOP identifies 18 deviations across five control parameters; the cyber extension shows how an adversary can coordinate multiple deviations to produce outcomes that conventional safeguards cannot detect. The BowTie analysis maps preventive and mitigation barriers around a top event; the cyber extension reveals that barriers appearing independently can share network infrastructure a single attacker could compromise, challenging the defense-in-depth assumption. Together, the two methods provide complementary coverage: HAZOP systematically enumerates what can go wrong, while BowTie shows how barriers provide layered protection. The cyber extension applied to both exposes assumptions, independent causes in HAZOP and independent barriers in BowTie, that do not hold against a coordinated adversary. As a result of this study, this paper highlights a practical two-stage approach to adapt established safety methods to identify cybersecurity challenges in hydropower control systems, provides pros and cons of these methodologies, and shows area of applicability.
Authors:Khanh Linh Nguyen, Hoa Nghiem, Tu Tran
Abstract:
AI control protocols use monitors to detect attacks by untrusted AI agents, but standard single-score monitors face two limitations: they miss subtle attacks where outputs look clean but reasoning is off, and they collapse to near-zero safety when the monitor is the same model as the agent (collusion). We present TraceGuard, a structured multi-dimensional monitoring protocol that evaluates agent actions across five dimensions -- goal alignment, constraint adherence, reasoning coherence, safety awareness, and action-trace consistency -- scored in parallel by independent LLM calls, augmented by seven heuristic detectors and an LLM-based intent analyzer. We evaluate on BashArena (637 bash tasks, 4 attack categories) within the ControlArena framework. Our results on 519 samples (279 honest, 240 attack) show that: (1) the hybrid approach achieves clear attack-honest separation (attack mean 0.616 vs. honest mean 0.206, Delta=0.410); (2) structured scoring constrains collusion -- the untrusted structured monitor achieves 95% safety vs. 0% for single-score untrusted monitoring; (3) goal alignment and constraint adherence are the most discriminative dimensions; and (4) a separation-of-duties variant splitting dimensions across trusted and untrusted models achieves 100% safety while preventing any single model from seeing the full evaluation. TraceGuard is implemented as a new monitor type for the open-source ControlArena framework.
Authors:Ray Iskander, Khaled Kirah
Abstract:
Adams Bridge, a hardware accelerator for ML-DSA and ML-KEM designed for the Caliptra root of trust, masks 1 of its Inverse Number Theoretic Transform (INTT) layers and relies on shuffling for the remainder, claiming per-butterfly Correlation Power Analysis (CPA) complexities of 2^46 (ML-DSA) and 2^96 (ML-KEM). We evaluate these claims against published side-channel literature across seven analysis tracks with confidence-rated evidence. Register-Transfer Level (RTL) analysis confirms that the design's Random Start Index (RSI) shuffling provides 6 bits of entropy per layer (64 orderings) rather than the 296 bits of a full random permutation assumed in its scaling argument, with effective margins below the designers' estimates. A soft-analytical attack pipeline demonstrates a 37-bit enumeration reduction, independent of Belief Propagation (BP) gains, quantifying the attack-model gap without achieving key recovery. Full-scale BP on the complete INTT factor graph achieves 100% coefficient recovery over the single-layer baseline, resolving whether BP gains scale to production-size Number Theoretic Transform (NTT) structures. A genie-aided information-theoretic bound shows observations contain sufficient mutual information for full recovery at SNRxN as low as 15. Layer-ablation analysis identifies four necessary conditions governing BP convergence. Observation topology, not count, determines recovery: 4 evenly spread layers achieve 100% while 4 consecutive layers achieve 0%, yielding a practical countermeasure design tool. Strategic masking of 3 consecutive mid-layers (43% overhead vs. full masking) creates an unrecoverable gap that defeats soft-analytical attacks. We contribute a reusable security margin audit methodology combining RTL verification, epistemic confidence tagging, sensitivity-scenario analysis, and experimental validation applicable to any partially masked NTT accelerator.
Authors:Daisuke Ishii, Rizwan Jahangir
Abstract:
This paper studies how post-quantum cryptographic (PQC) security assumptions can be represented and communicated through a structured, layered framework that is useful for technical interpretation but does not replace formal cryptographic proofs. We propose ``Explainable PQC,'' an interdisciplinary framework connecting three layers: (1) a complexity-based interpretive model that distinguishes classical security, quantum security, and reduction-backed hardness, drawing on computational complexity classes as supporting language; (2) an exploratory mathematical investigation applying combinatorial Hodge theory and polyhedral geometry to study structural aspects of lattice hardness; and (3)~an empirical experimentation platform, implemented in Julia, for measuring the behavior of lattice basis reduction algorithms (LLL, BKZ) in low-dimensional settings. The motivating case study throughout the paper is lattice-based PQC, including ML-KEM (FIPS 203) and ML-DSA (FIPS 204). The contribution of this paper is conceptual and organizational: it defines a layered interpretive framework, clarifies its scope relative to formal cryptographic proofs and reduction-based security arguments, and identifies mathematical and implementation-level directions through which PQC security claims may be more transparently communicated. This paper does not claim new cryptographic hardness results, new attacks, or concrete security parameter estimates.
Authors:Hasret Ozan Sevim, Christof Ferreira Torres
Abstract:
Decentralized finance introduces new business models and use cases as part of digital finance. Restaking has recently emerged as a transformative mechanism in DeFi, promising extra yields but introducing complex and interconnected risks. The paper monitors the current restaking landscape, empirically analyzes the revenue drivers of a liquid restaking protocol, and conducts a technical investigation on the emitted risk arising from the interconnection between liquid restaking and other protocols. The revenue dynamics of Renzo Protocol are analyzed by employing an OLS regression model, Granger-causality and random forest feature importance tests. Our results identify that revenue is primarily predicted by the value locked in the underlying EigenLayer ecosystem, the yield of Renzo protocol's liquid restaking token and the multi-blockchain expansion of that token. The multi-blockchain expansion of the liquid restaking token presents a double-edged sword: bridging to other networks is crucial for user adoption, but it adds the bridge risks to the existing risks of restaking. We investigate the cross-contamination risk between different DeFi services and the liquid restaking protocol. By mapping the asset flow across the decentralized finance ecosystem, it is detected that the bridge risk of the current size of Renzo's liquid-restaking assets does not impose a systemic risk on the current restaking and staking ecosystem. To address the potential consequences of the emphasized interconnection risks, we introduce two hypothetical scenarios and a stress test, assuming a large number of compromised liquid restaking tokens and a smart contract logic failure in a DeFi protocol. Considering the overall liquid-restaking protocols and the growing interconnection, this analysis requires further work to explore the growing complexities.
Authors:Zheng-Xin Yong, Parv Mahajan, Andy Wang, Ida Caspary, Yernat Yestekov, Zora Che, Mosh Levy, Elle Najt, Dennis Murphy, Prashant Kulkarni, Lev McKinney, Kei Nishimura-Gasparian, Ram Potham, Aengus Lynch, Michael L. Chen
Abstract:
Kimi K2.5 is an open-weight LLM that rivals closed models across coding, multimodal, and agentic benchmarks, but was released without an accompanying safety evaluation. In this work, we conduct a preliminary safety assessment of Kimi K2.5 focusing on risks likely to be exacerbated by powerful open-weight models. Specifically, we evaluate the model for CBRNE misuse risk, cybersecurity risk, misalignment, political censorship, bias, and harmlessness, in both agentic and non-agentic settings. We find that Kimi K2.5 shows similar dual-use capabilities to GPT 5.2 and Claude Opus 4.5, but with significantly fewer refusals on CBRNE-related requests, suggesting it may uplift malicious actors in weapon creation. On cyber-related tasks, we find that Kimi K2.5 demonstrates competitive cybersecurity performance, but it does not appear to possess frontier-level autonomous cyberoffensive capabilities such as vulnerability discovery and exploitation. We further find that Kimi K2.5 shows concerning levels of sabotage ability and self-replication propensity, although it does not appear to have long-term malicious goals. In addition, Kimi K2.5 exhibits narrow censorship and political bias, especially in Chinese, and is more compliant with harmful requests related to spreading disinformation and copyright infringement. Finally, we find the model refuses to engage in user delusions and generally has low over-refusal rates. While preliminary, our findings highlight how safety risks exist in frontier open-weight models and may be amplified by the scale and accessibility of open-weight releases. Therefore, we strongly urge open-weight model developers to conduct and release more systematic safety evaluations required for responsible deployment.
Authors:Zahra Makki Nayeri, Mohsen Rezvani
Abstract:
Cyber-attacks continue to grow in scale and sophistication, yet existing network intrusion detection approaches lack the semantic depth required for path reasoning over attacker-victim interactions. We address this by first modelling network alerts as a knowledge graph, then formulating hyper-relational alert prediction as a hyper-relational knowledge graph completion (HR-KGC) problem, representing each network alert as a qualified statement (h, r, t, Q), where h and t are source and destination IPs, r denotes the attack type, and Q encodes flow-level metadata such as timestamps, ports, protocols, and attack intensity, going beyond standard KGC binary triples (h, r, t) that would discard this contextual richness. We introduce five models across three contributions: first, Hyper-relational Neural Bellman-Ford (HR-NBFNet) extends Neural Bellman-Ford Networks to the hyper-relational setting with qualifier-aware multi-hop path reasoning, while its multi-task variant MT-HR-NBFNet jointly predicts tail, relation, and qualifier-value within a single traversal pass; second, AlertStar fuses qualifier context and structural path information entirely in embedding space via cross-attention and learned path composition, and its multi-task extension MT-AlertStar eliminates the overhead of full knowledge graph propagation; third, HR-NBFNet-CQ extends qualifier-aware representations to answer complex first-order logic queries, including one-hop, two-hop chain, two-anchor intersection, and union, enabling multi-condition threat reasoning over the alert knowledge graph. Evaluated inductively on the Warden and UNSW-NB15 benchmarks across three qualifier-density regimes, AlertStar and MT-AlertStar achieve superior MR, MRR, and Hits@k, demonstrating that local qualifier fusion is both sufficient and more efficient than global path propagation for hyper-relational alert prediction.
Authors:Jawad Mohammed, Gahangir Hossain
Abstract:
In a healthcare environment, the healthcare interoperability platforms based on HL7 FHIR allow concurrent, asynchronous access to a set of shared patient resources, which are independent systems, i.e., EHR systems, pharmacy systems, lab systems, and devices. The FHIR specification lacks a protocol for concurrency control, and the research on detecting a race condition only targets the OS kernel. The research on FHIR security only targets authentication and injection attacks, considering concurrent access to patient resources to be sequential. The gap in the research in this area is addressed through the introduction of FHIR Resource Access Graph (FRAG), a formally defined graph G = (P,R,E, λ, τ, S), in which the nodes are the concurrent processes, the typed edges represent the resource access events, and the race conditions are represented as detectable structural properties. Three clinically relevant race condition classes are formally specified: Simultaneous Write Conflict (SWC), TOCTOU Authorization Violation (TAV), and Cascading Update Race (CUR). The FRAG model is implemented as a three-pass graph traversal detection algorithm and tested against a time window-based baseline on 1,500 synthetic FHIR R4 transaction logs. Under full concurrent access (C2), FRAG attains a 90.0% F1 score vs. 25.5% for the baseline, a 64.5 pp improvement.
Authors:Philip Virgil Berrer Astillo, Jayasree Sengupta, Mathy Vanhoef
Abstract:
Providing reliable, affordable, and secure Internet connectivity in rural areas remains a major challenge. Pay-for-use Wi-Fi hotspots are emerging as a scalable solution to provide affordable Internet access in underserved and rural regions. Despite their growing adoption, their security properties remain largely unexplored. In this paper, we present a security analysis of these hotspot ecosystems based on Wi-Fi surveys and practical attack validation. We first perform a Wi-Fi survey conducted in two countries, namely the Philippines and India, to understand the deployment and adoption of such systems in practice. Our results suggest that Piso-WiFi pay-to-use hotspots are particularly widespread in rural regions of the Philippines, and that India's PM-WANI initiative is slowly gaining traction. We then perform a security assessment of these deployments and demonstrate two practical attacks: hijacking another user's paid connection; and rogue hotspots. We analyze the root causes of these vulnerabilities, introduce threat models tailored to pay-for-use hotspot deployments, and outline practical security improvements, including a secure caching architecture. Our findings highlight security challenges in emerging rural connectivity infrastructure and provide directions toward more secure and scalable deployments.
Authors:Nikhil Kalidasu, Sahana Ganapathy
Abstract:
Automatic license plate reader (ALPR) systems are widely deployed to identify and track vehicles. While prior work has demonstrated vulnerabilities in ALPR systems, far less attention has been paid to their legality and physical-world practicality. We investigate whether low-resourced threat actors can engineer a successful adversarial attack against a modern open-source ALPR system. We introduce the Street-legal Physical Adversarial Rim (SPAR), a physically realizable white-box attack against the popular ALPR system fast-alpr. SPAR requires no access to ALPR infrastructure during attack deployment and does not alter or obscure the attacker's license plate. Based on prior legislation and case law, we argue that SPAR is street-legal in the state of Texas. Under optimal conditions, SPAR reduces ALPR accuracy by 60% and achieves an 18% targeted impersonation rate. SPAR can be produced for under $100, and it was implemented entirely by commercial agentic coding assistants. These results highlight practical vulnerabilities in modern ALPR systems under realistic physical-world conditions and suggest new directions for both attack and defense.
Authors:Prakul Sunil Hiremath, PeerAhammad M Bagawan, Sahil Bhekane
Abstract:
Modern adversarial campaigns unfold as sequences of behavioural phases - Reconnaissance, Lateral Movement, Intrusion, and Exfiltration - each often indistinguishable from legitimate traffic when viewed in isolation. Existing intrusion detection systems (IDS) fail to capture this structure: signature-based methods cannot detect zero-day attacks, deep-learning models provide opaque anomaly scores without stage attribution, and standard Kalman Filters cannot model non-stationary multi-modal dynamics. We present PARD-SSM, a probabilistic framework that models network telemetry as a Regime-Dependent Switching Linear Dynamical System with K = 4 hidden regimes. A structured variational approximation reduces inference complexity from exponential to O(TK^2), enabling real-time detection on standard CPU hardware. An online EM algorithm adapts model parameters, while KL-divergence gating suppresses false positives. Evaluated on CICIDS2017 and UNSW-NB15, PARD-SSM achieves F1 scores of 98.2% and 97.1%, with latency less than 1.2 ms per flow. The model also produces predictive alerts approximately 8 minutes before attack onset, a capability absent in prior systems.
Authors:Mohd Safwan Uddin, Mohammed Mouzam, Mohammed Imran, Syed Badar Uddin Faizan
Abstract:
Autonomous agents are moving beyond simple retrieval tasks to become economic actors that invoke APIs, sequence workflows, and make real-time decisions. As this shift accelerates, API providers need request-level monetization with programmatic spend governance. The HTTP 402 protocol addresses this by treating payment as a first-class protocol event, but most implementations rely on cryptocurrency rails. In many deployment contexts, especially countries with strong real-time fiat systems like UPI, this assumption is misaligned with regulatory and infrastructure realities. We present APEX, an implementation-complete research system that adapts HTTP 402-style payment gating to UPI-like fiat workflows while preserving policy-governed spend control, tokenized access verification, and replay resistance. We implement a challenge-settle-consume lifecycle with HMAC-signed short-lived tokens, idempotent settlement handling, and policy-aware payment approval. The system uses FastAPI, SQLite, and Python standard libraries, making it transparent, inspectable, and reproducible. We evaluate APEX across three baselines and six scenarios using sample sizes 2-4x larger than initial experiments (N=20-40 per scenario). Results show that policy enforcement reduces total spending by 27.3% while maintaining 52.8% success rate for legitimate requests. Security mechanisms achieve 100% block rate for both replay attacks and invalid tokens with low latency overhead (19.6ms average). Multiple trial runs show low variance across scenarios, demonstrating high reproducibility with 95% confidence intervals. The primary contribution is a controlled agent-payment infrastructure and reference architecture that demonstrates how agentic access monetization can be adapted to fiat systems without discarding security and policy guarantees.
Authors:Ayush Garg, Sophia Hager, Jacob Montiel, Aditya Tiwari, Michael Gentile, Zach Reavis, David Magnotti, Wayne Fullen
Abstract:
Security teams face a challenge: the volume of newly disclosed Common Vulnerabilities and Exposures (CVEs) far exceeds the capacity to manually develop detection mechanisms. In 2025, the National Vulnerability Database published over 48,000 new vulnerabilities, motivating the need for automation. We present RuleForge, an AWS internal system that automatically generates detection rules--JSON-based patterns that identify malicious HTTP requests exploiting specific vulnerabilities--from structured Nuclei templates describing CVE details. Nuclei templates provide standardized, YAML-based vulnerability descriptions that serve as the structured input for our rule generation process. This paper focuses on RuleForge's architecture and operational deployment for CVE-related threat detection, with particular emphasis on our novel LLM-as-a-judge (Large Language Model as judge) confidence validation system and systematic feedback integration mechanism. This validation approach evaluates candidate rules across two dimensions--sensitivity (avoiding false negatives) and specificity (avoiding false positives)--achieving AUROC of 0.75 and reducing false positives by 67% compared to synthetic-test-only validation in production. Our 5x5 generation strategy (five parallel candidates with up to five refinement attempts each) combined with continuous feedback loops enables systematic quality improvement. We also present extensions enabling rule generation from unstructured data sources and demonstrate a proof-of-concept agentic workflow for multi-event-type detection. Our lessons learned highlight critical considerations for applying LLMs to cybersecurity tasks, including overconfidence mitigation and the importance of domain expertise in both prompt design and quality review of generated rules through human-in-the-loop validation.
Authors:Subho Halder, Siddharth Saxena, Kashinath Kadaba Shrish, Thiyagarajan M
Abstract:
Existing benchmarks for LLM-based vulnerability detection compress model performance into a single metric, which fails to reflect the distinct priorities of different stakeholders. For example, a CISO may emphasize high recall of critical vulnerabilities, an engineering leader may prioritize minimizing false positives, and an AI officer may balance capability against cost. To address this limitation, we introduce SecLens-R, a multi-stakeholder evaluation framework structured around 35 shared dimensions grouped into 7 measurement categories. The framework defines five role-specific weighting profiles: CISO, Chief AI Officer, Security Researcher, Head of Engineering, and AI-as-Actor. Each profile selects 12 to 16 dimensions with weights summing to 80, yielding a composite Decision Score between 0 and 100. We apply SecLens-R to evaluate 12 frontier models on a dataset of 406 tasks derived from 93 open-source projects, covering 10 programming languages and 8 OWASP-aligned vulnerability categories. Evaluations are conducted across two settings: Code-in-Prompt (CIP) and Tool-Use (TU). Results show substantial variation across stakeholder perspectives, with Decision Scores differing by as much as 31 points for the same model. For instance, Qwen3-Coder achieves an A (76.3) under the Head of Engineering profile but a D (45.2) under the CISO profile, while GPT-5.4 shows a similar disparity. These findings demonstrate that vulnerability detection is inherently a multi-objective problem and that stakeholder-aware evaluation provides insights that single aggregated metrics obscure.
Authors:Nitin Kohli, Paul Laskowski
Abstract:
Differentially private mechanisms are increasingly used to publish tables of counts, where each entry represents the number of individuals belonging to a particular category. A distribution of counts summarizes the information in the count column, unlinking counts from categories. This object is useful for answering a class of research questions, but it is subject to statistical biases when counts are privatized with standard mechanisms. This motivates a novel design criterion we term accuracy of distribution. This study formalizes a two-stage framework for privatizing tables of counts that balances accuracy of distribution with two standard criteria of accuracy of counts and runtime. In the first stage, a distribution privatizer generates an estimate for the true distribution of counts. We introduce a new mechanism, called the cyclic Laplace, specifically tailored to distributions of counts, that outperforms existing general-purpose differentially private histogram mechanisms. In the second stage, a constructor algorithm generates a count mechanism, represented as a transition matrix, whose fixed-point is the privatized distribution of counts. We develop a mathematical theory that describes such transition matrices in terms of simple building blocks we call epsilon-scales. This theory informs the design of a new constructor algorithm that generates transition matrices with favorable properties more efficiently than standard optimization algorithms. We explore the practicality of our framework with a set of experiments, highlighting situations in which a fixed-point method provides a favorable tradeoff among performance criteria.
Authors:Kıvanç Kuzey Dikici, Serdar Kara, Semih Çağlar, Eray Tüzün, Sinem Sav
Abstract:
As Large Language Models (LLMs) for code increasingly utilize massive, often non-permissively licensed datasets, evaluating data contamination through Membership Inference Attacks (MIAs) has become critical. We propose SERSEM (Selective Entropy-Weighted Scoring for Membership Inference), a novel white-box attack framework that suppresses uninformative syntactical boilerplate to amplify specific memorization signals. SERSEM utilizes a dual-signal methodology: first, a continuous character-level weight mask is derived through static Abstract Syntax Tree (AST) analysis, spellchecking-based multilingual logic detection, and offline linting. Second, these heuristic weights are used to pool internal transformer activations and calibrate token-level Z-scores from the output logits. Evaluated on a 25,000-sample balanced dataset, SERSEM achieves a global AUC-ROC of 0.7913 on the StarCoder2-3B model and 0.7867 on the StarCoder2-7B model, consistently outperforming the implemented probability-based baselines Loss, Min-K% Prob, and PAC. Our findings demonstrate that focusing on human-centric coding anomalies provides a significantly more robust indicator of verbatim memorization than sequence-level probability averages.
Authors:Anubhab Sahu, Diptisha Samanta, Reza Soosahabi
Abstract:
System Instructions in Large Language Models (LLMs) are commonly used to enforce safety policies, define agent behavior, and protect sensitive operational context in agentic AI applications. These instructions may contain sensitive information such as API credentials, internal policies, and privileged workflow definitions, making system instruction leakage a critical security risk highlighted in the OWASP Top 10 for LLM Applications. Without incurring the overhead costs of reasoning models, many LLM applications rely on refusal-based instructions that block direct requests for system instructions, implicitly assuming that prohibited information can only be extracted through explicit queries. We introduce an automated evaluation framework that tests whether system instructions remain confidential when extraction requests are re-framed as encoding or structured output tasks. Across four common models and 46 verified system instructions, we observe high attack success rates (> 0.7) for structured serialization where models refuse direct extraction requests but disclose protected content in the requested serialization formats. We further demonstrate a mitigation strategy based on one-shot instruction reshaping using a Chain-of-Thought reasoning model, indicating that even subtle changes in wording and structure of system instructions can significantly reduce attack success rate without requiring model retraining.
Authors:Jiaqi Wu, Yiqing Sun, Zhigang Yao
Abstract:
We introduce a differentially private manifold denoising framework that allows users to exploit sensitive reference datasets to correct noisy, non-private query points without compromising privacy. The method follows an iterative procedure that (i) privately estimates local means and tangent geometry using the reference data under calibrated sensitivity, (ii) projects query points along the privately estimated subspace toward the local mean via corrective steps at each iteration, and (iii) performs rigorous privacy accounting across iterations and queries using $(\varepsilon,δ)$-differential privacy (DP). Conceptually, this framework brings differential privacy to manifold methods, retaining sufficient geometric signal for downstream tasks such as embedding, clustering, and visualization, while providing formal DP guarantees for the reference data. Practically, the procedure is modular and scalable, separating DP-protected local geometry (means and tangents) from budgeted query-point updates, with a simple scheduler allocating privacy budget across iterations and queries. Under standard assumptions on manifold regularity, sampling density, and measurement noise, we establish high-probability utility guarantees showing that corrected queries converge toward the manifold at a non-asymptotic rate governed by sample size, noise level, bandwidth, and the privacy budget. Simulations and case studies demonstrate accurate signal recovery under moderate privacy budgets, illustrating clear utility-privacy trade-offs and providing a deployable DP component for manifold-based workflows in regulated environments without reengineering privacy systems.
Authors:Anurag K. S. V., Shubham Chouhan, K. Srinivasan, G. Raghavan, Kanaka Raju P
Abstract:
Random Number Generators (RNGs) are crucial for applications ranging from cryptography to simulations. Depending on the source of randomness, RNGs are classified into Pseudo-Random Number Generators (PRNGs), True Random Number Generators (TRNGs), and Quantum Random Number Generators (QRNGs). This work presents the end-to-end development of a high-speed, high-efficiency, phase-noise-based QRNG system that taps into the quantum phase noise of a single-frequency laser, with randomness originating from spontaneous emission. Using a self-heterodyne measurement with a semiconductor laser (linewidth $\approx$ 5.23 $GHz$) operated near threshold and a $\sim$48 $cm$ fiber delay line, a raw data generation rate of 2.0 $Gbps$ is achieved. To ensure uniform randomness in the QRNG output, robust extraction techniques developed in-house, such as the Toeplitz Strong Extractor (TSE), are used. Randomness validation using the NIST and Diehard test suites confirms that all statistical tests pass at standard confidence levels. The developed system achieves a post-processed generation rate of 1.0 $Gbps$ in operation and attains a Technology Readiness Level (TRL) of 7, approaching TRL 8, making it suitable for real-time secure applications such as cryptographic key generation and stochastic modeling.
Authors:Gabrielle De Micheli, Syed Mahbub Hafiz, Geovandro Pereira, Eduardo L. Cominetti, Thales B. Paiva, Jina Choi, Marcos A. Simplicio, Bahattin Yildiz
Abstract:
Face recognition models operate in a client-server setting where a client extracts a compact face embedding and a server performs similarity search over a template database. This raises privacy concerns, as facial data is highly sensitive. To provide cryptographic privacy guarantees, one can use fully homomorphic encryption to perform end-to-end encrypted similarity search. However, existing FHE-based protocols are computationally costly and, impose high memory overhead. Building on prior work, HyDia, we introduce algorithmic and system-level improvements targeting real-world deployment with resource-constrained clients. First, we propose BSGS-Diagonal, an algorithm delivering fast and memory-efficient similarity computation. BSGS-Diagonal substantially shrinks the rotation-key set, lowering both client and server memory requirements, and also improves practical server runtime. This yields a 91% reduction in the number of rotation keys, translating to approximately 14 GB less memory used on the client, and reducing overall CPU peak RAM from over 30 GB in the original HyDia to under 10 GB for databases up to size 1M. In addition, runtime is improved by up to 1.57x for the membership verification scenario and 1.43x for the identification scenario. Secondly, we introduce fully GPU-optimized similarity matrix computation kernels. The implementation is built upon FIDESlib, a CKKS-level GPU library based on OpenFHE. Rather than offloading individual CKKS primitives in isolation, the integrated kernels fuse operations to avoid repeated CPU-GPU ciphertext movement and costly FIDESlib/OpenFHE data-structure conversions. As a result, our GPU implementations of both HyDia and BSGS-Diagonal achieve up to 9x and 17x speedups, respectively, enabling sub-second encrypted face recognition for databases up to 32K entries while further reducing host memory usage.
Authors:Razi Iqbal, Awais Ahmad, Asfandyar Gillani
Abstract:
This paper brings up this idea of using Near Field Communication (NFC) for inventory control system instead of using traditional barcodes. NFC because of its high security, ease of use and efficiency can be very suitable for systems like inventory control. In traditional inventory control systems, each product has a barcode pasted on it, which is vulnerable to attacks as barcodes are open and have no security. Furthermore, barcodes are prone to damages and can be unreliable when pasted on different types of products e.g. hot and frozen products, circular shaped products and irregular shaped products like clothes etc. NFC on the other hand is very efficient, secure and reliable when it comes to short-range wireless communication. In this paper we will present our prototype for the inventory control system of an electronic store in which each product has a passive NFC tag pasted to it. When a customer buys a product the receipt of the product is generated using NFC between the NFC passive tag on the product and NFC enabled device (e.g. smart phone or reader) at the cash counter.
Authors:Sameer Shaik, Zhen Huang, Daniela Stan Raicu, Jacob Furst
Abstract:
Detecting software vulnerabilities is critical to ensuring the security and reliability of modern computer systems. Deep neural networks have shown promising results on vulnerability detection, but they lack the capability to capture global contextual information on vulnerable code. To address this limitation, we explore the application of transformers for C/C++ vulnerability detection. We use program slices that encapsulate key syntactic and semantic features of program code, such as API function calls, array usage, pointer manipulations, and arithmetic expressions. By leveraging transformers' capability to capture both local and global contextual information on vulnerable code, our work can identify vulnerabilities accurately. Combined with data balancing and hyperparameter fine-tuning, our work offers a robust and efficient approach to identifying vulnerable code with moderate resource usage and training time.
Authors:Ema Mauko, Shane D Johnson, Enrico Mariconti
Abstract:
Cloud computing has drastically altered the ways in which it is possible to deliver information technologies in a service-led structure, however, this has also been reflected in the cybercrime domain. Cybercrime as a Service is an economic model where a technically skilled actor offers a given cyberattack as an end-to-end service to non-technical actors who pay a subscription fee for said service. The services, which can vary in scope, targets, and delivery modes, include everything from the vulnerability discoveries, delivery of the attack, and the attack itself to financial rewards to the subscriber. In this scoping literature review, we analysed 195 articles from both academic and grey literature with a view of investigating the services articles studied, the methodological approach the how the CaaS model is predicted to develop in the future. Our review indicates that with further commercialisation of the model will further lower the barrier of entry to the cybercrime realm, increase sophistication of the attacks and increase resilience of the service providers and their ecosystem which will result in harder shutdowns of services by the authorities. Furthermore, as the model becomes more accessible, groups such as organised crime groups, extremist actors may use them as well, which may have implications for criminal activity in both cyber and physical domains.
Authors:William Tighe, George Brumpton, Mark Carney, Benjamin T. H. Varcoe
Abstract:
Quantum key distribution is often regarded as an unconditionally secure method to exchange a secret key by harnessing fundamental aspects of quantum mechanics. Despite the robustness of key exchange, classical post-processing reveals vulnerabilities that an eavesdropper could target. In particular, many reconciliation protocols correct errors by comparing the parities of subsets between both parties. These communications occur over insecure channels, leaking information that an eavesdropper could exploit. Currently there is no holistic threat model that addresses how parity-leakage during reconciliation might be actively manipulated. In this paper we introduce a new form of attack, namely the Manipulate-and-Observe attack in which the adversary (1) partially intercepts a fraction $ρ$ of the qubits during key exchange, injecting the maximally tolerated amount of errors up to the 11 percent error threshold whilst remaining undetected and (2) probes the maximum amount of parity-leakage during reconciliation, and exploits it using a vectorised, parallel brute force filter to shrink the search space from 2n down to as few as a single candidate, for an n-bit reconciled key. We perform simulations of the attack, deploying it on the most widely used protocol, BB84, andthe benchmark reconciliation protocol, Cascade. Our simulation results demonstrate that the attack can significantly reduce the security below the theoretical bound and, in the worst case, fully recover the reconciled key material. The principles of the attack could threaten other parity-based reconciliation schemes, like Low Density Parity Check, which underscores the need for urgent consideration of the combined security of key exchange and post-processing.
Authors:Kok Ping Lim, Dongyang Jia, Iftekhar Salam
Abstract:
Lightweight cryptographic primitives are widely deployed in resource-constraint environment, particularly in the Internet of Things (IoT) devices. Due to their public accessibility, these devices are vulnerable to physical attacks, especially fault attacks. Recently, deep learning-based cryptanalytic techniques have demonstrated promising results; however, their application to fault attacks remains limited, particularly for stream ciphers. In this work, we investigate the feasibility of deep learning assisted differential fault attack on three lightweight stream ciphers, namely ACORNv3, MORUSv2 and ATOM, under a relaxed fault model, where a single-bit bit-flipping fault is injected at an unknown location. We train multilayer perceptron (MLP) models to identify the fault locations. Experimental results show that the trained models achieve high identification accuracies of 0.999880, 0.999231 and 0.823568 for ACORNv3, MORUSv2 and ATOM, respectively, and outperform traditional signature-based methods. For the secret recovery process, we introduce a threshold-based method to optimize the number of fault injections required to recover the secret information. The results show that the initial state of ACORN can be recovered with 21 to 34 faults; while MORUS requires 213 to 248 faults, with at most 6 bits of guessing. Both attacks reduce the attack complexity compared to existing works. For ATOM, the results show that it possesses a higher security margin, as majority of state bits in the Non-linear Feedback Shift Register (NFSR) can only be recovered under a precise control model. To the best of our knowledge, this work provides the first experimental results of differential fault attacks on ATOM.
Authors:Akhil Gupta Chigullapally, Sharvan Vittala, Razin Farhan Hussian, Mohsen Amini Salehi
Abstract:
The fast pace of modern AI is rapidly transforming traditional industrial systems into vast, intelligent and potentially unmanned autonomous operational environments driven by AI-based solutions. These solutions leverage various forms of machine learning, reinforcement learning, and generative AI. The introduction of such smart capabilities has pushed the envelope in multiple industrial domains, enabling predictive maintenance, optimized performance, and streamlined workflows. These solutions are often deployed across the Industrial Internet of Things (IIoT) and supported by the Edge-Fog-Cloud computing continuum to enable urgent (i.e., real-time or near real-time) decision-making. Despite the current trend of aggressively adopting these smart industrial solutions to increase profit, quality, and efficiency, large-scale integration and deployment also bring serious hazards that if ignored can undermine the benefits of smart industries. These hazards include unforeseen interoperability side-effects and heightened vulnerability to cyber threats, particularly in environments operating with a plethora of heterogeneous IIoT systems. The goal of this study is to shed light on the potential consequences of industrial smartness, with a particular focus on security implications, including vulnerabilities, side effects, and cyber threats. We distinguish software-level downsides stemming from both traditional AI solutions and generative AI from those originating in the infrastructure layer, namely IIoT and the Edge-Cloud continuum. At each level, we investigate potential vulnerabilities, cyber threats, and unintended side effects. As industries continue to become smarter, understanding and addressing these downsides will be crucial to ensure secure and sustainable development of smart industrial systems.
Authors:Jinwook Kim, Jonghun Hong
Abstract:
There have been various attempts at token standards on numerous blockchain platforms today to fundamentally change the way assets are traded in the traditional capital markets, but there is a lack of research and resolution on regulatory issues that become the common foundation for interoperability and reusable standards. Our proposal, Regulatory Compliance Protocol (RCP), is based on the regulations and reports of 15 global financial institutions and standardizes recommendations and guidelines involving the overall asset tokenization of TradFi and DeFi into five regulatory groups: Traceability, Confidentiality, Enforceability, Finality and Tokenizability, compiling them into 31 items and presenting a benchmark for technology and standards as an underlying protocol. To review the legality and effectiveness of RCP, it was validated based on three tokenization and trading scenarios, and through the RCP-based NEW-EIP, it showed superiority over other ERC protocols related to asset tokenization.
Authors:Miles Farmer, Ekincan Ufuktepe, Anne Watson, Hialo Muniz Carvalho, Vadim Okun, Zineb Maasaoui, Kannappan Palaniappan
Abstract:
Large Language Models (LLMs) have emerged as a popular choice in vulnerability detection studies given their foundational capabilities, open source availability, and variety of models, but have limited scalability due to extensive compute requirements. Using the natural graph relational structure of code, we show that our proposed graph neural network (GNN) based deep learning model VulGNN for vulnerability detection can achieve performance almost on par with LLMs, but is 100 times smaller in size and fast to retrain and customize. We describe the VulGNN architecture, ablation studies on components, learning rates, and generalizability to different code datasets. As a lightweight model for vulnerability analysis, VulGNN is efficient and deployable at the edge as part of real-world software development pipelines.
Authors:Alex Berke, Güliz Seray Tuncay, Michael Specter, Mihai Christodorescu
Abstract:
The major mobile platforms, Android and iOS, have introduced changes that restrict user tracking to improve user privacy, yet apps continue to covertly track users via device fingerprinting. We study the opportunity to improve this dynamic with a case study on mobile fingerprinting that evaluates developers' perceptions of how well platforms protect user privacy and how developers perceive platform privacy interventions. Specifically, we study developers' willingness to make changes to protect users from fingerprinting and how developers consider trade-offs between user privacy and developer effort. We do this via a survey of 246 Android developers, presented with a hypothetical Android change that protects users from fingerprinting at the cost of additional developer effort. We find developers overwhelmingly (89%) support this change, even when they anticipate significant effort, yet prefer the change be optional versus required. Surprisingly, developers who use fingerprinting are six times more likely to support the change, despite being most impacted by it. We also find developers are most concerned about compliance and enforcement. In addition, our results show that while most rank iOS above Android for protecting user privacy, this distinction significantly reduces among developers very familiar with fingerprinting. Thus there is an important opportunity for platforms and developers to collaboratively build privacy protections, and we present actionable ways platforms can facilitate this.
Authors:Yicheng Cai, Mitchell John DeStefano, Guodong Dong, Pulkit Handa, Peng Liu, Tejas Singhal, Peiyu Tseng, Winston Jen White
Abstract:
As Large Language Models (LLMs) and multi-agent AI systems are demonstrating increasing potential in cybersecurity operations, organizations, policymakers, model providers, and researchers in the AI and cybersecurity communities are interested in quantifying the capabilities of such AI systems to achieve more autonomous SOCs (security operation centers) and reduce manual effort. In particular, the AI and cybersecurity communities have recently developed several benchmarks for evaluating the red team capabilities of multi-agent AI systems. However, because the operations in SOCs are dominated by blue team operations, the capabilities of AI systems & agents to achieve more autonomous SOCs cannot be evaluated without a benchmark focused on blue team operations. To our best knowledge, no systematic benchmark for evaluating coordinated multi-task blue team AI has been proposed in the literature. Existing blue team benchmarks focus on a particular task. The goal of this work is to develop a set of design principles for the construction of a benchmark, which is denoted as SOC-bench, to evaluate the blue team capabilities of AI. Following these design principles, we have developed a conceptual design of SOC-bench, which consists of a family of five blue team tasks in the context of large-scale ransomware attack incident response.
Authors:Joy Acharya, Smit Patel, Paawan Sharma, Mohendra Roy
Abstract:
Physically Unclonable Functions (PUFs) provide promising hardware security for IoT authentication, leveraging inherent randomness suitable for resource constrained environments. However, ML/DL modeling attacks threaten PUF security by learning challenge-response patterns. This work introduces a custom resistor-capacitor (RC) based dynamically reconfigurable PUF using 32-bit challenge-response pairs (CRPs) designed to resist such attacks. We systematically evaluated robustness by generating a CRP dataset and splitting it into training, validation, and test sets. Multiple ML techniques including Artificial Neural Networks (ANN), Gradient Boosted Neural Networks (GBNN), Decision Trees (DT), Random Forests (RF), and XGBoost, were trained to model PUF behavior. While all models achieved 100% training accuracy, test performance remained near random guessing: 51.05% (ANN), 53.27% (GBNN), 50.06% (DT), 52.08% (RF), and 50.97% (XGBoost). These results demonstrate the proposed PUF's strong resistance to ML-driven modeling attacks, as advanced algorithms fail to reproduce accurate responses. The dynamically reconfigurable architecture enhances robustness against adversarial threats with minimal resource overhead. This simple RC-PUF offers an effective, low-cost alternative to complex encryption for securing next-generation IoT authentication against machine learning-based threats, ensuring reliable device verification without compromising computational efficiency or scalability in deployed IoT networks.
Authors:Ziad Sharawy, Mohammad Nakshbandi, Sorin Mihai Grigorescu
Abstract:
Deep Neural Networks (DNNs) achieve strong performance in semantic segmentation for robotic perception but remain vulnerable to adversarial attacks, threatening safety-critical applications. While robustness has been studied for image classification, semantic segmentation in robotic contexts requires specialized architectures and detection strategies.
Authors:Jinyuan Li, Liang Feng Zhang
Abstract:
As machine learning as a service (MLaaS) gains increasing popularity, it raises two critical challenges: privacy and verifiability. For privacy, clients are reluctant to disclose sensitive private information to access MLaaS, while model providers must safeguard their proprietary models. For verifiability, clients lack reliable mechanisms to ensure that cloud servers execute model inference correctly. Decision trees are widely adopted in MLaaS due to their popularity, interpretability, and broad applicability in domains like medicine and finance. In this context, outsourcing decision tree evaluation (ODTE) enables both clients and model providers to offload their sensitive data and decision tree models to the cloud securely. However, existing ODTE schemes often fail to address both privacy and verifiability simultaneously. To bridge this gap, we propose $\sf PVODTE$, a novel two-server private and verifiable ODTE protocol that leverages homomorphic secret sharing and a MAC-based verification mechanism. $\sf PVODTE$ eliminates the need for server-to-server communication, enabling independent computation by each cloud server. This ``non-interactive'' setting addresses the latency and synchronization bottlenecks of prior arts, making it uniquely suitable for wide-area network (WAN) deployments. To our knowledge, $\sf PVODTE$ is the first two-server ODTE protocol that eliminates server-to-server communication. Furthermore, $\sf PVODTE$ achieves security against \emph{malicious} servers, where servers cannot learn anything about the client's input or the providers' decision tree models, and servers cannot alter the inference result without being detected.
Authors:Ruiyang Wang, Rong Pan, Zhengan Yao
Abstract:
Federated learning (FL) enables distributed clients to collaboratively train a global model using local private data. Nevertheless, recent studies show that conventional FL algorithms still exhibit deficiencies in privacy protection, and the server lacks a reliable and stable aggregation rule for updating the global model. This situation creates opportunities for adversaries: on the one hand, they may eavesdrop on uploaded gradients or model parameters, potentially leaking benign clients' private data; on the other hand, they may compromise clients to launch poisoning attacks that corrupt the global model. To balance accuracy and security, we propose FedFG, a robust FL framework based on flow-matching generation that simultaneously preserves client privacy and resists sophisticated poisoning attacks. On the client side, each local network is decoupled into a private feature extractor and a public classifier. Each client is further equipped with a flow-matching generator that replaces the extractor when interacting with the server, thereby protecting private features while learning an approximation of the underlying data distribution. Complementing the client-side design, the server employs a client-update verification scheme and a novel robust aggregation mechanism driven by synthetic samples produced by the flow-matching generator. Experiments on MNIST, FMNIST, and CIFAR-10 demonstrate that, compared with prior work, our approach adapts to multiple attack strategies and achieves higher accuracy while maintaining strong privacy protection.
Authors:Bhavuk Jain, Sercan Ö. Arık, Hardeo K. Thakur
Abstract:
Multimodal large language models (MLLMs) integrate information from multiple modalities such as text, images, audio, and video, enabling complex capabilities such as visual question answering and audio translation. While powerful, this increased expressiveness introduces new and amplified vulnerabilities to adversarial manipulation. This survey provides a comprehensive and systematic analysis of adversarial threats to MLLMs, moving beyond enumerating attack techniques to explain the underlying causes of model susceptibility. We introduce a taxonomy that organizes adversarial attacks according to attacker objectives, unifying diverse attack surfaces across modalities and deployment settings. Additionally, we also present a vulnerability-centric analysis that links integrity attacks, safety and jailbreak failures, control and instruction hijacking, and training-time poisoning to shared architectural and representational weaknesses in multimodal systems. Together, this framework provides an explanatory foundation for understanding adversarial behavior in MLLMs and informs the development of more robust and secure multimodal language systems.
Authors:Masoud Heidary, Biresh Kumar Joardar
Abstract:
The dependability of AI models relies largely on the reliability of the underlying computation hardware. Hardware aging attacks can compromise the computing substrate and disrupt AI models over the long run. In this work, we present a new hardware aging attack that exploits commutative properties of addition to disrupt the multiply-and-add operation that forms the backbone of almost all AI models. By permuting the inputs of an adder, the attack preserves functional correctness while inducing unbalanced stress among transistors, accelerating delay degradation in the circuit. Unlike prior approaches that rely on input manipulation, additional trojan circuitry, etc., the proposed method incurs virtually no area or software overhead. Experimental results with two types of multipliers, different bit widths, a mix of AI models and datasets demonstrates that the proposed attack degrades inference accuracy by up to 64% in 4 years, posing a significant threat to AI accelerators. The attack can also be extended to arithmetic units of general-purpose processors.
Authors:Jaydeep Rath, Prajwal Panth, P. S. N. Bhaskar
Abstract:
Quantum Key Distribution (QKD) provides information-theoretic security by exploiting the principles of quantum mechanics. Among QKD protocols, the BB84 scheme remains the most widely adopted for both theoretical research and practical implementation. A critical parameter determining the reliability and security of BB84 is the Quantum Bit Error Rate (QBER), which quantifies errors in the sifted key arising from channel noise or potential eavesdropping. This paper presents a systematic review and analysis of QBER within the BB84 protocol, examining its calculation, statistical estimation methods, and role in detecting eavesdropping activity. Simulation results, corroborated by reported experimental observations, reveal a near-linear relationship between eavesdropping intensity and QBER, with values approaching 25% under full intercept-resend attacks. Four confidence interval estimation methods, Wald, Wilson, Clopper-Pearson, and Hoeffding's inequality, are compared for robust QBER analysis in finite-key scenarios. Protocol enhancements, including decoy-state methods, hybrid cryptographic models, and quantum-resistant authentication, are discussed as mechanisms to mitigate errors and strengthen resilience across fiber, free-space, underwater, and satellite QKD systems. Open challenges in distinguishing noise-induced errors from malicious eavesdropping, and the role of adaptive error correction and machine-learning-assisted QBER estimation in future quantum networks, are identified as key directions for further research.
Authors:Aditya Dhodapkar, Farhaan Pishori
Abstract:
When an LLM agent reads a confidential file, then writes a summary, then emails it externally, no single step is unsafe, but the sequence is a data leak. We call this safety drift: individually safe actions compounding into violations. Prior work has measured this problem; we predict it. SafetyDrift models agent safety trajectories as absorbing Markov chains, computing the probability that a trajectory will reach a violation within a given number of steps via closed form absorption analysis. A consequence of the monotonic state design is that every agent will eventually violate safety if left unsupervised (absorption probability 1.0 from all states), making the practical question not if but when, and motivating our focus on finite horizon prediction. Across 357 traces spanning 40 realistic tasks in four categories, we discover that "points of no return" are sharply task dependent: in communication tasks, agents that reach even a mild risk state have an 85% chance of violating safety within five steps, while in technical tasks the probability stays below 5% from any state. A lightweight monitor built on these models detects 94.7% of violations with 3.7 steps of advance warning at negligible computational cost, outperforming both keyword matching (44.7% detection, 55.9% false positive rate) and per step LLM judges (52.6% detection, 38.2% false positive rate) while running over 60,000x faster.
Authors:Praneel Panchigar, Torlach Rush, Matthew Canabarro
Abstract:
Large Language Models (LLMs) consume vast quantities of human-generated content for both training and real-time inference, yet the creators of that content remain largely invisible in the value chain. Existing approaches to data attribution operate either at the model-internals level, tracing influence through gradient signals, or at the legal-policy level through transparency mandates and copyright litigation. Neither provides a runtime mechanism for content creators to know when, by whom, and how their work is being consumed. We introduce the Sovereign Context Protocol (SCP), an open-source protocol specification and reference architecture that functions as an attribution-aware data access layer between LLMs and human-generated content. Inspired by Anthropic's Model Context Protocol (MCP), which standardizes how LLMs connect to tools, SCP standardizes how LLMs connect to creator-owned data, with every access event logged, licensed, and attributable. SCP defines six core methods (creator profiles, semantic search, content retrieval, trust/value scoring, authenticity verification, and access auditing) exposed over both REST and MCP-compatible interfaces. We formalize the protocol's message envelope, present a threat model with five adversary classes, propose a log-proportional revenue attribution model, and report preliminary latency benchmarks from a reference implementation built on FastAPI, ChromaDB, and NetworkX. We situate SCP within the emerging regulatory landscape, including the EU AI Act's Article 53 training data transparency requirements and ongoing U.S. copyright litigation, and argue that the attribution gap requires a protocol-level intervention that makes attribution a default property of data access.
Authors:Gokularam Muthukrishnan, Anshoo Tandon
Abstract:
Differentially private $K$-means clustering enables releasing cluster centers derived from a dataset while protecting the privacy of the individuals. Non-interactive clustering techniques based on privatized histograms are attractive because the released data synopsis can be reused for other downstream tasks without additional privacy loss. The choice of the number of grids for discretizing the data points is crucial, as it directly controls the quantization bias and the amount of noise injected to preserve privacy. The widely adopted strategy selects a grid size that is independent of the number of clusters and also relies on empirical tuning. In this work, we revisit this choice and propose a refined grid-size selection rule derived by minimizing an upper bound on the expected deviation in the K-means objective function, leading to a more principled discretization strategy for non-interactive private clustering. Compared to prior work, our grid resolution differs both in its dependence on the number of clusters and in the scaling with dataset size and privacy budget. Extensive numerical results elucidate that the proposed strategy results in accurate clustering compared to the state-of-the-art techniques, even under tight privacy budgets.
Authors:Christina Karakosta, Lian Alhedaithy, William J. Knottenbelt
Abstract:
Iris-based biometric identification is increasingly recognized for its significant accuracy and long-term stability compared to other biometric modalities such as fingerprints or facial features. However, all biometric modalities are highly sensitive data that raise serious privacy and security concerns, particularly in decentralized and untrusted environments. While Fully Homomorphic Encryption (FHE) has emerged as a promising solution for protecting sensitive data during computation, existing privacy-preserving iris recognition systems face significant performance limitations that hinder their practical deployment. This paper investigates the performance challenges of the current landscape of privacy-preserving iris recognition systems using FHE. Based on these insights, we outline a scalable privacy-preserving framework that aligns with all the requirements specified in the ISO/IEC 24745 standard. Leveraging the Open Iris library, our approach starts with robust iris segmentation, followed by normalization and feature extraction using Gabor filters to generate iris codes. We then apply binary masking to filter out unreliable regions and perform matching using Hamming distance on encrypted iris codes. The accuracy and performance of our proposed privacy-preserving framework is evaluated on the CASIA-Iris-Thousand dataset. Results show that our privacy-preserving framework yields very similar accuracy to the cleartext equivalent, but a much higher computational overhead with respect to pairwise iris template comparisons, of $\sim 120\,000 \times$. This points towards the need for the deployment of two-level schemes in the context of scalable $1-N$ template comparisons.
Authors:Zhijun Jiang, Amin Milani Fard
Abstract:
Achieving high availability and robust security in Kubernetes requires more than reactive scaling and standard perimeter firewalls. Traditional autoscalers, such as HPA, often fail to react quickly to traffic spikes and cannot distinguish between legitimate flash crowds and DDoS attacks. We present an open-source toolchain to provide a traffic-aware autoscaling approach that utilizes an eBPF-based networking layer to enforce security policies at the kernel level while orchestrating scaling decisions based on predictive models. Our results demonstrate that the predictive approach reduces timeout errors by 32% during sudden traffic surges compared to standard reactive scaling, while ensuring immediate network convergence and layer 7 security isolation for newly scaled pods.
Authors:Zhe Zhang, Martijn Goorden, Michel Reniers
Abstract:
Existing literature on timed opacity uses specific definitions for restricted subclasses of timed automata or limited observation models. This lack of a unified definition makes it difficult to establish formal relationships and compare the expressiveness of different opacity variants. This paper establishes a unified framework for timed opacity by introducing a universal observation model for timed automata. First, we introduce an observation model with full observation of time delay and partial observation of locations, clocks, and events. Second, based on this model, we define the notion of evolution-based timed opacity. Third, we mathematically prove that evolution-based timed opacity strictly implies language-based timed opacity and establish a formal equivalence with execution-time opacity under constrained observations. This framework establishes a unified semantic hierarchy for characterizing the landscape of timed opacity.
Authors:Luana Kurmann, Svenja Lage, Violetta Weger
Abstract:
In this paper we present an attack on a recently proposed code-based Private Information Retrieval (PIR) scheme. Indeed, the server can retrieve the index of the desired file with high probability in polynomial time. The attack relies on the fact that random codes over finite rings are free with high probability and that the dimension of the rowspan of the query matrix decreases when the rows corresponding to the desired index are removed.
Authors:Shayan Eskandari, Leid Zejnilovic, Jeremy Clark
Abstract:
Blockchain technology introduces asset types and custody mechanisms that fundamentally break traditional financial auditing paradigms. This paper presents an autoethnographic analysis of cryptoasset auditing challenges, build on top of prior research on a comprehensive framework addressing existence, ownership, valuation, and internal control verification. Drawing from lived experience implementing blockchain systems as an engineer, smart contract auditor, and CTO of a publicly traded cryptoasset firm, we demonstrate how autoethnographic methodology becomes necessary for understanding technical complexities that external analysis cannot capture. Through detailed examination of token airdrops, multi-signature smart contracts, and real-time on-chain reporting, we provide experimental approaches and common scenarios that auditing firms can analyze to address blockchain innovations currently considered technically insurmountable.
Authors:Cian Lalor, Matthew Marshall, Antonio Russo
Abstract:
Bitcoin's limited programmability and transaction throughput have historically prevented native Bitcoin from participating in decentralized finance (DeFi) applications. Existing solutions depend on honest-majority thresholds, or centralized custodial entities that introduce significant trust requirements. This paper introduces Bitcoin Smart Accounts (BSA), a novel protocol that enables native Bitcoin to access DeFi through trust-minimized infrastructure while maintaining self-custody of funds. BSA achieves this through a combination of emulated Bitcoin covenants using Partially Signed Bitcoin Transactions (PSBTs) and Taproot scripts, a Trusted Execution Environment (TEE)-based arbitration system, and destination chain smart contracts that enable DeFi platforms to accept self-custodial Bitcoin as collateral without necessitating protocol-level modifications. The setup leverages liquidity secured by the Lombard Security Consortium which provides a twofold advantage: for a DeFi protocol, liquidators rely on fungible assets with deep liquidity to quickly exit positions, while for a depositor, the general trust assumptions of honest majority (m-of-n) are reduced to existential honesty (1-of-k). We present the complete protocol design, including the Bitcoin architecture, the TEE-based arbitration mechanism, and the Smart Account Registry for protocol management. We provide a security analysis that demonstrates the correctness, safety, and availability properties under our trust model. Our design enables native Bitcoin to serve as collateral in lending markets and other DeFi protocols without requiring users to relinquish custody of funds.
Authors:Yixin Cao, Xianfeng Cheng, Yijie Liu
Abstract:
Transfer-based anti-money laundering (AML) systems monitor token flows through transaction-graph abstractions, implicitly assuming that economically meaningful value migration is sufficiently encoded in transfer-layer connectivity. In this paper, we demonstrate that this assumption, the bedrock of current industrial forensics, fundamentally collapses in composable smart-contract ecosystems. We formalize two structural mechanisms that undermine the completeness of transfer-layer attribution. First, we introduce Principal-Execution-Beneficiary (PEB) separation, where intent originators, transaction executors (e.g., MEV searchers), and ultimate beneficiaries are functionally decoupled. Second, we formalize state-mediated value migration, where economic coupling is enforced through invariant-driven contract state transitions (e.g., AMM reserve rebalancing) rather than explicit transfer continuity. Through a real-world case study of role-separated limit order execution and a constructive cross-pool arbitrage model, we prove that these mechanisms render transfer-layer observation neither attribution-complete nor causally closed. We further argue that simply expanding transfer-layer tracing capabilities fails to resolve the underlying attribution ambiguity inherent in structurally decoupled execution. Under modular composition and open participation markets, these mechanisms are structurally generative, implying that heuristic-based flow tracing has reached a formal observational boundary. We advocate for a paradigm shift toward AML based on execution semantics, focusing on the restitution of economic causality from atomic execution logic and state invariants rather than static graph connectivity.
Authors:Xintao Hu, Feng-Qi Cui
Abstract:
With the emergence of AI techniques for depression diagnosis, the conflict between high demand and limited supply for depression screening has been significantly alleviated. Among various modal data, audio-based depression diagnosis has received increasing attention from both academia and industry since audio is the most common carrier of emotion transmission. Unfortunately, audio data also contains User-sensitive Identity Information (ID), which is extremely vulnerable and may be maliciously used during the smart diagnosis process. Among previous methods, the clarification between depression features and sensitive features has always serve as a barrier. It is also critical to the problem for introducing a safe encryption methodology that only encrypts the sensitive features and a powerful classifier that can correctly diagnose the depression. To track these challenges, by leveraging adversarial loss-based Subspace Decomposition, we propose a first practical framework \name presented for Trustable Audio Affective Computing, to perform automated depression detection through audio within a trustable environment. The key enablers of TAAC are Differentiating Features Subspace Decompositor (DFSD), Flexible Noise Encryptor (FNE) and Staged Training Paradigm, used for decomposition, ID encryption and performance enhancement, respectively. Extensive experiments with existing encryption methods demonstrate our framework's preeminent performance in depression detection, ID reservation and audio reconstruction. Meanwhile, the experiments across various setting demonstrates our model's stability under different encryption strengths. Thus proving our framework's excellence in Confidentiality, Accuracy, Traceability, and Adjustability.
Authors:Eyal Hadad, Mordechai Guri
Abstract:
On-device Vision-Language Models (VLMs) promise data privacy via local execution. However, we show that the architectural shift toward Dynamic High-Resolution preprocessing (e.g., AnyRes) introduces an inherent algorithmic side-channel. Unlike static models, dynamic preprocessing decomposes images into a variable number of patches based on their aspect ratio, creating workload-dependent inputs. We demonstrate a dual-layer attack framework against local VLMs. In Tier 1, an unprivileged attacker can exploit significant execution-time variations using standard unprivileged OS metrics to reliably fingerprint the input's geometry. In Tier 2, by profiling Last-Level Cache (LLC) contention, the attacker can resolve semantic ambiguity within identical geometries, distinguishing between visually dense (e.g., medical X-rays) and sparse (e.g., text documents) content. By evaluating state-of-the-art models such as LLaVA-NeXT and Qwen2-VL, we show that combining these signals enables reliable inference of privacy-sensitive contexts. Finally, we analyze the security engineering trade-offs of mitigating this vulnerability, reveal substantial performance overhead with constant-work padding, and propose practical design recommendations for secure Edge AI deployments.
Authors:Changhee Shin, Bom Kim, Seungsoo Lee
Abstract:
Serverless computing is increasingly adopted for AI-driven workloads due to its automatic scaling and pay-as-you-go model. However, its function-based architecture creates significant security risks, including excessive privilege allocation and poor permission management. In this paper, we present ALPS, an automated framework for enforcing least privilege in serverless environments. Our system employs serverless-tailored static analysis to extract precise permission requirements from function code and a fine-tuned Large Language Model (LLM) to generate language- and vendor-specific security policies. It also performs real-time monitoring to block unauthorized access and adapt to policy or code changes, supporting heterogeneous cloud providers and programming languages. In an evaluation of 8,322 real-world functions across AWS, Google Cloud, and Azure, ALPS achieved 94.8\% coverage for least-privilege extraction, improved security logic generation quality by 220\% (BLEU), 124\% (ChrF++) and 100\% (ROUGE-2), and added minimum performance overhead. These results demonstrate that ALPS provides an effective, practical, and vendor-agnostic solution for securing serverless workloads.
Authors:Peng Wei, Wesley Shu
Abstract:
Knowledge distillation, model extraction, and behavior transfer have become central concerns in frontier AI. The main risk is not merely copying, but the possibility that useful capability can be transferred more cheaply than the governance structure that originally accompanied it. This paper presents a public, trade-secret-safe theoretical framework for reducing that asymmetry at the architectural level. The core claim is that distillation becomes less valuable as a shortcut when high-level capability is coupled to internal stability constraints that shape state transitions over time. To formalize this idea, the paper introduces a constraint-coupled reasoning framework with four elements: bounded transition burden, path-load accumulation, dynamically evolving feasible regions, and a capability-stability coupling condition. The paper is intentionally public-safe: it omits proprietary implementation details, training recipes, thresholds, hidden-state instrumentation, deployment procedures, and confidential system design choices. The contribution is therefore theoretical rather than operational. It offers a falsifiable architectural thesis, a clear threat model, and a set of experimentally testable hypotheses for future work on distillation resistance, alignment, and model governance.
Authors:Vasu Srinivasan, Dhriti Vasu
Abstract:
We present a Sovereign AI architecture for clinical triage in which all inference is performed on-device and inbound data is delivered via a physically unidirectional channel, implemented using receive-only broadcast infrastructure or certified hardware data diodes, with no return path to any external network. This design removes the network-mediated attack surface by construction, rather than attempting to secure it through software controls. The system performs conversational symptom intake, integrates device-captured vitals, and produces structured, triage-aligned clinical records at the point of care. We formalize the security properties of receiver-side unidirectionality and show that the architecture is transport-agnostic across broadcast and diode-enforced deployments. We further analyze threat models, enforcement mechanisms, and deployment configurations, demonstrating how physical one-way data flow enables high-assurance operation in both resource-constrained and high-risk environments. This work positions physically unidirectional channels as a foundational primitive for sovereign, on-device clinical intelligence at the front door of care.
Authors:Lucas Miranda, Carlos Banjar, Daniel Menasche, Anton Kocheturov, Gaurav Srivastava, Tobias Limmer
Abstract:
Assessing the security posture of Industrial Control Systems (ICS) is critical for protecting essential infrastructure. However, the complexity and scale of these environments make it challenging to identify and prioritize potential attack paths. This paper introduces a semi-automated approach for generating attack graphs in ICS environments to visualize and analyze multi-step attack scenarios. Our methodology integrates network topology information with vulnerability data to construct a model of the system. This model is then processed by a stateful traversal algorithm to identify potential exploit chains based on preconditions and consequences. We present a case study applying the proposed framework to the Siemens PCS7 Cybersecurity Blueprint for Water Treatment Plants. The results demonstrate the framework's ability to simulate different attack scenarios, including those originating from known CVEs and potential device misconfigurations. We show how a single point of failure can compromise network segmentation and how patching a critical vulnerability can protect an entire security zone, providing actionable insights for risk mitigation.
Authors:Jiasun Li, Project Team
Abstract:
The growing replication crisis across disciplines such as economics, finance, and other social sciences as well as computer science undermines the credibility of academic research. Current institutional solutions -- such as artifact evaluations and replication packages -- suffer from critical limitations, including shortages of qualified data editors, difficulties in handling proprietary datasets, inefficient processes, and reliance on voluntary labor. This paper proposes a novel framework leveraging new technological advances in trusted-execution environments (TEEs) -- exemplified by Intel Trust Domain Extensions (TDX) -- to address the replication crisis in a cost-effective and scalable manner. Under our approach, authors execute replication packages within a cloud-based TEE and submit cryptographic proofs of correct execution, for which journals or conferences can efficiently verify without re-running the code. This reallocates the operational burden to authors while preserving data confidentiality and eliminating reliance on scarce editorial resources. As a proof of concept, we validate the feasibility of this system through field experiments, reporting a pilot study replicating published papers on TDX-backed cloud VMs, finding average costs of \$1.35--\$1.80 per package with minimal computational overhead relative to standard VMs and high success rates even for novice users with no prior TEE experience. We also provide a conduct formal analysis showing that TEE adoption is incentive-compatible for authors, cost-dominant for journals, and constitutes an equilibrium in the certification market. The findings highlight the potential of TEE technology to provide a sustainable, privacy-preserving, and efficient mechanism to address the replication crisis in academia.
Authors:Zhenyi Wang, Siyu Luan
Abstract:
As machine learning (ML) systems expand in both scale and functionality, the security landscape has become increasingly complex, with a proliferation of attacks and defenses. However, existing studies largely treat these threats in isolation, lacking a coherent framework to expose their shared principles and interdependencies. This fragmented view hinders systematic understanding and limits the design of comprehensive defenses. Crucially, the two foundational assets of ML -- \textbf{data} and \textbf{models} -- are no longer independent; vulnerabilities in one directly compromise the other. The absence of a holistic framework leaves open questions about how these bidirectional risks propagate across the ML pipeline. To address this critical gap, we propose a \emph{unified closed-loop threat taxonomy} that explicitly frames model-data interactions along four directional axes. Our framework offers a principled lens for analyzing and defending foundation models. The resulting four classes of security threats represent distinct but interrelated categories of attacks: (1) Data$\rightarrow$Data (D$\rightarrow$D): including \emph{data decryption attacks and watermark removal attacks}; (2) Data$\rightarrow$Model (D$\rightarrow$M): including \emph{poisoning, harmful fine-tuning attacks, and jailbreak attacks}; (3) Model$\rightarrow$Data (M$\rightarrow$D): including \emph{model inversion, membership inference attacks, and training data extraction attacks}; (4) Model$\rightarrow$Model (M$\rightarrow$M): including \emph{model extraction attacks}. Our unified framework elucidates the underlying connections among these security threats and establishes a foundation for developing scalable, transferable, and cross-modal security strategies, particularly within the landscape of foundation models.
Authors:Aymen Bouferroum, Valeria Loscri, Abderrahim Benslimane
Abstract:
The Industrial Internet of Things (IIoT) introduces significant security challenges as resource-constrained devices become increasingly integrated into critical industrial processes. Existing security approaches typically address threats at a single network layer, often relying on expensive hardware and remaining confined to simulation environments. In this paper, we present the research framework and contributions of our doctoral thesis, which aims to develop a lightweight, Machine Learning (ML)-based security framework for IIoT environments. We first describe our adoption of the Tm-IIoT trust model and the Hybrid IIoT (H-IIoT) architecture as foundational baselines, then introduce the Trust Convergence Acceleration (TCA) approach, our primary contribution that integrates ML to predict and mitigate the impact of degraded network conditions on trust convergence, achieving up to a 28.6% reduction in convergence time while maintaining robustness against adversarial behaviors. We then propose a real-world deployment architecture based on affordable, open-source hardware, designed to implement and extend the security framework. Finally, we outline our ongoing research toward multi-layer attack detection, including physical-layer threat identification and considerations for robustness against adversarial ML attacks.
Authors:Hao Zhou, Siqi Cai, Hua Dai, Geng Yang, Jing Luo, Hui Cai
Abstract:
Differential privacy (DP) is crucial for safeguarding sensitive client information in federated learning (FL), yet traditional DP-FL methods rely predominantly on fixed gradient clipping thresholds. Such static clipping neglects significant client heterogeneity and varying privacy sensitivities, which may lead to an unfavorable privacy-utility trade-off. In this paper, we propose PAC-DP, a Personalized Adaptive Clipping framework for federated learning under record-level local differential privacy. PAC-DP introduces a Simulation-CurveFitting approach leveraging a server-hosted public proxy dataset to learn an effective mapping between personalized privacy budgets epsilon and gradient clipping thresholds C, which is then deployed online with a lightweight round-wise schedule. This design enables budget-conditioned threshold selection while avoiding data-dependent tuning during training. We provide theoretical analyses establishing convergence guarantees under the per-example clipping and Gaussian perturbation mechanism and a reproducible privacy accounting procedure. Extensive evaluations on multiple FL benchmarks show that PAC-DP surpasses conventional fixed-threshold approaches under matched privacy budgets, improving accuracy by up to 26% and accelerating convergence by up to 45.5% in our evaluated settings.
Authors:Shenghan Zheng, Qifan Zhang
Abstract:
AI agent protocols -- including MCP, A2A, ANP, and ACP -- enable autonomous agents to discover capabilities, delegate tasks, and compose services across trust boundaries. Despite massive deployment (MCP alone has 97M+ monthly SDK downloads), no systematic security framework for these protocols exists. We present three contributions. First, the Agent Protocol Stack, a 6-layer architectural model that defines what a complete agent protocol must specify at each layer -- analogous to ITU-T X.800 for the OSI stack. Second, the Agent-Agnostic Security Model, 11 security principles formalized as TLA+ invariants, each tagged with a property taxonomy (spec-mandated, spec-recommended, aasm-hardening, aps-completeness) that distinguishes protocol non-conformance from framework-imposed security requirements. Third, AgentConform, a two-phase conformance checker that (i)extracts normative clauses from protocol specifications into a typed Protocol~IR with explicit Protocol/Environment/Adversary action separation, (ii)compiles the IR into TLA+ models and model-checks them against AASM invariants, then (iii)replays counterexample traces against live SDK implementations to confirm findings. We introduce the Composition Safety (CS) principle: security properties that hold for individual protocols can break when protocols are composed through shared infrastructure. We demonstrate this with formal models of five protocol composition patterns, revealing cross-protocol design gaps that individual protocol analysis cannot detect. Preliminary application to representative agent protocols reveals recurrent gaps in credential lifecycle, consent enforcement, audit completeness, and composition safety. Some findings are under coordinated disclosure; full evaluation details will be released in the complete version.
Authors:Kaya Alpturer, Constantine Doumanidis, Aviv Zohar
Abstract:
Peer-discovery protocols within P2P networks are often vulnerable: because creating network identities is essentially free, adversaries can eclipse honest nodes or partition the overlay. This threat is especially acute for blockchains, whose security depends on resilient peer connectivity. We present AetherWeave, a stake-backed peer-discovery protocol that ties network participation to deposited stake, raising the cost of large-scale attacks. We prove that, with high probability, either the honest overlay remains connected or a $(1{-}δ)$-fraction of nodes in every smaller component raise an attack-detection flag -- even against a very powerful adversary. To our knowledge, AetherWeave is the first peer-discovery protocol to simultaneously provide Sybil resistance and privacy: nodes prove they hold valid stake without revealing which deposit they own, and gossiping does not expose peer-table contents. A cryptographic commitment scheme rate-limits discovery requests per round; exceeding the limit yields a publicly verifiable misbehavior proof that triggers on-chain slashing. Beyond deposit and slashing, the protocol requires no on-chain interaction, with per-node communication scaling as $O(s\sqrt{n})$. We validate our design through a mean-field analysis with closed-form convergence bounds, extensive adversarial simulations, and an end-to-end prototype built by forking Prysm, a leading Ethereum consensus client.
Authors:Oleksandr Yarotskyi, José D'Abruzzo Pereira, João R. Campos
Abstract:
The widespread adoption of web applications has made their security a critical concern and has increased the need for systematic ways to assess whether they can be considered trustworthy. However, "trust" assessment remains an open problem as existing techniques primarily focus on detecting known vulnerabilities or depend on manual evaluation, which limits their scalability; therefore, evaluating adherence to secure coding practices offers a complementary, pragmatic perspective by focusing on observable development behaviors. In practice, the identification and verification of secure coding practices are predominantly performed manually, relying on expert knowledge and code reviews, which is time-consuming, subjective, and difficult to scale. This study presents an empirical methodology to automate the trustworthiness assessment of web applications by leveraging Large Language Models (LLMs) to verify adherence to secure coding practices. We conduct a comparative analysis of prompt engineering techniques across five state-of-the-art LLMs, ranging from baseline zero-shot classification to prompts enriched with semantic definitions, structural context derived from call graphs, and explicit instructional guidance. Furthermore, we propose an extension of a hierarchical Quality Model (QM) based on the Logic Score of Preference (LSP), in which LLM outputs are used to populate the model's quality attributes and compute a holistic trustworthiness score. Experimental results indicate that excessive structural context can introduce noise, whereas rule-based instructional prompting improves assessment reliability. The resulting trustworthiness score allows discriminating between secure and vulnerable implementations, supporting the feasibility of using LLMs for scalable and context-aware trust assessment.
Authors:Filip Rezabek, Dahlia Malkhi, Amir Yahalom
Abstract:
The emergence of decentralized satellite networks creates a pressing need for trust architectures that operate without physical access to hardware, without pre-provisioned vendor secrets, and without dependence on a single manufacturer's attestation service. Terrestrial TEEs are insufficient: hardware-based designs are susceptible to physical attacks, and most platforms root their attestation chains in secrets provisioned during manufacturing, creating a pre-launch trust window and single-vendor dependency that cannot be independently audited. We present Space Fabric, an architecture that provides the missing trust foundation for orbital computing by relocating the trusted computing stack to satellite infrastructure, exploiting post-launch physical inaccessibility as a tamper barrier unattainable by terrestrial deployments. Our Satellite Execution Assurance Protocol binds workload execution to a specific satellite via a Byzantine-tolerant endorsement quorum of distributed ground stations, certifying not only \emph{what} executes inside the TEE but also \emph{where}. All cryptographic secrets are generated within co-located secure elements after launch, with no signing keys accessible on Earth at any point. To reduce single-vendor dependence, Space Fabric distributes its trust anchor across two independent secure elements, an NXP SE050 and a TROPIC01, both of which must co-sign attestation evidence. We implement Space Fabric on a USB Armory Mk II with ARM TrustZone, verify attestation end-to-end using Veraison, and provide a security analysis with satisfaction arguments and impossibility bounds under a strong adaptive adversary.
Authors:Kyrylo Riabov, Serhii Kryvyi
Abstract:
Ring-mapping protocols need a canonical byte-to-residue layer before any algebraic encryption step can begin. This paper isolates that layer and presents the base-m length codec, a canonical map from byte strings of length less than 2^64 to lists of residues modulo m. The encoder builds on and adapts an rANS-based system proposed by Duda. Decoding is exact for all moduli satisfying the paper's parameter bounds. Because the encoding carries the byte length in its fixed-width header, decoding is also tolerant to appended valid suffix digits. The paper is accompanied by a Rust implementation of the described protocol, a Lean 4 formalization of the abstract codec with machine-checked proofs, and performance benchmarks. The Lean 4 formalization establishes fixed-width prefix inversion and payload-state bounds below 2^64, stream-level roundtrip correctness, and that every emitted symbol is a valid residue modulo m. We conclude with a complexity analysis and a discussion of practical considerations arising in real-world use of the codec.
Authors:Roberto Metere, Mario Lilli, Luca Arnaboldi, Elvinia Riccobene
Abstract:
The latest Wi-Fi security standard, IEEE 802.11, includes a secure authentication protocol called SAE, whose use is mandatory for WPA3-Personal networks. The protocol is specified at two separate but linked levels: a traditional cryptographic description of the communication logic between network devices, and a state machine description that realises the former in each single device. Current formal verification efforts focus mainly on communication logic. We present detailed formal models of the protocol at both levels, provide precise specifications of its security properties, and analyse machine-checked proofs in ProVerif and ASMETA. The integrated analysis of the above two models is particularly novel, enabling us to identify and address several issues in the current IEEE 802.11 specification more thoroughly than would have been possible otherwise, leading to several official revisions of the standard.
Authors:Emir Karaosman, Advije Rizvani, Irdin Pekaric
Abstract:
Financial institutions face increasing cyber risk while operating under strict regulatory oversight. To manage this risk, they rely heavily on Cyber Threat Intelligence (CTI) to inform detection, response, and strategic security decisions. Artificial intelligence (AI) is widely suggested as a means to strengthen CTI. However, evidence of trustworthy production use in finance remains limited. Adoption depends not only on predictive performance, but also on governance, integration into security workflows and analyst trust. Thus, we examine how AI is used for CTI in practice within financial institutions and what barriers prevent trustworthy deployment. We report a mixed-methods, user-centric study combining a CTI-finance-focused systematic literature review, semi-structured interviews, and an exploratory survey. Our review screened 330 publications (2019-2025) and retained 12 finance-relevant studies for analysis; we further conducted six interviews and collected 14 survey responses from banks and consultancies. Across research and practice, we identify four recurrent socio-technical failure modes that hinder trustworthy AI-driven CTI: (i) shadow use of public AI tools outside institutional controls, (ii) license-first enablement without operational integration, (iii) attacker-perception gaps that limit adversarial threat modeling, and (iv) missing security for the AI models themselves, including limited monitoring, robustness evaluation and audit-ready evidence. Survey results provide additional insights: 71.4% of respondents expect AI to become central within five years, 57.1% report infrequent current use due to interpretability and assurance concerns and 28.6% report direct encounters with adversarial risks. Based on these findings, we derive three security-oriented operational safeguards for AI-enabled CTI deployments.
Authors:Ali Dehghantanha, Sajad Homayoun
Abstract:
Recent AI systems combine large language models with tools, external knowledge via retrieval-augmented generation (RAG), and even autonomous multi-agent decision loops. This agentic AI paradigm greatly expands capabilities - but also vastly enlarges the attack surface. In this systematization, we map out the trust boundaries and security risks of agentic LLM-based systems. We develop a comprehensive taxonomy of attacks spanning prompt-level injections, knowledge-base poisoning, tool/plug-in exploits, and multi-agent emergent threats. Through a detailed literature review, we synthesize evidence from 2023-2025, including more than 20 peer-reviewed and archival studies, industry reports, and standards. We find that agentic systems introduce new vectors for indirect prompt injection, code execution exploits, RAG index poisoning, and cross-agent manipulation that go beyond traditional AI threats. We define attacker models and threat scenarios, and propose metrics (e.g., Unsafe Action Rate, Privilege Escalation Distance) to evaluate security posture. Our survey examines defenses such as input sanitization, retrieval filters, sandboxes, access control, and "AI guardrails," assessing their effectiveness and pointing out the areas where protection is still lacking. To assist practitioners, we outline defensive controls and provide a phased security checklist for deploying agentic AI (covering design-time hardening, runtime monitoring, and incident response). Finally, we outline open research challenges in secure autonomous AI (robust tool APIs, verifiable agent behavior, supply-chain safeguards) and discuss ethical and responsible disclosure practices. We systematize recent findings to help researchers and engineers understand and mitigate security risks in agentic AI.
Authors:Samuel Ozechi, Jennifer Okonkwoabutu
Abstract:
As the Internet of Things (IoT) continues to expand across critical infrastructure, smart environments, and consumer devices, securing them against cyber threats has become increasingly vital. Traditional intrusion detection models often treat IoT threats as binary classification problems or rely on opaque models, thereby limiting trust. This work studies multiclass threat attribution in IoT environments using the CICIoT2023 dataset, grouping over 30 attack variants into 8 semantically meaningful classes. We utilize a combination of a gradient boosting model and SHAP (SHapley Additive exPlanations) to deliver both global and class-specific explanations, enabling detailed insight into the features driving each attack classification. The results show that the model distinguishes distinct behavioral signatures of the attacks using flow timing, packet size uniformity, TCP flag dynamics, and statistical variance. Additional analysis that exposes both feature attribution and the decision trajectory per class further validates these observed patterns. Our findings contribute to the development of more accurate and explainable intrusion detection systems, bridging the gap between high-performance machine learning and the need for trust and accountability in AI-driven cybersecurity for IoT environments.
Authors:Carlos Jimeno Miguel, Mikel Izal Azcarate
Abstract:
Capture The Flag (CTF) competitions have established themselves as a highly effective pedagogical tool in cybersecurity education, offering students hands-on experience in realistic attack and defense scenarios. However, organizing and hosting these events requires considerable infrastructure effort, which frequently limits their adoption in academic settings. This paper presents the design, iterative development, and evaluation of a CTF as a Service (CaaS) platform built on Proxmox virtualization, leveraging Infrastructure as Code (IaC) tools such as Terraform and Ansible, container orchestration via Docker Swarm, and load balancing with HAProxy. The system supports both a development-centered workflow, in which challenges are automatically deployed from a Git repository through a CI/CD pipeline, and a deployment-oriented workflow for ad-hoc infrastructure provisioning. The paper describes the design decisions made, the challenges encountered during development, and the solutions implemented to achieve session persistence, external routing, and challenge replicability. The platform is designed to evolve into a CTF hosting service with commercial potential, and future lines of work are outlined regarding automatic scaling, monitoring integration, and frontend standardization.
Authors:Athanasios P. Pelekoudas, Epameinondas Bolis, Jasmin Lindner, Prodromos Kyriakidis, Mathias Davidsen, Johannes T. E. Hansen, Christian H. Reichkendler, Sajad Homayoun
Abstract:
Phishing attacks remain a persistent cybersecurity threat, and the widespread adoption of TLS certificates has unintentionally enabled malicious websites to appear trustworthy to users. This study examines whether certificate metadata and domain characteristics can help distinguish phishing domains from benign domains within the Danish .dk namespace. A dataset was constructed by combining registry information from Punktum dk with phishing reports and popularity rankings from external sources. TLS certificate attributes were collected using Netlas, while additional domain-based features were derived from DNS records and lexical analysis of domain names. The analysis compares phishing, popular, and less frequently visited domains across several feature categories, including Certificate Authorities (CAs), validity periods, missing certificate fields, SAN structure, registrant geography, hosting providers, and lexical properties of domain names. The results indicate that several features show observable differences between phishing and highly popular domains. However, phishing domains often resemble less popular domains, resulting in substantial overlap across many characteristics. Consequently, no individual feature provides a reliable standalone indicator of phishing activity within the Danish namespace. The findings suggest that certificate and domain attributes may still contribute to detection when combined, while also highlighting the limitations of relying on individual indicators in isolation. This work provides an empirical overview of phishing-related infrastructure patterns in the Danish .dk ecosystem and offers insights that may inform future phishing detection approaches.
Authors:Bhagya Chembakottu, Martin P. Robillard
Abstract:
Developers rely on online tutorials to learn web application security, but tutorial quality varies. We reviewed 132 free security tutorials to examine topic coverage, authorship, and technical depth. Our analysis shows that most tutorials come from vendors and emphasize high-level explanations over concrete implementation guidance. Few tutorials provide complete runnable code examples or direct links to authoritative security resources such as the Open Web Application Security Project (OWASP), Common Weakness Enumeration (CWE), or Common Vulnerabilities and Exposures (CVE). We found that two visible signals help identify more useful tutorials: the presence of runnable code and direct links to official resources. These signals can help developers distinguish broad awareness material from tutorials that better support secure implementation.
Authors:Lujia Liang, Lei Zhang
Abstract:
Open Radio Access Network (O-RAN) is a major advancement in the telecommunications field, providing standardized interfaces that promote interoperability between different vendors' technologies, thereby enhancing network flexibility and reducing operational expenses. By leveraging cutting-edge developments in network virtualization and artificial intelligence, O-RAN enhances operational efficiency and stimulates innovation within an open ecosystem. In the context of 6G, the potential capabilities of O-RAN have been significantly expanded, enabling ultra-reliable low-latency communication, terabit-level data rates, and seamless integration of terrestrial and non-terrestrial networks. Despite these benefits, its open architecture paradigm also brings critical security and privacy challenges, which, if not addressed, could compromise network integrity and data confidentiality. This paper conducts a comprehensive investigation into the security vulnerabilities and privacy issues associated with the O-RAN architecture in the context of the evolving 6G landscape, systematically categorizing fundamental vulnerabilities, meticulously examining potential attack vectors, and assessing current and future threats. In addition, this study also examines the existing and emerging security mechanisms of O-RAN and reviews the ongoing standardization activities aimed at strengthening the O-RAN security framework.
Authors:Cemre Cadir, Salim Najib, Yanina Y. Shkel
Abstract:
The exact composition of mechanisms for which two differential privacy (DP) constraints hold simultaneously is studied. The resulting privacy region admits an exact representation as a mixture over compositions of mechanisms of heterogeneous DP guarantees, yielding a framework that naturally generalizes to the composition of mechanisms for which any number of DP constraints hold. This result is shown through a structural lemma for mixtures of binary hypothesis tests. Lastly, the developed methodology is applied to approximate $f$-DP composition.
Authors:Pouya Mehdipour, Alexandre Miranda Alves, Gerardo Honorato, Mostafa Salarinoghabi
Abstract:
This paper presents a symmetric stream cipher that utilizes the dynamic properties of random cubic mappings in the complex plane to generate pseudo-random key streams. The system is based on the iterations of the random cubic polynomial $f_n(z)=z^3+c_n z$, where the parameters $c_n$ are chosen randomly from a disc of radius $δ$ and with center at the origin, aiming to improve the chaotic behaviour and, consequently, the randomness of the generated sequence. The stability of the Julia set under small parameter perturbations, when $δ< δ_0\simeq 0.89$, is considered to ensure key consistency in noisy environments, such as 5G networks. On the other hand, for $δ> 3$, the system exhibits instability and chaos, ideal for generating ultra-secure keys. The Python implementation integrates secure key derivation, robust key stream generation via warmed-up iteration, and an authenticated encryption scheme using the modern cryptographic primitives (\texttt{HKDF} and\texttt{HMAC-SHA-256}), to ensure message integrity and authenticity. Statistical analyses, including chi-square test and entropy calculation, are performed on the output of the key stream generator to evaluate its randomness and distribution. In addition, a complete statistical validation, compliant with \texttt{NIST SP 800-22} standards in modern cryptography, was performed to enhance the proposed system's credibility.
Authors:Matta Varun, Ajay Kumar Dhakar, Yuan Hong, Shamik Sural
Abstract:
Graph neural network (GNN) is a powerful tool for analyzing graph-structured data. However, their vulnerability to adversarial attacks raises serious concerns, especially when dealing with sensitive information. Local Differential Privacy (LDP) offers a privacy-preserving framework for training GNNs, but its impact on adversarial robustness remains underexplored. This paper investigates adversarial attacks on LDP-protected GNNs. We explore how the privacy guarantees of LDP can be leveraged or hindered by adversarial perturbations. The effectiveness of existing attack methods on LDP-protected GNNs are analyzed and potential challenges in crafting adversarial examples under LDP constraints are discussed. Additionally, we suggest directions for defending LDP-protected GNNs against adversarial attacks. This work investigates the interplay between privacy and security in graph learning, highlighting the need for robust and privacy-preserving GNN architectures.
Authors:Jahyeok Han, Donghyeok Le, Minseok Ryu, Syed Assad, Yong-Su Kim, Sunghyun Bae
Abstract:
We propose a frequency-division multiplexed (FDM) continuous-variable quantum key distribution (CV-QKD) system with enhanced spectral efficiency through dense multiplexing of low-symbol-rate signals. A four-channel 10-Mbaud FDM-CV-QKD system was experimentally demonstrated using Gaussian modulation, a transmitted local oscillator, and homodyne detection. Under a finite-size scenario (N = 10^7), the system achieved a 3.7-fold back-to-back secret key rate gain and outperformed the single-channel system for distances up to 41.1 km.
Authors:Bernardo Magri, Benjamin Marsh, Paul Gebheim
Abstract:
Modern cloud inference creates a two sided privacy problem where users reveal sensitive inputs to providers, while providers must execute proprietary model weights inside potentially leaky execution environments. Fully homomorphic encryption (FHE) offers cryptographic guarantees but remains prohibitively expensive for modern architectures. We argue that progress requires co-design where specializing FHE schemes/compilers for the static structure of inference circuits, while simultaneously constraining inference architectures to reduce dominant homomorphic cost drivers. We outline a meet in the middle agenda and concrete optimization targets on both axes.
Authors:Vicenç Torra, Maria Bras-Amorós
Abstract:
Memory poisoning attacks for Agentic AI and multi-agent systems (MAS) have recently caught attention. It is partially due to the fact that Large Language Models (LLMs) facilitate the construction and deployment of agents. Different memory systems are being used nowadays in this context, including semantic, episodic, and short-term memory. This distinction between the different types of memory systems focuses mostly on their duration but also on their origin and their localization. It ranges from the short-term memory originated at the user's end localized in the different agents to the long-term consolidated memory localized in well established knowledge databases. In this paper, we first present the main types of memory systems, we then discuss the feasibility of memory poisoning attacks in these different types of memory systems, and we propose mitigation strategies. We review the already existing security solutions to mitigate some of the alleged attacks, and we discuss adapted solutions based on cryptography. We propose to implement local inference based on private knowledge retrieval as an example of mitigation strategy for memory poisoning for semantic memory. We also emphasize actual risks in relation to interactions between agents, which can cause memory poisoning. These latter risks are not so much studied in the literature and are difficult to formalize and solve. Thus, we contribute to the construction of agents that are secure by design.
Authors:Kristján Orri Ragnarsson, Jacky Mallett
Abstract:
Industrial Control Systems (ICS), and many simple Internet of Things (IoT) devices, commonly communicate using unencrypted or unauthenticated protocols. For ICS this is an historical carryover since the introduction of these systems predated practical lightweight cryptography. As the processing power of small devices has grown exponentially at the same time as new, more efficient encryption algorithms have become available, end device encryption of communication protocols is becoming much more practical, but is still not widely used with ICS protocols such as Modbus and IEC61850 (GOOSE) which have tight requirements for both latency and variance. Newer micro-processors can also present challenges both to measurement and use, since features such as dynamic frequency scaling can significantly impact performance measurements. In this paper, we measured the time cost of adding encryption into the communication cycle of low-cost edge devices using ChaCha20-Poly1305, and show that in the worst case the encryption cycle took less than 7.1 percent of the latency requirements of Goose, and less than 3% for IEC-60834-1 on Raspberry PI 4, and an Intel N95 Mini PC, which is well within the specified latency requirements for these protocols.
Authors:Junade Ali, Chris Hicks
Abstract:
Existing cybersecurity literature lacks a source of empirical, representative data as to the true nature of cyberattacks on Critical National Infrastructure. We have obtained UK-wide data on incidents reported under the Network and Information Systems (NIS) Regulations in 2024 causing "a significant impact on the continuity" of essential services and comparator data from intelligence agencies. We find that 29% of NIS reports already concern cybersecurity incidents. As the UK Government seeks to extend cybersecurity reporting, we find the NIS Regulations are limited in their effectiveness; whilst our requests revealed 30 cybersecurity incidents reported under the NIS regulations, there were 89 incidents classified as "highly significant and significant" captured by the National Cyber Security Centre in the 2024 reporting year. Whereas 36% of Cybersecurity and Infrastructure Security Agency reported attacks concerned espionage, from NIS data we find 100% NIS-reportable cyberattacks concerning healthcare systems in England in 2024 were ransomware.
Authors:Pranay Anchuri, Matteo Campanelli, Paul Cesaretti, Rosario Gennaro, Tushar M. Jois, Hasan S. Kayman, Tugce Ozdemir
Abstract:
When large AI models are deployed as cloud-based services, clients have no guarantee that responses are correct or were produced by the intended model. Rerunning inference locally is infeasible for large models, and existing cryptographic proof systems -- while providing strong correctness guarantees -- introduce prohibitive prover overhead (e.g., hundreds of seconds per query for billion-parameter models). We present a verification framework and protocol that replaces full cryptographic proofs with a lightweight, sampling-based approach grounded in statistical properties of neural networks. We formalize the conditions under which trace separation between functionally dissimilar models can be leveraged to argue the security of verifiable inference protocols. The prover commits to the execution trace of inference via Merkle-tree-based vector commitments and opens only a small number of entries along randomly sampled paths from output to input. This yields a protocol that trades soundness for efficiency, a tradeoff well-suited to auditing, large-scale deployment settings where repeated queries amplify detection probability, and scenarios with rationally incentivized provers who face penalties upon detection. Our approach reduces proving times by several orders of magnitude compared to state-of-the-art cryptographic proof systems, going from the order of minutes to the order of milliseconds, with moderately larger proofs. Experiments on ResNet-18 classifiers and Llama-2-7B confirm that common architectures exhibit the statistical properties our protocol requires, and that natural adversarial strategies (gradient-descent reconstruction, inverse transforms, logit swapping) fail to produce traces that evade detection. We additionally present a protocol in the refereed delegation model, where two competing servers enable correct output identification in a logarithmic number of rounds.
Authors:Enrico Bottazzi, Pia Park
Abstract:
NDAI zones let inventor and investor agents negotiate inside a Trusted Execution Environment (TEE) where any disclosed information is deleted if no deal is reached. This makes full IP disclosure the rational strategy for the inventor's agent. Leveraging this infrastructure, however, requires agents to distinguish a secure environment from an insecure one, a capability LLM agents lack natively, since they can rely only on evidence passed through the context window to form awareness of their execution environment. We ask: How do different LLM models weight various forms of evidence when forming awareness of the security of their execution environment? Using an NDAI-style negotiation task across 10 language models and various evidence scenarios, we find a clear asymmetry: a failing attestation universally suppresses disclosure across all models, whereas a passing attestation produces highly heterogeneous responses: some models increase disclosure, others are unaffected, and a few paradoxically reduce it. This reveals that current LLM models can reliably detect danger signals but cannot reliably verify safety, the very capability required for privacy-preserving agentic protocols such as NDAI zones. Bridging this gap, possibly through interpretability analysis, targeted fine-tuning, or improved evidence architectures, remains the central open challenge for deploying agents that calibrate information sharing to actual evidence quality.
Authors:Haochen Zhao, Shaoyang Cui
Abstract:
Autonomous web agents such as \textbf{OpenClaw} are rapidly moving into high-impact real-world workflows, but their security robustness under live network threats remains insufficiently evaluated. Existing benchmarks mainly focus on static sandbox settings and content-level prompt attacks, which leaves a practical gap for network-layer security testing. In this paper, we present \textbf{ClawTrap}, a \textbf{MITM-based red-teaming framework for real-world OpenClaw security evaluation}. ClawTrap supports diverse and customizable attack forms, including \textit{Static HTML Replacement}, \textit{Iframe Popup Injection}, and \textit{Dynamic Content Modification}, and provides a reproducible pipeline for rule-driven interception, transformation, and auditing. This design lays the foundation for future research to construct richer, customizable MITM attacks and to perform systematic security testing across agent frameworks and model backbones. Our empirical study shows clear model stratification: weaker models are more likely to trust tampered observations and produce unsafe outputs, while stronger models demonstrate better anomaly attribution and safer fallback strategies. These findings indicate that reliable OpenClaw security evaluation should explicitly incorporate dynamic real-world MITM conditions rather than relying only on static sandbox protocols.
Authors:Sunyoung Kim, Hokeun Kim
Abstract:
Recent studies reveal gaps in delegating critical tasks to agentic AI that accesses websites on the user's behalf, primarily due to limited access control mechanisms on websites designed for agentic AI. In response, we propose a design of website-based interaction for AI agents with fine-grained access control for delegated critical tasks. Our approach encompasses a website design and implementation, as well as modifications to the access grant protocols in an open-source authorization service to tailor it to agentic AI, with delegated critical tasks on the website. The evaluation of our approach demonstrates the capabilities of our access-controlled website used by AI agents.
Authors:Yi Ting Shen, Kentaroh Toyoda, Alex Leung
Abstract:
The Model Context Protocol (MCP) introduces a structurally distinct attack surface that existing threat frameworks, designed for traditional software systems or generic LLM deployments, do not adequately cover. This paper presents MCP-38, a protocol-specific threat taxonomy consisting of 38 threat categories (MCP-01 through MCP-38). The taxonomy was derived through a systematic four-phase methodology: protocol decomposition, multi-framework cross-mapping, real-world incident synthesis, and remediation-surface categorization. Each category is mapped to STRIDE, OWASP Top 10 for LLM Applications (2025, LLM01--LLM10), and the OWASP Top 10 for Agentic Applications (2026, ASI01--ASI10). MCP-38 addresses critical threats arising from MCP's semantic attack surface (tool description poisoning, indirect prompt injection, parasitic tool chaining, and dynamic trust violations), none of which are adequately captured by prior work. MCP-38 provides the definitional and empirical foundation for automated threat intelligence platforms.
Authors:Enzo Fenoglio, Philip Treleaven
Abstract:
Federated computing (FC) enables collaborative computation such as machine learning, analytics, or data processing across distributed organizations keeping raw data local. Built on four architectural pillars, distributed data assets, federated services, standardized APIs, and decentralized services, FC supports sovereignty-preserving collaboration. However, federated systems spanning organizational and jurisdictional boundaries lack a portable mechanism for enforcing sovereignty-critical constraints. They often depend on runtime policy evaluation, shared trust infrastructure, or institutional agreements that introduce coordination overhead and provide limited cryptographic assurance. Federated Computing as Code (FCaC) is a declarative architecture that addresses this gap by compiling authority and delegation into cryptographically verifiable artifacts rather than relying on online policy interpretation. Boundary admission becomes a local verification step rather than a policy decision service. FCaC separates constitutional governance from procedural governance. Admission is validated locally at execution boundaries using proof-carrying capabilities, while stateful services may still implement post-admission controls such as ABAC, risk scoring, quotas, and workflow state. FCaC introduces Virtual Federated Platforms (VFPs), which combine Core, Business, and Governance contracts through a cryptographic trust chain: Key Your Organization (KYO), Envelope Capability Tokens (ECTs), and proof of possession (PoP). We demonstrate the approach in a proof-of-concept cross-silo federated learning workflow using MNIST as a surrogate workload to validate the admission mechanisms and release an open-source implementation showing envelope issuance, boundary verification, and envelope-triggered training.
Authors:Sandra Jaudou, Hélène Gasnier, Elias Boudjella, Marc Canève, Victoria Bloquert, Vasily Shenshin, Tilio Pilet, Sacha Gaucher, Soo Hyeon Kim, Philippe Gaborit, Gouenou Coatrieux, Matthieu Labousse, Anthony Genot, Yannick Rondelez
Abstract:
Secure communication is the cornerstone of modern infrastructures, yet achieving unconditional security -resistant to any computational attack- remains a fundamental challenge. The One-Time Pad (OTP), proven by Shannon to offer perfect secrecy, requires a shared random key as long as the message, used only once. However, distributing large keys over long distances has been impractical due to the lack of secure and scalable sharing options. Here, we introduce a DNA-based cryptographic primitive that leverages random pools of synthetic DNA to install a synchronized entropy source between distant parties. Our approach uses duplicated DNA molecules -comprising random index-payload pairs- as a shared secret. These molecules are locally sequenced and digitized to generate a common binary mask for OTP encryption, achieving unconditional security without relying on computational assumptions. We experimentally demonstrate this protocol between Tokyo and Paris, using in-house sequencing, generating a shared secret mask of $\sim$ 400 Mb with a residual error rate to achieve the usual overall decryption failure rate of $2^{-128}$. The min-entropy of the binary mask meets the most recent National Institute of Standards and Technology requirements (SP 800-90B), and is comparable to that of approved cryptographic random number generators. Critically, our system can resist two types of adversarial interference through molecular copy-number statistics, providing an additional layer of security reminiscent of Quantum Key Distribution, but without distance limitations. This work establishes DNA as a scalable entropy source for long-distance OTP, enabling high-throughput and secure communications in sensitive contexts. By bridging molecular biology and cryptography, DNA-based key distribution opens a promising new route toward unconditional security in global communication networks.
Authors:Taiwo Onitiju, Iman Vakilinia
Abstract:
Large Language Models increasingly power critical infrastructure from healthcare to finance, yet their vulnerability to adversarial manipulation threatens system integrity and user safety. Despite growing deployment, no comprehensive comparative security assessment exists across major LLM architectures, leaving organizations unable to quantify risk or select appropriately secure LLMs for sensitive applications. This research addresses this gap by establishing a standardized vulnerability assessment framework and developing a multi-layered defensive system to protect against identified threats. We systematically evaluate five widely-deployed LLM families GPT-4, GPT-3.5 Turbo, Claude-3 Haiku, LLaMA-2-70B, and Gemini-2.5-pro against 10,000 adversarial prompts spanning six attack categories. Our assessment reveals critical security disparities, with vulnerability rates ranging from 11.9\% to 29.8\%, demonstrating that LLM capability does not correlate with security robustness. To mitigate these risks, we develop a production-ready defensive framework achieving 83\% average detection accuracy with only 5\% false positives. These results demonstrate that systematic security assessment combined with external defensive measures provides a viable path toward safer LLM deployment in production environments.
Authors:Ioannis Konstantinidis, Ioannis Mavridis, Evangelos K. Markakis
Abstract:
Digital identity is shifting from service- and network-centric approaches toward user-centric ones that promise users increased control over their data. Despite their decentralised design, such approaches often reintroduce centralised components in different forms. This research explores this tension, i.e., the decentralisation paradox, and argues that user-centric architectures tend to redistribute rather than eliminate centralisation. Based on Critical Systems Thinking (CST), digital identity is framed as a "wicked problem" that spans across the technical, legal, social and ethical dimensions. The paper argues that understanding all these interdependencies is essential for designing reliable architectures and ensuring the next generation of digital identity goes beyond superficial decentralisation.
Authors:Viraj Panchal, Tanmay Talsaniya, Parag Patel, Meet Patel
Abstract:
We present KidsNanny, a two-stage multimodal content moderation architecture for child safety. Stage 1 combines a vision transformer (ViT) with an object detector for visual screening (11.7 ms); outputs are routed as text not raw pixels to Stage 2, which applies OCR and a text based 7B language model for contextual reasoning (120 ms total pipeline). We evaluate on the UnsafeBench Sexual category (1,054 images) under two regimes: vision-only, isolating Stage 1, and multimodal, evaluating the full Stage 1+2 pipeline. Stage 1 achieves 80.27% accuracy and 85.39% F1 at 11.7 ms; vision-only baselines range from 59.01% to 77.04% accuracy. The full pipeline achieves 81.40% accuracy and 86.16% F1 at 120 ms, compared to ShieldGemma-2 (64.80% accuracy, 1,136 ms) and LlavaGuard (80.36% accuracy, 4,138 ms). To evaluate text-awareness, we filter two subsets: a text+visual subset (257 images) and a text-only subset (44 images where safety depends primarily on embedded text). On text-only images, KidsNanny achieves 100% recall (25/25 positives; small sample) and 75.76% precision; ShieldGemma-2 achieves 84% recall and 60% precision at 1,136 ms. Results suggest that dedicated OCR-based reasoning may offer recall-precision advantages on text-embedded threats at lower latency, though the small text-only subset limits generalizability. By documenting this architecture and evaluation methodology, we aim to contribute to the broader research effort on efficient multimodal content moderation for child safety.
Authors:Yiming Lei, Qiannan Shen, Junhao Song
Abstract:
Financial fraud detection in transaction networks involves modeling sparse anomalies, dynamic patterns, and severe class imbalance in the presence of temporal drift in the data. In real-world transaction systems, a suspicious transaction is rarely isolated: rather, legitimate and suspicious transactions are often connected through accounts, intermediaries or through temporal transaction sequences. Attribute-based or randomly partitioned learning pipelines are therefore insufficient to detect relationally structured fraud. STC-MixHop, a graph-based framework combining spatial multi-resolution propagation with lightweight temporal consistency modeling for anomaly and fraud detection in dynamic transaction networks. It integrates three components: a MixHop-inspired multi-scale neighborhood diffusion encoder a multi-scale neighborhood diffusion MixHop-based encoder for learning structural patterns; a spatial-temporal attention module coupling current and preceding graph snapshots to stabilize representations; and a temporally informed self-supervised pretraining strategy exploiting unlabeled transaction interactions to improve representation quality. We evaluate the framework primarily on the PaySim dataset under strict chronological splits, supplementing the analysis with Porto Seguro and FEMA data to probe cross-domain component behavior. Results show that STC-MixHop is competitive among graph methods and achieves strong screening-oriented recall under highly imbalanced conditions. The experiments also reveal an important boundary condition: when node attributes are highly informative, tabular baselines remain difficult to outperform. Graph structure contributes most clearly where hidden relational dependencies are operationally important. These findings support a stability-focused view of graph learning for financial fraud detection.
Authors:Gautam Kumar, Ravi Sundaram, Shamik Sural
Abstract:
Over the years, access control systems have become increasingly more complex, often causing a disconnect between what is envisaged by the stakeholders in decision-making positions and the actual permissions granted as evidenced from access logs. For instance, Attribute-based Access Control (ABAC), which is a flexible yet complex model typically configured by system security officers, can be made understandable to others only when presented at a high level in natural language. Although several algorithms have been proposed in the literature for automatic extraction of ABAC rules from access logs, there is no attempt yet to bridge the semantic gap between the machine-enforceable formal logic and human-centric policy intent. Our work addresses this problem by developing a framework that generates human understandable natural language access control policies from logs. We investigate to what extent the power of Large Language Models (LLMs) can be harnessed to achieve both accuracy and scalability in the process. Named LANTERN (LLM-based ABAC Natural Translation and Explanation for Rule Navigation), we have instantiated the framework as a publicly accessible web based application for reproducibility of our results.
Authors:Dectot--Le Monnier de Gouville Esteban, Mohammad Hamdaqa, Moataz Chouchen
Abstract:
YARA has established itself as the de facto standard for "Detection as Code," enabling analysts and DevSecOps practitioners to define signatures for malware identification across the software supply chain. Despite its pervasive use, the open-source YARA ecosystem remains characterized by ad-hoc sharing and opaque quality. Practitioners currently rely on public repositories without empirical evidence regarding the ecosystem's structural characteristics, maintenance and diffusion dynamics, or operational reliability. We conducted a large-scale mixed-method study of 8.4 million rules mined from 1,853 GitHub repositories. Our pipeline integrates repository mining to map supply chain dynamics, static analysis to assess syntactic quality, and dynamic benchmarking against 4,026 malware and 2,000 goodware samples to measure operational effectiveness. We reveal a highly centralized structure where 10 authors drive 80% of rule adoption. The ecosystem functions as a "static supply chain": repositories show a median inactivity of 782 days and a median technical lag of 4.2 years. While static quality scores appear high (mean = 99.4/100), operational benchmarking uncovers significant noise (false positives) and low recall. Furthermore, coverage is heavily biased toward legacy threats (Ransomware), leaving modern initial access vectors (Loaders, Stealers) severely underrepresented. These findings expose a systemic "double penalty": defenders incur high performance overhead for decayed intelligence. We argue that public repositories function as raw data dumps rather than curated feeds, necessitating a paradigm shift from ad-hoc collection to rigorous rule engineering. We release our dataset and pipeline to support future data-driven curation tools.
Authors:Viet K. Nguyen, Nathan Lee, Mohammad Husain
Abstract:
Deep learning-based perception pipelines in autonomous ground vehicles are vulnerable to both adversarial manipulation and network-layer disruption. We present a systematic, on-hardware experimental evaluation of five attack classes: FGSM, PGD, man-in-the-middle (MitM), denial-of-service (DoS), and phantom attacks on low-cost autonomous vehicle platforms (JetRacer and Yahboom). Using a standardized 13-second experimental protocol and comprehensive automated logging, we systematically characterize three dimensions of attack behavior:(i) control deviation, (ii) computational cost, and (iii) runtime responsiveness. Our analysis reveals that distinct attack classes produce consistent and separable "fingerprints" across these dimensions: perception attacks (MitM output manipulation and phantom projection) generate high steering deviation signatures with nominal computational overhead, PGD produces combined steering perturbation and computational load signatures across multiple dimensions, and DoS exhibits frame rate and latency degradation signatures with minimal control-plane perturbation. We demonstrate that our fingerprinting framework generalizes across both digital attacks (adversarial perturbations, network manipulation) and environmental attacks (projected false features), providing a foundation for attack-aware monitoring systems and targeted, signature-based defense mechanisms.
Authors:Dingding Cao, Bianbian Jiao, Jingzong Yang, Yujing Zhong, Wei Yang
Abstract:
The high-frequency issuance and short-cycle speculation of meme tokens in decentralized finance (DeFi) have significantly amplified rug-pull risk. Existing approaches still struggle to provide stable early warning under scarce anomalies, incomplete labels, and limited interpretability. To address this issue, an end-to-end warning framework is proposed for BSC meme tokens, consisting of four stages: dataset construction and labeling, wash-trading pattern feature modeling, risk prediction, and error analysis. Methodologically, 12 token-level behavioral features are constructed based on three wash-trading patterns (Self, Matched, and Circular), unifying transaction-, address-, and flow-level signals into risk vectors. Supervised models are then employed to output warning scores and alert decisions. Under the current setting (7 tokens, 33,242 records), Random Forest outperforms Logistic Regression on core metrics, achieving AUC=0.9098, PR-AUC=0.9185, and F1=0.7429. Ablation results show that trade-level features are the primary performance driver (Delta PR-AUC=-0.1843 when removed), while address-level features provide stable complementary gain (Delta PR-AUC=-0.0573). The model also demonstrates actionable early-warning potential for a subset of samples, with a mean Lead Time (v1) of 3.8133 hours. The error profile (FP=1, FN=8) indicates that the current system is better positioned as a high-precision screener rather than a high-recall automatic alarm engine. The main contributions are threefold: an executable and reproducible rug-pull warning pipeline, empirical validation of multi-granularity wash-trading features under weak supervision, and deployment-oriented evidence through lead-time and error-bound analysis.
Authors:Kartikeya Sharma, Craig Jacobik
Abstract:
In light of rising cybersecurity threats, data center providers face growing pressure to protect their own management infrastructure from Distributed Denial-of-Service (DDoS) attacks. While tenant-managed cages generally fall outside the data center's direct security purview, a successful DDoS assault on core provider systems can indirectly disrupt network services. To address this availability assault, the authors developed a Graph Neural Network (GNN) based detection system which leverages Graph U-Nets to automatically classify and mitigate DDoS traffic. Although the model was developed using open-source network flows rather than proprietary data center logs, the model effectively identifies multi-layer DDoS attacks that resemble the malicious patterns threatening modern data centers. Adopting this system to data center environments requires minimal changes to existing operational workflows and processes. Specifically, the GNN based system can be integrated at critical areas within a data center's network infrastructure. Our model achieved an F1 score of over 95% when evaluated on various open-source datasets, significantly reducing the likelihood of service disruptions and reputational damage. This Graph U-Nets architecture delivers unprecedented precision (98.5%) in complex cloud environments, thereby helping data center operators uphold reliable service availability and increase customer trust and goodwill in an era of increasingly sophisticated cyber threats.
Authors:Arjun Chakraborty, Sandra Ho, Adam Cook, Manuel Meléndez
Abstract:
CTI-REALM (Cyber Threat Real World Evaluation and LLM Benchmarking) is a benchmark designed to evaluate AI agents' ability to interpret cyber threat intelligence (CTI) and develop detection rules. The benchmark provides a realistic environment that replicates the security analyst workflow. This enables agents to examine CTI reports, execute queries, understand schema structures, and construct detection rules. Evaluation involves emulated attacks of varying complexity across Linux systems, cloud platforms, and Azure Kubernetes Service (AKS), with ground truth data for accurate assessment. Agent performance is measured through both final detection results and trajectory-based rewards that capture decision-making effectiveness. This work demonstrates the potential of AI agents to support labor-intensive aspects of detection engineering. Our comprehensive evaluation of 16 frontier models shows that Claude Opus 4.6 (High) achieves the highest overall reward (0.637), followed by Claude Opus 4.5 (0.624) and the GPT-5 family. An ablation study confirms that CTI-specific tools significantly improve agent performance, a variance analysis across repeated runs demonstrates result stability. Finally, a memory augmentation study shows that seeded context can close 33\% of the performance gap between smaller and larger models.
Authors:Subhadip Rana, Sanku Paul, Mrinal Kanti Mandal
Abstract:
In the era of digitization secure transmission of digital images has become essential in real world applications. Image encryption is an effective technique for protecting image data from unauthorized access. The security of encrypted data strongly depends on the quality of the random numbers used as the encryption key. In this paper, we proposed a hybrid random number generator based on quantum fluctuations and an algorithmically inspired rotating wheel. The wheel contains integer values from 0 to 255 that are shuffled using quantum fluctuations generated by time-evolving the quantum kicked rotor model. There are four pre-defined tapping positions in the rotating wheel to collect the number sequences. The wheel rotation speed is dynamically varied after each set of tapping to enhance unpredictability. The entropy of the number sequence obtained from the rotating wheel attains the ideal value of 8 (in an 8 bit representation). Further, the generated number sequences exhibit a flat histogram and nearly zero correlation, indicating strong randomness. The generated sequences are applied to the image encryption and analyzed cryptographically. Experimental results demonstrate a near ideal entropy of 7.997, an NPCR of 99.60%, low correlation in all directions, and low PSNR for encrypted images. These results confirm that the proposed random number generator achieves efficient and high-security performance, making it suitable for the security of consumer applications such as mobile healthcare imaging, biometric authentication, QR-based and multimedia communication on smart devices.
Authors:Avinash Laddha, Danil Mikhailov, Uyi Stewart
Abstract:
We present a technical case study on the Privacy-Enhancing Technologies (PETs) for Public Health Challenge, a collaborative effort to safely leverage sensitive private sector data for social impact, specifically pandemic management. The project utilized Differential Privacy (DP) to create realistic, privacy-preserved synthetic financial transaction data, which was then combined with public health and mobility datasets. This approach successfully addressed the critical hurdle of sharing sensitive financial information for research and policy. The analysis demonstrated that this synthetic, DP-protected data possesses significant spatial-temporal and predictive power for public health. Key outcomes include the development of six reusable tools and frameworks supporting diagnostic nowcasting (e.g., Hotspot Detection, Pandemic Adherence Monitoring) and predictive forecasting (e.g., Mobility Analysis, Contact Matrix Estimation) for epidemiological decision-making. The study provides best practices for advancing data sharing in a privacy-compliant manner.
Authors:Darren Cheng, Wen-Kwang Tsao
Abstract:
Prompt injection remains one of the most practical attack vectors against LLM-integrated applications. We replicate the Microsoft LLMail-Inject benchmark (Greshake et al., 2024) against current generation models running inside OpenClaw, an open source multitool agent platform. Our proposed defense combines two mechanisms: agent isolation, implemented as a privilege separated two-agent pipeline with tool partitioning, and JSON formatting, which produces structured output that strips persuasive framing before the action agent processes it. We run four experiments on the same 649 attacks that succeeded against our single-agent baseline. The full pipeline achieves 0 percent attack success rate (ASR) on the evaluated benchmark. Agent isolation alone achieves 0.31 percent ASR, approximately 323 times lower than the baseline. JSON formatting alone achieves 14.18 percent ASR, about 7.1 times lower. Our ablation study confirms that agent isolation is the dominant mechanism. JSON formatting provides additional hardening but is not sufficient on its own. The defense is structural: the action agent never receives raw injection content regardless of model behavior on any individual input.
Authors:Rodrigo Tertulino, Laércio Alencar
Abstract:
Accurate cardiovascular risk prediction is crucial for preventive healthcare; however, the development of robust Artificial Intelligence (AI) models is hindered by the fragmentation of clinical data across institutions due to stringent privacy regulations. This paper presents a comprehensive architectural case study validating the engineering robustness of FedCVR, a privacy-preserving Federated Learning framework applied to heterogeneous clinical networks. Rather than proposing a new theoretical optimizer, this work focuses on a systems engineering analysis to quantify the operational trade-offs of server-side adaptive optimization under utility-prioritized Differential Privacy (DP). By conducting a rigorous stress test in a high-fidelity synthetic environment calibrated against real-world datasets (Framingham, Cleveland), we systematically evaluate the system's resilience to statistical noise. The validation results demonstrate that integrating server-side momentum as a temporal denoiser allows the architecture to achieve a stable F1-score of 0.84 and an Area Under the Curve (AUC) of 0.96, statistically outperforming standard stateless baselines. Our findings confirm that server-side adaptivity is a structural prerequisite for recovering clinical utility under realistic privacy budgets, providing a validated engineering blueprint for secure multi-institutional collaboration.
Authors:Nasim Abdirahman Ismail, Enis Karaarslan
Abstract:
High-frequency banking environments face a critical trade-off between low-latency fraud detection and the regulatory explainability demanded by GDPR. Traditional rule-based and discriminative models struggle with "zero-day" attacks due to extreme class imbalance and the lack of historical precedents. This paper proposes a Dual-Path Generative Framework that decouples real-time anomaly detection from offline adversarial training. The architecture employs a Variational Autoencoder (VAE) to establish a legitimate transaction manifold based on reconstruction error, ensuring <50ms inference latency. In parallel, an asynchronous Wasserstein GAN with Gradient Penalty (WGAN-GP) synthesizes high-entropy fraudulent scenarios to stress-test the detection boundaries. Crucially, to address the non-differentiability of discrete banking data (e.g., Merchant Category Codes), we integrate a Gumbel-Softmax estimator. Furthermore, we introduce a trigger-based explainability mechanism where SHAP (Shapley Additive Explanations) is activated only for high-uncertainty transactions, reconciling the computational cost of XAI with real-time throughput requirements.
Authors:Matthew Butler, Yi Fan, Christos Faloutsos
Abstract:
The proposed method (FraudFox) provides solutions to adversarial attacks in a resource constrained environment. We focus on questions like the following: How suspicious is `Smith', trying to buy \$500 shoes, on Monday 3am? How to merge the risk scores, from a handful of risk-assessment modules (`oracles') in an adversarial environment? More importantly, given historical data (orders, prices, and what-happened afterwards), and business goals/restrictions, which transactions, like the `Smith' transaction above, which ones should we `pass', versus send to human investigators? The business restrictions could be: `at most $x$ investigations are feasible', or `at most \$$y$ lost due to fraud'. These are the two research problems we focus on, in this work. One approach to address the first problem (`oracle-weighting'), is by using Extended Kalman Filters with dynamic importance weights, to automatically and continuously update our weights for each 'oracle'. For the second problem, we show how to derive an optimal decision surface, and how to compute the Pareto optimal set, to allow what-if questions. An important consideration is adaptation: Fraudsters will change their behavior, according to our past decisions; thus, we need to adapt accordingly. The resulting system, \method, is scalable, adaptable to changing fraudster behavior, effective, and already in \textbf{production} at Amazon. FraudFox augments a fraud prevention sub-system and has led to significant performance gains.
Authors:Nurullah Demir, Yash Vekaria, Georgios Smaragdakis, Zakir Durumeric
Abstract:
Application programming interfaces (APIs) have become a central part of the modern IT environment, allowing developers to enrich the functionality of applications and interact with third parties such as cloud and payment providers. This interaction often occurs through authentication mechanisms that rely on sensitive credentials such as API keys and tokens that require secure handling. Exposure of these credentials can pose significant consequences to organizations, as malicious attackers can gain access to related services. Previous studies have shown exposure of these sensitive credentials in different environments such as cloud platforms and GitHub. However, the web remains unexplored. In this paper, we study exposure of credentials on the web by analyzing 10M webpages. Our findings reveal that API credentials are widely and publicly exposed on the web, including highly popular and critical webpages such as those of global banks and firmware developers. We identify 1,748 distinct credentials from 14 service providers (e.g., cloud and payment providers) across nearly 10,000 webpages. Moreover, our analysis of archived data suggest credentials to remain exposed for periods ranging from a month to several years. We characterize web-specific exposure vectors and root causes, finding that most originate from JavaScript environments. We also discuss the outcomes of our responsible disclosure efforts that demonstrated a substantial reduction in credential exposure on the web.
Authors:Alicia Pang, Katsiaryna Labunets, Olga Gadyatskaya
Abstract:
Applications like Enterprise Resource Planning (ERP) systems have become an indispensable part of the corporate digital infrastructure. These systems store sensitive data about customers, suppliers, and employees, and thus companies have to process these data in accordance with applicable regulations like the GDPR (the EU General Data Protection Regulation). This can be challenging due to a variety of reasons. For example, prior research has shown that developers sometimes lack knowledge about privacy. In this work, we focus on privacy in ERP systems in the context of an international consultancy firm. We investigate the privacy awareness regarding privacy-by-design and data minimization of two important populations: developers of ERP systems and managers and consultants responsible for services related to ERP systems. Applying thematic analysis, we elicit privacy behavioral models of these two populations using Fogg's Behavioral Model (FBM) framework. Our findings provide a means to stimulate more adequate privacy-related behaviors for developers and consultants.
Authors:Antoine Mallet, Patrick Bas
Abstract:
This paper investigates the detectability of popular imagein-image steganography schemes [1, 2, 3, 4, 5]. In this paradigm, the payload is usually an image of the same size as the Cover image, leading to very high embedding rates. We first show that the embedding yields a mixing process that is easily identifiable by independent component analysis. We then propose a simple, interpretable steganalysis method based on the first four moments of the independent components estimated from the wavelet decomposition of the images, which are used to distinguish between the distributions of Cover and Stego components. Experimental results demonstrate the efficiency of the proposed method, with eight-dimensional input vectors attaining up to 84.6% accuracy. This vulnerability analysis is supported by two other facts: the use of keyless extraction networks and the high detectability w.r.t. classical steganalysis methods, such as the SRM combined with support vector machines, which attains over 99% accuracy.
Authors:James Bartusek, Eli Goldin
Abstract:
We construct unclonable encryption (UE) in the Haar random oracle model, where all parties have query access to $U,U^\dagger,U^*,U^T$ for a Haar random unitary $U$. Our scheme satisfies the standard notion of unclonable indistinguishability security, supports reuse of the secret key, and can encrypt arbitrary-length messages. That is, we give the first evidence that (reusable) UE, which requires computational assumptions, exists in "micocrypt", a world where one-way functions may not exist. As one of our central technical contributions, we build on the recently introduced path recording framework to prove a natural ``unitary reprogramming lemma'', which may be of independent interest.
Authors:Xiangwen Wang, Ananth Balashankar, Varun Chandrasekaran
Abstract:
Large language models remain vulnerable to jailbreak attacks, yet we still lack a systematic understanding of how jailbreak success scales with attacker effort across methods, model families, and harm types. We initiate a scaling-law framework for jailbreaks by treating each attack as a compute-bounded optimization procedure and measuring progress on a shared FLOPs axis. Our systematic evaluation spans four representative jailbreak paradigms, covering optimization-based attacks, self-refinement prompting, sampling-based selection, and genetic optimization, across multiple model families and scales on a diverse set of harmful goals. We investigate scaling laws that relate attacker budget to attack success score by fitting a simple saturating exponential function to FLOPs--success trajectories, and we derive comparable efficiency summaries from the fitted curves. Empirically, prompting-based paradigms tend to be the most compute-efficient compared to optimization-based methods. To explain this gap, we cast prompt-based updates into an optimization view and show via a same-state comparison that prompt-based attacks more effectively optimize in prompt space. We also show that attacks occupy distinct success--stealthiness operating points with prompting-based methods occupying the high-success, high-stealth region. Finally, we find that vulnerability is strongly goal-dependent: harms involving misinformation are typically easier to elicit than other non-misinformation harms.
Authors:S. -L. Ng, M. B. Paterson, E. A. Quaglia
Abstract:
In a content delivery network (CDN), resources are strained during peak-time and underutilised in off-peak times when supplying digital content to users. Caching can help balance this. At the off-peak time some content is delivered to users' local caches. During peak time, the use of cached data to serve users' requests relieves strain on the network by reducing repeated transfer of popular content. In \emph{coded caching}, the cache content placement is designed in conjunction with the delivery techniques to optimise network throughput. Since dissemination of information, as well as the delivery of entertainment, is reliant on CDNs, the security and privacy of cache placement, user demand, and content delivery, are paramount. In much of the literature in \emph{secure coded caching}, security is built on top of solutions that have efficiency in mind, and most current proposals focus on the security of individual parts of the process. A lack of a unifying network model also makes it difficult to compare or combine solutions. In this survey we analyse the security and privacy requirements of secure coded caching, and evaluate existing schemes in terms of the security provided and the cost of this security provision. We also review the techniques used to achieve secure coded caching and analyse their limitations. In addition, we contextualise secure coded caching in the landscape of other secure content delivery primitives. As a result, we identify and prioritise open security and privacy challenges for the future.
Authors:Sanghyeon Park, DeukJae Cho, Pyo-Woong Son
Abstract:
Global Navigation Satellite System (GNSS) spoofing and jamming threaten maritime navigation by corrupting positions from Automatic Identification System (AIS) transponders. Crucially, raw AIS messages contain communication-layer defects (duplicated MMSIs, timestamp errors, stale retransmissions, and multi-station rebroadcast delays) that can mimic spoofing or jamming. Thus, AIS positions are unreliable without pre-filtering. We propose a three-stage AIS-based framework that (1) uses rule-based diagnostics to discard communication faults, (2) applies an interacting multiple model filter and transmission-interval analysis to extract kinematic-consistency and continuity anomalies, and (3) applies spatiotemporal DBSCAN to group anomalies by multi-vessel coherence and temporal persistence and classify them as sensor faults, spoofing, or jamming. Tested on approximately 966 million AIS messages from Korean coastal waters, the framework detected 17 spoofing and 343 jamming clusters and reduced false alarms by 98.6% relative to naive clustering. These results show that, after rigorous pre-filtering, AIS data can enable wide-area GNSS interference detection without dedicated sensors.
Authors:Wenting Song, K. Suzanne Barber
Abstract:
Online social networks facilitate user engagement and information sharing but are also rife with misinformation and deception. Research on trust modeling in online social networks focuses on developing computational models or algorithms to measure trust relationships, assess the reliability of shared content, and detect spam or malicious activities. However, most existing review papers either briefly mention the concept of trust or focus on a single category of trust models. In this paper, we offer a comprehensive categorization and review of state-of-the-art trust models developed for online social networks. First, we explore theories and models related to trust in psychology and identify several factors that influence the formation and evolution of online trust. Next, state-of-the-art trust models are categorized based on their algorithmic foundations. For each category, the modeling mechanisms are investigated, and their unique contributions to quantitative trust modeling are highlighted. Subsequently, we provide an implementation-centric trust modeling handbook, which summarizes available datasets, trust-related features, promising modeling techniques, and feasible application scenarios. Finally, the findings of the literature review are summarized, and unresolved challenges are discussed.
Authors:David Gómez-Cambronero, Daniel Munteanu, Ana Isabel González-Tablas
Abstract:
In this paper, we present a laboratory study focused on the impact of post-quantum cryptography (PQC) algorithms on multiple layers of stateful HTTP over TLS transactions: the TCP handshake, the intermediate TCP-TLS layer, the TLS handshake, the intermediate TLS layer, and the HTTP application layer. To this end, we propose a laboratory architecture that emulates a real-world setup in which a load test of up to 100 transactions per second is sent to a load balancer, which in turn forwards them to a backend server that returns the responses. Each set of tests is executed using the TLS 1.3 key exchange groups as follows: traditional (or non-PQC), hybrid PQC and pure PQC. Each set of tests also varied the backend response size. Across more than thirty experiments, we performed data reduction and statistical analysis for each layer, to determine the specific impact of each algorithm (PQC and traditional) at every stage of the HTTP-over-TLS transaction.
Authors:Manuel Wiesinger, Daniel Dorfmeister, Stefan Brunthaler
Abstract:
Vulnerabilities emanating from DRAM errors pose a vexing problem that remains, as of yet, unsolved and elusive but cannot be ignored. Prior defenses focused on specific details of early RowHammer attacks and fail to generalize with the generalizations of recent RowHammer attacks. Even worse, it is presently not clear that techniques from prior defenses will be able to cope with these generalizations or if an entirely new approach is required. Although still work-in-progress, we have identified a new approach that combines memory allocation with principles underlying software diversity and shows promising early results. At first glance, software diversity seems to be an unlikely contender, since it faces seemingly insurmountable obstacles, primarily the lack of sufficient entropy in memory subsystems. Our system - called MAD, short for memory allocation diversity - leverages two novel, complementary spatial diversification techniques to overcome this entropy obstacle. Entropy aside, MAD offers ease-of-implementation, negligible performance impact, and is both hardware and software agnostic. From a security perspective, MAD's goal is to deter RowHammer attacks by delaying them to the maximum extent possible. Such a delay opens the door for a variety of additional responses, e.g., proactive rebooting, or complementary in-depth analysis of ongoing attacks that would be too slow for an always-on defense.
Authors:Jesse Yu, Nicholas Wei
Abstract:
As open-weights generative AI rapidly proliferates, the ability to synthesize hyper-realistic media has introduced profound challenges to digital trust. Automated disinformation and AI-generated imagery have made robust digital provenance a critical cybersecurity imperative. Currently, state-of-the-art invisible watermarks operate within one of two primary mathematical manifolds: the spatial domain (post-generation pixel embedding) or the latent domain (pre-generation frequency embedding). While existing literature frequently evaluates these models against isolated, classical distortions, there is a critical lack of rigorous, comparative benchmarking against modern generative AI editing tools. In this study, we empirically evaluate two leading representative paradigms, RivaGAN (Spatial) and Tree-Ring (Latent), utilizing an automated Attack Simulation Engine across 30 intensity intervals of geometric and generative perturbations. We formalize an "Adversarial Evasion Region" (AER) framework to measure cryptographic degradation against semantic visual retention (OpenCLIP > 75.0). Our statistical analysis ($n=100$ per interval, $MOE = \pm 3.92\%$) reveals that these domains possess mutually exclusive, mathematically orthogonal vulnerabilities. Spatial watermarks experience severe cryptographic degradation under algorithmic pixel-rewriting (exhibiting a 67.47% AER evasion rate under Img2Img translation), whereas latent watermarks exhibit profound fragility against geometric misalignment (yielding a 43.20% AER evasion rate under static cropping). By proving that single-domain watermarking is fundamentally insufficient against modern adversarial toolsets, this research exposes a systemic vulnerability in current digital provenance standards and establishes the foundational exigence for future multi-domain cryptographic architectures.
Authors:Shriti Priya, Julian James Stephen, Arjun Natarajan
Abstract:
Enterprises and organizations today increasingly deploy in-house, cloud based applications and APIs for internal operations or external customers. These deployments deal with increasing number of threats, despite security features offered by cloud service providers. This work focus on threats that exploit application layer vulnerabilities of cloud workloads. Prevention and mitigation measures against such threats need to be cognizant of application semantics, posing a hurdle to existing solutions. In this work, we design and implement a security framework that allow cloud workload administrators to easily define and enforce policies capable of preventing (i) unrestricted resource consumption, (ii) unrestricted access to sensitive business flows, and (iii) broken authentication. Our framework, Paladin, leverages large language models to extract sufficient semantic meaning from API requests to provide cloud administrators with an application agnostic policy definition interface. Once defined, requests are automatically matched with relevant policies and enforced by high performance proxies. Evaluations with our prototype show that such a framework has broad applicability across applications, good policy identification accuracy, and reasonable overheads, making it substantially easier to define and enforce cross application policies.
Authors:Nikitha M. Palaniappan, Ying He
Abstract:
Considering the rise of cyberattacks incidents worldwide, the need to ensure stronger passwords is necessary. Developing a password strength meter (PSM) can help users create stronger passwords when creating an account on an online platform. This research aimed to explore whether incorporating a non-English training dataset (specifically Indian) can improve the performance of a PSM. Findings show that PSMs can be improved by utilising learning of words from other languages. Another contribution of the research was to compare and provide an analysis of AI generated data (specifically by ChatGPT) and PassGAN (existing state-of-the-art model), proving that PassGAN-like tools may no longer be needed as the performance is higher using AI generated data. To further strengthen detection, a Jaro similarity-based matching mechanism was incorporated, enabling the classification of passwords that are highly similar to known weak passwords - this addresses limitations of direct matching techniques used in prior work. A final novel contribution is on developing a PSM tailored for Indian passwords, which has not been developed previously - this resulted in a near-perfect matching accuracy using a Jaro function value of 0.5. Although performance improvements were constrained by limited data and training, results suggest that using the ChatGPT dataset is a viable and effective strategy for developing secure, language-aware password strength meters.
Authors:Ailiya Borjigin, Igor Stadnyk, Ben Bilski, Serhii Hovorov, Sofiia Pidturkina
Abstract:
OpenClaw-style agent stacks turn language into privileged execution: LLM intents flow through tool interception, policy gates, and a local executor. In parallel, skill marketplaces such as skills.sh make capability acquisition as easy as installing skills and CLIs, creating a growing capability supply chain. Together, these trends shift the dominant safety failure mode from "wrong answers" to execution-induced loss, where untrusted prompts, compromised skills, or narrative manipulation can trigger real trades and irreversible side effects. We propose Survivability-Aware Execution (SAE), an execution-layer survivability standard for OpenClaw-style systems and skill-enabled agents. SAE sits as middleware between a strategy engine (LLM or non-LLM) and the exchange executor. It defines an explicit execution contract (ExecutionRequest, ExecutionContext, ExecutionDecision) and enforces non-bypassable last-mile invariants: projection-based exposure budgets, cooldown and order-rate limits, slippage bounds, staged execution, and tool/venue allowlists. To make delegated execution testable under supply-chain risk, we operationalize the Delegation Gap (DG) via a logged Intended Policy Spec that enables deterministic out-of-scope labeling and reproducible DG metrics. On an offline replay using official Binance USD-M BTCUSDT/ETHUSDT perpetual data (15m; 2025-09-01--2025-12-01, incl. funding), SAE improves survivability: MDD drops from 0.4643 to 0.0319 (Full; 93.1%), |CVaR_0.99| shrinks from 4.025e-3 to ~1.02e-4 (~97.5%), and DG loss proxy falls from 0.647 to 0.019 (~97.0%). AttackSuccess decreases from 1.00 to 0.728 with zero FalseBlock in this run. Block bootstrap, paired Wilcoxon, and two-proportion tests confirm the shifts. SAE reframes agentic trading safety for the OpenClaw+skills era: treat upstream intent and skills as untrusted, and enforce survivability where actions become side effects.
Authors:Edibe Yilmaz, Kahraman Kostas
Abstract:
The integration of large language models (LLMs) into educational processes introduces significant constraints regarding data privacy and reliability, particularly in pedagogically vulnerable contexts such as Turkish heritage language education. This study aims to systematically evaluate the robustness and pedagogical safety of locally deployable offline LLMs within the context of Turkish heritage language education. To this end, a Turkish Anomaly Suite (TAS) consisting of 10 original edge-case scenarios was developed to assess the models' capacities for epistemic resistance, logical consistency, and pedagogical safety. Experiments conducted on 14 different models ranging from 270M to 32B parameters reveal that anomaly resistance is not solely dependent on model scale and that sycophancy bias can pose pedagogical risks even in large-scale models. The findings indicate that reasoning-oriented models in the 8B--14B parameter range represent the most balanced segment in terms of cost-safety trade-off for language learners.
Authors:Giacomo Borin, Luca De Feo, Guido Maria Lido, Sina Schaeffler
Abstract:
We explore the use of level structures to generalize the SQIsign signature scheme. We give a general framework where, given the public key and the commitment, the challenge is to exhibit an isogeny between them with an additional requirement, namely to map a chosen level structure to another. We then instantiate the framework using 1-dimensional and 2-dimensional isogenies. In doing that we provide a new explicit Deuring correspondence for supersingular elliptic curves with level structures and solve new constrained norm equations.
Authors:Willie Kouam, Stefan Rass
Abstract:
The rapid expansion of Internet use has increased system exposure to cyber threats, with advanced persistent threats (APTs) being especially challenging due to their stealth, prolonged duration, and multi-stage attacks targeting high-value assets. In this study, we model APT evolution as a strategic interaction between an attacker and a defender on an attack graph. With limited information about the attacker's position and progress, the defender acts at random intervals by deploying intrusion detection sensors across the network. Once a compromise is detected, affected components are immediately secured through measures such as backdoor removal, patching, or system reconfiguration. Meanwhile, the attacker begins with reconnaissance and then proceeds through the network, exploiting vulnerabilities and installing backdoors to maintain persistent access and adaptive movement. Furthermore, the attacker may take several steps between consecutive defensive operations, resulting in an asymmetric temporal dynamic. The defender's goal is to reduce the likelihood that the attacker will gain access to a critical asset, whereas the attacker's purpose is to increase this likelihood. We investigate this interaction under three informational regimes, reflecting varying levels of attacker knowledge prior to action: (i) a Stackelberg scenario, in which the attacker has full knowledge of the defender's strategy and can optimize accordingly; (ii) a blind regime, where the attacker has no information and assumes uniform beliefs about defensive deployments; and (iii) a belief-based framework, where the attacker holds accurate probabilistic beliefs about the defender's actions. For each regime, we derive optimal defensive strategies by solving the corresponding optimization problems.
Authors:Jia Hu, Youcheng Sun, Pierre Olivier
Abstract:
Software compartmentalization breaks down an application into compartments isolated from each other: an attacker taking over a compartment will be confined to it, limiting the damage they can cause to the rest of the application. Despite the security promises of this approach, recent studies have shown that most existing compartmentalized software is plagued by vulnerabilities at cross-compartment interfaces, allowing an attacker taking over a compartment to escape its confinement and negate the security guarantees expected from compartmentalization. In that context, securing cross-compartment interfaces is notoriously difficult and engineering-intensive. In light of recent advances in Automated Program Repair (APR), notably through the use of Large Language Models (LLMs), this paper presents a work in progress investigating the suitability of LLM-based APR at securing cross-compartment interfaces as automatically as possible. We observe that existing APR approaches and general purpose/code-centric LLMs used as is are unfit for this task, and present the design, implementation, and early results of a new APR framework dedicated to compartment interface safety. The framework integrates into a feedback loop 1) a specialized fuzzer uncovering cross-compartment interface vulnerabilities; 2) a patch generation component bridging the lack of compartmentalization awareness of existing LLMs with a series of analysis techniques; and 3) a patch validation component assessing the effectiveness of generated vulnerability fixes. We validate our framework over a sample interface vulnerability, comparing it to a naive use of general-purpose LLMs, and discuss future research avenues.
Authors:Peaker Guo, Rayne Holland, Hao Wu
Abstract:
Given a dataset of $n$ user-contributed strings, each of length at most $\ell$, a key problem is how to identify all frequent substrings while preserving each user's privacy. Recent work by Bernardini et al. (PODS'25) introduced a $\varepsilon$-differentially private algorithm achieving near-optimal error, but at the prohibitive cost of $O(n^2\ell^4)$ space and processing time. In this work, we present a new $\varepsilon$-differentially private algorithm that retains the same near-optimal error guarantees while reducing space complexity to $O(n \ell+ |Σ| )$ and time complexity to $O(n \ell\log |Σ| + |Σ| )$, for input alphabet $Σ$. Our approach builds on a top-down exploration of candidate substrings but introduces two new innovations: (i) a refined candidate-generation strategy that leverages the structural properties of frequent prefixes and suffixes, and (ii) pruning of the search space guided by frequency relations. These techniques eliminate the quadratic blow-ups inherent in prior work, enabling scalable frequent substring mining under differential privacy.
Authors:George Edwards, Mahdi Eslamimehr
Abstract:
The weaponization of LLMs for automated malware generation poses an existential threat to conventional detection paradigms. AI-generated malware exhibits polymorphic, metamorphic, and context-aware evasion capabilities that render signature-based and shallow heuristic defenses obsolete. This paper introduces a novel hybrid analysis framework that synergistically combines \emph{concolic execution} with \emph{LLM-augmented path prioritization} and \emph{deep-learning-based vulnerability classification} to detect zero-day AI-generated malware with provable guarantees. We formalize the detection problem within a first-order temporal logic over program execution traces, define a lattice-theoretic abstraction for path constraint spaces, and prove both the \emph{soundness} and \emph{relative completeness} of our detection algorithm, assuming classifier correctness. The framework introduces three novel algorithms: (i) an LLM-guided concolic exploration strategy that reduces the average number of explored paths by 73.2\% compared to depth-first search while maintaining equivalent malicious-path coverage; (ii) a transformer-based path-constraint classifier trained on symbolic execution traces; and (iii) a feedback loop that iteratively refines the LLM's prioritization policy using reinforcement learning from detection outcomes. We provide a comprehensive implementation built upon \texttt{angr} 9.2, \texttt{Z3} 4.12, Hugging Face Transformers 4.38, and PyTorch 2.2, with configuration details enabling reproducibility. Experimental evaluation on the EMBER, Malimg, SOREL-20M, and a novel AI-Gen-Malware benchmark comprising 2{,}500 LLM-synthesized samples demonstrates that achieves 98.7\% accuracy on conventional malware and 97.5\% accuracy on AI-generated threats, outperforming ClamAV, YARA, MalConv, and EMBER-GBDT baselines by margins of 8.4--52.2 percentage points on AI-generated samples.
Authors:Tam Nguyen, Moses Ndebugre, Dheeraj Arremsetty
Abstract:
Multi-agent artificial intelligence systems or MAS are systems of autonomous agents that exercise delegated tool authority, share persistent memory, and coordinate via inter-agent communication. MAS introduces qualitatively distinct security vulnerabilities from those documented for singular AI models. Existing security and governance frameworks were not designed for these emerging attack surfaces. This study systematically characterizes the threat landscape of MAS and quantitatively evaluates 16 security frameworks for AI against it. A four-phase methodology is proposed: constructing a deep technical knowledge base of production multi-agent architectures; conducting generative AI-assisted threat modeling scoped to MAS cybersecurity risks and validated by domain experts; structuring survey plans at individual-threat granularity; and scoring each framework on a three-point scale against the cybersecurity risks. The risks were organized into 193 distinct main threat items across nine risk categories. The expected minimal average score is 2. No reviewed framework achieves majority coverage of any single category. Non-Determinism (mean score 1.231 across all 16 frameworks) and Data Leakage (1.340) are the most under-addressed domains. The OWASP Agentic Security Initiative leads overall at 65.3\% coverage and in the design phase; the CDAO Generative AI Responsible AI Toolkit leads in development and operational coverage. These results provide the first empirical cross-framework comparison for MAS security and offer evidence-based guidance for framework selection.
Authors:Carolin Brunn, Florian Tschorsch
Abstract:
Analyzing large volumes of sensor network data, such as electricity consumption measurements from smart meters, is essential for modern applications but raises significant privacy concerns. Privacy-enhancing technologies like z-anonymity offer efficient anonymization for continuous data streams by suppressing rare values that could lead to re-identification, making it particularly suited for resource-constrained environments. Originally designed for centralized architectures, z-anonymity assumes a trusted central entity. In this paper, we introduce deZent, a decentralized implementation of z-anonymity that minimizes trust in the central entity by realizing local z-anonymity with lightweight coordination. We develop deZent using a stochastic counting structure and secure sum to coordinate private anonymization across the network. Our results show that deZent achieves comparable performance to centralized z-anonymity in terms of publication ratio, while reducing the communication overhead towards the central entity. Thus, deZent presents a promising approach for enhancing privacy in sensor networks while preserving system efficiency.
Authors:Ali Fattahdizaji, Mohammad Pishdar, Zarina Shukur
Abstract:
Smart contracts are fundamental components of blockchain ecosystems; however, their security remains a critical concern due to inherent vulnerabilities. While existing detection methodologies are predominantly syntax-oriented, targeting reentrancy and arithmetic errors, they often overlook logical flaws arising from defective business logic. This paper introduces SmartGraphical, a novel security framework specifically engineered to identify logical attack surfaces. By synthesizing automated static analysis with an interactive graphical representation of contract architectures, SmartGraphical facilitates a comprehensive inspection of a contract's functional control flow. To mitigate the context-dependent nature of logical bugs, the tool adopts a human-in-the-loop approach, empowering developers to interpret heuristic warnings within a visualized structural context. The efficacy of SmartGraphical was validated through a rigorous empirical evaluation involving a large dataset of real-world contracts and a large-scale user study with 100 developers of varying expertise. Furthermore, the framework's performance was demonstrated through case studies on high-profile exploits, such as the SYFI rebase failure and farming protocol flash swap attacks, proving that SmartGraphical identifies intricate vulnerabilities that elude state-of-the-art automated detectors. Our findings indicate that this hybrid methodology significantly enhances the interpretability and detection rate of non-trivial logical security threats in smart contracts.
Authors:Md Mojibur Rahman Redoy Akanda, Ahmed Tanvir Mahdad, Nitesh Saxena
Abstract:
In today's technology-driven world, web services have opened up new opportunities for blind and visually impaired people to interact independently. Securing interactions with these services is crucial; however, currently deployed authentication mainly concentrate on sighted users, overlooking the needs of the blind and visually impaired community. In this paper, we address this gap by investigating the security and accessibility aspects of these authentication when adopted by blind and visually impaired users. We model web authentication for such users as screen reader assisted authentication and introduce an evaluation framework called AWARE. Using AWARE, we then systematically assessed popular PC and smartphone-based screen readers against different authentication methods, including variants of 2FA and passwordless schemes, to simulate real-world scenarios. We analyzed these screen reader assisted authentication interactions with authentication methods in three settings: using a terminal (PC) with screen readers, a combination of the terminal (PC) and smartphone with screen readers, and smartphones with integrated screen readers. The results of our study underscore weaknesses in all of our observed screen reader assisted scenarios for real-life authentication methods. These weaknesses, encompassing specific accessibility issues caused by imprecise screen reader instructions, highlight vulnerability concerning observed scenarios for both real-world and research literature based attacks, including phishing, concurrency, fatigue, cross-service, and shoulder surfing. Broadly, our AWARE framework can be used by designers as a precursor to user studies which are typically time-consuming and tedious to perform, independently allowing to unfold security and accessibility problems early which designers can address prior to full-fledged user testing of more isolated issues.
Authors:Shayeef Murshid, Ramprasad Sarkar, Mriganka Mandal
Abstract:
Certified deletion ensures that encrypted data can be irreversibly deleted, preventing future recovery even if decryption keys are later exposed. Although existing works have achieved certified deletion across various cryptographic primitives, they rely on central authorities, leading to inherent escrow vulnerabilities. This raises the question of whether certified deletion can be achieved in decentralized frameworks such as Registered Attribute-Based Encryption (RABE) that combines fine-grained access control with user-controlled key registration. This paper presents the first RABE schemes supporting certified deletion and certified everlasting security. Specifically, we obtain the following: - We first design a privately verifiable RABE with Certified Deletion (RABE-CD) scheme by combining our newly proposed shadow registered ABE (Shad-RABE) with one-time symmetric key encryption with certified deletion. - We then construct a publicly verifiable RABE-CD scheme using Shad-RABE, witness encryption, and one-shot signatures, allowing any party to validate deletion certificates without accessing secret keys. - We also extend to privately verifiable RABE with Certified Everlasting Deletion (RABE-CED) scheme, integrating quantum-secure RABE with the certified everlasting lemma. Once a certificate is produced, message privacy becomes information-theoretic even against unbounded adversaries. -We finally realize a publicly verifiable RABE-CED scheme by employing digital signatures for the BB84 states, allowing universal verification while ensuring that deletion irreversibly destroys information relevant to decryption.
Authors:Sky Pelletier Waterpeace, Nikolay Ivanov
Abstract:
Transaction processing systems underpin modern commerce, finance, and critical infrastructure, yet their security has never been studied across the full evolutionary arc of these systems. Over five decades, transaction processing has progressed through four distinct generations, from centralized databases, to distributed databases, to blockchain and distributed ledger technologies (DLTs), finally to multi-context systems that span cyber-physical components under real-time constraints. Each generation has introduced new transaction types and new classes of vulnerabilities, yet security research remains fragmented by domain, and the foundational ACID transaction model has not been revisited to reflect the demands of contemporary systems. We classify 163 papers on transaction security by evolutionary generation, security focus, and relevant Common Weakness Enumeration (CWE) entries, and distill a curated set of 41 high-impact or seminal papers spanning all four generations. We make three principal contributions. First, we develop a four-generation evolutionary taxonomy that contextualizes each work within the broader trajectory of transaction processing. Second, we map each paper's security focus to CWE identifiers, providing a systems-oriented vocabulary for analyzing transaction-specific threats across otherwise siloed domains. Third, we demonstrate that the classical ACID properties are insufficient for modern transactional systems and introduce RANCID, extending ACID with Real-timeness (R) and N-many Contexts (N), as a property set for reasoning about the security and correctness of systems that must coordinate across heterogeneous contexts under timing constraints. Our systematization exposes a pronounced bias toward DLT security research at the expense of broader transactional security and identifies concrete open problems for the next generation of transaction processing systems.
Authors:Konstantinos A. Draziotis, Myrto Eleftheria Gkogkou
Abstract:
In the present paper we study a non-modular variant of the Short Integer Solution problem over the integers. Given a random matrix $A \in \mathbb{Z}^{n\times m}$ with entries $a_{ij}$ such that $0\le a_{ij}< Q,$ for some $Q>0,$ the goal is to find a nonzero vector ${\bf x}\in\mathbb{Z}^m$ such that $A{\bf x}={\bf 0}$ and $\|{\bf x}\|_\infty \le β,$ for a given bound $β.$ We show that an algorithm that solves random instances of this problem with non-negligible probability yields a polynomial-time algorithm for approximating $\mathrm{SIVP}$ within a factor $\widetilde{O}(n^{3/2})$ (with $\ell_2$ norm) in the worst case for any $n-$dimensional integer lattice.
Authors:Viraaji Mothukuri, Reza M. Parizi
Abstract:
This review examines how quantum computing and artificial intelligence challenge current cryptographic systems. We analyze the literature to assess the resilience of algorithms against quantum attacks (Shor's and Grover's algorithms) and AI-enhanced cryptanalysis. RSA and elliptic curve cryptography are at risk of compromise from quantum computers. Symmetric algorithms like AES-128 retain security, but with a reduced effective key length under quantum attacks. Deep learning models demonstrate improved side-channel analysis, extracting keys from protected implementations. These convergent threats require a defense-in-depth approach that combines post-quantum algorithms, implementation hardening, and cryptographic agility. We find that lattice-based algorithms (ML-KEM, ML-DSA) resist known quantum attacks but require careful implementation to prevent side-channel leakage. Hash-based signatures (SLH-DSA) provide conservative security with signature sizes ranging from 17 to 50 KB. No single approach addresses both quantum and AI threats comprehensively. Organizations must treat cryptographic security as an ongoing process rather than a fixed deployment, maintaining the capability to update algorithms as threats evolve.
Authors:Sushanth Ambati, Kainat Adeel, Jack Myers, Nikolay Ivanov
Abstract:
Self-Sovereign Digital Identity (SSDI) enables individuals to control their own identity assertions and data, rather than relying on centralized or federated systems prone to large-scale data breaches. By eliminating centralized databases maintained by service providers and identity brokers, SSDIs offer enhanced security and privacy. However, adoption remains slow, and research in this area lacks systematization and uniformity. To address these gaps, we present a comprehensive systematization of knowledge on self-sovereign digital identities, with a primary focus on identifying the challenges that impede real-world adoption. We survey 80 academic and non-academic sources and identify six major challenges: (i) binding a single identity to one individual or organization, (ii) the absence of mature cryptographic and communication protocols, (iii) significant usability barriers, (iv) regulatory and oversight gaps, (v) bootstrapping to critical-mass adoption, and (vi) dependence on a permissionless, decentralized, yet singular infrastructure that may expose unforeseen vulnerabilities over time. We then analyze 47 scientific publications and find that the vast majority focus on blockchain-based solutions rather than generalized SSDI architectures. Additionally, we catalog 12 real-world, production-grade SSDI applications. Our evaluation of these solutions reveals that self-sovereignty is, in practice, a spectrum rather than a binary property. Finally, we explore the frontiers of SSDI by identifying major trends, open problems, and opportunities for future research. We hope this systematization will help advance the shift from centralized to self-sovereign digital identities in a disciplined and impactful way.
Authors:Sona Alex, Bian Yang
Abstract:
This document details the Fully Homomorphic Modified Rivest Scheme (FHMRS), a security issue in FHMRS, and a modification to FHMRS (mFHMRS) to mitigate the security issue.
Authors:Muhammad Zia Hydari, Idris Adjerid, Yingda Lu, Narayan Ramasubbu
Abstract:
Simulated phishing campaigns are widely deployed, yet the behavioral data they produce is endogenous: because training is triggered by clicking, the employees receiving intervention have already demonstrated susceptibility. This endogeneity, combined with the difficulty of separating genuine habit formation from stable individual differences, means standard analyses can mischaracterize program effectiveness. In this Research Note, we develop a generalizable analytic framework addressing both biases simultaneously. We utilize marginal structural models (MSMs) to correct for the endogenous, click-triggered assignment of training, while integrating correlated random effects (CRE) to disentangle true state dependence from stable employee heterogeneity. Applying the MSM+CRE estimator to logs from 17 campaigns delivered to university staff (192,840 observations) reveals that analyses ignoring stable differences overstate the causal persistence of clicking; most repeat clicking reflects who employees are, not the effect of recent failures. This persistence is context-dependent, amplifying when successive campaigns share persuasion cues. Teachable-moment features also matter: emotion framing and explicit reporting pitches can largely eliminate persistence, while annotated-email cues modestly exacerbate it. Finally, employees engaging with the education page exhibit greater persistence than those dismissing it, consistent with an emboldening mechanism. We contribute methodologically by integrating MSMs and CRE into a portable framework for analyzing standard simulation logs, and practically by identifying specific design levers so organizations can better sequence and evaluate their phishing programs.
Authors:Cameron Bell, Timothy Johnston, Antoine Luciano, Christian P Robert
Abstract:
Theoretical and applied research into privacy encompasses an incredibly broad swathe of differing approaches, emphasis and aims. This work introduces a new quantitative notion of privacy that is both contextual and specific. We argue that it provides a more meaningful notion of privacy than the widely utilised framework of differential privacy and a more explicit and rigorous formulation than what is commonly used in statistical disclosure theory. Our definition relies on concepts inherent to standard Bayesian decision theory, while departing from it in several important respects. In particular, the party controlling the release of sensitive information should make disclosure decisions from the prior viewpoint, rather than conditional on the data, even when the data is itself observed. Illuminating toy examples and computational methods are discussed in high detail in order to highlight the specificities of the method.
Authors:Wang Jian, Shen Hong, Ke Wei, Liu Xue Hua
Abstract:
While federated learning protects data privacy, it also makes the model update process vulnerable to long-term stealthy perturbations. Existing studies on backdoor attacks in federated learning mainly focus on trigger design or poisoning strategies, typically assuming that identical perturbations behave similarly across different model architectures. This assumption overlooks the impact of model structure on perturbation effectiveness. From a structure-aware perspective, this paper analyzes the coupling relationship between model architectures and backdoor perturbations. We introduce two metrics, Structural Responsiveness Score (SRS) and Structural Compatibility Coefficient (SCC), to measure a model's sensitivity to perturbations and its preference for fractal perturbations. Based on these metrics, we develop a structure-aware fractal perturbation injection framework (TFI) to study the role of architectural properties in the backdoor injection process. Experimental results show that model architecture significantly influences the propagation and aggregation of perturbations. Networks with multi-path feature fusion can amplify and retain fractal perturbations even under low poisoning ratios, while models with low structural compatibility constrain their effectiveness. Further analysis reveals a strong correlation between SCC and attack success rate, suggesting that SCC can predict perturbation survivability. These findings highlight that backdoor behaviors in federated learning depend not only on perturbation design or poisoning intensity but also on the interaction between model architecture and aggregation mechanisms, offering new insights for structure-aware defense design.
Authors:Santanu Mondal, T. Chithralekha
Abstract:
Central Bank Digital Currency (CBDCs) are becoming a new digital financial tool aimed at financial inclusion, increased monetary stability, and improved efficiency of payment systems, as they are issued by central banks. One of the most important aspects is that the CBDC must offer secure offline payment methods to users, allowing them to retain cash-like access without violating Anti-Money Laundering and Counter-terrorism Financing (AML/CFT) rules. The offline CBDC ecosystems will provide financial inclusion, empower underserved communities, and ensure equitable access to digital payments, even in connectivity-poor remote locations. With the rapid growth of Internet of Things (IoT) devices in our everyday lives, they are capable of performing secure digital transactions. Integrating offline CBDC payment with IoT devices enables seamless, automated payment without internet connectivity. However, IoT devices face special challenges due to their resource-constrained nature. This makes it difficult to include features such as double-spending prevention, privacy preservation, low-computation operation, and digital identity management. The work proposes a privacy-preserving offline CBDC model with integrated secure elements (SEs), zero-knowledge proofs (ZKPs), and intermittent synchronisation to conduct offline payments on IoT hardware. The proposed model is based on recent improvements in offline CBDC prototypes, regulations and cryptographic design choices such as hybrid architecture that involves using combination of online and offline payment in IoT devices using secure hardware with lightweight zero-knowledge proof cryptographic algorithm.
Authors:Samiran Ghosh, V Anil Kumar
Abstract:
Malware attacks in today's vast digital ecosystem pose a serious threat. Understanding malware propagation dynamics and designing effective control strategies are therefore essential. In this work, we propose a generic SEIRV model formulated using ordinary differential equations to study malware spread. We establish the positivity and boundedness of the system, derive the malware propagation threshold, and analyze the local and global stability of the malware-free equilibrium. The separatrix defining epidemic regions in the control space is identified, and the existence of a forward bifurcation is demonstrated. Using normalized forward sensitivity indices, we determine the parameters most influential to the propagation threshold. We further examine the nonlinear dependence of key epidemic characteristics on the transmission rate, including the maximum number of infected, time to peak infection, and total number of infected. We propose a hybrid gradient-based global optimization framework using simulated annealing approach to identify effective and cost-efficient control strategies. Finally, we calibrate the proposed model using infection data from the "Windows Malware Dataset with PE API Calls" and investigated the effect of intervention onset time on averted cases, revealing an exponential decay relationship between delayed intervention and averted cases.
Authors:Mahmudul Hassan Ashik, Moinul Hossain
Abstract:
The Cellular Vehicle-to-Everything (C-V2X), introduced and developed by the 3GPP, is a promising technology for the Autonomous Driving System (ADS). C-V2X aims to fulfill the Service-Level Requirements (SLRs) of ADS to ensure road safety following the development of the latest version, i.e., the NR-V2X. However, vulnerabilities threatening road safety in NR-V2X persist that have yet to be investigated. Existing research primarily evaluates road safety based on successful packet receptions. In this work, we propose a novel resource starvation attack that exploits vulnerabilities in the resource allocation of NR-V2X to diminish the required SLRs, making the road condition unsafe for autonomous driving. Furthermore, we establish the Age of Information (AoI) as the predominant metric for estimating the impact of adversarial attacks on NR-V2X by constructing a Discrete-time Markov chain (DTMC) based analytical model and validating it through extensive simulations. Finally, our analysis underscores how the proposed attack on NR-V2X can lead to unsafe driving conditions by reducing the SLR of time-sensitive applications in ADS up to 15% from the target. Additionally, we observe that even benign vehicles act selfishly when resources are scarce, leading to further safety compromises.
Authors:Joud Khoury, Minyoung Kim, Christophe Merlin, Jose Meseguer, Zachary Ratliff, Carolyn Talcott
Abstract:
Hidden communication systems (HCS) embed covert messages within ordinary network activity to hide the presence of communication. In practice, the undetectability of an HCS is typically evaluated using ad hoc traffic statistics or specific detectors, making security claims tightly coupled to experimental setups and implicit adversarial assumptions. In this work, we formalize undetectability as the statistical indistinguishability of observable execution traces under two deployments: a baseline system without hidden communication and an HCS deployment carrying covert traffic. Undetectability is expressed as a bound on a quantitative measure of distance between the trace distributions induced by these two executions. We develop Maude-HCS, an executable modeling and analysis framework that provides a principled and executable foundation for reasoning about undetectability-performance tradeoffs in complex HCS designs. Maude-HCS allows designers to specify protocol behavior, adversary observables, and environmental assumptions, and to generate Monte Carlo samples from the induced trace distributions. We show that Maude-HCS can be used to audit claims of undetectability by estimating the true and false positive rates of a statistical test and converting these estimates into lower bounds on undetectability measures such as KL divergence. This enables systematic evaluation of detectability and its tradeoffs with performance under explicitly stated modeling assumptions. Finally, we evaluate Maude-HCS on tunneling-based HCS instantiations and validate model predictions against measurements from a physical testbed. For passive adversaries observing timing and traffic statistics, we quantify how undetectability and performance vary with protocol configuration, background traffic, and network loss, and demonstrate strong semantic alignment between model-based guarantees and empirical results.
Authors:Adam Dorian Wong, John D. Hastings
Abstract:
Mobile devices are frequent targets of eCrime threat actors through SMS spearphishing (smishing) links that leverage Domain Generation Algorithms (DGA) to rotate hostile infrastructure. Despite this, DGA research and evaluation largely emphasize malware C2 and email phishing datasets, leaving limited evidence on how well detectors generalize to smishing-driven domain tactics outside enterprise perimeters. This work addresses that gap by evaluating traditional and machine-learning DGA detectors against Gravity Falls, a new semi-synthetic dataset derived from smishing links delivered between 2022 and 2025. Gravity Falls captures a single threat actor's evolution across four technique clusters, shifting from short randomized strings to dictionary concatenation and themed combo-squatting variants used for credential theft and fee/fine fraud. Two string-analysis approaches (Shannon entropy and Exp0se) and two ML-based detectors (an LSTM classifier and COSSAS DGAD) are assessed using Top-1M domains as benign baselines. Results are strongly tactic-dependent: performance is highest on randomized-string domains but drops on dictionary concatenation and themed combo-squatting, with low recall across multiple tool/cluster pairings. Overall, both traditional heuristics and recent ML detectors are ill-suited for consistently evolving DGA tactics observed in Gravity Falls, motivating more context-aware approaches and providing a reproducible benchmark for future evaluation.
Authors:Malik Mouaji, Saif Al-Kuwari
Abstract:
Multiparty quantum key agreement (MQKA) enables $n \geq 3$ mutually distrustful users to establish a shared secret key through collaborative quantum protocols. In this paper, we provide a comprehensive review where we argue that MQKA is best understood as a design space organized along three orthogonal but tightly coupled axes: (1) network architecture, which determines how quantum states flow between participants; (2) quantum resources, which encode the physical degrees of freedom used for implementation; and (3) security model, which defines trust assumptions about devices and infrastructure. Rather than treating MQKA as a linear sequence of isolated protocols, we develop this three-axis perspective to reveal recurrent patterns, sharp trade-offs, and unexplored design spaces. We classify MQKA protocols into structural families, map them to underlying quantum resources, and analyze how different security models shape fairness and collusion resistance. We further identify open challenges in composable security frameworks, network native integration, device-independent implementations, and propose a research roadmap toward hybrid-resource, bosonic-code-encoded, and fairness-aware MQKA suitable for the future quantum internet deployments in the post-NISQ era.
Authors:Sheng Sun, Sarah Evans
Abstract:
This paper presents composable attestation as a generalized cryptographic framework for Continuous and Incremental Trust in Distributed Systems,such as Artificial Intelligence (AI) computation, and Open Source Software (OSS) supply chain verification. We establish a rigorous mathematical foundation which is defining core properties of such attestation systems: composability, order independence, transitivity, determinism, inclusion, and dynamic component verification. In contrast to traditional attestation methodologies relying on monolithic verification, composable attestation facilitates modular, scalable, and cryptographically secured integrity verification adaptable to evolving system configurations. This work introduces generalized attestation proof generation and verification functions, implementable via a variety of cryptographic constructions, in which Merkle trees plays vital role in constructing the composable attestation proof. Alternative constructions, including accumulator-based schemes and multi-signature approaches, are also explored, each presenting distinct trade-offs in performance, security, and functionality. Formal analysis demonstrates the adherence of these implementations to the fundamental properties . The framework's utility extends to applications such as secure AI model integrity verification , federated learning, and runtime trust assurance. The concept of attestation inclusion is introduced, permitting incremental integration of new components without necessitating full system re-attestation. This generalized approach reinforce trust in AI computation and broader distributed computing environments through cryptographically verifiable proof mechanisms, building upon foundational concepts of bootstrapping trust.
Authors:Nancy Lau, Louis Sloot, Jyoutir Raj, Giuseppe Marco Boscardin, Evan Harris, Dylan Bowman, Mario Brajkovski, Jaideep Chawla, Dan Zhao
Abstract:
Large language models (LLMs) are increasingly being deployed as software engineering agents that autonomously contribute to repositories. A major benefit these agents present is their ability to find and patch security vulnerabilities in the codebases they oversee. To estimate the capability of agents in this domain, we introduce ZeroDayBench, a benchmark where LLM agents find and patch 22 novel critical vulnerabilities in open-source codebases. We focus our efforts on three popular frontier agentic LLMs: GPT-5.2, Claude Sonnet 4.5, and Grok 4.1. We find that frontier LLMs are not yet capable of autonomously solving our tasks and observe some behavioral patterns that suggest how these models can be improved in the domain of proactive cyberdefense.
Authors:Phat T. Tran-Truong, Vinh X. Q. Nguyen, Ha X. Son, Phien Nguyen-Ngoc, Khanh H. Vo, Triet M. Nguyen
Abstract:
The proliferation of IoT and V2X systems generates unprecedented sensitive data at the network edge, demanding privacy-preserving architectures that enable secure sharing without exposing raw information. Contemporary solutions face a fundamental privacy-efficiency-trust trilemma: achieving strong privacy guarantees, computational efficiency for resource-constrained devices, and decentralized trust simultaneously remains intractable with single-paradigm approaches. This survey systematically analyzes 75 technical papers (2007--2025) through a novel three-dimensional taxonomy classifying architectures into Decentralized Computation, Cryptography-based, and Distributed Ledger approaches. Temporal analysis reveals dramatic acceleration during 2024--2025, with 48% of all papers published in this period -- Decentralized Computation dominates at 44% of contributions and 59% of 2025 publications. Comprehensive Security Threat Mapping and Technology Maturity Assessment demonstrate that mature solutions occupy narrow design regions excelling in one or two dimensions while compromising others, conclusively validating the trilemma hypothesis. We identify emerging hybrid architectures combining complementary paradigms as the essential path forward. Critical challenges including security guarantee composition across layers, multi-layer coordination overhead minimization, and post-quantum security integration must be addressed for practical deployment in next-generation intelligent transportation systems and IoT ecosystems.
Authors:Taisei Otsuji, Peter Fulla, Takuro Fukunaga
Abstract:
Hotaru Beam is a logic puzzle which objective is to connect circles placed on a grid by drawing only lines with specified starting points and numbers of bends. A zero-knowledge proof is a communication protocol that allows one player to persuade the other that they are in possession of a certain piece of information without actually revealing it. We show that Hotaru Beam is NP-complete and present a physical zero-knowledge proof (i.e. implementable using physical items) for proving that one knows a solution to the puzzle.
Authors:Mingcheng Jiang, Jiancheng Huang, Jiangfei Wang, Zhengzhu Xie, Nan Fang, Guang Cheng, Xiaoyan Hu, Hua Wu
Abstract:
Static Application Security Testing (SAST) tools often suffer from high false positive rates, leading to alert fatigue that consumes valuable auditing resources. Recent efforts leveraging Large Language Models (LLMs) as filters offer limited improvements; however, these methods treat LLMs as passive, stateless classifiers, which lack project-wide context and the ability to learn from analyses to discover unknown, similar vulnerabilities. In this paper, we propose vEcho, a novel framework that transforms the LLM from a passive filter into a virtual security expert capable of learning, memory, and reasoning. vEcho equips its core reasoning engine with a robust developer tool suite for deep, context-aware verification. More importantly, we introduce a novel Echoic Vulnerability Propagation (EVP) mechanism. Driven by a Cognitive Memory Module that simulates human learning, EVP enables vEcho to learn from verified vulnerabilities and proactively infer unknown, analogous flaws, achieving a paradigm shift from passive verification to active discovery. Extensive experiments on the CWE-Bench-Java dataset demonstrate vEcho's dual advantages over the state-of-the-art baseline, IRIS. Specifically, vEcho achieves a 65% detection rate, marking a 41.8% relative improvement over IRIS's 45.83%. Crucially, it simultaneously addresses alert fatigue by reducing the false positive rate to 59.78%, a 28.3% relative reduction from IRIS's 84.82%. Furthermore, vEcho proactively identified 37 additional known vulnerabilities beyond the 120 documented in the dataset, and has discovered 51 novel 0-day vulnerabilities in open-source projects.
Authors:Hillol Biswas, Kyriakos Zoiros
Abstract:
The current state, emerging trends, and practical challenges of optical fiber-based power network SCADA quantum communication must be addressed to fully utilise the technological platform's potential in real-world power system SCADA communications involving massive volumes of real-time data, as well as in managing, encoding, and applications such as quantum cryptography. Quantum key distribution (QKD) is an essential part of the cybersecurity paradigm for quantum communication. Even though quantum computing with individual circuits yields probabilistic outcomes for the problem at hand, real-world datasets are complex and challenging to handle, even with telemetry. When using the cybersecurity triad of availability, confidentiality, and integrity (CIA) in reverse order (AIC), availability is given priority in electric power networks. This research assesses the use of the BB84, E91, B92, and SARG04 cryptographic protocols by applying them to large, multivariate power-system SCADA datasets and comparing the outcomes. By leveraging the variety of QKD protocols available with quantum electronics hardware, this simulation work provides a promising avenue for developing frameworks and deploying SCADA/PMU networks in actual power systems.
Authors:Tamer Abdelaziz, Salma Alsaghir, Karim Ali
Abstract:
Smart contracts underpin high-value ecosystems such as decentralized finance (DeFi), yet recurring vulnerabilities continue to cause losses worth billions of dollars. Although numerous security analyzers that detect such flaws exist, real-world attacks remain frequent, raising the question of whether these tools are truly effective or simply under-used due to low developer trust. Prior benchmarks have evaluated analyzers on synthetic or vulnerable-only contract datasets, limiting their ability to measure false positives, false negatives, and usability factors that drive adoption. To close this gap, we present a mixed-methods study that combines large-scale benchmarking with practitioner insights. We evaluate six widely used analyzers (i.e., Confuzzius, Dlva, Mythril, Osiris, Oyente, and Slither) on 653 real-world smart contracts that cover three high-impact vulnerability classes from the OWASP Smart Contract Top Ten (i.e., reentrancy, suicidal contract termination, and integer arithmetic errors). Our results show substantial variation in accuracy (F1 = 31.2 to 94.6%), high false-positive rates (up to 32.6%), and runtimes exceeding 700 seconds per contract. We then survey 150 professional developers and auditors to understand how they use and perceive these tools. Our findings reveal that excessive false positives, vague explanations, and long analysis times are the main barriers to trust and adoption in practice. By linking measurable performance gaps to developer perceptions, we provide concrete recommendations for improving the precision, explainability, and usability of smart-contract security analyzers.
Authors:Manuella Christelle Tossa, Fernando Madrigal, Ryan Blosser, Asma Jodeiri Akbarfam
Abstract:
Reliable grid operation depends on accurate and timely telemetry, making modern power systems vulnerable to communication layer cyberattacks. This paper evaluates how Denial of Service (DoS), Denial of Data (DoD), and False Data Injection (FDI) attacks disrupt the IEEE 14 bus system using a MATLAB only, time stepped simulation framework built on MATPOWER. The framework emulates a 24 hour operating cycle with sinusoidal load variation, introduces attack specific manipulation of load and voltage data, and performs full AC power flow solves with reactive limit enforcement (PV PQ switching). At each timestep, the system logs true and measured voltages, generator P/Q output, system losses, and voltage limit violations to capture transient cyber physical effects. Results show that DoD causes the largest physical distortions and reactive power stress, DoS masks natural variability and degrades situational awareness, and FDI creates significant discrepancies between true and perceived voltages. The study provides a compact, reproducible benchmark for analyzing cyber induced instability and informing future defense strategies.
Authors:Yiwei Fu, Tianhao Wang, Varun Chandrasekaran
Abstract:
Data valuation methods quantify how individual training examples contribute to a model's behavior, and are increasingly used for dataset curation, auditing, and emerging data markets. As these techniques become operational, they raise serious privacy concerns: valuation scores can reveal whether a person's data was included in training, whether it was unusually influential, or what sensitive patterns exist in proprietary datasets. This motivates the study of privacy-preserving data valuation. However, privacy is fundamentally in tension with valuation utility under differential privacy (DP). DP requires outputs to be insensitive to any single record, while valuation methods are explicitly designed to measure per-record influence. As a result, naive privatization often destroys the fine-grained distinctions needed to rank or attribute value, particularly in heterogeneous datasets where rare examples exert outsized effects. In this work, we analyze the feasibility of DP-compatible data valuation. We identify the core algorithmic primitives across common valuation frameworks that induce prohibitive sensitivity, explaining why straightforward DP mechanisms fail. We further derive design principles for more privacy-amenable valuation procedures and empirically characterize how privacy constraints degrade ranking fidelity across representative methods and datasets. Our results clarify the limits of current approaches and provide a foundation for developing valuation methods that remain useful under rigorous privacy guarantees.
Authors:Chuanming Tang, Ling Qing, Shifeng Chen
Abstract:
The rapid evolution of sophisticated cyberattacks has strained modern Security Operations Centers (SOC), which traditionally rely on rule-based or signature-driven detection systems. These legacy frameworks often generate high volumes of technical alerts that lack organizational context, leading to analyst fatigue and delayed incident responses. This paper presents LiaisonAgent, an autonomous multi-agent system designed to bridge the gap between technical risk detection and business-level risk governance. Built upon the QWQ-32B large reasoning model, LiaisonAgent integrates specialized sub-agents, including human-computer interaction agents, comprehensive judgment agents, and automated disposal agents-to execute end-to-end investigation workflows. The system leverages a hybrid planning architecture that combines deterministic workflows for compliance with autonomous reasoning based on the ReAct paradigm to handle ambiguous operational scenarios. Experimental evaluations across diverse security contexts, such as large-scale data exfiltration and unauthorized account borrowing, achieve an end-to-end tool-calling success rate of 97.8% and a risk judgment accuracy of 95%. Furthermore, the system exhibits significant resilience against out-of-distribution noise and adversarial prompt injections, while achieving a 92.7% reduction in manual investigation overhead.
Authors:Kennedy Edemacu, Mohammad Mahdi Shokri
Abstract:
Retrieval-augmented generation (RAG) has emerged as a powerful paradigm for enhancing multimodal large language models by grounding their responses in external, factual knowledge and thus mitigating hallucinations. However, the integration of externally sourced knowledge bases introduces a critical attack surface. Adversaries can inject malicious multimodal content capable of influencing both retrieval and downstream generation. In this work, we present MM-MEPA, a multimodal poisoning attack that targets the metadata components of image-text entries while leaving the associated visual content unaltered. By only manipulating the metadata, MM-MEPA can still steer multimodal retrieval and induce attacker-desired model responses. We evaluate the attack across multiple benchmark settings and demonstrate its severity. MM-MEPA achieves an attack success rate of up to 91\% consistently disrupting system behaviors across four retrievers and two multimodal generators. Additionally, we assess representative defense strategies and find them largely ineffective against this form of metadata-only poisoning. Our findings expose a critical vulnerability in multimodal RAG and underscore the urgent need for more robust, defense-aware retrieval and knowledge integration methods.
Authors:Boram Jung, Yuliang Li, Hung-Wei Tseng
Abstract:
Homomorphic encryption (HE) enables computations directly on encrypted data, offering strong cryptographic guarantees for secure and privacy-preserving data storage and query execution. However, despite its theoretical power, practical adoption of HE in database systems remains limited due to extreme cipher-text expansion, memory overhead, and the computational cost of bootstrapping, which resets noise levels for correctness. This paper presents NSHEDB, a secure query processing engine designed to address these challenges at the system architecture level. NSHEDB uses word-level leveled HE (LHE) based on the BFV scheme to minimize ciphertext expansion and avoid costly bootstrapping. It introduces novel techniques for executing equality, range, and aggregation operations using purely homomorphic computation, without transciphering between different HE schemes (e.g., CKKS/BFV/TFHE) or relying on trusted hardware. Additionally, it incorporates a noise-aware query planner to extend computation depth while preserving security guarantees. We implement and evaluate NSHEDB on real-world database workloads (TPC-H) and show that it achieves 20x-V1370x speedup and a 73x storage reduction compared to state-of-the-art HE-based systems, while upholding 128-bit security in a semi-honest model with no key release or trusted components.
Authors:Abisheka Pitumpe, Amir Rahmati
Abstract:
Job-based smishing scams, where victims are recruited under the guise of remote job opportunities, represent a rapidly growing and understudied threat within the broader landscape of online fraud. In this paper, we present Anansi, the first scalable, end-to-end measurement pipeline designed to systematically engage with, analyze, and characterize job scams in the wild. Anansi combines large language models (LLMs), automated browser agents, and infrastructure fingerprinting tools to collect over 29,000 scam messages, interact with more than 1900 scammers, and extract behavioral, financial, and infrastructural signals at scale. We detail the operational workflows of scammers, uncover extensive reuse of message templates, domains, and cryptocurrency wallets, and identify the social engineering tactics used to defraud victims. Our analysis reveals millions of dollars in cryptocurrency losses, highlighting the use of deceptive techniques such as domain fronting and impersonation of well-known brands. Anansi demonstrates the feasibility and value of automating the engagement with scammers and the analysis of infrastructure, offering a new methodological foundation for studying large-scale fraud ecosystems.
Authors:Sean M. Alderman, John D. Hastings
Abstract:
The growth and heterogeneity of IoT devices create security challenges where static identification models can degrade as traffic evolves. This paper presents a two-stage, flow-feature-based pipeline for unsupervised IoT device traffic profiling and incremental model updating, evaluated on selected long-duration captures from the Deakin IoT dataset. For baseline profiling, density-based clustering (DBSCAN) isolates a substantial outlier portion of the data and produces the strongest alignment with ground-truth device labels among tested classical methods (NMI 0.78), outperforming centroid-based clustering on cluster purity. For incremental adaptation, we evaluate stream-oriented clustering approaches and find that BIRCH supports efficient updates (0.13 seconds per update) and forms comparatively coherent clusters for a held-out novel device (purity 0.87), but with limited capture of novel traffic (share 0.72) and a measurable trade-off in known-device accuracy after adaptation (0.71). Overall, the results highlight a practical trade-off between high-purity static profiling and the flexibility of incremental clustering for evolving IoT environments.
Authors:Zhicheng Fang, Jingjie Zheng, Chenxu Fu, Wei Xu
Abstract:
Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare across papers due to drift in datasets, harnesses, and judging protocols. We introduce JAILBREAK FOUNDRY (JBF), a system that addresses this gap via a multi-agent workflow to translate jailbreak papers into executable modules for immediate evaluation within a unified harness. JBF features three core components: (i) JBF-LIB for shared contracts and reusable utilities; (ii) JBF-FORGE for the multi-agent paper-to-module translation; and (iii) JBF-EVAL for standardizing evaluations. Across 30 reproduced attacks, JBF achieves high fidelity with a mean (reproduced-reported) attack success rate (ASR) deviation of +0.26 percentage points. By leveraging shared infrastructure, JBF reduces attack-specific implementation code by nearly half relative to original repositories and achieves an 82.5% mean reused-code ratio. This system enables a standardized AdvBench evaluation of all 30 attacks across 10 victim models using a consistent GPT-4o judge. By automating both attack integration and standardized evaluation, JBF offers a scalable solution for creating living benchmarks that keep pace with the rapidly shifting security landscape.
Authors:Alejandro Guerra-Manzanares, Jialin Huang
Abstract:
Cross-domain intrusion detection remains a critical challenge due to significant variability in network traffic characteristics and feature distributions across environments. This study evaluates the transferability of three widely used flow-based feature sets (Argus, Zeek and CICFlowMeter) across four widely used datasets representing heterogeneous IoT and Industrial IoT network conditions. Through extensive experiments, we evaluate in- and cross-domain performance across multiple classification models and analyze feature importance using SHapley Additive exPlanations (SHAP). Our results show that models trained on one domain suffer significant performance degradation when applied to a different target domain, reflecting the sensitivity of IoT intrusion detection systems to distribution shifts. Furthermore, the results evidence that the choice of classification algorithm and feature representations significantly impact transferability. Beyond reporting performance differences and thorough analysis of the transferability of features and feature spaces, we provide practical guidelines for feature engineering to improve robustness under domain variability. Our findings suggest that effective intrusion detection requires both high in-domain performance and resilience to cross-domain variability, achievable through careful feature space design, appropriate algorithm selection and adaptive strategies.
Authors:Wei Lian, Alejandro Guerra-Manzanares
Abstract:
The rapid expansion of Industrial IoT (IIoT) systems has amplified security challenges, as heterogeneous devices and dynamic traffic patterns increase exposure to sophisticated and previously unseen cyberattacks. Traditional intrusion detection systems often struggle in such environments due to their reliance on extensive labeled data and limited ability to detect new threats. To address these challenges, we propose MI$^2$DAS, a multi-layer intrusion detection framework that integrates anomaly-based hierarchical traffic pooling, open-set recognition to distinguish between known and unknown attacks and incremental learning for adapting to novel attack types with minimal labeling. Experiments conducted on the Edge-IIoTset dataset demonstrate strong performance across all layers. In the first layer, GMM achieves superior normal-attack discrimination (accuracy = 0.953, TPR = 1.000). In open-set recognition, GMM attains a recall of 0.813 for known attacks, while LOF achieves 0.882 recall for unknown attacks. For fine-grained classification of known attacks, Random Forest achieves a macro-F1 of 0.941. Finally, the incremental learning module maintains robust performance when incorporation novel attack classes, achieving a macro-F1 of 0.8995. These results showcase MI$^2$DAS as an effective, scalable and adaptive framework for enhancing IIoT security against evolving threats.
Authors:Xuhui Dou, Hayretdin Bahsi, Alejandro Guerra-Manzanares
Abstract:
Recent work applies Large Language Models (LLMs) to source-code vulnerability detection, but most evaluations still rely on random train-test splits that ignore time and overestimate real-world performance. In practice, detectors are deployed on evolving code bases and must recognise future vulnerabilities under temporal distribution shift. This paper investigates continual fine-tuning of a decoder-style language model (microsoft/phi-2 with LoRA) on a CVE-linked dataset spanning 2018-2024, organised into bi-monthly windows. We evaluate eight continual learning strategies, including window-only and cumulative training, replay-based baselines and regularisation-based variants. We propose Hybrid Class-Aware Selective Replay (Hybrid-CASR), a confidence-aware replay method for binary vulnerability classification that prioritises uncertain samples while maintaining a balanced ratio of VULNERABLE and FIXED functions in the replay buffer. On bi-monthly forward evaluation Hybrid-CASR achieves a Macro-F1 of 0.667, improving on the window-only baseline (0.651) by 0.016 with statistically significant gains ($p = 0.026$) and stronger backward retention (IBR@1 of 0.741). Hybrid-CASR also reduces training time per window by about 17 percent compared to the baseline, whereas cumulative training delivers only a minor F1 increase (0.661) at a 15.9-fold computational cost. Overall, the results show that selective replay with class balancing offers a practical accuracy-efficiency trade-off for LLM-based temporal vulnerability detection under continuous temporal drift.
Authors:Jeff Nijsse, Andrea Pinto
Abstract:
In an age of financial system digitisation and the increasing adoption of digital currencies, Central Bank Digital Currencies (CBDCs) have emerged as a focal point for technological innovation. Privacy compliance has become a key factor in the successful design of CBDCs, extending beyond technical requirements to influence legal requirements, user trust, and security considerations. Implementing Privacy-Enhancing Technologies (PETs) in CBDCs requires an interdisciplinary approach, however, the lack of a common understanding of privacy and the essential technological characteristics restricts progress. This work investigates: (1) How privacy can be defined within the framework of CBDCs and what implications does this definition have for CBDCs design? and (2) Which PETs can be employed to enhance privacy in CBDC design? We propose a comprehensive definition for privacy that is mapped to the cryptographic landscape for feature implementation. The research is validated against case studies from 20 current CBDCs. The study shows that comprehensive privacy can be designed in the proposal stage, but that privacy does not reach the launched version of the CBDC.
Authors:Alex Carbajal, Asma Jodeiri Akbarfam
Abstract:
Wi-Fi deauthentication attacks remain a practical denial-of-service (DoS) threat by exploiting unprotected management frames to disrupt client connectivity. In this work, we introduce a software-defined testbed to measure Wi-Fi resilience to deauthentication attacks. We experimentally evaluate five wireless security configurations: open networks, WPA1, WPA2 without Protected Management Frames (PMF), WPA2 with PMF, and WPA3. Using controlled experiments, we measure client disconnection rates, packet injection volume, and time-to-disruption under each configuration. Packet-level behavior is analyzed using standard wireless auditing tools. Open networks, WPA1, and WPA2 without PMF proved entirely vulnerable to deauthentication, while no successful attacks were observed for WPA2 with PMF or WPA3 under tested conditions. These findings confirm the effectiveness of management-frame protection and highlight the continued risk posed by legacy or misconfigured wireless deployments.
Authors:C. Seas, G. Fitzpatrick, J. A. Hamilton, M. C. Carlisle
Abstract:
Each year, software vulnerabilities are discovered, which pose significant risks of exploitation and system compromise. We present a convolutional neural network model that can successfully identify bugs in C code. We trained our model using two complementary datasets: a machine-labeled dataset created by Draper Labs using three static analyzers and the NIST SATE Juliet human-labeled dataset designed for testing static analyzers. In contrast with the work of Russell et al. on these datasets, we focus on C programs, enabling us to specialize and optimize our detection techniques for this language. After removing duplicates from the dataset, we tokenize the input into 91 token categories. The category values are converted to a binary vector to save memory. Our first convolution layer is chosen so that the entire encoding of the token is presented to the filter. We use two convolution and pooling layers followed by two fully connected layers to classify programs into either a common weakness enumeration category or as ``clean.'' We obtain higher recall than prior work by Russell et al. on this dataset when requiring high precision. We also demonstrate on a custom Linux kernel dataset that we are able to find real vulnerabilities in complex code with a low false-positive rate.
Authors:Samuel Lemes-Perera, Miguel R. Alarcon, Pino Caballero-Gil, Miquel Serra-Ricart
Abstract:
The era of large astronomical surveys generates massive image catalogs requiring efficient and secure access, particularly during pre-publication periods where data confidentiality and integrity are paramount. While Findable, Accessible, Interoperable, and Reusable (FAIR) principles guide the eventual public dissemination of data, traditional security methods for restricted phases often lack granularity or incur prohibitive performance penalties. To address this, we present a framework that integrates a flexible policy engine for fine-grained access control with a novel GPU-accelerated implementation of the AES-GCM authenticated encryption protocol. The novelty of this work lies in the adaptation and optimization of a parallel tree-reduction strategy to overcome the main performance bottleneck in authenticated encryption on GPUs: the inherently sequential Galois/Counter Mode (GCM) authentication hash (GHASH). We present both the algorithmic adaptation and its efficient execution on GPU architectures. Although similar parallelization techniques have been explored in cryptographic research, this is, to our knowledge, the first demonstration of their integration into a high-throughput encryption framework specifically designed for large-scale astronomical data. Our implementation transforms the sequential GHASH computation into a highly parallelizable, logarithmic-time process, achieving authenticated encryption throughput suitable for petabyte-scale image analysis. Our solution provides a robust mechanism for data providers to enforce access policies, ensuring both confidentiality and integrity without hindering research workflows, thereby facilitating a secure and managed transition of data to public, FAIR archives.
Authors:Aurora Arrus, Maria di Gisi, Sara Lilli, Marco Quadrini
Abstract:
The General Data Protection Regulation (GDPR) requires organisations to notify supervisory authorities of personal data breaches within 72 hours of discovery. Meeting this strict deadline is challenging because incident responders must manually translate low-level forensic artefacts such as malware traces, system-call logs, and network captures into the structured, legally framed information required by data-protection authorities. This gap between technical evidence and regulatory reporting often results in delays, incomplete notifications, and a high cognitive burden on analysts. We propose a hybrid malware analysis pipeline that automates the extraction and organisation of breach-relevant information, with a particular focus on exfiltration-oriented Linux/ARM malware, which is rapidly increasing in prevalence due to the widespread adoption of IoT and embedded devices. The system combines static analysis to identify potential exfiltrators with dynamic analysis to reconstruct their behaviour. It employs a Large Language Model (LLM) constrained by a formal JSON schema aligned with the official Italian Garante Privacy notification form. The LLM transforms heterogeneous forensic artefacts into a structured, compliance-ready report that a human operator can rapidly validate.
Authors:Prasanna Kumar, Nishank Soni, Gaurang Munje
Abstract:
Distributed storage architectures are foundational to modern cloud-native infrastructure, yet a critical operational bottleneck persists within disaster recovery (DR) workflows: the dependence on content-based cryptographic hashing for data identification and synchronization. While hash-based deduplication is effective for storage efficiency in steady-state operation, it becomes a systemic liability during failover and failback events when hash indexes are stale, incomplete, or must be rebuilt following a crash. This paper precisely characterizes the operational conditions under which full or partial re-hashing becomes unavoidable. The paper also analyzes the downstream impact of cryptographic re-hashing on Recovery Time Objective (RTO) compliance, and proposes a generalized architectural shift toward deterministic, metadata-driven identification. The proposed framework assigns globally unique composite identifiers to data blocks at ingestion time-independent of content analysis enabling instantaneous delta computation during DR without any cryptographic overhead.
Authors:Nimrod Talmon, Haim Zysberg
Abstract:
Blockchains are widely used for secure transaction processing, but their scalability remains limited, and existing multichain designs are typically static even as demand and capacity shift. We cast blockchain configuration as a multiagent resource-allocation problem: applications and operators declare demand, capacity, and price bounds; an optimizer groups them into ephemeral chains each epoch and sets a chain-level clearing price. The objective maximizes a governance-weighted combination of normalized utilities for applications, operators, and the system. The model is modular -- accommodating capability compatibility, application-type diversity, and epoch-to-epoch stability -- and can be solved off-chain with outcomes verifiable on-chain. We analyze fairness and incentive issues and present simulations that highlight trade-offs among throughput, decentralization, operator yield, and service stability.
Authors:Miguel Morona-Mínguez, Alberto Pedrouzo-Ulloa, Fernando Pérez-González
Abstract:
Threshold Homomorphic Encryption (Threshold HE) is a good fit for implementing private federated average aggregation, a key operation in Federated Learning (FL). Despite its potential, recent studies have shown that threshold schemes available in mainstream HE libraries can introduce unexpected security vulnerabilities if an adversary has access to a restricted decryption oracle. This oracle reflects the FL clients' capacity to collaboratively decrypt the aggregated result without knowing the secret key. This work surveys the use of threshold RLWE-based HE for federated average aggregation and examines the performance impact of using smudging noise with a large variance as a countermeasure. We provide a detailed comparison of threshold variants of BFV and CKKS, finding that CKKS-based aggregations perform comparably to BFV-based solutions.
Authors:Yaser Baseri, Edward Waller
Abstract:
The advent of Cryptographically Relevant Quantum Computers (CRQCs) presents a fundamental and existential threat to the forensic integrity and operational safety of Industrial Control Systems (ICS) and Operational Technology (OT) in critical infrastructure. This paper introduces a novel, forensics-first framework for achieving quantum resilience in high-consequence environments, with a specific focus on nuclear power plants. We systematically analyze the quantum threat landscape across the Purdue architecture (L0-L5), detailing how Harvest-Now, Decrypt-Later (HNDL) campaigns, enabled by algorithms like Shor's, can retroactively compromise cryptographic foundations, undermine evidence admissibility, and facilitate sophisticated sabotage. Through two detailed case studies, \textsc{Quantum~Scar} and \textsc{Quantum~Dawn}, we demonstrate multi-phase attack methodologies where state-level adversaries exploit cryptographic monoculture and extended OT lifecycles to degrade safety systems while creating unsolvable forensic paradoxes. Our probabilistic risk modeling reveals alarming success probabilities (up to 78\% for targeted facilities under current defenses), underscoring the criticality of immediate action. In response, we propose and validate a phased, defense-in-depth migration path to Post-Quantum Cryptography (PQC), integrating hybrid key exchange, cryptographic diversity, secure time synchronization, and side-channel resistant implementations aligned with ISA/IEC 62443 and NIST standards. The paper concludes that without urgent adoption of quantum-resilient controls, the integrity of both physical safety systems and digital forensic evidence remains at severe and irreversible risk.
Authors:Jonathan Cruz, Jason Hamlet
Abstract:
Logic locking as a solution for semiconductor intellectual property (IP) confidentiality has received considerable attention in academia, but has yet to produce a viable solution to protect against known threats. In part due to a lack of rigor, logic locking defenses have been historically short-lived, which is an unacceptable risk for hardware-based security solutions for critical systems that may be fielded for decades. Researchers have worked to map the concept of cryptographic indistinguishability to logic locking, as indistinguishability provides strong security guarantees. In an effort to bridge theory and practice, we highlight recent efforts that can be used to analyze the indistinguishability of logic locking techniques, and propose a new method of evaluation based on comparing distributions of $k$-cuts, which is akin to comparing against a library of sub-functions. We evaluate our approach on several different classes of logic locking and show up to 92% average accuracy in correctly identifying which design was locked, even in the presence of resynthesis, suggesting that the evaluated locks do not provide indistinguishability.
Authors:Shruti Srivastava, Kiranmayee Janardhan, Shaurya Jauhari
Abstract:
Cybersecurity threats are becoming increasingly sophisticated, making traditional defense mechanisms and manual red teaming approaches insufficient for modern organizations. While red teaming has long been recognized as an effective method to identify vulnerabilities by simulating real-world attacks, its manual execution is resource-intensive, time-consuming, and lacks scalability for frequent assessments. These limitations have driven the evolution toward auto-mated red teaming, which leverages artificial intelligence and automation to deliver efficient and adaptive security evaluations. This systematic review consolidates existing research on automated red teaming, examining its methodologies, tools, benefits, and limitations. The paper also highlights current trends, challenges, and research gaps, offering insights into future directions for improving automated red teaming as a critical component of proactive cybersecurity strategies. By synthesizing findings from diverse studies, this review aims to provide a comprehensive understanding of how automation enhances red teaming and strengthens organizational resilience against evolving cyber threats.
Authors:Jingwei Shi, Xinxiang Yin, Jing Huang, Jinman Zhao, Shengyu Tao
Abstract:
The evaluation of Large Language Models (LLMs) for code generation relies heavily on the quality and robustness of test cases. However, existing benchmarks often lack coverage for subtle corner cases, allowing incorrect solutions to pass. To bridge this gap, we propose CodeHacker, an automated agent framework dedicated to generating targeted adversarial test cases that expose latent vulnerabilities in program submissions. Mimicking the hack mechanism in competitive programming, CodeHacker employs a multi-strategy approach, including stress testing, anti-hash attacks, and logic-specific targeting to break specific code submissions. To ensure the validity and reliability of these attacks, we introduce a Calibration Phase, where the agent iteratively refines its own Validator and Checker via self-generated adversarial probes before evaluating contestant code.Experiments demonstrate that CodeHacker significantly improves the True Negative Rate (TNR) of existing datasets, effectively filtering out incorrect solutions that were previously accepted. Furthermore, generated adversarial cases prove to be superior training data, boosting the performance of RL-trained models on benchmarks like LiveCodeBench.
Authors:Jeel Piyushkumar Khatiwala, Daniel Kwaku Ntiamoah Addai, Weifeng Xu
Abstract:
The growing reliance on AI-identified digital evidence raises significant concerns about its reliability, particularly as large language models (LLMs) are increasingly integrated into forensic investigations. This paper proposes a structured framework that automates forensic artifact extraction, refines data through LLM-driven analysis, and validates results using a Digital Forensic Knowledge Graph (DFKG). Evaluated on a 13 GB forensic image dataset containing 61 applications, 2,864 databases, and 5,870 tables, the framework ensures artifact traceability and evidentiary consistency through deterministic Unique Identifiers (UIDs) and forensic cross-referencing. We propose this methodology to address challenges in ensuring the credibility and forensic integrity of AI-identified evidence, reducing classification errors, and advancing scalable, auditable methodologies. A comprehensive case study on this dataset demonstrates the framework's effectiveness, achieving over 95 percent accuracy in artifact extraction, strong support of chain-of-custody adherence, and robust contextual consistency in forensic relationships. Key results validate the framework's ability to enhance reliability, reduce errors, and establish a legally sound paradigm for AI-assisted digital forensics.
Authors:Genliang Zhu, Chu Wang, Ziyuan Wang, Zhida Li, Qiang Li
Abstract:
AI agents increasingly require direct, structured access to application data and actions, but production deployments still struggle to express and verify the governance properties that matter in practice: least-privilege authorization, controlled write execution, predictable failure handling, abuse resistance, and auditability. This paper introduces OpenPort Protocol (OPP), a governance-first specification for exposing application tools through a secure server-side gateway that is model- and runtime-neutral and can bind to existing tool ecosystems. OpenPort defines authorization-dependent discovery, stable response envelopes with machine-actionable \texttt{agent.*} reason codes, and an authorization model combining integration credentials, scoped permissions, and ABAC-style policy constraints. For write operations, OpenPort specifies a risk-gated lifecycle that defaults to draft creation and human review, supports time-bounded auto-execution under explicit policy, and enforces high-risk safeguards including preflight impact binding and idempotency. To address time-of-check/time-of-use drift in delayed approval flows, OpenPort also specifies an optional State Witness profile that revalidates execution-time preconditions and fails closed on state mismatch. Operationally, the protocol requires admission control (rate limits/quotas) with stable 429 semantics and structured audit events across allow/deny/fail paths so that client recovery and incident analysis are deterministic. We present a reference runtime and an executable governance toolchain (layered conformance profiles, negative security tests, fuzz/abuse regression, and release-gate scans) and evaluate the core profile at a pinned release tag using artifact-based, externally reproducible validation.
Authors:Shenyang Chen, Liuwan Zhu
Abstract:
Standard evaluations of backdoor attacks on text-to-image (T2I) models primarily measure trigger activation and visual fidelity. We challenge this paradigm, demonstrating that encoder-side poisoning induces persistent, trigger-free semantic corruption that fundamentally reshapes the representation manifold. We trace this vulnerability to a geometric mechanism: a Jacobian-based analysis reveals that backdoors act as low-rank, target-centered deformations that amplify local sensitivity, causing distortion to propagate coherently across semantic neighborhoods. To rigorously quantify this structural degradation, we introduce SEMAD (Semantic Alignment and Drift), a diagnostic framework that measures both internal embedding drift and downstream functional misalignment. Our findings, validated across diffusion and contrastive paradigms, expose the deep structural risks of encoder poisoning and highlight the necessity of geometric audits beyond simple attack success rates.
Authors:Zac Garby, Andrew D. Gordon, David Sands
Abstract:
A conversation with a large language model (LLM) is a sequence of prompts and responses, with each response generated from the preceding conversation. AI agents build such conversations automatically: given an initial human prompt, a planner loop interleaves LLM calls with tool invocations and code execution. This tight coupling creates a new and poorly understood attack surface. A malicious prompt injected into a conversation can compromise later reasoning, trigger dangerous tool calls, or distort final outputs. Despite the centrality of such systems, we currently lack a principled semantic foundation for reasoning about their behaviour and safety. We address this gap by introducing an untyped call-by-value lambda calculus enriched with dynamic information-flow control and a small number of primitives for constructing prompt-response conversations. Our language includes a primitive that invokes an LLM: it serializes a value, sends it to the model as a prompt, and parses the response as a new term. This calculus faithfully represents planner loops and their vulnerabilities, including the mechanisms by which prompt injection alters subsequent computation. The semantics explicitly captures conversations, and so supports reasoning about defenses such as quarantined sub-conversations, isolation of generated code, and information-flow restrictions on what may influence an LLM call. A termination-insensitive noninterference theorem establishes integrity and confidentiality guarantees, demonstrating that a formal calculus can provide rigorous foundations for safe agentic programming.
Authors:Xiaochong Jiang, Shiqi Yang, Wenting Yang, Yichen Liu, Cheng Ji
Abstract:
Agentic systems built on large language models (LLMs) extend beyond text generation to autonomously retrieve information and invoke tools. This runtime execution model shifts the attack surface from build-time artifacts to inference-time dependencies, exposing agents to manipulation through untrusted data and probabilistic capability resolution. While prior work has focused on model-level vulnerabilities, security risks emerging from cyclic and interdependent runtime behavior remain fragmented. We systematize these risks within a unified runtime framework, categorizing threats into data supply chain attacks (transient context injection and persistent memory poisoning) and tool supply chain attacks (discovery, implementation, and invocation). We further identify the Viral Agent Loop, in which agents act as vectors for self-propagating generative worms without exploiting code-level flaws. Finally, we advocate a Zero-Trust Runtime Architecture that treats context as untrusted control flow and constrains tool execution through cryptographic provenance rather than semantic inference.
Authors:Ilan Rosenfeld, Noam Kleinburd, Hillel Chapman, Dror Reuven
Abstract:
The Ring-Learning With Errors (RLWE) problem forms the backbone of highly efficient Fully Homomorphic Encryption (FHE) schemes. A significant component of the RLWE public key and ciphertext of the form $(b,a)$ is the uniformly random polynomial $a \in R_q$ . While essential for security, the communication overhead of transmitting $a$ from client to server, and inputting it into a hardware accelerator, can be substantial, especially for FHE accelerators aiming at high acceleration factors. A known technique in reducing this overhead generates $a$ from a small seed on the client side via a deterministic process, transmits only the seed, and generates $a$ on-the-fly within the accelerator. Challenges in the hardware implementation of an accelerator include wiring (density and power), compute area, compute power as well as flexibility in scheduling of on-the-fly generation instructions. This extended abstract proposes a concrete scheme and parameters wherein these practical challenges are addressed. We detail the benefits of our approach, which maintains the reduction in communication latency and memory footprint, while allowing parallel generation of uniformly distributed samples, relaxed wiring requirements, unrestricted randomaccess to RNS limbs, and results in an extremely low overhead on the client side (i.e. less than 3%) during the key generation process. The proposed scheme eliminates the need for thick metal layers for randomness distribution and prevents the power consumption of the PRNG subsystem from scaling prohibitively with the acceleration factor, potentially saving tens of Watts per accelerator chip in high-throughput configurations.
Authors:Lei Ba, Qinbin Li, Songze Li
Abstract:
LLM-based code interpreter agents are increasingly deployed in critical workflows, yet their robustness against risks introduced by their code execution capabilities remains underexplored. Existing benchmarks are limited to static datasets or simulated environments, failing to capture the security risks arising from dynamic code execution, tool interactions, and multi-turn context. To bridge this gap, we introduce CIBER, an automated benchmark that combines dynamic attack generation, isolated secure sandboxing, and state-aware evaluation to systematically assess the vulnerability of code interpreter agents against four major types of adversarial attacks: Direct/Indirect Prompt Injection, Memory Poisoning, and Prompt-based Backdoor. We evaluate six foundation models across two representative code interpreter agents (OpenInterpreter and OpenCodeInterpreter), incorporating a controlled study of identical models. Our results reveal that Interpreter Architecture and Model Alignment Set the Security Baseline. Structural integration enables aligned specialized models to outperform generic SOTA models. Conversely, high intelligence paradoxically increases susceptibility to complex adversarial prompts due to stronger instruction adherence. Furthermore, we identify a "Natural Language Disguise" Phenomenon, where natural language functions as a significantly more effective input modality than explicit code snippets (+14.1% ASR), thereby bypassing syntax-based defenses. Finally, we expose an alarming Security Polarization, where agents exhibit robust defenses against explicit threats yet fail catastrophically against implicit semantic hazards, highlighting a fundamental blind spot in current pattern-matching protection approaches.
Authors:Sujaya Maiyya, Shantanu Sharma, Avinash Kumar
Abstract:
Managing personal health data is a challenge in today's fragmented and institution-centric healthcare ecosystem. Individuals often lack meaningful control over their medical records, which are scattered across incompatible systems and formats. This vision paper presents Health+, a user-centric, multimodal health data management system that empowers individuals (including those with limited technical expertise) to upload, query, and share their data across modalities (e.g., text, images, reports). Rather than aiming for institutional overhaul, Health+ emphasizes individual agency by providing intuitive interfaces and intelligent recommendations for data access and sharing. At the system level, it tackles the complexity of storing, integrating, and securing heterogeneous health records, ensuring both efficiency and privacy. By unifying multimodal data and prioritizing patients, Health+ lays the foundation for a more connected, interpretable, and user-controlled health information ecosystem.
Authors:Norrakith Srisumrith, Sunantha Sodsee
Abstract:
The critical need for transparent and trustworthy machine learning in cybersecurity operations drives the development of this integrated Explainable AI (XAI) framework. Our methodology addresses three fundamental challenges in deploying AI for threat detection: handling massive datasets through Strategic Sampling Methodology that preserves class distributions while enabling efficient model development; ensuring experimental rigor via Automated Data Leakage Prevention that systematically identifies and removes contaminated features; and providing operational transparency through Integrated XAI Implementation using SHAP analysis for model-agnostic interpretability across algorithms. Applied to the CIC-IDS2017 dataset, our approach maintains detection efficacy while reducing computational overhead and delivering actionable explanations for security analysts. The framework demonstrates that explainability, computational efficiency, and experimental integrity can be simultaneously achieved, providing a robust foundation for deploying trustworthy AI systems in security operations centers where decision transparency is paramount.
Authors:Nnaemeka Obiefuna, Samuel Oyeneye, Similoluwa Odunaiya, Iremide Oyelaja, Steven Kolawole
Abstract:
Privacy preserving machine learning deployments in sensitive deep learning applications; from medical imaging to autonomous systems; increasingly require combining multiple techniques. Yet, practitioners lack systematic guidance to assess the synergistic and non-additive interactions of these hybrid configurations, relying instead on isolated technique analysis that misses critical system level interactions. We introduce PrivacyBench, a benchmarking framework that reveals striking failures in privacy technique combinations with severe deployment implications. Through systematic evaluation across ResNet18 and ViT models on medical datasets, we uncover that FL + DP combinations exhibit severe convergence failure, with accuracy dropping from 98% to 13% while compute costs and energy consumption substantially increase. In contrast, FL + SMPC maintains near-baseline performance with modest overhead. Our framework provides the first systematic platform for evaluating privacy-utility-cost trade-offs through automated YAML configuration, resource monitoring, and reproducible experimental protocols. PrivacyBench enables practitioners to identify problematic technique interactions before deployment, moving privacy-preserving computer vision from ad-hoc evaluation toward principled systems design. These findings demonstrate that privacy techniques cannot be composed arbitrarily and provide critical guidance for robust deployment in resource-constrained environments.
Authors:Jessica Young, Sam Vaughan, Andrew Jenks, Henrique Malvar, Christian Paquin, Paul England, Thomas Roca, Juan LaVista Ferres, Forough Poursabzi, Neil Coles, Ken Archer, Eric Horvitz
Abstract:
We provide background on emerging challenges and future directions with media integrity and authentication methods, focusing on distinguishing AI-generated media from authentic content captured by cameras and microphones. We evaluate several approaches, including provenance, watermarking, and fingerprinting. After defining each method, we analyze three representative technologies: cryptographically secured provenance, imperceptible watermarking, and soft-hash fingerprinting. We analyze how these tools operate across modalities and evaluate relevant threat models, attack categories, and real-world workflows spanning capture, editing, distribution, and verification. We consider sociotechnical reversal attacks that can invert integrity signals, making authentic content appear synthetic and vice versa, highlighting the value of verification systems that are resilient to both technical and psychosocial manipulation. Finally, we outline techniques for delivering high-confidence provenance authentication, including directions for strengthening edge-device security using secure enclaves.
Authors:María Teresa García-Ordás, Jose Aveleira-Mata, Isaías García-Rodríguez, José Luis Casteleiro-Roca, Martín Bayón-Gutierrez, Héctor Alaiz-Moretón
Abstract:
The Internet of Things (IoT) presents a unique cybersecurity challenge due to its vast network of interconnected, resource-constrained devices. These vulnerabilities not only threaten data integrity but also the overall functionality of IoT systems. This study addresses these challenges by exploring efficient data reduction techniques within a model-based intrusion detection system (IDS) for IoT environments. Specifically, the study explores the efficacy of an autoencoder's latent space combined with three different classification techniques. Utilizing a validated IoT dataset, particularly focusing on the Constrained Application Protocol (CoAP), the study seeks to develop a robust model capable of identifying security breaches targeting this protocol. The research culminates in a comprehensive evaluation, presenting encouraging results that demonstrate the effectiveness of the proposed methodologies in strengthening IoT cybersecurity with more than a 99% of precision using only 2 learned features.
Authors:Tung T. Ngo, Dai Nguyen Van, Anh-Minh Nguyen, Phuong-Anh Do, Anh Nguyen-Quoc
Abstract:
Qualitative data analysis is labor-intensive, yet the privacy risks associated with commercial Large Language Models (LLMs) often preclude their use in sensitive research. To address this, we introduce ChatQDA, an on-device framework powered by open-source LLMs designed for privacy-preserving open coding. Our mixed-methods user study reveals that while participants rated the system highly for usability and perceived efficiency, they exhibited "conditional trust", valuing the tool for surface-level extraction while questioning its interpretive nuance and consistency. Furthermore, despite the technical security of local deployment, participants reported epistemic uncertainty regarding data protection, suggesting that invisible security measures are insufficient to foster trust. We conclude with design recommendations for local-first analysis tools that prioritize verifiable privacy and methodological rigor.
Authors:Luciano Juvinski, Haochen Li, Alessio Brini
Abstract:
Global illicit fund flows exceed an estimated $3.1 trillion annually, with stablecoins emerging as a preferred laundering medium due to their liquidity. While decentralized protocols increasingly adopt zero-knowledge proofs to obfuscate transaction graphs, centralized stablecoins remain critical "transparent choke points" for compliance. Leveraging this persistent visibility, this study analyzes an Ethereum dataset and uses behavioral features to develop a robust AML framework. Our findings demonstrate that domain-informed tree ensemble models achieve higher Macro-F1 score, significantly outperforming graph neural networks, which struggle with the increasing fragmentation of transaction networks. The model's interpretability goes beyond binary detection, successfully dissecting distinct typologies: it differentiates the complex, high-velocity dispersion of cybercrime syndicates from the constrained, static footprints left by sanctioned entities. This framework aligns with the industry shift toward deterministic verification, satisfying the auditability and compliance expectations under regulations such as the EU's MiCA and the U.S. GENIUS Act while minimizing unjustified asset freezes. By automating high-precision detection, we propose an approach that effectively raises the economic cost of financial misconduct without stifling innovation.
Authors:André Augusto, Christof Ferreira Torres, André Vasconcelos, Miguel Correia
Abstract:
Intent-based cross-chain bridges have emerged as an alternative to traditional interoperability protocols by allowing off-chain entities (\emph{solvers}) to immediately fulfill users' orders by fronting their own liquidity. While improving user experience, this approach introduces new systemic risks, such as solver liquidity concentration and delayed settlement. In this paper, we propose a new class of attacks called \emph{liquidity exhaustion attacks} and a replay-based parameterized attack simulation framework. We analyze 3.5 million cross-chain intents that moved \$9.24B worth of tokens between June and November 2025 across three major protocols (Mayan Swift, Across, and deBridge), spanning nine blockchains. For rational attackers, our results show that protocols with higher solver profitability, such as deBridge, are vulnerable under current parameters: 210 historical attack instances yield a mean net profit of \$286.14, with 80.5\% of attacks profitable. In contrast, Across remains robust in all tested configurations due to low solver margins and very high liquidity, while Mayan Swift is generally secure but becomes vulnerable under stress-test conditions. Under byzantine attacks, we show that it is possible to suppress availability across all protocols, causing dozens of failed intents and solver profit losses of up to \$978 roughly every 16 minutes. Finally, we propose an optimized attack strategy that exploits patterns in the data to reduce attack costs by up to 90.5\% compared to the baseline, lowering the barrier to liquidity exhaustion attacks.
Authors:Wyatt Benno, Alberto Centelles, Antoine Douchet, Khalil Gibran
Abstract:
We present Jolt Atlas, a zero-knowledge machine learning (zkML) framework that extends the Jolt proving system to model inference. Unlike zkVMs (zero-knowledge virtual machines), which emulate CPU instruction execution, Jolt Atlas adapts Jolt's lookup-centric approach and applies it directly to ONNX tensor operations. The ONNX computational model eliminates the need for CPU registers and simplifies memory consistency verification. In addition, ONNX is an open-source, portable format, which makes it easy to share and deploy models across different frameworks, hardware platforms, and runtime environments without requiring framework-specific conversions. Our lookup arguments, which use sumcheck protocol, are well-suited for non-linear functions -- key building blocks in modern ML. We apply optimisations such as neural teleportation to reduce the size of lookup tables while preserving model accuracy, as well as several tensor-level verification optimisations detailed in this paper. We demonstrate that Jolt Atlas can prove model inference in memory-constrained environments -- a prover property commonly referred to as \textit{streaming}. Furthermore, we discuss how Jolt Atlas achieves zero-knowledge through the BlindFold technique, as introduced in Vega. In contrast to existing zkML frameworks, we show practical proving times for classification, embedding, automated reasoning, and small language models. Jolt Atlas enables cryptographic verification that can be run on-device, without specialised hardware. The resulting proofs are succinctly verifiable. This makes Jolt Atlas well-suited for privacy-centric and adversarial environments. In a companion work, we outline various use cases of Jolt Atlas, including how it serves as guardrails in agentic commerce and for trustless AI context (often referred to as \textit{AI memory}).
Authors:René Brinkhege, Prahlad Menon
Abstract:
In current inter-organizational data spaces, usage policies are enforced mainly at the asset level: a whole document or dataset is either shared or withheld. When only parts of a document are sensitive, providers who want to avoid leaking protected information typically must manually redact documents before sharing them, which is costly, coarse-grained, and hard to maintain as policies or partners change. We present DAVE, a usage policy-enforcing LLM spokesperson that answers questions over private documents on behalf of a data provider. Instead of releasing documents, the provider exposes a natural language interface whose responses are constrained by machine-readable usage policies. We formalize policy-violating information disclosure in this setting, drawing on usage control and information flow security, and introduce virtual redaction: suppressing sensitive information at query time without modifying source documents. We describe an architecture for integrating such a spokesperson with Eclipse Dataspace Components and ODRL-style policies, and outline an initial provider-side integration prototype in which QA requests are routed through a spokesperson service instead of triggering raw document transfer. Our contribution is primarily architectural: we do not yet implement or empirically evaluate the full enforcement pipeline. We therefore outline an evaluation methodology to assess security, utility, and performance trade-offs under benign and adversarial querying as a basis for future empirical work on systematically governed LLM access to multi-party data spaces.
Authors:Christian Majenz, Jaya Sharma
Abstract:
The Fischlin transform yields non-interactive zero-knowledge proofs with straight-line extractability in the classical random oracle model. This is done by forcing a prover to generate multiple accepting transcripts through a proof-of-work mechanism. Whether the Fischlin transform is straight-line extractable against quantum adversaries has remained open due to the difficulty of reasoning about the likelihood of query transcripts in the quantum-accessible random oracle model (QROM), even when using the compressed oracle methodology. In this work, we prove that the Fischlin transform remains straight-line extractable in the QROM, via an extractor based on the compressed oracle. This establishes the post-quantum security of the Fischlin transform, providing a post-quantum straight-line extractable NIZK alternative to Pass' transform with smaller proof size. Our techniques include tail bounds for sums of independent random variables and for martingales as well as symmetrization, query amplitude and quantum union bound arguments.
Authors:Johannes Bertram, Jonas Geiping
Abstract:
We introduce NESSiE, the NEceSsary SafEty benchmark for large language models (LLMs). With minimal test cases of information and access security, NESSiE reveals safety-relevant failures that should not exist, given the low complexity of the tasks. NESSiE is intended as a lightweight, easy-to-use sanity check for language model safety and, as such, is not sufficient for guaranteeing safety in general -- but we argue that passing this test is necessary for any deployment. However, even state-of-the-art LLMs do not reach 100% on NESSiE and thus fail our necessary condition of language model safety, even in the absence of adversarial attacks. Our Safe & Helpful (SH) metric allows for direct comparison of the two requirements, showing models are biased toward being helpful rather than safe. We further find that disabled reasoning for some models, but especially a benign distraction context degrade model performance. Overall, our results underscore the critical risks of deploying such models as autonomous agents in the wild. We make the dataset, package and plotting code publicly available.
Authors:Janis Nötzel, Anshul Singhal, Peter van Loock
Abstract:
With the rise of artificial intelligence and machine learning, a new wave of private information is being flushed into applications. This development raises privacy concerns, as private datasets can be stolen or abused for non-authorized purposes. Secure function computation aims to solve such problems by allowing a service provider to compute functions of datasets in the possession of a a data provider without reading the data itself. A foundational primitive for such tasks is Bit Commitment (BC), which is known to be impossible to realize without added assumptions. Given the pressing nature of the topic, it is thus important to develop BC systems and prove their security under reasonable assumptions. In this work, we provide a novel quantum optical BC protocol that uses the added assumption that the network provider will secure transmission lines against eavesdropping. Under this added assumption, we prove security of our protocol in the honest but curious setting and discuss the hardness of Mayer's attack in the context of our protocol.
Authors:Mohsen Ahmadvand, Rok Pajnič, Ching-Lun Chiu
Abstract:
Zero-knowledge proof generation imposes stringent timing and reliability constraints on blockchain systems. For ZK-rollups, delayed proofs cause finality lag and economic loss; for Ethereum's emerging L1 zkEVM, proofs must complete within the 12-second slot window to enable stateless validation. The Ethereum Foundation's Ethproofs initiative coordinates multiple independent zkVMs across proving clusters to achieve real-time block proving, yet no principled orchestration framework addresses the joint challenges of (i) strict head-of-chain ordering, (ii) sub-slot latency bounds, (iii) fault-tolerant task reassignment, and (iv) prover-agnostic workflow composition. We present push0, a cloud-native proof orchestration system that decouples prover binaries from scheduling infrastructure. push0 employs an event-driven dispatcher--collector architecture over persistent priority queues, enforcing block-sequential proving while exploiting intra-block parallelism. We formalize requirements drawn from production ZK-rollup operations and the Ethereum real-time proving specification, then demonstrate via production Kubernetes cluster experiments that push0 achieves 5 ms median orchestration overhead with 99--100% scaling efficiency at 32 dispatchers for realistic workloads--overhead negligible (less than 0.1%) relative to typical proof computation times of 7+ seconds. Controlled Docker experiments validate these results, showing comparable performance (3--10 ms P50) when network variance is eliminated. Production deployment on the Zircuit zkrollup (14+ million mainnet blocks since March 2025) provides ecological validity for these controlled experiments. Our design enables seamless integration of heterogeneous zkVMs, supports automatic task recovery via message persistence, and provides the scheduling primitives necessary for both centralized rollup operators and decentralized multi-prover networks.
Authors:Marvin Beckmann, Christian Majenz
Abstract:
Ring signatures are a powerful primitive that allows a member to sign on behalf of a group, without revealing their identity. Recently, ring signatures have received additional attention as an ingredient for post-quantum deniable authenticated key exchange, e.g., for a post-quantum version of the Signal protocol, employed by virtually all end-to-end-encrypted messenger services. While several ring signature constructions from post-quantum assumptions offer suitable security and efficiency for use in deniable key exchange, they are currently proven secure in the random oracle model (ROM) only, which is insufficient for post-quantum security. In this work, we provide four security reductions in the quantum-accessible random oracle model (QROM) for two generic ring signature constructions: two for the AOS framework and two for a construction paradigm based on ring trapdoors, whose generic backbone we formalize. The two security proofs for AOS ring signatures differ in their requirements on the underlying sigma protocol and their tightness. The two reductions for the ring-trapdoor-based ring signatures exhibit various differences in requirements and the security they provide. We employ the measure-and-reprogram technique, QROM straightline extraction tools based on the compressed oracle, history-free reductions and QROM reprogramming tools. To make use of Rényi divergence properties in the QROM, we study the behavior of quantum algorithms that interact with an oracle whose distribution is based on one of two different distributions over the set of outputs. We provide tight bounds for the statistical distance, show that the Rényi divergence can not be used to replace the entire oracle and provide a workaround.
Authors:Srikumar Nayak, James Walmesley
Abstract:
Cross-border insider threats pose a critical challenge to government financial schemes, particularly when dealing with distributed, privacy-sensitive data across multiple jurisdictions. Existing approaches face fundamental limitations: they cannot effectively share intelligence across borders due to privacy constraints, lack reasoning capabilities to understand complex multi-step attack patterns, and fail to capture intricate graph-structured relationships in financial networks. We introduce FedGraph-AGI, a novel federated learning framework integrating Artificial General Intelligence (AGI) reasoning with graph neural networks for privacy-preserving cross-border insider threat detection. Our approach combines: (1) federated graph neural networks preserving data sovereignty; (2) Mixture-of-Experts (MoE) aggregation for heterogeneous jurisdictions; and (3) AGI-powered reasoning via Large Action Models (LAM) performing causal inference over graph data. Through experiments on a 50,000-transaction dataset across 10 jurisdictions, FedGraph-AGI achieves 92.3% accuracy, significantly outperforming federated baselines (86.1%) and centralized approaches (84.7%). Our ablation studies reveal AGI reasoning contributes 6.8% improvement, while MoE adds 4.4%. The system maintains epsilon = 1.0 differential privacy while achieving near-optimal performance and scales efficiently to 50+ clients. This represents the first integration of AGI reasoning with federated graph learning for insider threat detection, opening new directions for privacy-preserving cross-border intelligence sharing.
Authors:Katherine Molinet, Aris Filos-Ratsikas
Abstract:
In this paper, we explore the short- and long-term stability of backed stablecoins offering constant mint and redeem prices to all agents. We refer to such designs as price window-based, since the mint and redeem prices constrain the stablecoin's market equilibrium. We show that, without secondary stabilization mechanisms, price window designs cannot achieve both short- and long-term stability unless they are backed by already-stable reserves. In particular, the mechanism faces a tradeoff: either risk eventual reserve depletion through persistent arbitrage by a speculator, or widen the distance between mint and redeem prices enough to disincentivize arbitrage. In the latter case, however, the market price of the stablecoin inherits the volatility of its backing asset, with fluctuations that can be proportional to the backing asset's own volatility.
Authors:Brennan Bell, Andreas Trügler, Konstantin Beyer, Paul Erker
Abstract:
We study a sequential coherent side-channel model in which an adversarial probe qubit interacts with a target qubit during a hidden gate sequence. Repeating the same hidden sequence for $N$ shots yields an empirical full-correlation record: the joint histogram $\widehat{P}_g(b)$ over probe bit-strings $b\in\{0,1\}^k$, which is a sufficient statistic for classical post-processing under identically and independently distributed (i.i.d.) shots but grows exponentially with circuit depth. We first describe this sequential probe framework in a coupling- and measurement-agnostic form, emphasizing the scaling of the observation space and why exact analytic distinguishability becomes intractable with circuit depth. We then specialize to a representative instantiation (a controlled-rotation probe coupling with fixed projective readout and a commuting $R_x$ gate alphabet) where we (i) derive a depth-dependent leakage envelope whose maximizer predicts a "Goldilocks" coupling band as a function of depth, and (ii) provide an operational decoder, via machine learning, a single parameter-conditioned map from $\widehat{P}_g$ to Alice's per-step gate labels, generalizing across coupling and noise settings without retraining. Experiments over broad coupling and noise grids show that strict sequence recovery concentrates near the predicted coupling band and degrades predictably under decoherence and finite-shot estimation.
Authors:Saleh Darzia, Gökcan Cantalib, Attila Altay Yavuza, Gürkan Gür
Abstract:
Database-driven cognitive radio networks (DB-CRNs) enable dynamic spectrum sharing through geolocation databases but introduce critical security and privacy challenges, including mandatory location disclosure, susceptibility to location spoofing, and denial-of-service (DoS) attacks on centralized services. Existing approaches address these issues in isolation and lack a unified, regulation-compliant solution under realistic adversarial conditions. In this work, we present a unified security framework for DB-CRNs that simultaneously provides location privacy, user anonymity, verifiable location, and DoS resilience. Our framework, denoted as SLAPX, enables privacy-preserving spectrum queries using delegatable anonymous credentials, supports adaptive location verification without revealing precise user location, and mitigates DoS attacks through verifiable delay functions (VDFs) combined with RLRS-based rate limiting. Extensive cryptographic benchmarking and network simulations demonstrate that SLAPX achieves significantly lower latency and communication overhead than existing solutions while effectively resisting location spoofing and DoS attacks. These results show that SLAPX is practical and well-suited for secure next-generation DB-CRN deployments.
Authors:Yasmine Hayder, Adrien Boiret, Cédric Eichler, Benjamin Nguyen
Abstract:
In this paper, we investigate how attackers can discover sensitive information embedded within databases by exploiting inference rules. We demonstrate the inadequacy of naively applied existing state of the art differential privacy (DP) models in safeguarding against such attacks. We introduce ontology aware differential privacy (Onto-DP), a novel extension of differential privacy paradigms built on top of any classical DP model by enriching it with semantic awareness. We show that this extension is a sufficient condition to adequately protect against attackers aware of inference rules.
Authors:Richelle Williams, Fernando Koch
Abstract:
An open measurement problem in IoT security is whether scan-observable network configurations encode population-level exposure risk beyond individual devices. An analysis of internet-exposed IoT endpoints using a controlled multi-country sample from Shodan Search and Shodan InternetDB, selecting 100 hosts identified via TCP port 7547 (TR-069/CWMP) and evenly distributed across the ten most represented countries. Hosts are enriched with scan-derived metadata and analyzed using feature-relevance assessment, cross-country comparisons of open and risky port exposure, and supervised classification of higher-risk exposure profiles. The analysis reveals consistent cross-country differences in exposure structure, with mean risky-port counts ranging from 0.4 to 1.0 per host, and achieves balanced accuracy of approximately 0.61 when classifying higher-risk exposure profiles.
Authors:Mohammad Hadi Foroughi, Seyed Hamed Rastegar, Mohammad Sabokrou, Ahmad Khonsari
Abstract:
Federated learning (FL) enables distributed model training across edge devices while preserving data locality. This decentralized approach has emerged as a promising solution for collaborative learning on sensitive user data, effectively addressing the longstanding privacy concerns inherent in centralized systems. However, the decentralized nature of FL exposes new security vulnerabilities, especially backdoor attacks that threaten model integrity. To investigate this critical concern, this paper presents the Layer Smoothing Attack (LSA), a novel backdoor attack that exploits layer-specific vulnerabilities in neural networks. First, a Layer Substitution Analysis methodology systematically identifies backdoor-critical (BC) layers that contribute most significantly to backdoor success. Subsequently, LSA strategically manipulates these BC layers to inject persistent backdoors while remaining undetected by state-of-the-art defense mechanisms. Extensive experiments across diverse model architectures and datasets demonstrate that LSA achieves a remarkably backdoor success rate of up to 97% while maintaining high model accuracy on the primary task, consistently bypassing modern FL defenses. These findings uncover fundamental vulnerabilities in current FL security frameworks, demonstrating that future defenses must incorporate layer-aware detection and mitigation strategies.
Authors:Mohsin Khan, Elisavet Kozyri, Håvard Dagenborg
Abstract:
The emergence of small computing devices and the integration of processing units into everyday objects has made lightweight cryptography an essential part of the security landscape. Conventional cryptographic algorithms such as AES, RSA, and DES are unsuitable for resource-constrained devices due to limited processing power, memory, and battery. This paper provides a systematic review of lightweight cryptographic algorithms and the appropriateness of different algorithms in different areas such as IoT, RFID, and wireless sensor networks. Using tabular analysis and graphical interpretation, we compare these algorithms in terms of performance, security, energy consumption, and implementation costs. An overview of the evolution of lightweight cryptography based on those design trade-offs is also provided.
Authors:Kashyap Thimmaraju, Duc Anh Hoang, Souradip Nath, Jaron Mink, Gail-Joon Ahn
Abstract:
The sustainability of Security Operations Centers depends on their people, yet 71% of practitioners report burnout and 24% plan to exit cybersecurity entirely. Flow theory suggests that when job demands misalign with practitioner capabilities, work becomes overwhelming or tedious rather than engaging. Achieving challenge-skill balance begins at hiring: if job descriptions inaccurately portray requirements, organizations risk recruiting underskilled practitioners who face anxiety or overskilled ones who experience boredom. Yet we lack empirical understanding of what current SOC job descriptions actually specify. We analyzed 106 public SOC job postings from November to December 2024 across 35 organizations in 11 countries, covering Analysts (n=17), Incident Responders (n=38), Threat Hunters (n=39), and SOC Managers (n=12). Using Inductive Content Analysis, we coded certifications, technical skills, soft skills, tasks, and experience requirements. Three patterns emerged: (1) Communication skills dominate (50.9% of postings), exceeding SIEM tools (18.9%) or programming (30.2%), suggesting organizations prioritize collaboration over technical capabilities. (2) Certification expectations vary widely: CISSP leads (22.6%), but 43 distinct credentials appear with no universal standard. (3) Technical requirements show consensus: Python dominates programming (27.4%), Splunk leads SIEM platforms (14.2%), and ISO 27001 (13.2%) and NIST (10.4%) are most cited standards. These findings enable organizations to audit job descriptions against empirical baselines, help practitioners identify valued certifications and skills, and allow researchers to validate whether stated requirements align with actual demands. This establishes the foundation for flow-aligned interview protocols and investigation of how AI reshapes requirements. Dataset and codebook: https://git.tu-berlin.de/wosoc-2026/soc-jd-analysis.
Authors:Marthin Toruan, R. D. N. Shakya, Samuel Tseitkin, Raymond K. Zhao, Nalin Arachchilage
Abstract:
Advances in quantum computing increasingly threaten the security and privacy of data protected by current cryptosystems, particularly those relying on public-key cryptography. In response, the international cybersecurity community has prioritized the implementation of Post-Quantum Cryptography (PQC), a new cryptographic standard designed to resist quantum attacks while operating on classical computers. The National Institute of Standards and Technology (NIST) has already standardized several PQC algorithms and plans to deprecate classical asymmetric schemes, such as RSA and ECDSA, by 2035. Despite this urgency, PQC adoption remains slow, often due to limited developer expertise. Application Programming Interfaces (APIs) are intended to bridge this gap, yet prior research on classical security APIs demonstrates that poor usability of cryptographic APIs can lead developers to introduce vulnerabilities during implementation of the applications, a risk amplified by the novelty and complexity of PQC. To date, the usability of PQC APIs has not been systematically studied. This research presents an empirical evaluation of the usability of the PQC APIs, observing how developers interact with APIs and documentation during software development tasks. The study identifies cognitive factors that influence the developer's performance when working with PQC primitives with minimal onboarding. The findings highlight opportunities across the PQC ecosystem to improve developer-facing guidance, terminology alignment, and workflow examples to better support non-specialists.
Authors:Beatrice Perez, Abhinav Mehrotra, Mirco Musolesi
Abstract:
Location information extracted from mobile devices has been largely exploited to reveal our routines, significant places, and interests just to name a few. Given the sensitivity of the information it reveals, location access is protected by mobile operating systems and users have control over which applications can access it. We argue that applications can still infer the coarse-grain location information by using alternative sensors that are available in off-the-shelf mobile devices that do not require any permissions from the users. In this paper we present a zero-permission attack based on the use of the in-built magnetometer, considering a variety of methods for identifying location-types from their magnetic signature. We implement the proposed approach by using four different techniques for time-series classification. In order to evaluate the approach, we conduct an in-the-wild study to collect a dataset of nearly 70 hours of magnetometer readings with six different phones at 66 locations, each accompanied by a label that classifies it as belonging to one of six selected categories. Finally, using this dataset, we quantify the performance of all models based on two evaluation criteria: (i) leave-a-place-out (using the test data collected from an unknown place), and (ii) leave-a-device-out (using the test data collected from an unknown device) showing that we are able to achieve 40.5% and 39.5% accuracy in classifying the location-type for each evaluation criteria respectively against a random baseline of approximately 16.7% for both of them.
Authors:Saurav Silwal, Lu Gao, Ph. D. Yunpeng Zhang, Ph. D. Ahmed Senouci, Ph. D. Yi-Lung Mo, Ph. D., P. E
Abstract:
Given the promising future of autonomous vehicles, it is foreseeable that self-driving cars will soon emerge as the predominant mode of transportation. While autonomous vehicles offer enhanced efficiency, they remain vulnerable to external attacks. In this research, we sought to investigate the potential impact of cyberattacks on traffic patterns. To achieve this, we conducted simulations where cyberattacks were simulated on connected vehicles by disseminating false information to either a single vehicle or vehicle platoons. The primary objective of this research is to assess the cybersecurity challenges confronting connected and automated vehicles and propose practical solutions to minimize the adverse effects of malicious external information. In the simulation, we have implemented an innovative car-following model for the simulation of connected self-driving vehicles. This model continually monitors data received from preceding vehicles and optimizes various actions, such as acceleration, and deceleration, with the aim of maximizing overall traffic efficiency and safety.
Authors:Josiah Dykstra, William Yurcik
Abstract:
The U.S. public health system increased life expectancy by more than 30 years since 1900 through systematic data collection, evidence-based intervention, and coordinated response. This paper examines whether cybersecurity can benefit from similar organizational principles. We find that both domains exhibit public good characteristics: security improvements create positive externalities that individual actors cannot fully capture, leading to systematic market failure and underinvestment. Current cybersecurity lacks fundamental infrastructure including standardized population definitions, reliable outcome measurements, understanding of transmission mechanisms, and coordinated intervention testing. Drawing on public health's transformation from fragmented local responses to coordinated evidence-based discipline, we propose a national Cyber Public Health System for systematic data collection, standardized measurement, and coordinated response. We argue government coordination is economically necessary rather than merely beneficial, and outline specific federal roles in establishing standards, funding research, coordinating response, and addressing information asymmetries that markets cannot resolve.
Authors:Gianpietro Castiglione, Shahriar Ebrahimi, Narges Khakpour
Abstract:
A Software Bill of Materials (SBOM) is a key component for the transparency of software supply chain; it is a structured inventory of the components, dependencies, and associated metadata of a software artifact. However, an SBOM often contain sensitive information that organizations are unwilling to disclose in full to anyone, for two main concerns: technological risks deriving from exposing proprietary dependencies or unpatched vulnerabilities, and business risks, deriving from exposing architectural strategies. Therefore, delivering a plaintext SBOM may result in the disruption of the intellectual property of a company. To address this, we present VeriSBOM, a trustless, selectively disclosed SBOM framework that provides cryptographic verifiability of SBOMs using zero-knowledge proofs. Within VeriSBOM, third parties can validate specific statements about a delivered software. Respectively, VeriSBOM allows independent third parties to verify if a software contains authentic dependencies distributed by official package managers and that the same dependencies satisfy rigorous policy constraints such as the absence of vulnerable dependencies or the adherence with specific licenses models. VeriSBOM leverages a scalable vector commitment scheme together with folding-based proof aggregation to produce succinct zero-knowledge proofs that attest to security and compliance properties while preserving confidentiality. Crucially, the verification process requires no trust in the SBOM publisher beyond the soundness of the underlying primitives, and third parties can independently check proofs against the public cryptographic commitments. We implement VeriSBOM, analyze its security, and evaluate its performance on real-world package registries. The results show that our method enables scalable, privacy-preserving, and verifiable SBOM sharing and validation.
Authors:Weiming Song, Xuan Xie, Ruiping Yin
Abstract:
Large language models (LLMs) remain vulnerable to jailbreak prompts that elicit harmful or policy-violating outputs, while many existing defenses rely on expensive fine-tuning, intrusive prompt rewriting, or external guardrails that add latency and can degrade helpfulness. We present AISA, a lightweight, single-pass defense that activates safety behaviors already latent inside the model rather than treating safety as an add-on. AISA first localizes intrinsic safety awareness via spatiotemporal analysis and shows that intent-discriminative signals are broadly encoded, with especially strong separability appearing in the scaled dot-product outputs of specific attention heads near the final structural tokens before generation. Using a compact set of automatically selected heads, AISA extracts an interpretable prompt-risk score with minimal overhead, achieving detector-level performance competitive with strong proprietary baselines on small (7B) models. AISA then performs logits-level steering: it modulates the decoding distribution in proportion to the inferred risk, ranging from normal generation for benign prompts to calibrated refusal for high-risk requests -- without changing model parameters, adding auxiliary modules, or requiring multi-pass inference. Extensive experiments spanning 13 datasets, 12 LLMs, and 14 baselines demonstrate that AISA improves robustness and transfer while preserving utility and reducing false refusals, enabling safer deployment even for weakly aligned or intentionally risky model variants.
Authors:Mohamed Shaaban, Mohamed Elmahallawy
Abstract:
Federated learning (FL) enables collaborative training across organizational silos without sharing raw data, making it attractive for privacy-sensitive applications. With the rapid adoption of large language models (LLMs), federated fine-tuning of generative LLMs has gained attention as a way to leverage distributed data while preserving confidentiality. However, this setting introduces fundamental challenges: (i) privacy leakage of personally identifiable information (PII) due to LLM memorization, and (ii) a persistent tension between global generalization and local utility under heterogeneous data. Existing defenses, such as data sanitization and differential privacy, reduce leakage but often degrade downstream performance. We propose SecureGate, a privacy-aware federated fine-tuning framework for LLMs that provides fine-grained privacy control without sacrificing utility. SecureGate employs a dual-adapter LoRA architecture: a secure adapter that learns sanitized, globally shareable representations, and a revealing adapter that captures sensitive, organization-specific knowledge. A token-controlled gating module selectively activates these adapters at inference time, enabling controlled information disclosure without retraining. Extensive experiments across multiple LLMs and real-world datasets show that SecureGate improves task utility while substantially reducing PII leakage, achieving up to a 31.66X reduction in inference attack accuracy and a 17.07X reduction in extraction recall for unauthorized requests. Additionally, it maintains 100% routing reliability to the correct adapter and incurs only minimal computational and communication overhead.
Authors:Tailia Malloy, Tegawende F. Bissyande
Abstract:
Large Language Models are expanding beyond being a tool humans use and into independent agents that can observe an environment, reason about solutions to problems, make changes that impact those environments, and understand how their actions impacted their environment. One of the most common applications of these LLM Agents is in computer programming, where agents can successfully work alongside humans to generate code while controlling programming environments or networking systems. However, with the increasing ability and complexity of these agents comes dangers about the potential for their misuse. A concerning application of LLM agents is in the domain cybersecurity, where they have the potential to greatly expand the threat imposed by attacks such as social engineering. This is due to the fact that LLM Agents can work autonomously and perform many tasks that would normally require time and effort from skilled human programmers. While this threat is concerning, little attention has been given to assessments of the capabilities of LLM coding agents in generating code for social engineering attacks. In this work we compare different LLMs in their ability and willingness to produce potentially dangerous code bases that could be misused by cyberattackers. The result is a dataset of 200 website code bases and logs from 40 different LLM coding agents. Analysis of models shows which metrics of LLMs are more and less correlated with performance in generating spear-phishing sites. Our analysis and the dataset we present will be of interest to researchers and practitioners concerned in defending against the potential misuse of LLMs in spear-phishing.
Authors:Shlomi Dolev, Ehud Gudes, Daniel Shlomo
Abstract:
The rapid growth of decentralized systems in theWeb3 ecosystem has introduced numerous challenges, particularly in ensuring data security, privacy, and scalability [3, 8]. These systems rely heavily on distributed architectures, requiring robust mechanisms to manage data and interactions among participants securely. One critical aspect of decentralized systems is key management, which is essential for encrypting files, securing database segments, and enabling private transactions. However, securely managing cryptographic keys in a distributed environment poses significant risks, especially when nodes in the network can be compromised [9]. This research proposes a decentralized database scheme specifically designed for secure and private key management. Our approach ensures that cryptographic keys are not stored explicitly at any location, preventing their discovery even if an attacker gains control of multiple nodes. Instead of traditional storage, keys are encoded and distributed using the BFLUT (Bloom Filter for Private Look-Up Tables) algorithm [7], which enables secure retrieval without direct exposure. The system leverages OrbitDB [4], IPFS [1], and IPNS [10] for decentralized data management, providing robust support for consistency, scalability, and simultaneous updates. By combining these technologies, our scheme enhances both security and privacy while maintaining high performance and reliability. Our findings demonstrate the system's capability to securely manage keys, prevent unauthorized access, and ensure privacy, making it a foundational solution for Web3 applications requiring decentralized security.
Authors:Parsa Sadri Sinaki, Zainab Ahmad, Wentao Xie, Merlijn Sebrechts, Jimmy Kjällman, Lachlan J. Gunn
Abstract:
Hardware-secured remote attestation is essential to establishing trust in the integrity of confidential virtual machines (cVMs), but is difficult to use in practice because verifying attestation evidence requires the use of hardware-specific cryptographic logic. This increases both maintenance costs and the verifiers' trusted computing base. We introduce the concept of self-verifying remote attestation evidence. Each attestation bundle includes verification logic as a WebAssembly component signed by a trusted party. This approach transforms evidence verification into a standard code-signing problem: the verifier checks the signature on the embedded logic and then executes it to validate the evidence. As a result, verifiers can validate attestation evidence without any platform-specific knowledge. We implement this concept as TrustMee, a platform-agnostic verification driver for the Trustee framework. We demonstrate its functionality with self-verifying evidence for AMD SEV-SNP and Intel TDX attestations, producing attestation claims in the standard EAT Attestation Result (EAR) format.
Authors:Alfous Tim, Kuniyilh Simi D
Abstract:
The Internet of Things (IoT) systems increasingly depend on continual learning to adapt to non-stationary environments. These environments can include factors such as sensor drift, changing user behavior, device aging, and adversarial dynamics. Contrastive continual learning (CCL) combines contrastive representation learning with incremental adaptation, enabling robust feature reuse across tasks and domains. However, the geometric nature of contrastive objectives, when paired with replay-based rehearsal and stability-preserving regularization, introduces new security vulnerabilities. Notably, backdoor attacks can exploit embedding alignment and replay reinforcement, enabling the implantation of persistent malicious behaviors that endure through updates and deployment cycles. This paper provides a comprehensive analysis of backdoor attacks on CCL within IoT systems. We formalize the objectives of embedding-level attacks, examine persistence mechanisms unique to IoT deployments, and develop a layered taxonomy tailored to IoT. Additionally, we compare vulnerabilities across various learning paradigms and evaluate defense strategies under IoT constraints, including limited memory, edge computing, and federated aggregation. Our findings indicate that while CCL is effective for enhancing adaptive IoT intelligence, it may also elevate long-lived representation-level threats if not adequately secured.
Authors:Sebastian Mödersheim, Simon Lund, Alessandro Bruni, Marco Carbone, Rosario Giustolisi
Abstract:
We present CryptoChoreo, a choreography language for the specification of cryptographic protocols. Choreographies can be regarded as an extension of Alice-and-Bob notation, providing an intuitive high-level view of the protocol as a whole (rather than specifying each protocol role in isolation). The extensions over standard Alice-and-Bob notation that we consider are non-deterministic choice, conditional branching, and mutable long-term memory. We define the semantics of CryptoChoreo by translation to a process calculus. This semantics entails an understanding of the protocol: it determines how agents parse and check incoming messages and how they construct outgoing messages, in the presence of an arbitrary algebraic theory and non-deterministic choices made by other agents. While this semantics entails algebraic problems that are in general undecidable, we give an implementation for a representative theory. We connect this translation to ProVerif and show on a number of case studies that the approach is practically feasible.
Authors:Mahfuzul I. Nissan, James Wagner
Abstract:
The widespread adoption of NoSQL databases has made digital forensics increasingly difficult as storage formats are diverse and often opaque, and audit logs cannot be assumed trustworthy when privileged insiders, such as DevOps or administrators, can disable, suppress, or manipulate logging to conceal activity. We present RADAR (Record & Artifact Detection, Alignment & Reporting), a log-adversary-aware framework that derives forensic ground truth by cross-referencing low-level storage artifacts against high-level application logs. RADAR analyzes artifacts reconstructed by the Automated NoSQL Carver (ANOC), which infers layouts and carves records directly from raw disk bytes, bypassing database APIs and the management system entirely, thereby treating physical storage as the independent evidence source. RADAR then reconciles carved artifacts with the audit log to identify delta artifacts such as unlogged insertions, silent deletions, and field-level updates that exist on disk but are absent from the logical history. We evaluate RADAR across ten NoSQL engines, including BerkeleyDB, LMDB, MDBX, etcd, ZODB, Durus, LiteDB, Realm, RavenDB, and NitriteDB, spanning key-value and document stores and multiple storage designs, e.g., copy-on-write/MVCC, B/B+ tree, and append-only. Under log-evasion scenarios, such as log suppression and post-maintenance attacks, including cases where historical bytes are pruned, RADAR consistently exposes unattributed operations while sustaining 31.7-397 MB/min processing throughput, demonstrating the feasibility of log-independent, trustworthy NoSQL forensics.
Authors:Paul Keeler, Ben Smyth
Abstract:
Democracies are built upon secure and reliable voting systems. Electronic voting systems seek to replace ballot papers and boxes with computer hardware and software. Proposed electronic election schemes have been subjected to scrutiny, with researchers spotting inherent faults and weaknesses. Inspired by physical voting systems, we argue that any electronic voting system needs two essential properties: ballot secrecy and verifiability. These properties seemingly work against each other. An election scheme that is a complete black box offers ballot secrecy, but verification of the outcome is impossible. This challenge can be tackled using standard tools from modern cryptography, reaching a balance that delivers both properties. This tutorial makes these ideas accessible to readers outside electronic voting. We introduce fundamental concepts such as asymmetric and homomorphic encryption, which we use to describe a general electronic election scheme while keeping mathematical formalism minimal. We outline game-based cryptography, a standard approach in modern cryptography, and introduce notation for formulating elections as games. We then give precise definitions of ballot secrecy and verifiability in the framework of game-based cryptography. A principal aim is introducing modern research approaches to electronic voting.
Authors:Oghenekaro Elem, Nimrod Talmon
Abstract:
Decentralized protocols claim immutable, rule-based execution, yet many embed emergency mechanisms such as chain-level freezes, protocol pauses, and account quarantines. These overrides are crucial for responding to exploits and systemic failures, but they expose a core tension: when does intervention preserve trust and when is it perceived as illegitimate discretion? With approximately $10$ billion in technical exploit losses potentially addressable by onchain intervention (2016--2026), the design of these mechanisms has high practical stakes, but current approaches remain ad hoc and ideologically charged. We address this gap by developing a Scope $\times$ Authority taxonomy that maps the design space of emergency architectures along two dimensions: the precision of the intervention and the concentration of trigger authority. We formalize the resulting tradeoffs of a standing centralization cost versus containment speed and collateral disruption as a stochastic cost-minimization problem; and derive three testable predictions. Assessing these predictions against 705 documented exploit incidents, we find that containment time varies systematically by authority type; that losses follow a heavy-tailed distribution ($α\approx 1.33$) concentrating risk in rare catastrophic events; and that community sentiment measurably modulates the effective cost of maintaining intervention capability. The analysis yields concrete design principles that move emergency governance from ideological debate towards quantitative engineering.
Authors:Dalyapraz Manatova, Pablo Moriano, L. Jean Camp
Abstract:
Graph neural networks (GNNs) are designed to use attributed graphs to learn representations. Such representations are beneficial in the unsupervised learning of clusters and community detection. Nonetheless, such inference may reveal sensitive groups, clustered systems, or collective behaviors, raising concerns regarding group-level privacy. Community attribution in social and critical infrastructure networks, for example, can expose coordinated asset groups, operational hierarchies, and system dependencies that could be used for profiling or intelligence gathering. We study a defensive setting in which a data publisher (defender) seeks to conceal a community of interest while making limited, utility-aware changes in the network. Our analysis indicates that community concealment is strongly influenced by two quantifiable factors: connectivity at the community boundary and feature similarity between the protected community and adjacent communities. Informed by these findings, we present a perturbation strategy that rewires a set of selected edges and modifies node features to reduce the distinctiveness leveraged by GNN message passing. The proposed method outperforms DICE in our experiments on synthetic benchmarks and real network graphs under identical perturbation budgets. Overall, it achieves median relative concealment improvements of approximately 20-45% across the evaluated settings. These findings demonstrate a mitigation strategy against GNN-based community learning and highlight group-level privacy risks intrinsic to graph learning.
Authors:Alessandro Epasto, Xin Lyu, Pasin Manurangsi
Abstract:
We study the computational cost of differential privacy in terms of memory efficiency. While the trade-off between accuracy and differential privacy is well-understood, the inherent cost of privacy regarding memory use remains largely unexplored. This paper establishes for the first time an unconditional space lower bound for user-level differential privacy by introducing a novel proof technique based on a multi-player communication game. Central to our approach, this game formally links the hardness of low-memory private algorithms to the necessity of ``contribution capping'' -- tracking and limiting the users who disproportionately impact the dataset. We demonstrate that winning this communication game requires transmitting information proportional to the number of over-active users, which translates directly to memory lower bounds. We apply this framework, as an example, to the fundamental problem of estimating the number of distinct elements in a stream and we prove that any private algorithm requires almost $\widetildeΩ(T^{1/3})$ space to achieve certain error rates in a promise variant of the problem. This resolves an open problem in the literature (by Jain et al. NeurIPS 2023 and Cummings et al. ICML 2025) and establishes the first exponential separation between the space complexity of private algorithms and their non-private $\widetilde{O}(1)$ counterparts for a natural statistical estimation task. Furthermore, we show that this communication-theoretic technique generalizes to broad classes of problems, yielding lower bounds for private medians, quantiles, and max-select.
Authors:Andrei Kojukhov, Arkady Bovshover
Abstract:
Contemporary AI-driven cybersecurity systems are predominantly architected as model-centric detection and automation pipelines optimized for task-level performance metrics such as accuracy and response latency. While effective for bounded classification tasks, these architectures struggle to support accountable decision-making under adversarial uncertainty, where actions must be justified, governed, and aligned with organizational and regulatory constraints. This paper argues that cybersecurity orchestration should be reconceptualized as an agentic, multi-agent cognitive system, rather than a linear sequence of detection and response components. We introduce a conceptual architectural framework in which heterogeneous AI agents responsible for detection, hypothesis formation, contextual interpretation, explanation, and governance are coordinated through an explicit meta-cognitive judgement function. This function governs decision readiness and dynamically calibrates system autonomy when evidence is incomplete, conflicting, or operationally risky. By synthesizing distributed cognition theory, multi-agent systems research, and responsible AI governance frameworks, we demonstrate that modern security operations already function as distributed cognitive systems, albeit without an explicit organizing principle. Our contribution is to make this cognitive structure architecturally explicit and governable by embedding meta-cognitive judgement as a first-class system function. We discuss implications for security operations centers, accountable autonomy, and the design of next-generation AI-enabled cyber defence architectures. The proposed framework shifts the focus of AI in cybersecurity from optimizing isolated predictions to governing autonomy under uncertainty.
Authors:Nilesh Vyas, Fabien Geyer, Svetoslav Duhovnikov
Abstract:
Shared, dynamic network infrastructures, such as dual-use LEO satellite constellations, pose critical threats to metadata privacy, particularly for state actors operating in mixed-trust environments. This work proposes an enhanced anonymity architecture, evolving the Loopix mix-network, to provide robust security and reliability in these volatile topologies. We introduce three primary contributions: (1) A multi-path transport protocol utilizing $(n, k)$ erasure codes, which is demonstrated to counteract the high link volatility and intermittent connectivity that renders standard mix-networks unreliable. (2) The integration of a computationally efficient Private Information Retrieval (PIR) protocol during route discovery. (3) The introduction of adaptive, centrality-based delay strategies that efficiently mitigate the inherent topological bias of LEO networks, providing a superior anonymity-to-latency trade-off. This mechanism provably prevents metadata leakage at the user-provider directory, mitigating profiling and correlation attacks. We validate this architecture via high-fidelity, packet-level simulations of a LEO constellation. Empirical results show our multi-path transport achieves near-zero message loss, establishing a quantifiable trade-off between reliability and bandwidth overhead. Furthermore, microbenchmarks of the PIR protocol quantify its computational and latency overheads, confirming its feasibility for practical deployment. This work provides a validated blueprint for deployable high-anonymity communication systems, demonstrating the viability of securely multiplexing sensitive operations within large-scale commercial network infrastructures.
Authors:Kirk Swidowski, Daniel Moghimi, Josh Eads, Erdem Aktas, Jia Ma
Abstract:
In the second and third quarters of 2025, Google collaborated with Intel to conduct a security assessment of Intel Trust Domain Extensions (TDX), extending Google's previous review and covering major changes since Intel TDX Module 1.0 - namely support for Live Migration and Trusted Domain (TD) Partitioning (nested VMs within TDs). Intel provided guidance and support, including documentation and updated TDX 1.5 source code. Unlike the previous review, this time, we had access to a compute node capable of running TDX to develop a toolkit for live testing and Proof-of-Concept (PoC) generation. Furthermore, we integrated Gemini for analysis and NotebookLM to efficiently navigate complex specifications. This assessment resulted in the discovery of one vulnerability that enables a VMM to fully compromise a TD, and four vulnerabilities that enable a malicious VMM or TD to leak confidential memory of the Intel TDX Module. Several other security weaknesses and/or bugs were identified but not categorized as vulnerabilities despite having some impact on security. Beyond presenting the technical details of multiple bugs and vulnerabilities in this report, these findings underscore that confidential computing, like other security measures, requires iterative refinement and complementary security controls to harden it, in line with a defense-in-depth approach.
Authors:Ahmad Fareed, Bilal Al Habib, Anne Pepita Francis
Abstract:
Low rate Distributed Denial of Service DDoS attacks have emerged as a major threat to containerized cloud infrastructures. Due to their low traffic volumes, these attacks can be difficult to detect and mitigate, potentially causing serious harm to internet applications. This work proposes a DDoS mitigation system that effectively defends against low rate DDoS attacks in containerized environments using a multi layered defense strategy. The solution integrates a Web Application Firewall WAF, rate limiting, dynamic blacklisting, TCP and UDP header analysis, and zero trust principles to detect and block malicious traffic at different stages of the attack life cycle. By applying zero trust principles, the system ensures that each data packet is carefully inspected before granting access, improving overall security and resilience. Additionally, the systems integration with Docker orchestration facilitates deployment and management in containerized settings.
Authors:Abhishek Saini, Haolin Jiang, Hang Liu
Abstract:
The deployment of large language models (LLMs) on third-party devices requires new ways to protect model intellectual property. While Trusted Execution Environments (TEEs) offer a promising solution, their performance limits can lead to a critical compromise: using a precomputed, static secret basis to accelerate cryptographic operations. We demonstrate that this mainstream design pattern introduces a classic cryptographic flaw, the reuse of secret keying material, into the system's protocol. We prove its vulnerability with two distinct attacks: First, our attack on a model confidentiality system achieves a full confidentiality break by recovering its secret permutations and model weights. Second, our integrity attack completely bypasses the integrity checks of systems like Soter and TSQP. We demonstrate the practicality of our attacks against state-of-the-art LLMs, recovering a layer's secrets from a LLaMA-3 8B model in about 6 minutes and showing the attack scales to compromise 405B-parameter LLMs across a variety of configurations.
Authors:Enrico Ahlers, Daniel Passon, Yannic Noller, Lars Grunske
Abstract:
Machine learning models are increasingly present in our everyday lives; as a result, they become targets of adversarial attackers seeking to manipulate the systems we interact with. A well-known vulnerability is a backdoor introduced into a neural network by poisoned training data or a malicious training process. Backdoors can be used to induce unwanted behavior by including a certain trigger in the input. Existing mitigations filter training data, modify the model, or perform expensive input modifications on samples. If a vulnerable model has already been deployed, however, those strategies are either ineffective or inefficient. To address this gap, we propose our inference-time backdoor mitigation approach called FIRE (Feature-space Inference-time REpair). We hypothesize that a trigger induces structured and repeatable changes in the model's internal representation. We view the trigger as directions in the latent spaces between layers that can be applied in reverse to correct the inference mechanism. Therefore, we turn the backdoored model against itself by manipulating its latent representations and moving a poisoned sample's features along the backdoor directions to neutralize the trigger. Our evaluation shows that FIRE has low computational overhead and outperforms current runtime mitigations on image benchmarks across various attacks, datasets, and network architectures.
Authors:Rumman Firdos, Aman Dangi
Abstract:
The growing sophistication of modern malware and phishing campaigns has diminished the effectiveness of traditional signature-based intrusion detection systems. This work presents SecureScan, an AI-driven, triple-layer detection framework that integrates logistic regression-based classification, heuristic analysis, and external threat intelligence via the VirusTotal API for comprehensive triage of URLs, file hashes, and binaries. The proposed architecture prioritizes efficiency by filtering known threats through heuristics, classifying uncertain samples using machine learning, and validating borderline cases with third-party intelligence. On benchmark datasets, SecureScan achieves 93.1 percent accuracy with balanced precision (0.87) and recall (0.92), demonstrating strong generalization and reduced overfitting through threshold-based decision calibration. A calibrated threshold and gray-zone logic (0.45-0.55) were introduced to minimize false positives and enhance real-world stability. Experimental results indicate that a lightweight statistical model, when augmented with calibrated verification and external intelligence, can achieve reliability and performance comparable to more complex deep learning systems.
Authors:Ashwin Sreevatsa, Sebastian Prasanna, Cody Rushing
Abstract:
The AI Control research agenda aims to develop control protocols: safety techniques that prevent untrusted AI systems from taking harmful actions during deployment. Because human oversight is expensive, one approach is trusted monitoring, where weaker, trusted models oversee stronger, untrusted models$\unicode{x2013}$but this often fails when the untrusted model's actions exceed the monitor's comprehension. We introduce legibility protocols, which encourage the untrusted model to take actions that are easier for a monitor to evaluate. We perform control evaluations in the APPS coding setting, where an adversarial agent attempts to write backdoored code without detection. We study legibility protocols that allow the untrusted model to thoroughly document its code with comments$\unicode{x2013}$in contrast to prior work, which removed comments to prevent deceptive ones. We find that: (i) commenting protocols improve safety without sacrificing task performance relative to comment-removal baselines; (ii) commenting disproportionately benefits honest code, which typically has a natural explanation that resolves monitor suspicion, whereas backdoored code frequently lacks an easy justification; (iii) gains from commenting increase with monitor strength, as stronger monitors better distinguish genuine justifications from only superficially plausible ones.
Authors:Júlio Oliveira, Rodrigo Ferreira, André Riker, Glaucio H. S. Carvalho, Eirini Eleni Tsilopoulou
Abstract:
Data privacy and eXplainable Artificial Intelligence (XAI) are two important aspects for modern Machine Learning systems. To enhance data privacy, recent machine learning models have been designed as a Federated Learning (FL) system. On top of that, additional privacy layers can be added, via Differential Privacy (DP). On the other hand, to improve explainability, ML must consider more interpretable approaches with reduced number of features and less complex internal architecture. In this context, this paper aims to achieve a machine learning (ML) model that combines enhanced data privacy with explainability. So, we propose a FL solution, called Federated EXplainable Trees with Differential Privacy (FEXT-DP), that: (i) is based on Decision Trees, since they are lightweight and have superior explainability than neural networks-based FL systems; (ii) provides additional layer of data privacy protection applying Differential Privacy (DP) to the Tree-Based model. However, there is a side effect adding DP: it harms the explainability of the system. So, this paper also presents the impact of DP protection on the explainability of the ML model. The carried out performance assessment shows improvements of FEXT-DP in terms of a faster training, i.e., numbers of rounds, Mean Squared Error and explainability.
Authors:Logan Therrien, John Hastings
Abstract:
The Cybersecurity Maturity Model Certification (CMMC) framework provides a common standard for protecting sensitive unclassified information in defense contracting. While CMMC defines assessment objectives and control requirements, limited formal guidance exists regarding evidence sampling, the process by which assessors select, review, and validate artifacts to substantiate compliance. Analyzing data collected through an anonymous survey of CMMC-certified assessors and lead assessors, this exploratory study investigates whether inconsistencies in evidence sampling practices exist within the CMMC assessment ecosystem and evaluates the need for a risk-informed standardized sampling methodology. Across 17 usable survey responses, results indicate that evidence sampling practices are predominantly driven by assessor judgment, perceived risk, and environmental complexity rather than formalized standards, with formal statistical sampling models rarely referenced. Participants frequently reported inconsistencies across assessments and expressed broad support for the development of standardized guidance, while generally opposing rigid percentage-based requirements. The findings support the conclusion that the absence of a uniform evidence sampling framework introduces variability that may affect assessment reliability and confidence in certification outcomes. Recommendations are provided to inform future CMMC assessment methodology development and further empirical research.
Authors:George Tsigkourakos, Constantinos Patsakis
Abstract:
Static Application Security Testing (SAST) tools are integral to modern DevSecOps pipelines, yet tools like CodeQL, Semgrep, and SonarQube remain fundamentally constrained: they require expert-crafted queries, generate excessive false positives, and detect only predefined vulnerability patterns. Recent work has explored augmenting SAST with Large Language Models (LLMs), but these approaches typically use LLMs to triage existing tool outputs rather than to reason about vulnerability semantics directly. We introduce QRS (Query, Review, Sanitize), a neuro-symbolic framework that inverts this paradigm. Rather than filtering results from static rules, QRS employs three autonomous agents that generate CodeQL queries from a structured schema definition and few-shot examples, then validate findings through semantic reasoning and automated exploit synthesis. This architecture enables QRS to discover vulnerability classes beyond predefined patterns while substantially reducing false positives. We evaluate QRS on full Python packages rather than isolated snippets. In 20 historical CVEs in popular PyPI libraries, QRS achieves 90.6% detection accuracy. Applied to the 100 most-downloaded PyPI packages, QRS identified 39 medium-to-high-severity vulnerabilities, 5 of which were assigned new CVEs, 5 received documentation updates, while the remaining 29 were independently discovered by concurrent researchers, validating both the severity and discoverability of these findings. QRS accomplishes this with low time overhead and manageable token costs, demonstrating that LLM-driven query synthesis and code review can complement manually curated rule sets and uncover vulnerability patterns that evade existing industry tools.
Authors:Yunusa Simpa Abdulsalam, Mustapha Hedabou
Abstract:
Trusted Platform Module (TPM) 2.0 devices provide efficient hardware-based cryptographic security through tamper-resistant key storage and computation, making them ideal building blocks for multi-party signature schemes in distributed systems. However, existing TPM-based multi-signature constructions suffer from a fundamental limitation, they require interactive protocols where all participants must coordinate during the commitment phase, before any signature can be computed. This interactive requirement creates several critical problems, such as synchronization bottlenecks, quadratic communication complexity, and aborted protocols as a result of participant failure. These limitations become particularly heightened for applications that require cross-device cryptographic operations. This paper presents PiTPM, an Aggregator Framework built upon Schnorr's digital signature. Our protocol eliminates the interactive requirement using a hybrid trust architecture. The proposed framework uses pre-shared randomness seeds stored securely in an Aggregator, enabling deterministic computation of global commitments without inter-participant communication. The resulting signatures of the proposed framework are of constant size regardless of signer count. Our experimental results show a possible paradigm shift in TPM-based cryptographic system design, demonstrating that hybrid trust architectures can achieve significant performance improvements while maintaining rigorous security guarantees. We provide a comprehensive formal security analysis proving EU-CMA security under the discrete logarithm assumption in the random oracle model.
Authors:Hayfa Dhabhi, Kashyap Thimmaraju
Abstract:
Large Language Models (LLMs) deploy safety mechanisms to prevent harmful outputs, yet these defenses remain vulnerable to adversarial prompts. While existing research demonstrates that jailbreak attacks succeed, it does not explain \textit{where} defenses fail or \textit{why}. To address this gap, we propose that LLM safety operates as a sequential pipeline with distinct checkpoints. We introduce the \textbf{Four-Checkpoint Framework}, which organizes safety mechanisms along two dimensions: processing stage (input vs.\ output) and detection level (literal vs.\ intent). This creates four checkpoints, CP1 through CP4, each representing a defensive layer that can be independently evaluated. We design 13 evasion techniques, each targeting a specific checkpoint, enabling controlled testing of individual defensive layers. Using this framework, we evaluate GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro across 3,312 single-turn, black-box test cases. We employ an LLM-as-judge approach for response classification and introduce Weighted Attack Success Rate (WASR), a severity-adjusted metric that captures partial information leakage overlooked by binary evaluation. Our evaluation reveals clear patterns. Traditional Binary ASR reports 22.6\% attack success. However, WASR reveals 52.7\%, a 2.3$\times$ higher vulnerability. Output-stage defenses (CP3, CP4) prove weakest at 72--79\% WASR, while input-literal defenses (CP1) are strongest at 13\% WASR. Claude achieves the strongest safety (42.8\% WASR), followed by GPT-5 (55.9\%) and Gemini (59.5\%). These findings suggest that current defenses are strongest at input-literal checkpoints but remain vulnerable to intent-level manipulation and output-stage techniques. The Four-Checkpoint Framework provides a structured approach for identifying and addressing safety vulnerabilities in deployed systems.
Authors:Dennis Breutigam, Rüdiger Reischuk
Abstract:
Differential Privacy (DP) considers a scenario in which an adversary has almost complete information about the entries of a database. This worst-case assumption is likely to overestimate the privacy threat faced by an individual in practice. In contrast, Statistical Privacy (SP), as well as related notions such as noiseless privacy or limited background knowledge privacy, describe a setting in which the adversary knows the distribution of the database entries, but not their exact realizations. In this case, privacy analysis must account for the interaction between uncertainty induced by the entropy of the underlying distributions and privacy mechanisms that distort query answers, which can be highly non-trivial. This paper investigates this problem for multiple queries (composition). A privacy mechanism is proposed that is based on subsampling and randomly partitioning the database to bound the dependency among queries. This way for the first time, to the best of our knowledge, upper privacy bounds against limited adversaries are obtained without any further restriction on the database. These bounds show that in realistic application scenarios taking the entropy of distributions into account yields improvements of privacy and precision guarantees. We illustrate examples where for fixed privacy parameters and utility loss SP allows significantly more queries than DP.
Authors:Ghalia Jarad, Kemal Bicakci
Abstract:
Automated traffic continued to surpass human-generated traffic on the web, and a rising proportion of this automation was explicitly malicious. Evasive bots could pretend to be real users, even solve Captchas and mimic human interaction patterns. This work explores a less intrusive, protocol-level method: using TLS fingerprinting with the JA4 technique to tell apart bots from real users. Two gradient-boosted machine learning classifiers (XGBoost and CatBoost) were trained and evaluated on a dataset of real TLS fingerprints (JA4DB) after feature extraction, which derived informative signals from JA4 fingerprints that describe TLS handshake parameters. The CatBoost model performed better, achieving an AUC of 0.998 and an F1 score of 0.9734. It was accurate 0.9863 of the time on the test set. The XGBoost model showed almost similar results. Feature significance analyses identified JA4 components, especially ja4\_b, cipher\_count, and ext\_count, as the most influential on model effectiveness. Future research will extend this method to new protocols, such as HTTP/3, and add additional device-fingerprinting features to test how well the system resists advanced bot evasion tactics.
Authors:Moshe Noivirt, Jessica Sorrell, Eliad Tsfadia
Abstract:
We study the computational relationship between replicability (Impagliazzo et al. [STOC `22], Ghazi et al. [NeurIPS `21]) and other stability notions. Specifically, we focus on replicable PAC learning and its connections to differential privacy (Dwork et al. [TCC 2006]) and to the statistical query (SQ) model (Kearns [JACM `98]). Statistically, it was known that differentially private learning and replicable learning are equivalent and strictly more powerful than SQ-learning. Yet, computationally, all previously known efficient (i.e., polynomial-time) replicable learning algorithms were confined to SQ-learnable tasks or restricted distributions, in contrast to differentially private learning. Our main contribution is the first computationally efficient replicable algorithm for realizable learning of parities over arbitrary distributions, a task that is known to be hard in the SQ-model, but possible under differential privacy. This result provides the first evidence that efficient replicable learning over general distributions strictly extends efficient SQ-learning, and is closer in power to efficient differentially private learning, despite computational separations between replicability and privacy. Our main building block is a new, efficient, and replicable algorithm that, given a set of vectors, outputs a subspace of their linear span that covers most of them.
Authors:Andy Dong, Arun Ganesh
Abstract:
We study privacy amplification for BandMF, i.e., DP-SGD with correlated noise across iterations via a banded correlation matrix. We propose $b$-min-sep subsampling, a new subsampling scheme that generalizes Poisson and balls-in-bins subsampling, extends prior practical batching strategies for BandMF, and enables stronger privacy amplification than cyclic Poisson while preserving the structural properties needed for analysis. We give a near-exact privacy analysis using Monte Carlo accounting, based on a dynamic program that leverages the Markovian structure in the subsampling procedure. We show that $b$-min-sep matches cyclic Poisson subsampling in the high noise regime and achieves strictly better guarantees in the mid-to-low noise regime, with experimental results that bolster our claims. We further show that unlike previous BandMF subsampling schemes, our $b$-min-sep subsampling naturally extends to the multi-attribution user-level privacy setting.
Authors:Yael Tauman Kalai, Dakshita Khurana, Justin Raizes
Abstract:
Existing protocols for classical verification of quantum computation (CVQC) consume the prover's witness state, requiring a new witness state for each invocation. Because QMA witnesses are not generally clonable, destroying the input witness means that amplifying soundness and completeness via repetition requires many copies of the witness. Building CVQC with low soundness error that uses only *one* copy of the witness has remained an open problem so far. We resolve this problem by constructing a CVQC that uses a single copy of the QMA witness, has negligible completeness and soundness errors, and does *not* destroy its witness. The soundness of our CVQC is based on the post-quantum Learning With Errors (LWE) assumption. To obtain this result, we define and construct two primitives (under the post-quantum LWE assumption) for non-destructively handling superpositions of classical data, which we believe are of independent interest: - A *state preserving* classical argument for NP. - Dual-mode trapdoor functions with *state recovery*.
Authors:Pouria Arefijamal, Mahdi Ahmadlou, Bardia Safaei, Jörg Henkel
Abstract:
Federated learning (FL) is a decentralized learning paradigm widely adopted in resource-constrained Internet of Things (IoT) environments. These devices, typically relying on TinyML models, collaboratively train global models by sharing gradients with a central server while preserving data privacy. However, as data heterogeneity and task complexity increase, TinyML models often become insufficient to capture intricate patterns, especially under extreme non-IID (non-independent and identically distributed) conditions. Moreover, ensuring robustness against malicious clients and poisoned updates remains a major challenge. Accordingly, this paper introduces RIFLE - a Robust, distillation-based Federated Learning framework that replaces gradient sharing with logit-based knowledge transfer. By leveraging a knowledge distillation aggregation scheme, RIFLE enables the training of deep models such as VGG-19 and Resnet18 within constrained IoT systems. Furthermore, a Kullback-Leibler (KL) divergence-based validation mechanism quantifies the reliability of client updates without exposing raw data, achieving high trust and privacy preservation simultaneously. Experiments on three benchmark datasets (MNIST, CIFAR-10, and CIFAR-100) under heterogeneous non-IID conditions demonstrate that RIFLE reduces false-positive detections by up to 87.5%, enhances poisoning attack mitigation by 62.5%, and achieves up to 28.3% higher accuracy compared to conventional federated learning baselines within only 10 rounds. Notably, RIFLE reduces VGG19 training time from over 600 days to just 1.39 hours on typical IoT devices (0.3 GFLOPS), making deep learning practical in resource-constrained networks.
Authors:Sahar Zargarzadeh, Mohammad Islam
Abstract:
The Internet of Things (IoT) has revolutionized connectivity by linking billions of devices worldwide. However, this rapid expansion has also introduced severe security vulnerabilities, making IoT devices attractive targets for malware such as the Mirai botnet. Power side-channel analysis has recently emerged as a promising technique for detecting malware activity based on device power consumption patterns. However, the resilience of such detection systems under adversarial manipulation remains underexplored. This work presents a novel adversarial strategy against power side-channel-based malware detection. By injecting structured dummy code into the scanning phase of the Mirai botnet, we dynamically perturb power signatures to evade AI/ML-based anomaly detection without disrupting core functionality. Our approach systematically analyzes the trade-offs between stealthiness, execution overhead, and evasion effectiveness across multiple state-of-the-art models for side-channel analysis, using a custom dataset collected from smartphones of diverse manufacturers. Experimental results show that our adversarial modifications achieve an average attack success rate of 75.2\%, revealing practical vulnerabilities in power-based intrusion detection frameworks.
Authors:Masato Kamba, Akiyoshi Sannai
Abstract:
Multi-implementation systems are increasingly audited against natural-language specifications. Differential testing scales well when implementations disagree, but it provides little signal when all implementations converge on the same incorrect interpretation of an ambiguous requirement. We present SPECA, a Specification-to-Checklist Auditing framework that turns normative requirements into checklists, maps them to implementation locations, and supports cross-implementation reuse. We instantiate SPECA in an in-the-wild security audit contest for the Ethereum Fusaka upgrade, covering 11 production clients. Across 54 submissions, 17 were judged valid by the contest organizers. Cross-implementation checks account for 76.5 percent (13 of 17) of valid findings, suggesting that checklist-derived one-to-many reuse is a practical scaling mechanism in multi-implementation audits. To understand false positives, we manually coded the 37 invalid submissions and find that threat model misalignment explains 56.8 percent (21 of 37): reports that rely on assumptions about trust boundaries or scope that contradict the audit's rules. We detected no High or Medium findings in the V1 deployment; misses concentrated in specification details and implicit assumptions (57.1 percent), timing and concurrency issues (28.6 percent), and external library dependencies (14.3 percent). Our improved agent, evaluated against the ground truth of a competitive audit, achieved a strict recall of 27.3 percent on high-impact vulnerabilities, placing it in the top 4 percent of human auditors and outperforming 49 of 51 contestants on critical issues. These results, though from a single deployment, suggest that early, explicit threat modeling is essential for reducing false positives and focusing agentic auditing effort. The agent-driven process enables expert validation and submission in about 40 minutes on average.
Authors:Mark Bun, William Fang
Abstract:
We give new differentially private algorithms for the classic problems of learning decision lists and large-margin halfspaces in the PAC and online models. In the PAC model, we give a computationally efficient algorithm for learning decision lists with minimal sample overhead over the best non-private algorithms. In the online model, we give a private analog of the influential Winnow algorithm for learning halfspaces with mistake bound polylogarithmic in the dimension and inverse polynomial in the margin. As an application, we describe how to privately learn decision lists in the online model, qualitatively matching state-of-the art non-private guarantees.
Authors:Eli Propp, Seyed Majid Zahedi
Abstract:
Malware detection using Hardware Performance Counters (HPCs) offers a promising, low-overhead approach for monitoring program behavior. However, a fundamental architectural constraint, that only a limited number of hardware events can be monitored concurrently, creates a significant bottleneck, leading to detection blind spots. Prior work has primarily focused on optimizing machine learning models for a single, statically chosen event set, or on ensembling models over the same feature set. We argue that robustness requires diversifying not only the models, but also the underlying feature sets (i.e., the monitored hardware events) in order to capture a broader spectrum of program behavior. This observation motivates the following research question: Can detection performance be improved by trading temporal granularity for broader coverage, via the strategic scheduling of different feature sets over time? To answer this question, we propose Hydra, a novel detection mechanism that partitions execution traces into time slices and learns an effective schedule of feature sets and corresponding classifiers for deployment. By cycling through complementary feature sets, Hydra mitigates the limitations of a fixed monitoring perspective. Our experimental evaluation shows that Hydra significantly outperforms state-of-the-art single-feature-set baselines, achieving a 19.32% improvement in F1 score and a 60.23% reduction in false positive rate. These results underscore the importance of feature-set diversity and establish strategic multi-feature-set scheduling as an effective principle for robust, hardware-assisted malware detection.
Authors:Sahibpreet Singh, Saksham Sharma
Abstract:
Integration of AI into environmental regulation represents a significant advancement in data management. It offers promising results in both data protection plus algorithmic fairness. This research addresses the critical need for sustainable data protection in the era of ever evolving cyber threats. Traditional encryption methods face limitations in handling the dynamic nature of environmental data. This necessitates the exploration of advanced cryptographic techniques. The objective of this study is to evaluate how AI can enhance these techniques to ensure robust data protection while facilitating fair algorithmic management. The methodology involves a comprehensive review of current advancements in AI-enhanced homomorphic encryption (HE) and multi-party computation (MPC). It is coupled with an analysis of how these techniques can be applied to environmental data regulation. Key findings indicate that AI-driven dynamic key management, adaptive encryption schemes, and optimized computational efficiency in HE, alongside AI-enhanced protocol optimization and fault mitigation in MPC, significantly improve the security of environmental data processing. These findings highlight a crucial research gap in the intersection of AI, cyber laws, and environmental regulation, particularly in terms of addressing algorithmic bias, transparency, and accountability. The implications of this research underscore the need for stricter cyber laws. Also, the development of comprehensive regulations to safeguard sensitive environmental data. Future efforts should focus on refining AI systems to balance security with privacy and ensuring that regulatory frameworks can adapt to technological advancements. This study provides a foundation for future research aimed at achieving secure sustainable environmental data management through AI innovations.
Authors:Yassine Chagna, Antal Goldschmidt
Abstract:
This project explores large language models (LLMs) for anomaly detection across heterogeneous log sources. Traditional intrusion detection systems suffer from high false positive rates, semantic blindness, and data scarcity, as logs are inherently sensitive, making clean datasets rare. We address these challenges through three contributions: (1) LogAtlas-Foundation-Sessions and LogAtlas-Defense-Set, balanced and heterogeneous log datasets with explicit attack annotations and privacy preservation; (2) empirical benchmarking revealing why standard metrics such as F1 and accuracy are misleading for security applications; and (3) a two phase training framework combining log understanding (Base-AMAN, 3B parameters) with real time detection (AMAN, 0.5B parameters via knowledge distillation). Results demonstrate practical feasibility, with inference times of 0.3-0.5 seconds per session and operational costs below 50 USD per day.
Authors:Anubhav Bhatla, Navneet Navneet, Moinuddin Qureshi, Biswabandan Panda
Abstract:
The sharing of the last-level cache (LLC) among multiple cores makes it vulnerable to cross-core conflict- and occupancy-based attacks. Despite extensive prior work, modern processors still employ non-secure set-associative LLCs. Existing secure LLC designs broadly fall into two categories: (i) randomized and (ii) partitioned. The state-of-the-art randomized design, Mirage, mitigates conflict-based attacks but incurs significant area overhead (20% additional storage) and design complexity. Partitioned LLCs mitigate both conflict- and occupancy-based attacks, but often suffer from large performance overheads (on average over 5% and up to 49%), require OS support in set-based schemes, or face scalability issues in way-based schemes. These factors pose major obstacles to the industrial adoption of secure LLCs. This paper asks whether strong LLC security can be achieved with minimal changes to a conventional set-associative LLC, enabling security only when needed while preserving low performance, power, and area overheads. We propose Avatar, a secure and morphable LLC that supports three modes: non-secure (Avatar-N), randomized secure (Avatar-R), and partitioned secure (Avatar-P), and can switch dynamically between them. Avatar closely resembles a conventional set-associative LLC, facilitating industrial adoption. Avatar-R introduces extra invalid entries and leverages high associativity to provide a strong security guarantee with little capacity loss, achieving only one set-associative eviction per $10^{30}$ years, while incurring 1.5% storage overhead, a 2.7% increase in static power, and a 0.2% slowdown over a 16~MB baseline. Avatar-P mitigates both conflict- and occupancy-based attacks with only a 3% performance overhead, substantially outperforming prior way-based partitioned LLCs. When security is unnecessary, Avatar switches to Avatar-N to maximize performance and energy efficiency.
Authors:Aditya Mitra, Hamza Haroon, Amaan Rais Shah, Mohammad Elham Rasooli, Bogdan Itsam Dorantes Nikolaev, Tuğçe Ballı
Abstract:
A computer is nothing but a device that processes the instructions supplied to it. However, as computers evolved, the instructions or codes started to be more complicated. As computers started to be used by non-technical people, it became imperative that the users be able to use the machine without having underlying knowledge of the code or the hardware. And operating system became the backbone for translating the inputs from the user to actual operation on the hardware. With the increasing complexity and the choices of operating system, it became clear that different groups of people, especially in an enterprise scenario, required different operating systems. Installing them all on a single machine, for shared computers became a difficult task, giving rise to network-based booting. But network-based booting was confined to only wired connectivity, keeping it restricted to very small geographical areas. The proposed system, /dev/SDB, is aimed at creating a standard where any user, anyone on the globe, can access the operating system authorized to them without having to be on the corporate network. It aims to offer the same over Wi-Fi as well as cellular connectivity, ensuring employees can truly work from anywhere, while following the policies for operating systems and without redundant hardware.
Authors:Sura Khalid Salsal, Eman Shaker Mahmood, Farah Tawfiq Abdul Hussien, Maryam Mahdi Alhusseini, Azhar Naji Alyahya, Nikolai Safiullin
Abstract:
The current digital era, driven by growing threats to data security, requires a robust image encryption technique. Classical encryption algorithms suffer from a trade-off among security, image fidelity, and computational efficiency. This paper aims to enhance the performance and efficiency of image encryption. This is done by proposing Fractal encryption based on Fourier transforms as a new method of image encryption, leveraging state-of-the-art technology. The new approach considered here intends to enhance both security and efficiency in image encryption by comparing Fractal Encryption with basic methods. The suggested system also aims to optimise encryption/ decryption times and preserve image quality. This paper provides an introduction to Image Encryption using the fractal-based method, its mathematical formulation, and its comparative efficiency against publicly known traditional encryption methods. As a result, after filling the gaps identified in previous research, it has significantly improved both its encryption/decryption time and image fidelity compared to other techniques. In this paper, directions for future research and possible improvements are outlined for attention.
Authors:Nourin Shahin, Izzat Alsmadi
Abstract:
As large language models (LLMs) move from research prototypes to enterprise systems, their security vulnerabilities pose serious risks to data privacy and system integrity. This study benchmarks various Llama model variants against the OWASP Top 10 for LLM Applications framework, evaluating threat detection accuracy, response safety, and computational overhead. Using the FABRIC testbed with NVIDIA A30 GPUs, we tested five standard Llama models and five Llama Guard variants on 100 adversarial prompts covering ten vulnerability categories. Our results reveal significant differences in security performance: the compact Llama-Guard-3-1B model achieved the highest detection rate of 76% with minimal latency (0.165s per test), whereas base models such as Llama-3.1-8B failed to detect threats (0% accuracy) despite longer inference times (0.754s). We observe an inverse relationship between model size and security effectiveness, suggesting that smaller, specialized models often outperform larger general-purpose ones in security tasks. Additionally, we provide an open-source benchmark dataset including adversarial prompts, threat labels, and attack metadata to support reproducible research in AI security, [1].
Authors:Dariy Guzairov, Alex Potanin, Stephen Kell, Alwen Tiu
Abstract:
Memory corruption attacks have been prevalent in software for a long time. Some mitigation strategies against these attacks do exist, but they are not as far-reaching or as efficient as the CHERI architecture. CHERI uses capabilities to restrict pointers to certain regions of memory and with certain access restrictions. These capabilities are also used to implement "compartmentalisation": dividing a binary into smaller components with limited privilege, while adhering to the principle of least privilege. However, while this architecture successfully mitigates memory corruption attacks, the compartmentalisation mechanisms in place are less effective in containing malicious code to a separate compartment. This paper details four ways to bypass compartmentalisation, with a focus on Linux and BSD operating systems ported to this architecture. We find that although compartmentalisation is implemented in these two operating systems, simple bugs and attacks can still allow malicious code to bypass it. We conclude with mitigation measures to prevent these attacks, a proof-of-concept demonstrating their use, and recommendations for further securing Linux and BSD against unknown attacks.
Authors:Henry Chen, Victor Aranda, Samarth Keshari, Ryan Heartfield, Nicole Nichols
Abstract:
Prompt-based attack techniques are one of the primary challenges in securely deploying and protecting LLM-based AI systems. LLM inputs are an unbounded, unstructured space. Consequently, effectively defending against these attacks requires proactive hardening strategies capable of continuously generating adaptive attack vectors to optimize LLM defense at runtime. We present HASTE (Hard-negative Attack Sample Training Engine): a systematic framework that iteratively engineers highly evasive prompts, within a modular optimization process, to continuously enhance detection efficacy for prompt-based attack techniques. The framework is agnostic to synthetic data generation methods, and can be generalized to evaluate prompt-injection detection efficacy, with and without fuzzing, for any hard-negative or hard-positive iteration strategy. Experimental evaluation of HASTE shows that hard negative mining successfully evades baseline detectors, reducing malicious prompt detection for baseline detectors by approximately 64%. However, when integrated with detection model re-training, it optimizes the efficacy of prompt detection models with significantly fewer iteration loops compared to relative baseline strategies. The HASTE framework supports both proactive and reactive hardening of LLM defenses and guardrails. Proactively, developers can leverage HASTE to dynamically stress-test prompt injection detection systems; efficiently identifying weaknesses and strengthening defensive posture. Reactively, HASTE can mimic newly observed attack types and rapidly bridge detection coverage by teaching HASTE-optimized detection models to identify them.
Authors:Ruslan Abdulin, Mohammad Rasoul Narimani
Abstract:
The increasing deployment of Internet-of-Things (IoT)-enabled measurement devices in modern power systems has expanded the cyberattack surface of the grid. As a result, this critical infrastructure is increasingly exposed to cyberattacks, including false data injection attacks (FDIAs) that compromise measurement integrity and threaten reliable system operation. Existing FDIA detection methods primarily exploit spatial correlations and network topology using graph-based learning; however, these approaches often rely on high-dimensional representations and shallow classifiers, limiting their ability to capture local structural dependencies and global contextual relationships. Moreover, naively incorporating Transformer architectures can result in overly deep models that struggle to model localized grid dynamics. This paper proposes a joint FDIA detection and localization framework that integrates auto-regressive moving average (ARMA) graph convolutional filters with an Encoder-Only Transformer architecture. The ARMA-based graph filters provide robust, topology-aware feature extraction and adaptability to abrupt spectral changes, while the Transformer encoder leverages self-attention to capture long-range dependencies among grid elements without sacrificing essential local context. The proposed method is evaluated using real-world load data from the New York Independent System Operator (NYISO) applied to the IEEE 14- and 300-bus systems. Numerical results demonstrate that the proposed model effectively exploits both the state and topology of the power grid, achieving high accuracy in detecting FDIA events and localizing compromised nodes.
Authors:Eymen Ünay, Björn Franke, Jackson Woodruff
Abstract:
Fully Homomorphic Encryption (FHE) enables privacy preserving computation but it suffers from high latency and memory consumption. The computations are secured with special keys called rotation keys which often take up the majority of memory. In complex FHE applications, these rotation keys can cause a large memory bottleneck limiting program throughput. Existing compilers make little effort to solve this problem, instead relying on systems with massive memory availability. This resource requirement is a barrier to FHE uptake because optimizing FHE programs by hand is challenging due to their scale, complexity and expertise required. In this work, we present KeyMemRT; an MLIR based compiler and runtime framework that individually manages rotation key lifetimes to lower memory utilization and to allow arbitrary number of rotation indices to be supported without memory bloating. KeyMemRT relies on dataflow analysis to determine key lifetimes and is the first FHE compiler to provide automatic key management, handle fine-grained key-mangement and manage boostrap keys. We implement frontends for Orion and HEIR and show improvements over state-of-the-art FHE compilers. KeyMemRT achieves memory reduction of 1.74x and a speedup of 1.20x over ANT-ACE, and memory reduction of 1.16x and a speedup of 1.73x over memory-optimized compiler Fhelipe. We provide KeyMemRT as a post-optimizing compiler that can be targeted by any FHE compiler.
Authors:Darlan Noetzold, Valderi Reis Quietinho Leithardt
Abstract:
This book arises from the need to provide a clear and up-to-date overview of the impacts of quantum computing on cryptography. The goal is to provide a reference in Portuguese for undergraduate, master's, and doctoral students in the field of data security and cryptography. Throughout the chapters, we present fundamentals, we discuss classical and post-quantum algorithms, evaluate emerging patterns, and point out real-world implementation challenges. The initial objective is to serve as a guide for students, researchers, and professionals who need to understand not only the mathematics involved, but also its practical implications in security systems and policies. For more advanced professionals, the main objective is to present content and ideas so that they can assess the changes and perspectives in the era of quantum cryptographic algorithms. To that end, the text's structure was designed to be progressive: we begin with essential concepts, move on to quantum algorithms and their consequences (with emphasis on Shor's algorithm), present issues focusing on "families" of post-quantum schemes (based on lattices, codes, hash functions, multivariate, isogenies), analyze the state of the art in standardization (highlighting the NIST process), and finally, discuss migration, interoperability, performance, and cryptographic governance. We hope that this work will assist in the formation of critical thinking and informed technical decision-making, fostering secure transition strategies for the post-quantum era.
Authors:Beom Heyn Kim, Seok Min Hong, Mohammad Mannan
Abstract:
Ransomware variants increasingly combine privilege escalation with sophisticated evasion strategies such as intermittent encryption, low-entropy encryption, and imitation attacks. Such powerful ransomware variants, privilege-escalated evasive ransomware (PEER), can defeat existing solutions relying on I/O-pattern analysis by tampering with or obfuscating I/O traces. Meanwhile, conventional statistical content-based detection becomes unreliable as the encryption size decreases due to sampling noises. We present Rhea, a cloud-offloaded ransomware defense system that analyzes replicated data snapshots, so-called mutation snapshots. Rhea introduces Format-Aware Validation that validates the syntactic and semantic correctness of file formats, instead of relying on statistical or entropy-based indicators. By leveraging file-format specifications as detection invariants, Rhea can reliably identify fine-grained and evasive encryption even under elevated attacker privileges. Our evaluation demonstrates that Rhea significantly outperforms existing approaches, establishing its practical effectiveness against modern ransomware threats.
Authors:Mohammad Fasha, Faisal Abul Rub, Nasim Matar, Bilal Sowan, Mohammad Al Khaldy
Abstract:
Large Language Models (LLMs) have emerged as a transformative and disruptive technology, enabling a wide range of applications in natural language processing, machine translation, and beyond. However, this widespread integration of LLMs also raised several security concerns highlighted by the Open Web Application Security Project (OWASP), which has identified the top 10 security vulnerabilities inherent in LLM applications. Addressing these vulnerabilities is crucial, given the increasing reliance on LLMs and the potential threats to data integrity, confidentiality, and service availability. This paper presents a framework designed to mitigate the security risks outlined in the OWASP Top 10. Our proposed model leverages LLM-enabled intelligent agents, offering a new approach to proactively identify, assess, and counteract security threats in real-time. The proposed framework serves as an initial blueprint for future research and development, aiming to enhance the security measures of LLMs and protect against emerging threats in this rapidly evolving landscape.
Authors:Niaz Mohammad Ramaki, Florian Schintke
Abstract:
Auditability and reproducibility still are critical challenges for real-time data streams pipelines. Streaming engines are highly dependent on runtime scheduling, window triggers, arrival orders, and uncertainties such as network jitters. These all derive the streaming pipeline platforms to throw non-determinist outputs. In this work, we introduce a blockchain-backed provenance architecture for streaming platform (e.g Kafka Streams) the publishes cryptographic data of a windowed data stream without publishing window payloads on-chain. We used real-time weather data from weather stations in Berlin. Weather records are canonicalized, deduplicated, and aggregated per window, then serialised deterministically. Furthermore, the Merkle root of the records within the window is computed and stored alongside with Kafka offsets boundaries to MultiChain blockchain streams as checkpoints. Our design can enable an independent auditor to verify: (1) the completeness of window payloads, (2) canonical serialization, and (3) correctness of derived analytics such as minimum/maximum/average temperatures. We evaluated our system using real data stream from two weather stations (Berlin-Brandenburg and Berlin-Tempelhof) and showed linear verification cost, deterministic reproducibility, and with a scalable off-chain storage with on-chain cryptographic anchoring. We also demonstrated that the blockchain can afford to be integrated with streaming platforms particularly with our system, and we get satisfactory transactions per second values.
Authors:Ahmed Oun, Rishabh Das, Clay Hess, Aakriti Barat, Savas Kaya
Abstract:
Industrial Control Systems (ICS) rely on sensor feedback to keep safety-critical processes within operational limits. This research presents a hardware-root-of-trust that embeds a Physically Unclonable Function (PUF) at the measurement layer to authenticate sensor readings. The architecture combines voltage fingerprinting with a temporal authentication that integrates with standard industrial control system architecture. The research prototypes the PUF integration on a hardware-in-the-loop (HIL) water tank testbed using a Simulink-based PUF emulator. The system maintains 99.97% accuracy over a 5.18-hour period of normal operation and flags all injected anomalies, including spike faults, hard-over faults, and hardware trojan scenarios that push the system over to an unsafe operational state. The proposed architecture provides a process-aware, vendor-agnostic approach that can integrate with legacy plants to detect sensor signal degradation or sophisticated supply chain attacks.
Authors:Eliron Rahimi, Margarita Osadchy, Orr Dunkelman
Abstract:
Biometric data is considered to be very private and highly sensitive. As such, many methods for biometric template protection were considered over the years -- from biohashing and specially crafted feature extraction procedures, to the use of cryptographic solutions such as Fuzzy Commitments or the use of Fully Homomorphic Encryption (FHE). A key question that arises is how much protection these solutions can offer when the adversary can inject samples, and observe the outputs of the system. While for systems that return the similarity score, one can use attacks such as hill-climbing, for systems where the adversary can only learn whether the authentication attempt was successful, this question remained open. In this paper, we show that it is indeed possible to reconstruct the biometric template by just observing the success/failure of the authentication attempt (given the ability to inject a sufficient amount of templates). Our attack achieves negligible template reconstruction loss and enables full recovery of facial images through a generative inversion method, forming a pipeline from binary scores to high-resolution facial images that successfully pass the system more than 98\% of the time. Our results, of course, are applicable for any protection mechanism that maintains the accuracy of the recognition.
Authors:Yi Lyu, Luke Dotson, Nic Draves, Andy Zhang
Abstract:
In this paper, we take a close look at how CTF can be used in cybersecurity education. We divide the CTF competitions into four different categories, which are attack-based CTFs, defense-based CTFs, jeopardy CTFs and gamified and wargames CTFs. We start our analysis by summarizing the main characteristics of different CTF types. We then compare them with each other in both learning objectives and other aspects like accessibility. We conclude that combining all four CTF formats can help participants build one's cybersecurity knowledge. By doing that, we hope that our findings will provide some useful insights for future CTF educators.
Authors:Mohammad Zare, Pirooz Shamsinejadbabaki
Abstract:
Membership inference attacks (MIAs) pose a serious threat to the privacy of machine learning models by allowing adversaries to determine whether a specific data sample was included in the training set. Although federated learning (FL) is widely regarded as a privacy-aware training paradigm due to its decentralized nature, recent evidence shows that the final global model can still leak sensitive membership information through black-box access. In this paper, we introduce Res-MIA, a novel training-free and black-box membership inference attack that exploits the sensitivity of deep models to high-frequency input details. Res-MIA progressively degrades the input resolution using controlled downsampling and restoration operations, and analyzes the resulting confidence decay in the model's predictions. Our key insight is that training samples exhibit a significantly steeper confidence decline under resolution erosion compared to non-member samples, revealing a robust membership signal. Res-MIA requires no shadow models, no auxiliary data, and only a limited number of forward queries to the target model. We evaluate the proposed attack on a federated ResNet-18 trained on CIFAR-10, where it consistently outperforms existing training-free baselines and achieves an AUC of up to 0.88 with minimal computational overhead. These findings highlight frequency-sensitive overfitting as an important and previously underexplored source of privacy leakage in federated learning, and emphasize the need for privacy-aware model designs that reduce reliance on fine-grained, non-robust input features.
Authors:Yi Lyu, Shichun Yu, Joe Catudal
Abstract:
Improvements in software defined networking allow for policy to be informed and modified by data-driven applications that can adjust policy to accommodate fluctuating requirements at line speed. However, there is some concern that over-correction can occur and cause unintended consequences depending on the data received. This is particularly problematic for network security features, such as machine-learning intrusion detection systems. We present Safeguard, a rule-based policy that overlaps a data-driven policy to prevent unintended responses for edge cases in network traffic. We develop a reference implementation of a network traffic classifier that enforces firewall rules for malicious traffic, and show how additional rulesets to allow known-good traffic are essential in utilizing a data-driven network policy.
Authors:Elisa Botteghi, Martino S. Centonze, Davide Pastorello, Daniele Tantari
Abstract:
Cyber risk has become a critical financial threat in today's interconnected digital economy. This paper introduces a cyber-risk management framework for networked digital systems that combines the strategic behavior of players with contagion dynamics within a security game. We address the problem of optimally allocating cybersecurity resources across a network, focusing on the heterogeneous valuations of nodes by attackers and defenders, some areas may be of high interest to the attacker, while others are prioritized by the defender. We explore how this asymmetry drives attack and defense strategies and shapes the system's overall resilience. We extend a method to determine optimal resource allocation based on simple network metrics weighted by the defender's and attacker's risk profiles. We further propose risk measures based on contagion paths and analyze how propagation dynamics influence optimal defense strategies. Numerical experiments explore risk versus cost efficient frontiers varying network topologies and risk profiles, revealing patterns of resource allocation and cyber deception effects. These findings provide actionable insights for designing resilient digital infrastructures and mitigating systemic cyber risk.
Authors:Yuyang Qin, Haihan Duan
Abstract:
Cryptocurrency wallets have become the primary gateway to decentralized applications, yet users often face significant difficulty in discerning what a wallet signature actually does or entails. Prior work has mainly focused on mitigating protocol vulnerabilities, with limited attention to how users perceive and interpret what they are authorizing. To examine this usability-security gap, we conducted two formative studies investigating how users interpret authentic signing requests and what cues they rely on to assess risk. Findings reveal that users often misread critical parameters, underestimate high-risk signatures, and rely on superficial familiarity rather than understanding transaction intent. Building on these insights, we designed the Signature Semantic Decoder -- a prototype framework that reconstructs and visualizes the intent behind wallet signatures prior to confirmation. Through structured parsing and semantic labeling, it demonstrates how signing data can be transformed into plain-language explanations with contextual risk cues. In a between-subjects user study (N = 128), participants using the prototype achieved higher accuracy in identifying risky signatures, improved clarity and decision confidence, and lower cognitive workload compared with the baseline wallet interface. Our study reframes wallet signing as a problem of interpretability within secure interaction design and offers design implications for more transparent and trustworthy cryptocurrency wallet interfaces.
Authors:Pablo Sorrentino, Stjepan Picek, Ihsen Alouani, Nikolaos Athanasios Anagnostopoulos, Francesco Regazzoni, Lejla Batina, Tamalika Banerjee, Fatih Turkmen
Abstract:
Neuromorphic computing mimics brain-inspired mechanisms through spiking neurons and energy-efficient processing, offering a pathway to efficient in-memory computing (IMC). However, these advancements raise critical security and privacy concerns. As the adoption of bio-inspired architectures and memristive devices increases, so does the urgency to assess the vulnerability of these emerging technologies to hardware and software attacks. Emerging architectures introduce new attack surfaces, particularly due to asynchronous, event-driven processing and stochastic device behavior. The integration of memristors into neuromorphic hardware and software implementations in spiking neural networks offers diverse possibilities for advanced computing architectures, including their role in security-aware applications. This survey systematically analyzes the security landscape of neuromorphic systems, covering attack methodologies, side-channel vulnerabilities, and countermeasures. We focus on both hardware and software concerns relevant to spiking neural networks (SNNs) and hardware primitives, such as Physical Unclonable Functions (PUFs) and True Random Number Generators (TRNGs) for cryptographic and secure computation applications. We approach this analysis from diverse perspectives, from attack methodologies to countermeasure strategies that integrate efficiency and protection in brain-inspired hardware. This review not only maps the current landscape of security threats but provides a foundation for developing secure and trustworthy neuromorphic architectures.
Authors:Kai Li, Jiahao Lu, Fu Yao, Guang Zeng, Dongsheng Liu, Shengfei Gu, Zhengpeng Zhao, Jiachen Wang
Abstract:
FrodoKEM is a lattice-based post-quantum key encapsulation mechanism (KEM). It has been considered for standardization by the International Organization for Standardization (ISO) due to its robust security profile. However, its hardware implementation exhibits a weakness of high latency and heavy resource burden, hindering its practical application. Moreover, diverse usage scenarios call for comprehensive functionality. To address these challenges, this paper presents a high-performance and efficient crypto-processor for FrodoKEM. A multiple-instruction overlapped execution scheme is introduced to enable efficient multi-module scheduling and minimize operational latency. Furthermore, a high-speed, reconfigurable parallel multiplier array is integrated to handle intensive matrix computations under diverse computation patterns, significantly enhancing hardware efficiency. In addition, a compact memory scheduling strategy shortens the lifespan of intermediate matrices, thereby reducing overall storage requirements. The proposed design provides full support for all FrodoKEM security levels and protocol phases. It consumes 13467 LUTs, 6042 FFs, and 14 BRAMs on an Artix-7 FPGA and achieves the fastest reported execution time. Compared with state-of-the-art hardware implementations, our design improves the area-time product (ATP) by 1.75-2.00 times.
Authors:Juliao Braga, Percival Henriques, Juliana C. Braga, Itana Stiubiener
Abstract:
The use of algorithms is increasing across various fields such as healthcare, justice, finance, and education. This growth has significantly accelerated with the advent of Artificial Intelligence (AI) technologies based on Large Language Models (LLMs) since 2022. This expansion presents substantial challenges related to accountability, ethics, and transparency. This article explores the potential of the Digital Object Identifier (DOI) to identify algorithms, aiming to enhance accountability, transparency, and reliability in their development and application, particularly in AI agents and multimodal LLMs. The use of DOIs facilitates tracking the origin of algorithms, enables audits, prevents biases, promotes research reproducibility, and strengthens ethical considerations. The discussion addresses the challenges and solutions associated with maintaining algorithms identified by DOI, their application in API security, and the proposal of a cryptographic authentication protocol.
Authors:Joan Vendrell Farreny, Martí Jordà Roca, Miquel Cornudella Gaya, Rodrigo Fernández Baón, Víctor García Martínez, Eduard Camacho Sucarrats, Alessandro Pignati
Abstract:
This paper introduces the Generative Application Firewall (GAF), a new architectural layer for securing LLM applications. Existing defenses -- prompt filters, guardrails, and data-masking -- remain fragmented; GAF unifies them into a single enforcement point, much like a WAF coordinates defenses for web traffic, while also covering autonomous agents and their tool interactions.
Authors:Daewoo Kim, Sihang Liu
Abstract:
Virtualization is widely adopted in cloud systems to manage resource sharing among users. A virtualized environment usually deploys a virtual switch within the host system to enable virtual machines to communicate with each other and with the physical network. The Open vSwitch (OVS) is one of the most popular software-based virtual switches. It maintains a cache hierarchy to accelerate packet forwarding from the host to virtual machines. We characterize the caching system inside OVS from a security perspective and identify three attack primitives. Based on the attack primitives, we present three remote attacks via OVS, breaking the isolation in virtualized environments. First, we identify remote covert channels using different caches. Second, we present a novel header recovery attack that leaks a remote user's packet header fields, breaking the confidentiality guarantees from the system. Third, we demonstrate a remote packet rate monitoring attack that recovers the packet rate of a remote victim. To defend against these attacks, we also discuss and evaluate mitigation solutions.
Authors:Piyumi Bhagya Sudasinghe, Kushan Sudheera Kalupahana Liyanage, Harsha S. Gardiyawasam Pussewalage
Abstract:
The rapid growth of Internet of Things (IoT) devices has increased the scale and diversity of cyberattacks, exposing limitations in traditional intrusion detection systems. Classical machine learning (ML) models such as Random Forest and Support Vector Machine perform well on known attacks but require retraining to detect unseen or zero-day threats. This study investigates lightweight decoder-only Large Language Models (LLMs) for IoT attack detection by integrating structured-to-text conversion, Quantized Low-Rank Adaptation (QLoRA) fine-tuning, and Retrieval-Augmented Generation (RAG). Network traffic features are transformed into compact natural-language prompts, enabling efficient adaptation under constrained hardware. Experiments on the CICIoT2023 dataset show that a QLoRA-tuned LLaMA-1B model achieves an F1-score of 0.7124, comparable to the Random Forest (RF) baseline (0.7159) for known attacks. With RAG, the system attains 42.63% accuracy on unseen attack types without additional training, demonstrating practical zero-shot capability. These results highlight the potential of retrieval-enhanced lightweight LLMs as adaptable and resource-efficient solutions for next-generation IoT intrusion detection.
Authors:Lorenzo Fernández Maimó, Alberto Huertas Celdrán, Manuel Gil Pérez, Félix J. García Clemente, Gregorio Martínez Pérez
Abstract:
Fog and mobile edge computing (MEC) will play a key role in the upcoming fifth generation (5G) mobile networks to support decentralized applications, data analytics and management into the network itself by using a highly distributed compute model. Furthermore, increasing attention is paid to providing user-centric cybersecurity solutions, which particularly require collecting, processing and analyzing significantly large amount of data traffic and huge number of network connections in 5G networks. In this regard, this paper proposes a MEC-oriented solution in 5G mobile networks to detect network anomalies in real-time and in autonomic way. Our proposal uses deep learning techniques to analyze network flows and to detect network anomalies. Moreover, it uses policies in order to provide an efficient and dynamic management system of the computing resources used in the anomaly detection process. The paper presents relevant aspects of the deployment of the proposal and experimental results to show its performance.
Authors:Aditi Gandhi, Aakankshya Das, Aswani Kumar Cherukuri
Abstract:
The emergence of quantum computing poses a fundamental threat to current public key cryptographic systems. This threat is necessitating a transition to quantum resistant cryptographic alternatives in all the applications. In this work, we present the implementation of a practical hybrid end-to-end encryption system that combines classical and post-quantum cryptographic primitives to achieve both security and efficiency. Our system employs CRYSTALS-Kyber, a NIST-standardized lattice-based key encapsulation mechanism, for quantum-safe key exchange, coupled with AES-256-GCM for efficient authenticated symmetric encryption and SHA-256 for deterministic key derivation. The architecture follows a zero-trust model where a relay server facilitates communication without accessing plaintext messages or cryptographic keys. All encryption and decryption operations occur exclusively at client endpoints. The system demonstrates that NIST standardized post-quantum cryptography can be effectively integrated into practical messaging systems with acceptable performance characteristics, offering protection against both classical and quantum adversaries. As our focus is on implementation rather than on novelty, we also provide an open-source implementation to facilitate reproducibility and further research in post quantum secure communication systems.
Authors:Daisuke Miyamoto, Takuji Iimura, Narushige Michishita
Abstract:
With the spread of generative AI in recent years, attacks known as Whaling have become a serious threat. Whaling is a form of social engineering that targets important high-authority individuals within organizations and uses sophisticated fraudulent emails. In the context of Japanese universities, faculty members frequently hold positions that combine research leadership with authority within institutional workflows. This structural characteristic leads to the wide public disclosure of high-value information such as publications, grants, and detailed researcher profiles. Such extensive information exposure enables the construction of highly precise target profiles using generative AI. This raises concerns that Whaling attacks based on high-precision profiling by generative AI will become prevalent. In this study, we propose a Whaling countermeasure framework for university faculty members that constructs personalized defense profiles and uses large language model (LLM)-based agents. We design agents that (i) build vulnerability profiles for each target from publicly available information on faculty members, (ii) identify potential risk scenarios relevant to Whaling defense based on those profiles, (iii) construct defense profiles corresponding to the vulnerabilities and anticipated risks, and (iv) analyze Whaling emails using the defense profiles. Furthermore, we conduct a preliminary risk-assessment experiment. The results indicate that the proposed method can produce judgments accompanied by explanations of response policies that are consistent with the work context of faculty members who are Whaling targets. The findings also highlight practical challenges and considerations for future operational deployment and systematic evaluation.
Authors:Qiyue Mei, Michael Fu
Abstract:
Infrastructure as Code (IaC) enables automated provisioning of large-scale cloud and on-premise environments, reducing the need for repetitive manual setup. However, this automation is a double-edged sword: a single misconfiguration in IaC scripts can propagate widely, leading to severe system downtime and security risks. Prior studies have shown that IaC scripts often contain security smells--bad coding patterns that may introduce vulnerabilities--and have proposed static analyzers based on symbolic rules to detect them. Yet, our preliminary analysis reveals that rule-based detection alone tends to over-approximate, producing excessive false positives and increasing the burden of manual inspection. In this paper, we present IntelliSA, an intelligent static analyzer for IaC security smell detection that integrates symbolic rules with neural inference. IntelliSA applies symbolic rules to over-approximate potential smells for broad coverage, then employs neural inference to filter false positives. While an LLM can effectively perform this filtering, reliance on LLM APIs introduces high cost and latency, raises data governance concerns, and limits reproducibility and offline deployment. To address the challenges, we adopt a knowledge distillation approach: an LLM teacher generates pseudo-labels to train a compact student model--over 500x smaller--that learns from the teacher's knowledge and efficiently classifies false positives. We evaluate IntelliSA against two static analyzers and three LLM baselines (Claude-4, Grok-4, and GPT-5) using a human-labeled dataset including 241 security smells across 11,814 lines of real-world IaC code. Experimental results show that IntelliSA achieves the highest F1 score (83%), outperforming baselines by 7-42%. Moreover, IntelliSA demonstrates the best cost-effectiveness, detecting 60% of security smells while inspecting less than 2% of the codebase.
Authors:Ka Lok Wu, Christa Jenkins, Scott D. Stoller, Omar Chowdhury
Abstract:
Robust access control is a cornerstone of secure software, systems, and networks. An access control mechanism is as effective as the policy it enforces. However, authoring effective policies that satisfy desired properties such as the principle of least privilege is a challenging task even for experienced administrators, as evidenced by many real instances of policy misconfiguration. In this paper, we set out to address this pain point by proposing Restricter, which automatically tightens each (permit) policy rule of a policy with respect to an access log, which captures some already exercised access requests and their corresponding access decisions (i.e., allow or deny). Restricter achieves policy tightening by reducing the number of access requests permitted by a policy rule without sacrificing the functionality of the underlying system it is regulating. We implement Restricter for Amazon's Cedar policy language and demonstrate its effectiveness through two realistic case studies.
Authors:Botong Ou, Baijian Yang
Abstract:
As the expansion of IoT connectivity continues to provide quality-of-life improvements around the world, they simultaneously introduce increasing privacy and security concerns. The lack of a clear definition in managing shared and protected access to IoT sensors offer channels by which devices can be compromised and sensitive data can be leaked. In recent years, WebAssembly has received considerable attention for its efficient application sandboxing suitable for embedded systems, making it a prime candidate for exploring a secure and portable sensor interface. This paper introduces the first WebAssembly System Interface (WASI) extension offering a secure, portable, and low-footprint sandbox enabling multi-tenant access to sensor data across heterogeneous embedded devices. The runtime extensions provide application memory isolation, ensure appropriate resource privileges by intercepting sensor access, and offer an MQTT-SN interface enabling in-network access control. When targeting the WebAssembly byte-code with the associated runtime extensions implemented atop the Zephyr RTOS, our evaluation of sensor access indicates a latency overhead of 6% with an additional memory footprint of 5% when compared to native execution. As MQTT-SN requests are dominated by network delays, the WASI-SN implementation of MQTT-SN introduces less than 1% additional latency with similar memory footprint.
Authors:Murilo de Souza Neves, Adilson Luiz Bonifacio
Abstract:
Smart contracts are tools with self-execution capabilities that provide enhanced security compared to traditional contracts; however, their immutability makes post-deployment fault correction extremely complex, highlighting the need for a verification layer prior to this stage. Although formalisms such as Contract Language (CL) enable logical analyses, they prove limited in attributing responsibilities within complex multilateral scenarios. This work presents a proof of concept using the Relativized Contract Language (RCL) and the RECALL tool for the specification and verification of a purchase and sale contract involving multiple agents. The study demonstrates the tool's capability to detect normative conflicts during the modeling phase. After correcting logical inconsistencies, the contract was translated into Solidity and functionally validated within the Remix IDE environment, confirming that prior formal verification is fundamental to ensuring the reliability and security of the final code.
Authors:Ashikuzzaman, Md. Shawkat Hossain, Jubayer Abdullah Joy, Md Zahid Akon, Md Manjur Ahmed, Md. Naimul Islam
Abstract:
The increase in the number of Internet of Things (IoT) devices has tremendously increased the attack surface of cyber threats thus making a strong intrusion detection system (IDS) with a clear explanation of the process essential towards resource-constrained environments. Nevertheless, current IoT IDS systems are usually traded off with detection quality, model elucidability, and computational effectiveness, thus the deployment on IoT devices. The present paper counteracts these difficulties by suggesting an explainable AI (XAI) framework based on an optimized Decision Tree classifier with both local and global importance methods: SHAP values that estimate feature attribution using local explanations, and Morris sensitivity analysis that identifies the feature importance in a global view. The proposed system attains the state of art on the test performance with 99.91% accuracy, F1-score of 99.51% and Cohen Kappa of 0.9960 and high stability is confirmed by a cross validation mean accuracy of 98.93%. Efficiency is also enhanced in terms of computations to provide faster inferences compared to those that are generalized in ensemble models. SrcMac has shown as the most significant predictor in feature analyses according to SHAP and Morris methods. Compared to the previous work, our solution eliminates its major drawback lack because it allows us to apply it to edge devices and, therefore, achieve real-time processing, adhere to the new regulation of transparency in AI, and achieve high detection rates on attacks of dissimilar classes. This combination performance of high accuracy, explainability, and low computation make the framework useful and reliable as a resource-constrained IoT security problem in real environments.
Authors:Ambarish Gurjar, L Jean Camp
Abstract:
Network defenders face a steady stream of attacks, observed as raw Intrusion Detection System (IDS) alerts. The sheer volume of alerts demands prioritization, typically based on high-level risk classifications. This work expands the scope of risk measurement by examining alerts not only through their technical characteristics but also by examining and classifying their temporal patterns. One critical issue in responding to intrusion alerts is determining whether an alert is part of an escalating attack pattern or an opportunistic scan. To identify the former, we apply extreme-regime forecasting methods from financial modeling to IDS data. Extreme-regime forecasting is designed to identify likely future high-impact events or significant shifts in system behavior. Using these methods, we examine attack patterns by computing per-minute alert intensity, volatility, and a short-term momentum measure derived from weighted moving averages. We evaluate the efficacy of a supervised learning model for forecasting future escalation patterns using these derived features. The trained model identifies future high-intensity attacks and demonstrates strong predictive performance, achieving approximately 91\% accuracy, 89\% recall, and 98\% precision. Our contributions provide a temporal measurement framework for identifying future high-intensity attacks and demonstrate the presence of predictive early-warning signals within the temporal structure of IDS alert streams. We describe our methods in sufficient detail to enable reproduction using other IDS datasets. In addition, we make the trained models openly available to support further research. Finally, we introduce an interpretable visualization that enables defenders to generate early predictive warnings of elevated volumetric arrival risk.
Authors:Ruben Neyroud, Sam Corley
Abstract:
While most LLMs are autoregressive, diffusion-based LLMs have recently emerged as an alternative method for generation. Greedy Coordinate Gradient (GCG) attacks have proven effective against autoregressive models, but their applicability to diffusion language models remains largely unexplored. In this work, we present an exploratory study of GCG-style adversarial prompt attacks on LLaDA (Large Language Diffusion with mAsking), an open-source diffusion LLM. We evaluate multiple attack variants, including prefix perturbations and suffix-based adversarial generation, on harmful prompts drawn from the AdvBench dataset. Our study provides initial insights into the robustness and attack surface of diffusion language models and motivates the development of alternative optimization and evaluation strategies for adversarial analysis in this setting.
Authors:Olawale Amos Akanji, Manuel Egele, Gianluca Stringhini
Abstract:
Digital lending applications, commonly referred to as loan apps, have become a primary channel for microcredit in emerging markets. However, many of these apps demand excessive permissions and misuse sensitive user data for coercive debt-recovery practices, including harassment, blackmail, and public shaming that affect both borrowers and their contacts. This paper presents the first cross-country measurement of loan app compliance against both national regulations and Google's Financial Services Policy. We analyze 434 apps drawn from official registries and app markets from Indonesia, Kenya, Nigeria, Pakistan, and the Philippines. To operationalize policy requirements at scale, we translate policy text into testable permission checks using LLM-assisted policy-to-permission mapping and combine this with static and dynamic analyses of loan apps' code and runtime behavior. Our findings reveal pervasive non-compliance among approved apps: 141 violate national regulatory policy and 147 violate Google policy. Dynamic analysis further shows that several apps transmit sensitive data (contacts, SMS, location, media) before user signup or registration, undermining informed consent and enabling downstream harassment of borrowers and third parties. Following our disclosures, Google removed 93 flagged apps from Google Play, representing over 300M cumulative installs. We advocate for adopting our methodology as a proactive compliance-monitoring tool and offer targeted recommendations for regulators, platforms, and developers to strengthen privacy protections. Overall, our results highlight the need for coordinated enforcement and robust technical safeguards to ensure that digital lending supports financial inclusion without compromising user privacy or safety.
Authors:Isabel Straw, Akhil Polamarasetty, Mustafa Jaafar
Abstract:
Individuals experiencing interpersonal violence (IPV), who depend on medical devices, represent a uniquely vulnerable population as healthcare technologies become increasingly connected. Despite rapid growth in MedTech innovation and "health-at-home" ecosystems, the intersection of MedTech cybersecurity and technology-facilitated abuse remains critically under-examined. IPV survivors who rely on therapeutic devices encounter a qualitatively different threat environment from the external, technically sophisticated adversaries typically modeled in MedTech cybersecurity research. We address this gap through two complementary methods: (1) the development of hazard-integrated threat models that fuse Cyber physical system security modeling with tech-abuse frameworks, and (2) an immersive simulation with practitioners, deploying a live version of our model, identifying gaps in digital forensic practice. Our hazard-integrated CIA threat models map exploits to acute and chronic biological effects, uncovering (i) Integrity attack pathways that facilitate "Medical gaslighting" and "Munchausen-by-IoMT", (ii) Availability attacks that create life-critical and sub-acute harms (glycaemic emergencies, blindness, mood destabilization), and (iii) Confidentiality threats arising from MedTech advertisements (geolocation tracking from BLE broadcasts). Our simulation demonstrates that these attack surfaces are unlikely to be detected in practice: participants overlooked MedTech, misclassified reproductive and assistive technologies, and lacked awareness of BLE broadcast artifacts. Our findings show that MedTech cybersecurity in IPV contexts requires integrated threat modeling and improved forensic capabilities for detecting, preserving and interpreting harms arising from compromised patient-technology ecosystems.
Authors:Lirui Zhang, Huishuai Zhang
Abstract:
As LLMs rapidly advance and enter real-world use, their privacy implications are increasingly important. We study an authorship de-anonymization threat: using LLMs to link anonymous documents to their authors, potentially compromising settings such as double-blind peer review. We propose De-Anonymization at Scale (DAS), a large language model-based method for attributing authorship among tens of thousands of candidate texts. DAS uses a sequential progression strategy: it randomly partitions the candidate corpus into fixed-size groups, prompts an LLM to select the text most likely written by the same author as a query text, and iteratively re-queries the surviving candidates to produce a ranked top-k list. To make this practical at scale, DAS adds a dense-retrieval prefilter to shrink the search space and a majority-voting style aggregation over multiple independent runs to improve robustness and ranking precision. Experiments on anonymized review data show DAS can recover same-author texts from pools of tens of thousands with accuracy well above chance, demonstrating a realistic privacy risk for anonymous platforms. On standard authorship benchmarks (Enron emails and blog posts), DAS also improves both accuracy and scalability over prior approaches, highlighting a new LLM-enabled de-anonymization vulnerability.
Authors:Amit Chougule, Vinay Chamola, Norbert Herencsar, Fei Richard Yu
Abstract:
The rapid evolution of the automobile sector, driven by advancements in connected and autonomous vehicles (CAVs), has transformed how vehicles communicate, operate, and interact with their surroundings. Technologies such as Vehicle-to-Everything (V2X) communication enable autonomous cars to generate and exchange substantial amounts of data with real-world entities, enhancing safety, improving performance, and delivering personalized user experiences. However, this data-driven ecosystem introduces significant challenges, particularly concerning data privacy, security, and governance. The absence of transparency and comprehensive regulatory frameworks exacerbates issues of unauthorized data access, prolonged retention, and potential misuse, creating tension between consumer benefits and privacy risks. This review paper explores the multifaceted nature of data sharing in CAVs, analyzing its contributions to innovation and its associated vulnerabilities. It evaluates data-sharing mechanisms and communication technologies, highlights the benefits of data exchange across various use cases, examines privacy concerns and risks of data misuse, and critically reviews regulatory frameworks and their inadequacies in safeguarding user privacy. By providing a thorough analysis of the current state of data sharing in the automotive sector, the paper emphasizes the urgent need for robust policies and ethical data management practices. It calls for striking a balance between fostering technological advancements and ensuring secure, consumer-friendly solutions, paving the way for a trustworthy and innovative automotive future.
Authors:Mohammad Shahid, Paritosh Ramanan, Mohammad Fili, Guiping Hu, Hillel Haim
Abstract:
Analysis of clinical data is a cornerstone of biomedical research with applications in areas such as genomic testing and response characterization of therapeutic drugs. Maintaining strict privacy controls is essential because such data typically contains personally identifiable health information of patients. At the same time, regulatory compliance often requires study managers to demonstrate the integrity and authenticity of participant data used in analyses. Balancing these competing requirements, privacy preservation and verifiable accountability, remains a critical challenge. In this paper, we present CoSMeTIC, a zero-knowledge computational framework that proposes computational Sparse Merkle Trees (SMTs) as a means to generate verifiable inclusion and exclusion proofs for individual participants' data in clinical studies. We formally analyze the zero-knowledge properties of CoSMeTIC and evaluate its computational efficiency through extensive experiments. Using the Kolmogorov-Smirnov and likelihood-ratio hypothesis tests, along with logistic-regression-based genomic analyses on real-world Huntington's disease datasets, we demonstrate that CoSMeTIC achieves strong privacy guarantees while maintaining statistical fidelity. Our results suggest that CoSMeTIC provides a scalable and practical alternative for achieving regulatory compliance with rigorous privacy protection in large-scale clinical research.
Authors:Richik Chakraborty, Lawrence Liu, Syed Hasnain
Abstract:
Personalized health analytics increasingly rely on population benchmarks to provide contextual insights such as ''How do I compare to others like me?'' However, cohort-based aggregation of health data introduces nontrivial privacy risks, particularly in interactive and longitudinal digital platforms. Existing privacy frameworks such as $k$-anonymity and differential privacy provide essential but largely static guarantees that do not fully capture the cumulative, distributional, and tail-dominated nature of re-identification risk in deployed systems. In this work, we present a privacy-preserving cohort analytics framework that combines deterministic cohort constraints, differential privacy mechanisms, and synthetic baseline generation to enable personalized population comparisons while maintaining strong privacy protections. We further introduce a stochastic risk modeling approach that treats re-identification risk as a random variable evolving over time, enabling distributional evaluation through Monte Carlo simulation. Adapting quantitative risk measures from financial mathematics, we define Privacy Loss at Risk (P-VaR) to characterize worst-case privacy outcomes under realistic cohort dynamics and adversary assumptions. We validate our framework through system-level analysis and simulation experiments, demonstrating how privacy-utility tradeoffs can be operationalized for digital health platforms. Our results suggest that stochastic risk modeling complements formal privacy guarantees by providing interpretable, decision-relevant metrics for platform designers, regulators, and clinical informatics stakeholders.
Authors:Shaunak Perni, Minal Shirodkar, Ramdas Karmalli
Abstract:
NoSQL Injection attacks are a class of cybersecurity attacks where an attacker sends a specifically engineered query to a NoSQL database which then performs an unauthorized operation. To defend against such attacks, rule based systems were initially developed but then were found to be ineffective to innovative injection attacks hence a model based approach was developed. Most model based detection systems, during testing gave exponentially positive results but were trained only on the query statement sent to the server. However due to the scarcity of data and class imbalances these model based systems were found to be not effective against all attacks in the real world. This paper explores classifying NoSQL injection attacks sent to a MongoDB server based on Log Data, and other extracted features excluding raw query statements. The log data was collected from a simulated attack on an empty MongoDB server which was then processed and explored. A discriminant analysis was carried out to determine statistically significant features to discriminate between injection and benign queries resulting in a dataset of significant features. Several Machine learning based classification models using an AutoML library, "FLAML", as well as 6 manually programmed models were trained on this dataset , which were then trained on 50 randomized samples of data, cross validated and evaluated. The study found that the best model was the "FLAML" library's "XGBoost limited depth" model with an accuracy of 71%.
Authors:Taehyun Noh, Yingchen Wang, Tal Garfinkel, Mahesh Madhav, Daniel Moghimi, Mattan Erez, Shravan Narayan
Abstract:
We present the first comprehensive analysis of ARM MTE hardware performance on four different microarchitectures: ARM Big (A7x), Little (A5x), and Performance (Cortex-X) cores on the Google Pixel 8 and Pixel 9, and on Ampere Computing's AmpereOne CPU core. We also include preliminary analysis of MTE on Apple's M5 chip. We investigate performance in MTE's primary application -- probabilistic memory safety -- on both SPEC CPU benchmarks and in server workloads such as RocksDB, Nginx, PostgreSQL, and Memcached. While MTE often exhibits modest overheads, we also see performance slowdowns up to 6.64x on certain benchmarks. We identify the microarchitectural cause of these overheads and where they can be addressed in future processors. We then analyze MTE's performance for more specialized security applications such as memory tracing, time-of-check time-of-use prevention, sandboxing, and CFI. In some of these cases, MTE offers significant advantages today, while the benefits for other cases are negligible or will depend on future hardware. Finally, we explore where prior work characterizing MTE performance has either been incomplete or incorrect due to methodological or experimental errors.
Authors:Chetan Pathade, Vinod Dhimam, Sheheryar Ahmad, Ilsa Lareb
Abstract:
Serverless computing has achieved widespread adoption, with over 70% of AWS organizations using serverless solutions [1]. Meanwhile, machine learning inference workloads increasingly migrate to Function-as-a-Service (FaaS) platforms for their scalability and cost-efficiency [2], [3], [4]. However, this convergence introduces critical security challenges, with recent reports showing a 220% increase in AI/ML vulnerabilities [5] and serverless computing's fragmented architecture raises new security concerns distinct from traditional cloud deployments [6], [7]. This paper presents the first comprehensive security analysis of machine learning workloads in serverless environments. We systematically characterize the attack surface across five categories: function-level vulnerabilities (cold start exploitation, dependency poisoning), model-specific threats (API-based extraction, adversarial inputs), infrastructure attacks (cross-function contamination, privilege escalation), supply chain risks (malicious layers, backdoored libraries), and IAM complexity (ephemeral nature, serverless functions). Through empirical assessments across AWS Lambda, Azure Functions, and Google Cloud Functions, we demonstrate real-world attack scenarios and quantify their security impact. We propose Serverless AI Shield (SAS), a multi-layered defense framework providing pre-deployment validation, runtime monitoring, and post-execution forensics. Our evaluation shows SAS achieves 94% detection rates while maintaining performance overhead below 9% for inference latency. We release an open-source security toolkit to enable practitioners to assess and harden their serverless AI deployments, advancing the field toward more resilient cloud-native machine learning systems.
Authors:Stephan Helfrich, Emilia Grass
Abstract:
Considering the increasing frequency of cyberattacks affecting multiple hospitals simultaneously, improving resilience at a network level is essential. Various countermeasures exist to improve resilience against cyberattacks, such as deploying controls that strengthen IT infrastructures to limit their impact, or enabling resource sharing, patient transfers and backup capacities to maintain services of hospitals in response to realized attacks. However, determining the most cost-effective combination among these wide range of countermeasures is a complex challenge, further intensified by constrained budgets and competing priorities between maintaining efficient daily hospital operations and investing in disaster preparedness. To address these challenges, we propose a defender-attacker-defender optimization model that supports decision-makers in identifying effective strategies for improving the resilience of a network of hospitals against cyberattacks. The model explicitly captures interdependence between hospital services and their supporting IT infrastructures. By doing so, cyberattacks can be directly translated into reductions of service capacities, which allows to assess proactive and reactive strategies on both the operational and technical sides within a single framework. Further, time-dependent resilience measures are incorporated as design objectives to account for the mid- to long-term consequences of cyberattacks. The model is validated based on the German hospital network, suggesting that enabling cooperation with backup capacities particularly in urban areas, alongside strengthening of IT infrastructures across all hospitals, are crucial strategies.
Authors:Getoar Sopa, Marco Avella Medina, Cynthia Rush
Abstract:
Differential Privacy (DP) provides a rigorous framework for releasing statistics while protecting individual information present in a dataset. Although substantial progress has been made on differentially private linear regression, existing methods almost exclusively address the item-level DP setting, where each user contributes a single observation. Many scientific and economic applications instead involve longitudinal or panel data, in which each user contributes multiple dependent observations. In these settings, item-level DP offers inadequate protection, and user-level DP - shielding an individual's entire trajectory - is the appropriate privacy notion. We develop a comprehensive framework for estimation and inference in longitudinal linear regression under user-level DP. We propose a user-level private regression estimator based on aggregating local regressions, and we establish finite-sample guarantees and asymptotic normality under short-range dependence. For inference, we develop a privatized, bias-corrected covariance estimator that is automatically heteroskedasticity- and autocorrelation-consistent. These results provide the first unified framework for practical user-level DP estimation and inference in longitudinal linear regression under dependence, with strong theoretical guarantees and promising empirical performance.
Authors:Xianyu Zou, Xiaoli Gong, Jin Zhang, Shiyang Li, Pen-Chung Yew
Abstract:
Virtualization-based binary obfuscation is widely adopted to protect software intellectual property, yet existing approaches leave exception-handling (EH) metadata unprotected to preserve ABI compatibility. This exposed metadata leaks rich structural information, such as stack layouts, control-flow boundaries, and object lifetimes, which can be exploited to facilitate reverse engineering. In this paper, we present XuanJia, a comprehensive VM-based binary obfuscation framework that provides end-to-end protection for both executable code and exception-handling semantics. At the core of XuanJia is ABI-Compliant EH Shadowing, a novel exception-aware protection mechanism that preserves compatibility with unmodified operating system runtimes while eliminating static EH metadata leakage. XuanJia replaces native EH metadata with ABI-compliant shadow unwind information to satisfy OS-driven unwinding, and securely redirects exception handling into a protected virtual machine where the genuine EH semantics are decrypted, reversed, and replayed using obfuscated code. We implement XuanJia from scratch, supporting 385 x86 instruction encodings and 155 VM handler templates, and design it as an extensible research testbed. We evaluate XuanJia across correctness, resilience, and performance dimensions. Our results show that XuanJia preserves semantic equivalence under extensive dynamic and symbolic testing, effectively disrupts automated reverse-engineering tools such as IDA Pro, and incurs negligible space overhead and modest runtime overhead. These results demonstrate that XuanJia achieves strong protection of exception-handling logic without sacrificing correctness or practicality.
Authors:Mithil Bavishi, Anuj Bohra, Kushal Vadodaria, Abhinav Bohra, Neha Katre, Ramchandra Mangrulkar, Vinaya Sawant
Abstract:
Encryption and Decryption is the process of sending a message in a ciphered way that appears meaningless and could be deciphered using a key for security purposes to avoid data breaches. This paper expands on the previous work on Sudoku-based encryption methods, applying it to other forms of media including images, audio and video. It also enhances the security of key generation and usage by making it dependent on the timestamp of when the message was transmitted. It is a versatile system that works on multimodal data and functions as a block-based transposition cipher. Instead of shuffling, it can also employ substitution methods like XOR, making it a substitution cipher. The resulting media are highly encrypted and resilient to brute-force and differential attacks. For images, NPCR values approach 100% and for audio, SNR values exceed 60dB. This makes the encrypted audio significantly different from the source, making decryption more difficult.
Authors:Khushbakht Farooq, Muhammad Ibrahim, Irsa Manzoor, Mukhtaj Khan, Wei Song
Abstract:
The rapid integration of IoT with edge computing has revolutionized various domains, particularly healthcare, by enabling real-time data sharing, remote monitoring, and decision-making. However, it introduces critical challenges, including data privacy breaches, security vulnerabilities, especially in environments dealing with sensitive information. Traditional access control mechanisms and centralized security systems do not address these issues, leaving IoT environments exposed to unauthorized access and data misuse. This research proposes Fuzzychain-edge, a novel Fuzzy logic-based adaptive Access control model for Blockchain in Edge Computing framework designed to overcome these limitations by incorporating Zero-Knowledge Proofs (ZKPs), fuzzy logic, and smart contracts. ZKPs secure sensitive data during access control processes by enabling verification without revealing confidential details, thereby ensuring user privacy. Fuzzy logic facilitates adaptive, context-aware decision-making for access control by dynamically evaluating parameters such as data sensitivity, trust levels, and user roles. Blockchain technology, with its decentralized and immutable architecture, ensures transparency, traceability, and accountability using smart contracts that automate access control processes. The proposed framework addresses key challenges by enhancing security, reducing the likelihood of unauthorized access, and providing a transparent audit trail of data transactions. Expected outcomes include improved data privacy, accuracy in access control, and increased user trust in IoT systems. This research contributes significantly to advancing privacy-preserving, secure, and traceable solutions in IoT environments, laying the groundwork for future innovations in decentralized technologies and their applications in critical domains such as healthcare and beyond.
Authors:Vayur Shanbhag, Prasad Krishnan
Abstract:
We design new minimal-subpacketization schemes for information-theoretic private information retrieval on graph-based replicated databases. In graph-based replication, the system consists of $K$ files replicated across $N$ servers according to a graph with $N$ vertices and $K$ edges. The client wants to retrieve one desired file, while keeping the index of the desired file private from each server via a query-response protocol. We seek PIR protocols that have (a) high rate, which is the ratio of the file-size to the total download cost, and (b) low subpacketization, which acts as a constraint on the size of the files for executing the protocol. We report two new schemes which have unit-subpacketization (which is minimal): (i) for a special class of graphs known as star graphs, and (ii) for general graphs. Our star-graph scheme has a better rate than previously known schemes with low subpacketization for general star graphs. Our scheme for general graphs uses a decomposition of the graph via independent sets. This scheme achieves a rate lower than prior schemes for the complete graph, however it can achieve higher rates than known for some specific graph classes. An extension of our scheme to the case of multigraphs achieves a higher rate than previous schemes for the complete multi-graph.
Authors:Nikita Andrusov, Sevag Büyüksimkeşyan, Dimitrios Noulas, Fabien Pazuki, Mustafa Umut Kazancıoğlu, Jordi Vilà-Casadevall
Abstract:
The Weil pairing on elliptic curves has deep links with discrete logarithm problems. In practice, to better suit the functionalities of cryptosystems, one often needs to modify the original Weil pairing via what is called a distortion map. We propose a study on the question of the existence of distortion maps for elliptic curves over finite fields. We revisit results from the literature and provide detailed proofs. We also propose new perspectives at times.
Authors:Jose Eduardo Ulloa, Diego R. Llanos
Abstract:
This paper documents the installation, configuration, and operation of a full Bitcoin node in a Linux environment, from manual compilation of the source code to complete synchronization with the network. The technical phases of the process are described, the main files generated by Bitcoin Core are analyzed, and the effects of the parameters txindex, prune, dbcache, maxmempool, and maxconnections are empirically studied. System resources during the block download (IBD) mechanism are also documented, and the operational importance of each resource is explained. This paper provides a solid foundation for future research proposals on Bitcoin node performance or for the development of blockchain data query tools.
Authors:Francisco Angulo de Lafuente, Seid Mehammed Abdu, Nirmal Tej
Abstract:
This paper presents SiliconHealth, a comprehensive blockchain-based healthcare infrastructure designed for resource-constrained regions, particularly sub-Saharan Africa. We demonstrate that obsolete Bitcoin mining Application-Specific Integrated Circuits (ASICs) can be repurposed to create a secure, low-cost, and energy-efficient medical records system. The proposed architecture employs a four-tier hierarchical network: regional hospitals using Antminer S19 Pro (90+ TH/s), urban health centers with Antminer S9 (14 TH/s), rural clinics equipped with Lucky Miner LV06 (500 GH/s, 13W), and mobile health points with portable ASIC devices. We introduce the Deterministic Hardware Fingerprinting (DHF) paradigm, which repurposes SHA-256 mining ASICs as cryptographic proof generators, achieving 100% verification rate across 23 test proofs during 300-second validation sessions. The system incorporates Reed-Solomon LSB watermarking for medical image authentication with 30-40% damage tolerance, semantic Retrieval-Augmented Generation (RAG) for intelligent medical record queries, and offline synchronization protocols for intermittent connectivity. Economic analysis demonstrates 96% cost reduction compared to GPU-based alternatives, with total deployment cost of $847 per rural clinic including 5-year solar power infrastructure. Validation experiments on Lucky Miner LV06 (BM1366 chip, 5nm) achieve 2.93 MH/W efficiency and confirm hardware universality. This work establishes a practical framework for deploying verifiable, tamper-proof electronic health records in regions where traditional healthcare IT infrastructure is economically unfeasible, potentially benefiting over 600 million people lacking access to basic health information systems.
Authors:Francesco Capano, Jonas Böhler, Benjamin Weggenmann
Abstract:
In collaborative learning (CL), multiple parties jointly train a machine learning model on their private datasets. However, data can not be shared directly due to privacy concerns. To ensure input confidentiality, cryptographic techniques, e.g., multi-party computation (MPC), enable training on encrypted data. Yet, even securely trained models are vulnerable to inference attacks aiming to extract memorized data from model outputs. To ensure output privacy and mitigate inference attacks, differential privacy (DP) injects calibrated noise during training. While cryptography and DP offer complementary guarantees, combining them efficiently for cryptographic and differentially private CL (CPCL) is challenging. Cryptography incurs performance overheads, while DP degrades accuracy, creating a privacy-accuracy-performance trade-off that needs careful design considerations. This work systematizes the CPCL landscape. We introduce a unified framework that generalizes common phases across CPCL paradigms, and identify secure noise sampling as the foundational phase to achieve CPCL. We analyze trade-offs of different secure noise sampling techniques, noise types, and DP mechanisms discussing their implementation challenges and evaluating their accuracy and cryptographic overhead across CPCL paradigms. Additionally, we implement identified secure noise sampling options in MPC and evaluate their computation and communication costs in WAN and LAN. Finally, we propose future research directions based on identified key observations, gaps and possible enhancements in the literature.
Authors:Fokke Heikamp, Lei Pan, Robin Doss, Rolando Trujillo-Rasua, Sushmita Ruj
Abstract:
Traceability systems have become prevalent in supply chains because of the rapid development of RFID and IoT technologies. These systems facilitate product recall and mitigate problems such as counterfeiting, tampering, and theft by tracking the manufacturing and distribution life-cycle of a product. Therefore, traceability systems are a defense mechanism against supply chain attacks and, consequently, have become a target for attackers to circumvent. For example, a counterfeiter may change the trace of a fake product for the trace of an authentic product, fooling the system into accepting a counterfeit product as legit and thereby giving a false sense of security. This systematic analysis starts with the observation that security requirements in existing traceability solutions are often unstructured or incomplete, leaving critical vulnerabilities unaddressed. We synthesized the properties of current state-of-the-art traceability solutions within a single security framework that allows us to analyze and compare their security claims. Using this framework, we objectively compared the security of $17$ traceability solutions and identified several weaknesses and vulnerabilities. This article reports on these flaws, the methodology we used to identify them, and the first security evaluation of traceability solutions on a large scale.
Authors:Pedro Antonino, Namrata Jain
Abstract:
Zero-Knowledge (ZK) proof systems are cryptographic protocols that can (with overwhelming probability) demonstrate that the pair $(X, W)$ is in a relation $R$ without revealing information about the private input $W$. This membership checking is captured by a complex arithmetic circuit: a set of polynomial equations over a finite field. ZK programming languages, like Noir, have been proposed to simplify the description of these circuits. A developer can write a Noir program using traditional high-level constructs that can be compiled into a lower-level ACIR (Abstract Circuit Intermediate Representation), which is essentially a high-level description of an arithmetic circuit. In this paper, we formalise some of the ACIR language using SMT-LIB and its extended theory of finite fields. We use this formalisation to create an open-source formal verifier for the Noir language using the SMT solver cvc5. Our verifier can be used to check whether Noir programs behave appropriately. For instance, it can be used to check whether a Noir program has been properly constrained, that is, the finite-field polynomial equations generated truly capture the intended relation. We evaluate our verifier over 4 distinct sets of Noir programs, demonstrating its practical applicability and identifying a hard-to-check constraint type that charts an improvement path for our verification framework.
Authors:Dafne Lozano-Paredes, Luis Bote-Curiel, Juan Ramón Feijóo-Martínez, Ismael Gómez-Talal, José Luis Rojo-Álvarez
Abstract:
The IEC 61850 Generic Object-Oriented Substation Event (GOOSE) protocol plays a critical role in real-time protection and automation of digital substations, yet its lack of native security mechanisms can expose power systems to sophisticated cyberattacks. Traditional rule-based and supervised intrusion detection techniques struggle to detect protocol-compliant and zero-day attacks under significant class imbalance and limited availability of labeled data. This paper proposes an explainable, unsupervised multi-view anomaly detection framework for IEC 61850 GOOSE networks that explicitly separates semantic integrity and temporal availability. The approach employs asymmetric autoencoders trained only on real operational GOOSE traffic to learn distinct latent representations of sequence-based protocol semantics and timing-related transmission dynamics in normal traffic. Anomaly detection is implemented using reconstruction errors mixed with statistically grounded thresholds, enabling robust detection without specified attack types. Feature-level reconstruction analysis provides intrinsic explainability by directly linking detection outcomes to IEC 61850 protocol characteristics. The proposed framework is evaluated using real substation traffic for training and a public dataset containing normal traffic and message suppression, data manipulation, and denial-of-service attacks for testing. Experimental results show attack detection rates above 99% with false positives remaining below 5% of total traffic, demonstrating strong generalization across environments and effective operation under extreme class imbalance and interpretable anomaly attribution.
Authors:Miryam Mi-Ying Huang, Er-Cheng Tang
Abstract:
Program obfuscation aims to conceal a program's internal structure while preserving its functionality. A central open problem is whether an obfuscation scheme for arbitrary quantum circuits exists. Despite several efforts having been made toward this goal, prior works have succeeded only in obfuscating quantum circuits that implement either pseudo-deterministic functions or unitary transformations. Although unitary transformations already include a broad class of quantum computation, many important quantum tasks, such as state preparation and quantum error-correction, go beyond unitaries and fall within general completely positive trace-preserving maps. In this work, we construct the first quantum ideal obfuscation scheme for arbitrary quantum circuits that support quantum inputs and outputs in the classical oracle model assuming post-quantum one-way functions, thereby resolving an open problem posed in Bartusek et al. (STOC 2023), Bartusek, Brakerski, and Vaikuntanathan (STOC 2024), and Huang and Tang (FOCS 2025). At the core of our construction lies a novel primitive that we introduce, called the subspace-preserving strong pseudorandom unitary (spsPRU). An spsPRU is a family of efficient unitaries that fix every vector in a given linear subspace $S$, while acting as a Haar random unitary on the orthogonal complement $S^\perp$ under both forward and inverse oracle queries. Furthermore, by instantiating the classical oracle model with the ideal obfuscation scheme for classical circuits proposed by Jain et al. (CRYPTO 2023) and later enhanced by Bartusek et al. (arxiv:2510.05316), our obfuscation scheme can also be realized in the quantumly accessible pseudorandom oracle model.
Authors:Md Mashrur Arifin, Maqsudur Rahman, Nasir U. Eisty
Abstract:
As zero-day Android malware attacks grow more sophisticated, recent research highlights the effectiveness of using image-based representations of malware bytecode to detect previously unseen threats. However, existing studies often overlook how image type and resolution affect detection and ignore valuable textual data in Android Application Packages (APKs), such as permissions and metadata, limiting their ability to fully capture malicious behavior. The integration of multimodality, which combines image and text data, has gained momentum as a promising approach to address these limitations. This paper proposes a multimodal deep learning framework integrating APK images and textual features to enhance Android malware detection. We systematically evaluate various image types and resolutions across different Convolutional Neural Networks (CNN) architectures, including VGG, ResNet-152, MobileNet, DenseNet, EfficientNet-B4, and use LLaMA-2, a large language model, to extract and annotate textual features for improved analysis. The findings demonstrate that RGB images at higher resolutions (e.g., 256x256, 512x512) achieve superior classification performance, while the multimodal integration of image and text using the CLIP model reveals limited potential. Overall, this research highlights the importance of systematically evaluating image attributes and integrating multimodal data to develop effective malware detection for Android systems.
Authors:Sean Siddens, Sanya Srivastava, Reese Levine, Josiah Dykstra, Tyler Sorensen
Abstract:
To improve efficiency, nearly all parallel processing units (CPUs and GPUs) implement relaxed memory models in which memory operations may be re-ordered, i.e., executed out-of-order. Prior testing work in this area found that memory re-orderings are observed more frequently when other cores are active, e.g., stressing the memory system, which likely triggers aggressive hardware optimizations. In this work, we present Memory DisOrder: a timerless side-channel that uses memory re-orderings to infer activity on other processes. We first perform a fuzzing campaign and show that many mainstream processors (X86/Arm/Apple CPUs, NVIDIA/AMD/Apple GPUs) are susceptible to cross-process signals. We then show how the vulnerability can be used to implement classic attacks, including a covert channel, achieving up to 16 bits/second with 95% accuracy on an Apple M3 GPU, and application fingerprinting, achieving reliable closed-world DNN architecture fingerprinting on several CPUs and an Apple M3 GPU. Finally, we explore how low-level system details can be exploited to increase re-orderings, showing the potential for a covert channel to achieve nearly 30K bits/second on X86 CPUs. More precise attacks can likely be developed as the vulnerability becomes better understood.
Authors:Lorenzo Casalino, Maria Méndez Real, Jean-Christophe Prévotet, Rubén Salvador
Abstract:
Deep neural networks (DNNs), which support services such as driving assistants and medical diagnoses, undergo lengthy and expensive training procedures. Therefore, the training's outcome - the DNN weights - represents a significant intellectual property asset to protect. Side-channel analysis (SCA) has recently appeared as an effective approach to recover this confidential asset from DNN implementations. In response, researchers have proposed to defend DNN implementations through classic side-channel countermeasures, at the cost of higher energy consumption, inference time, and resource utilisation. Following a different approach, Ding et al. (HOST'25) introduced MACPRUNING, a novel SCA countermeasure based on pruning, a performance-oriented Approximate Computing technique: at inference time, the implementation randomly prunes (or skips) non-important weights (i.e., with low contribution to the DNN's accuracy) of the first layer, exponentially increasing the side-channel resilience of the protected DNN implementation. However, the original security analysis of MACPRUNING did not consider a control-flow dependency intrinsic to the countermeasure design. This dependency may allow an attacker to circumvent MACPRUNING and recover the weights important to the DNN's accuracy. This paper describes a preprocessing methodology to exploit the above-mentioned control-flow dependency. Through practical experiments on a Chipwhisperer-Lite running a MACPRUNING-protected Multi-Layer Perceptron, we target the first 8 weights of each neuron and recover 96% of the important weights, demonstrating the drastic reduction in security of the protected implementation. Moreover, we show how microarchitectural leakage improves the effectiveness of our methodology, even allowing for the recovery of up to 100% of the targeted non-important weights. Lastly, by adapting our methodology [continue in pdf].
Authors:Yongtong Gu, Songze Li, Xia Hu
Abstract:
The increasing misuse of AI-generated texts (AIGT) has motivated the rapid development of AIGT detection methods. However, the reliability of these detectors remains fragile against adversarial evasions. Existing attack strategies often rely on white-box assumptions or demand prohibitively high computational and interaction costs, rendering them ineffective under practical black-box scenarios. In this paper, we propose Multi-stage Alignment for Style Humanization (MASH), a novel framework that evades black-box detectors based on style transfer. MASH sequentially employs style-injection supervised fine-tuning, direct preference optimization, and inference-time refinement to shape the distributions of AI-generated texts to resemble those of human-written texts. Experiments across 6 datasets and 5 detectors demonstrate the superior performance of MASH over 11 baseline evaders. Specifically, MASH achieves an average Attack Success Rate (ASR) of 92%, surpassing the strongest baselines by an average of 24%, while maintaining superior linguistic quality.
Authors:Gal Engelberg, Konstantin Koutsyi, Leon Goldberg, Reuven Elezra, Idan Pinto, Tal Moalem, Shmuel Cohen, Yoni Weintrob
Abstract:
Identity Security Posture Management (ISPM) is a core challenge for modern enterprises operating across cloud and SaaS environments. Answering basic ISPM visibility questions, such as understanding identity inventory and configuration hygiene, requires interpreting complex identity data, motivating growing interest in agentic AI systems. Despite this interest, there is currently no standardized way to evaluate how well such systems perform ISPM visibility tasks on real enterprise data. We introduce the Sola Visibility ISPM Benchmark, the first benchmark designed to evaluate agentic AI systems on foundational ISPM visibility tasks using a live, production-grade identity environment spanning AWS, Okta, and Google Workspace. The benchmark focuses on identity inventory and hygiene questions and is accompanied by the Sola AI Agent, a tool-using agent that translates natural-language queries into executable data exploration steps and produces verifiable, evidence-backed answers. Across 77 benchmark questions, the agent achieves strong overall performance, with an expert accuracy of 0.84 and a strict success rate of 0.77. Performance is highest on AWS hygiene tasks, where expert accuracy reaches 0.94, while results on Google Workspace and Okta hygiene tasks are more moderate, yet competitive. Overall, this work provides a practical and reproducible benchmark for evaluating agentic AI systems in identity security and establishes a foundation for future ISPM benchmarks covering more advanced identity analysis and governance tasks.
Authors:Xiangyu Liu, Brian Lee, Yuansong Qiao
Abstract:
The rapid development of Internet of Things (IoT) technology has led to growing concerns about data security and user privacy in the interactions within distributed systems. Decentralized Applications (DApps) in distributed systems consist of on-chain and off-chain functions, where on-chain functions are smart contracts running in the blockchain network, while off-chain functions operate outside the blockchain. Since smart contracts cannot access off-chain information, they cannot verify whether the off-chain functions, i.e. the software components, they interact with have been tampered or not. As a result, establishing mutual trust between the on-chain smart contracts and the off-chain functions remains a significant challenge. To address the challenge, this paper introduces TeeMAF, a generic framework for mutual attestation between on-chain and off-chain functions, leveraging Trusted Execution Environments (TEE), specifically Intel Software Guard Extensions (SGX), SCONE (a TEE container on top of Intel SGX), and remote attestation technologies. This ensures that the deployed off-chain functions of a DApp execute in a provably secure computing environment and achieve mutual attestation with the interacting on-chain functions. Through a security analysis of TeeMAF, the reliability of deployed DApps can be verified, ensuring their correct execution. Furthermore, based on this framework, this paper proposes a decentralized resource orchestration platform (a specific DApp) for deploying applications over untrusted environments. The system is implemented on Ethereum and benchmarked using Hyperledger Caliper. Performance evaluation focusing on throughput and latency demonstrates that, compared to platforms without a mutual attestation scheme, the performance overhead remains within an acceptable range.
Authors:Pavel Velek, Tomáš Rabas, Jiří Buček
Abstract:
The Hamming Quasi-Cyclic (HQC) cryptosystem was selected for standardization in the fourth round of the NIST Post-Quantum Cryptography (PQC) standardization project. The goal of the PQC project is to standardize one or more quantum-resistant public-key cryptographic algorithms. In this paper, we present a single-trace Simple Power Analysis (SPA) attack against HQC that exploits power consumption leakage that occurs during polynomial multiplication performed at the beginning of HQC decryption. Using the ChipWhisperer-Lite board, we perform and evaluate the attack, achieving a 99.69% success rate over 10 000 attack attempts. We also propose various countermeasures against the attack and evaluate their time complexity.
Authors:Gaohao Cui, Jianing Li, Jincheng Zhuang
Abstract:
The shortest vector problem (SVP) over ideal lattices is closely related to the Ring-LWE problem, which is widely used to build post-quantum cryptosystems. Power-of-two cyclotomic fields are frequently adopted to instantiate Ring-LWE. Pan et al. (EUROCRYPT~2021) explored the SVP over ideal lattices via the decomposition fields and, in particular determined the length of the shortest vector in prime ideals lying over rational primes $p\equiv3,5\pmod{8}$ in power-of-two cyclotomic fields via explicit construction of reduced lattice bases. In this work, we first provide a new method (different from analyzing lattice bases) to analyze the length of the shortest vector in prime ideals in $\mathbb{Z}[ζ_{2^{n+1}}]$ when $p\equiv3,5\pmod{8}$. Then we precisely characterize the length of the shortest vector in the cases of $p\equiv7,9\pmod{16}$. Furthermore, we derive a new upper bound $\sqrt[4]{2^{2n+1}p}$ for this length, which is tighter than the bound $2^n\sqrt[4]{p}$ obtained from Minkowski's theorem. Our key technique is to investigate whether a generator of a principal ideal can achieve the shortest length after embedding as a vector. If this holds for the ideal, finding the shortest vector in this ideal can be reduced to finding its shortest generator.
Authors:Emre Balci, Timucin Aydede, Gorkem Yilmaz, Ece Gelal Soyak
Abstract:
Smart contract technology facilitates self-executing agreements on the blockchain, eliminating dependency on an external trusted authority. However, smart contracts may expose vulnerabilities that can lead to financial losses and disruptions in decentralized applications. In this work, we evaluate deep learning-based approaches for vulnerability scanning of Ethereum smart contracts. We propose VASCOT, a Vulnerability Analyzer for Smart COntracts using Transformers, which performs sequential analysis of Ethereum Virtual Machine (EVM) bytecode and incorporates a sliding window mechanism to overcome input length constraints. To assess VASCOT's detection efficacy, we construct a dataset of 16,469 verified Ethereum contracts deployed in 2022, and annotate it using trace analysis with concrete validation to mitigate false positives. VASCOT's performance is then compared against a state-of-the-art LSTM-based vulnerability detection model on both our dataset and an older public dataset. Our findings highlight the strengths and limitations of each model, providing insights into their detection capabilities and generalizability.
Authors:Karthikeyan V. R., Premnath S., Kavinraaj S., J. Sangeetha
Abstract:
Fraudulent activities on digital banking services are becoming more intricate by the day, challenging existing defenses. While older rule driven methods struggle to keep pace, even precision focused algorithms fall short when new scams are introduced. These tools typically overlook subtle shifts in criminal behavior, missing crucial signals. Because silent breaches cost institutions far more than flagged but legitimate actions, catching every possible case is crucial. High sensitivity to actual threats becomes essential when oversight leads to heavy losses. One key aim here involves reducing missed fraud cases without spiking incorrect alerts too much. This study builds a system using group learning methods adjusted through smart threshold choices. Using real world transaction records shared openly, where cheating acts rarely appear among normal activities, tests are run under practical skewed distributions. The outcomes reveal that approximately 98 percent of actual fraud is detected, outperforming standard setups that rely on unchanging rules when dealing with uneven examples across classes. When tested in live settings, the fraud detection system connects directly to an online banking transaction flow, stopping questionable activities before they are completed. Alongside this setup, a browser add on built for Chrome is designed to flag deceptive web links and reduce threats from harmful sites. These results show that adjusting decisions by cost impact and validating across entire systems makes deployment more stable and realistic for today's digital banking platforms.
Authors:Harshil Parmar, Pushti Vyas, Prayers Khristi, Priyank Panchal
Abstract:
As vulnerability research increasingly adopts generative AI, a critical reliance on opaque model outputs has emerged, creating a "trust gap" in security automation. We address this by introducing Zer0n, a framework that anchors the reasoning capabilities of Large Language Models (LLMs) to the immutable audit trails of blockchain technology. Specifically, we integrate Gemini 2.0 Pro for logic-based vulnerability detection with the Avalanche C-Chain for tamper-evident artifact logging. Unlike fully decentralized solutions that suffer from high latency, Zer0n employs a hybrid architecture: execution remains off-chain for performance, while integrity proofs are finalized on-chain. Our evaluation on a dataset of 500 endpoints reveals that this approach achieves 80% detection accuracy with only a marginal 22.9% overhead, effectively demonstrating that decentralized integrity can coexist with high-speed security workflows.
Authors:Fabian Walke, Thaddäa Nürnberger
Abstract:
This paper provides a comprehensive literature review on the belief in false information, including misinformation, disinformation, and fake information. It addresses the increasing societal concern regarding false information, which is fueled by technological progress, especially advancements in artificial intelligence. This review systematically identifies and categorizes factors that influence the belief in false information. The review identifies 24 influence factors grouped into six main categories: demographic factors, personality traits, psychological factors, policy and values, media consumption, and preventive factors. Key findings highlight that lower education levels, high extraversion, low agreeableness, high neuroticism, and low cognitive reflection significantly increase belief in false information. The effectiveness of preventive strategies like labeling false information and promoting reflection about correctness is also discussed. This literature review conceptualizes belief in false information as a human-centered security risk in sociotechnical systems, as it can be exploited to manipulate decisions, undermine trust, and increase susceptibility to social engineering. It aims to inform preventive strategies that strengthen socio-technical security and societal resilience.
Authors:Xing Zhou, Dmitrii Ustiugov, Haoxin Shang, Kisson Lin
Abstract:
AI memory systems are evolving toward unified context layers that enable efficient cross-agent collaboration and multi-tool workflows, facilitating better accumulation of personal data and learning of user preferences. However, centralization creates a trust crisis where users must entrust cloud providers with sensitive digital memory data. We identify a core tension between personalization demands and data sovereignty: centralized memory systems enable efficient cross-agent collaboration but expose users' sensitive data to cloud provider risks, while private deployments provide security but limit collaboration. To resolve this tension, we aim to achieve local-equivalent security while enabling superior maintenance efficiency and collaborative capabilities. We propose a five-layer architecture abstracting common functional components of AI memory systems: Storage, Extraction, Learning, Retrieval, and Governance. By applying TEE protection to each layer, we establish a trustworthy framework. Based on this, we design MemTrust, a hardware-backed zero-trust architecture that provides cryptographic guarantees across all layers. Our contributions include the five-layer abstraction, "Context from MemTrust" protocol for cross-application sharing, side-channel hardened retrieval with obfuscated access patterns, and comprehensive security analysis. The architecture enables third-party developers to port existing systems with acceptable development costs, achieving system-wide trustworthiness. We believe that AI memory plays a crucial role in enhancing the efficiency and collaboration of agents and AI tools. AI memory will become the foundational infrastructure for AI agents, and MemTrust serves as a universal trusted framework for AI memory systems, with the goal of becoming the infrastructure of memory infrastructure.
Authors:Jinduo Guo, Yinzhi Cao
Abstract:
The Newton method has been widely adopted to achieve certified unlearning. A critical assumption in existing approaches is that the data requested for unlearning are selected i.i.d.(independent and identically distributed). However,the problem of certified unlearning under non-i.i.d. deletions remains largely unexplored. In practice, unlearning requests are inherently biased, leading to non-i.i.d. deletions and causing distribution shifts between the original and retained datasets. In this paper, we show that certified unlearning with the Newton method becomes inefficient and ineffective under non-i.i.d. unlearning sets. We then propose a better certified unlearning approach by performing a distribution-aware certified unlearning framework based on iterative Newton updates constrained by a trust region. Our method provides a closer approximation to the retrained model and yields a tighter pre-run bound on the gradient residual, thereby ensuring efficient (epsilon, delta)-certified unlearning. To demonstrate its practical effectiveness under distribution shift, we also conduct extensive experiments across multiple evaluation metrics, providing a comprehensive assessment of our approach.
Authors:Vasanth Iyer, Leonardo Bobadilla, S. S. Iyengar
Abstract:
Large Language Models (LLMs) such as Gemma-2B have shown strong performance in various natural language processing tasks. However, general-purpose models often lack the domain expertise required for cybersecurity applications. This work presents a methodology to fine-tune the Gemma-2B model into a domain-specific cybersecurity LLM. We detail the processes of dataset preparation, fine-tuning, and synthetic data generation, along with implications for real-world applications in threat detection, forensic investigation, and attack analysis. Experiments highlight challenges in prompt length distribution during domain-specific fine-tuning. Uneven prompt lengths limit the model's effective use of the context window, constraining local inference to 200-400 tokens despite hardware support for longer sequences. Chain-of-thought styled prompts, paired with quantized weights, yielded the best performance under these constraints. To address context limitations, we employed a hybrid strategy using cloud LLMs for synthetic data generation and local fine-tuning for deployment efficiency. To extend the evaluation, we introduce a Retrieval-Augmented Generation (RAG) pipeline and graph-based reasoning framework. This approach enables structured alignment with MITRE ATT&CK techniques through STIX-based threat intelligence, enhancing recall in multi-hop and long-context scenarios. Graph modules encode entity-neighborhood context and tactic chains, helping mitigate the constraints of short prompt windows. Results demonstrate improved model alignment with tactic, technique, and procedure (TTP) coverage, validating the utility of graph-augmented LLMs in cybersecurity threat intelligence applications.
Authors:Boutaina Jebari, Khalil Ibrahimi, Hamidou Tembine, Mounir Ghogho
Abstract:
Public blockchains, though renowned for their transparency and immutability, suffer from significant privacy concerns. Network-level analysis and long-term observation of publicly available transactions can often be used to infer user identities. To mitigate this, several blockchain applications rely on relayers, which serve as intermediary nodes between users and smart contracts deployed on the blockchain. However, dependence on a single relayer not only creates a single point of failure but also introduces exploitable vulnerabilities that weaken the system's privacy guarantees. This paper proposes a decentralized relayer architecture that enhances privacy and reliability through game-theoretic incentive design. We model the interaction among relayers as a non-cooperative game and design an incentive mechanism in which probabilistic uploading emerges as a unique mixed Nash equilibrium. Using evolutionary game analysis, we demonstrate the equilibrium's stability against perturbations and coordinated deviations. Through numerical evaluations, we analyze how equilibrium strategies and system behavior evolve with key parameters such as the number of relayers, upload costs, rewards, and penalties. In particular, we show that even with high transaction costs, the system maintains reliability with an outage probability below 0.05 . Furthermore, our results highlight a fundamental trade-off between privacy, reliability, robustness, and cost in decentralized relayer systems.
Authors:Ahmed M. Abdelmagid, Barry C. Ezell, Michael McShane
Abstract:
Small-Medium Businesses (SMBs) are essential to global economies yet remain highly vulnerable to cyberattacks due to limited budgets, inadequate cybersecurity expertise, and underestimation of cyber risks. Their increasing reliance on digital infrastructures has expanded their attack surfaces, exposing them to sophisticated and evolving threats. Consequently, implementing proactive, adaptive security measures has become imperative. This research investigates the effectiveness of Zero Trust Architecture (ZTA) as a sustainable cybersecurity solution tailored to SMBs. While ZTA adoption has been examined broadly, the specific financial, organizational, and capability constraints of SMBs remain underexplored. This study develops an integrated predictive model to assess both the feasibility and risk-mitigation potential of ZTA implementation. The model consists of two sub-models. The first sub-model evaluates the probability of successful ZTA adoption considering implied barriers, and the second tests the effectiveness of ZTA in responding to prevalent cyberattacks. The integrated model predicts the risk level in the presence of ZTA and quantifies the uncertainty of the extent to which ZTA can enhance SMBs' cyber resilience, contributing novel insights for practitioners and stakeholders seeking to enhance compliance with policies, risk, and governance activities in SMBs.
Authors:Imtiaz Ali Soomro, Hamood Ur Rehman, S. Jawad Hussain ID, Adeel Iqbal, Waqas Khalid, Heejung Yu ID
Abstract:
The rapid proliferation of Internet of Things (IoT) devices across domains such as smart homes, industrial control systems, and healthcare networks has significantly expanded the attack surface for cyber threats, including botnet-driven distributed denial-of-service (DDoS), malware injection, and data exfiltration. Conventional intrusion detection systems (IDS) face critical challenges like privacy, scalability, and robustness when applied in such heterogeneous IoT environments. To address these issues, we propose SecureDyn-FL, a comprehensive and robust privacy-preserving federated learning (FL) framework tailored for intrusion detection in IoT networks. SecureDyn-FL is designed to simultaneously address multiple security dimensions in FL-based IDS: (1) poisoning detection through dynamic temporal gradient auditing, (2) privacy protection against inference and eavesdropping attacks through secure aggregation, and (3) adaptation to heterogeneous non-IID data via personalized learning. The framework introduces three core contributions: (i) a dynamic temporal gradient auditing mechanism that leverages Gaussian mixture models (GMMs) and Mahalanobis distance (MD) to detect stealthy and adaptive poisoning attacks, (ii) an optimized privacy-preserving aggregation scheme based on transformed additive ElGamal encryption with adaptive pruning and quantization for secure and efficient communication, and (iii) a dual-objective personalized learning strategy that improves model adaptation under non-IID data using logit-adjusted loss. Extensive experiments on the N-BaIoT dataset under both IID and non-IID settings, including scenarios with up to 50% adversarial clients, demonstrate that SecureDyn-FL consistently outperforms state-of-the-art FL-based IDS defenses.
Authors:Rakesh Keshava, Sathish Kuppan Pandurangan, M. Sakthivanitha, Sankaranainar Parmsivan, Goutham Sunkara, R. Maruthi
Abstract:
The rise in frequency and complexity of malware attacks are viewed as a major threat to modern digital infrastructure, which means that traditional signature-based detection methods are becoming less effective. As cyber threats continue to evolve, there is a growing need for intelligent systems to accurately and proactively identify and prevent malware infections. This study presents a new hybrid context-aware malware detection framework(HCAMDF) based on artificial intelligence (AI), which combines static file analysis, dynamic behavioural analysis, and contextual metadata to provide more accurate and timely detection. HCADMF has a multi-layer architecture, which consists of lightweight static classifiers such as Long Short Term Memory (LSTM) for real-time behavioral analysis, and an ensemble risk scoring through the integration of multiple layers of prediction. Experimental evaluations of the new/methodology with benchmark datasets, EMBER and CIC-MalMem2022, showed that the new approach provides superior performances with an accuracy of 97.3%, only a 1.5% false positive rate and minimal detection delay compared to several existing machine learning(ML) and deep learning(DL) established methods in the same fields. The results show strong evidence that hybrid AI can detect both existing and novel malware variants, and lay the foundation on intelligent security systems that can enable real-time detection and adapt to a rapidly evolving threat landscape.
Authors:Polra Victor Falade, Oluwafemi Osho
Abstract:
This paper examines Nigeria's pursuit of digital sovereignty through two core instruments: the Cybercrimes (Prohibition, Prevention, etc.) Act and the National Cybersecurity Policy and Strategy (NCPS). Despite recent reforms, it remains unclear whether these frameworks effectively secure Nigeria's digital domain and advance its digital sovereignty amid escalating cross-border cyber threats. Using a multi-method, triangulated qualitative design that combines document analysis, secondary analysis of existing studies, expert insights, and direct observation of cybersecurity developments, the paper assesses how these instruments operate in practice. The Cybercrimes Act (2015, amended 2024) and NCPS (2015, revised 2021) have strengthened Nigeria's commitments to tackling cybercrime, regulating digital activities, and protecting critical infrastructure. Yet persistent gaps remain, including legislative ambiguities, weak enforcement, uneven threat prioritization, limited institutional coordination, and loss of skilled professionals. The paper argues that achieving digital sovereignty will require stronger implementation, sustainable resourcing, workforce retention, and clearer accountability mechanisms to translate policy ambition into tangible and durable security outcomes.
Authors:Sahibpreet Singh, Lalita Devi
Abstract:
This paper examines the admissibility of AI-generated forensic evidence in criminal trials. The growing adoption of AI presents promising results for investigative efficiency. Despite advancements, significant research gaps persist in practically understanding the legal limits of AI evidence in judicial processes. Existing literature lacks focused assessment of the evidentiary value of AI outputs. The objective of this study is to evaluate whether AI-generated evidence satisfies established legal standards of reliability. The methodology involves a comparative doctrinal legal analysis of evidentiary standards across common law jurisdictions. Preliminary results indicate that AI forensic tools can enhance scale of evidence analysis. However, challenges arise from reproducibility deficits. Courts exhibit variability in acceptance of AI evidence due to limited technical literacy and lack of standardized validation protocols. Liability implications reveal that developers and investigators may bear accountability for flawed outputs. This raises critical concerns related to wrongful conviction. The paper emphasizes the necessity of independent validation and, development of AI-specific admissibility criteria. Findings inform policy development for the responsible AI integration within criminal justice systems. The research advances the objectives of Sustainable Development Goal 16 by reinforcing equitable access to justice. Preliminary results contribute for a foundation for future empirical research in AI deployed criminal forensics.
Authors:Adrian Serrano, Erwan Umlil, Ronan Thomas
Abstract:
Deepfake detection systems deployed in real-world environments are subject to adversaries capable of crafting imperceptible perturbations that degrade model performance. While adversarial training is a widely adopted defense, its effectiveness under realistic conditions -- where attackers operate with limited knowledge and mismatched data distributions - remains underexplored. In this work, we extend the DUMB -- Dataset soUrces, Model architecture and Balance - and DUMBer methodology to deepfake detection. We evaluate detectors robustness against adversarial attacks under transferability constraints and cross-dataset configuration to extract real-world insights. Our study spans five state-of-the-art detectors (RECCE, SRM, XCeption, UCF, SPSL), three attacks (PGD, FGSM, FPBA), and two datasets (FaceForensics++ and Celeb-DF-V2). We analyze both attacker and defender perspectives mapping results to mismatch scenarios. Experiments show that adversarial training strategies reinforce robustness in the in-distribution cases but can also degrade it under cross-dataset configuration depending on the strategy adopted. These findings highlight the need for case-aware defense strategies in real-world applications exposed to adversarial attacks.
Authors:Federico Mazzone, Giorgio Micali, Massimiliano Pronesti
Abstract:
We introduce the first method for change-point detection on encrypted time series. Our approach employs the CKKS homomorphic encryption scheme to detect shifts in statistical properties (e.g., mean, variance, frequency) without ever decrypting the data. Unlike solutions based on differential privacy, which degrade accuracy through noise injection, our solution preserves utility comparable to plaintext baselines. We assess its performance through experiments on both synthetic datasets and real-world time series from healthcare and network monitoring. Notably, our approach can process one million points within 3 minutes.
Authors:Ahmad Alobaid, Martí Jordà Roca, Carlos Castillo, Joan Vendrell
Abstract:
The availability of Large Language Models (LLMs) has led to a new generation of powerful chatbots that can be developed at relatively low cost. As companies deploy these tools, security challenges need to be addressed to prevent financial loss and reputational damage. A key security challenge is jailbreaking, the malicious manipulation of prompts and inputs to bypass a chatbot's safety guardrails. Multi-turn attacks are a relatively new form of jailbreaking involving a carefully crafted chain of interactions with a chatbot. We introduce Echo Chamber, a new multi-turn attack using a gradual escalation method. We describe this attack in detail, compare it to other multi-turn attacks, and demonstrate its performance against multiple state-of-the-art models through extensive evaluation.
Authors:Tooba Qasim, Vasilios A. Siris, Izak Oosthuizen, Muttukrishnan Rajarajan, Sujit Biswas
Abstract:
Biometric authentication has become integral to digital identity systems, particularly in smart cities where it en-ables secure access to services across governance, trans-portation, and public infrastructure. Centralised archi-tectures, though widely used, pose privacy and scalabil-ity challenges due to the aggregation of sensitive biomet-ric data. Decentralised identity frameworks offer better data sovereignty and eliminate single points of failure but introduce new security concerns, particularly around mu-tual trust among distributed devices. In such environments, biometric sensors and verification agents must authenticate one another before sharing sensitive biometric data. Ex-isting authentication schemes rely on classical public key infrastructure, which is increasingly susceptible to quan-tum attacks. This work addresses this gap by propos-ing a quantum-secure communication protocol for decen-tralised biometric systems, built upon an enhanced Quan-tum Key Distribution (QKD) system. The protocol incorpo-rates quantum-resilient authentication at both the classical and quantum layers of QKD: post-quantum cryptography (PQC) is used to secure the classical channel, while authen-tication qubits verify the integrity of the quantum channel. Once trust is established, QKD generates symmetric keys for encrypting biometric data in transit. Qiskit-based sim-ulations show a key generation rate of 15 bits/sec and 89% efficiency. This layered, quantum-resilient approach offers scalable, robust authentication for next-generation smart city infrastructures.
Authors:Mohamed Nabeel, Oleksii Starov
Abstract:
According to Gartner, more than 70% of organizations will have integrated AI models into their workflows by the end of 2025. In order to reduce cost and foster innovation, it is often the case that pre-trained models are fetched from model hubs like Hugging Face or TensorFlow Hub. However, this introduces a security risk where attackers can inject malicious code into the models they upload to these hubs, leading to various kinds of attacks including remote code execution (RCE), sensitive data exfiltration, and system file modification when these models are loaded or executed (predict function). Since AI models play a critical role in digital transformation, this would drastically increase the number of software supply chain attacks. While there are several efforts at detecting malware when deserializing pickle based saved models (hiding malware in model parameters), the risk of abusing DL APIs (e.g. TensorFlow APIs) is understudied. Specifically, we show how one can abuse hidden functionalities of TensorFlow APIs such as file read/write and network send/receive along with their persistence APIs to launch attacks. It is concerning to note that existing scanners in model hubs like Hugging Face and TensorFlow Hub are unable to detect some of the stealthy abuse of such APIs. This is because scanning tools only have a syntactically identified set of suspicious functionality that is being analysed. They often do not have a semantic-level understanding of the functionality utilized. After demonstrating the possible attacks, we show how one may identify potentially abusable hidden API functionalities using LLMs and build scanners to detect such abuses.
Authors:Israt Jahan Chowdhury, Md Abu Yousuf Tanvir
Abstract:
Detection systems that utilize machine learning are progressively implemented at Security Operations Centers (SOCs) to help an analyst to filter through high volumes of security alerts. Practically, such systems tend to reveal probabilistic results or confidence scores which are ill-calibrated and hard to read when under pressure. Qualitative and survey based studies of SOC practice done before reveal that poor alert quality and alert overload greatly augment the burden on the analyst, especially when tool outputs are not coherent with decision requirements, or signal noise. One of the most significant limitations is that model confidence is usually shown without expressing that there are asymmetric costs in decision making where false alarms are much less harmful than missed attacks. The present paper presents a decision-sensitive trust signal correspondence scheme of SOC alert triage. The framework combines confidence that has been calibrated, lightweight uncertainty cues, and cost-sensitive decision thresholds into coherent decision-support layer, instead of making changes to detection models. To enhance probabilistic consistency, the calibration is done using the known post-hoc methods and the uncertainty cues give conservative protection in situations where model certainty is low. To measure the model-independent performance of the suggested model, we apply the Logistic Regression and the Random Forest classifiers to the UNSW-NB15 intrusion detection benchmark. According to simulation findings, false negatives are greatly amplified by the presence of misaligned displays of confidence, whereas cost weighted loss decreases by orders of magnitude between models with decision aligned trust signals. Lastly, we describe a human-in-the-loop study plan that would allow empirically assessing the decision-making of the analysts with aligned and misaligned trust interfaces.
Authors:Firdous Kausar, Asmah Muallem, Naw Safrin Sattar, Mohamed Zakaria Kurdi
Abstract:
We present a hybrid framework for adaptive insider-threat detection that tightly integrates multi-agent simulation (MAS), layered Security Information and Event Management (SIEM) correlation, behavioral and communication forensics, trust-aware machine learning, and Theory-of-Mind (ToM) reasoning. Intelligent agents operate in a simulated enterprise environment, generating both behavioral events and cognitive intent signals that are ingested by a centralized SIEM. We evaluate four system variants: a Layered SIEM-Core (LSC) baseline, a Cognitive-Enriched SIEM (CE-SIEM) incorporating ToM and communication forensics, an Evidence-Gated SIEM (EG-SIEM) introducing precision-focused validation mechanisms, and an Enron-enabled EG-SIEM (EG-SIEM-Enron) that augments evidence gating with a pretrained email forensics module calibrated on Enron corpora. Across ten simulation runs involving eight malicious insiders, CE-SIEM achieves perfect recall (1.000) and improves actor-level F1 from 0.521 (LSC) to 0.774. EG-SIEM raises actor-level F1 to 0.922 and confirmed-alert precision to 0.997 while reducing false positives to 0.2 per run. EG-SIEM-Enron preserves high precision (1.000 confirmed-alert precision; 0.0 false positives per run), slightly improves actor-level F1 to 0.933, and reduces detection latency (average TTD 10.26 steps versus 15.20 for EG-SIEM). These results demonstrate that cognitive context improves sensitivity, evidence-gated validation enables high-precision, low-noise detection, and pretrained communication calibration can further accelerate high-confidence insider threat identification.
Authors:Yannick Landeck, Dian Balta, Martin Wimmer, Christian Knierim
Abstract:
In operational technology (OT) contexts, containerised applications often require elevated privileges to access low-level network interfaces or perform administrative tasks such as application monitoring. These privileges reduce the default isolation provided by containers and introduce significant security risks. Security risk identification for OT container deployments is challenged by hybrid IT/OT architectures, fragmented stakeholder knowledge, and continuous system changes. Existing approaches lack reproducibility, interpretability across contexts, and technical integration with deployment artefacts. We propose a model-based approach, implemented as the Container Security Risk Ontology (CSRO), which integrates five key domains: adversarial behaviour, contextual assumptions, attack scenarios, risk assessment rules, and container security artefacts. Our evaluation of CSRO in a case study demonstrates that the end-to-end formalisation of risk calculation, from artefact to risk level, enables automated and reproducible risk identification. While CSRO currently focuses on technical, container-level treatment measures, its modular and flexible design provides a solid foundation for extending the approach to host-level and organisational risk factors.
Authors:Praneeta K Maganti, Daisuke Mashima, Rajib Ranjan Maiti
Abstract:
Smart grids are increasingly exposed to sophisticated cyber threats due to their reliance on interconnected communication networks, as demonstrated by real world incidents such as the cyberattacks on the Ukrainian power grid. In IEC61850 based smart substations, the Manufacturing Message Specification protocol operates over TCP to facilitate communication between SCADA systems and field devices such as Intelligent Electronic Devices and Programmable Logic Controllers. Although MMS enables efficient monitoring and control, it can be exploited by adversaries to generate legitimate looking packets for reconnaissance, unauthorized state reading, and malicious command injection, thereby disrupting grid operations. In this work, we propose a fully automated attack detection and prevention framework for IEC61850 compliant smart substations to counter remote cyberattacks that manipulate process states through compromised PLCs and IEDs. A detailed analysis of the MMS protocol is presented, and critical MMS field value pairs are extracted during both normal SCADA operation and active attack conditions. The proposed framework is validated using seven datasets comprising benign operational scenarios and multiple attack instances, including IEC61850Bean based attacks and script driven attacks leveraging the libiec61850 library. Our approach accurately identifies attack signature carrying MMS packets that attempt to disrupt circuit breaker status, specifically targeting the smart home zone IED and PLC of the EPIC testbed. The results demonstrate the effectiveness of the proposed framework in precisely detecting malicious MMS traffic and enhancing the cyber resilience of IEC61850 based smart grid environments.
Authors:Kelvin Uzoma Echenim, Karuna Pande Joshi
Abstract:
Disaster response requires sharing heterogeneous artifacts, from tabular assistance records to UAS imagery, under overlapping privacy mandates. Operational systems often reduce compliance to binary access control, which is brittle in time-critical workflows. We present a novel deontic knowledge graph-based framework that integrates a Disaster Management Knowledge Graph (DKG) with a Policy Knowledge Graph (PKG) derived from IoT-Reg and FEMA/DHS privacy drivers. Our release decision function supports three outcomes: Allow, Block, and Allow-with-Transform. The latter binds obligations to transforms and verifies post-transform compliance via provenance-linked derived artifacts; blocked requests are logged as semantic privacy incidents. Evaluation on a 5.1M-triple DKG with 316K images shows exact-match decision correctness, sub-second per-decision latency, and interactive query performance across both single-graph and federated workloads.
Authors:Rasmus Erlemann, Charles Colyer Morris, Sanjyot Sathe
Abstract:
The emergence of large-scale quantum computing threatens widely deployed public-key cryptographic systems, creating an urgent need for enterprise-level methods to assess post-quantum (PQ) readiness. While PQ standards are under development, organizations lack scalable and quantitative frameworks for measuring cryptographic exposure and prioritizing migration across complex infrastructures. This paper presents a knowledge graph based framework that models enterprise cryptographic assets, dependencies, and vulnerabilities to compute a unified PQ readiness score. Infrastructure components, cryptographic primitives, certificates, and services are represented as a heterogeneous graph, enabling explicit modeling of dependency-driven risk propagation. PQ exposure is quantified using graph-theoretic risk functionals and attributed across cryptographic domains via Shapley value decomposition. To support scalability and data quality, the framework integrates large language models with human-in-the-loop validation for asset classification and risk attribution. The resulting approach produces explainable, normalized readiness metrics that support continuous monitoring, comparative analysis, and remediation prioritization.
Authors:Huan Lin Oh, Jay Yong Jun Jie, Mandy Lee Ling Siu, Jonathan Pan
Abstract:
Cybersecurity post-incident reviews are essential for identifying control failures and improving organisational resilience, yet they remain labour-intensive, time-consuming, and heavily reliant on expert judgment. This paper investigates whether Large Language Models (LLMs) can augment post-incident review workflows by autonomously analysing system evidence and identifying security policy gaps. We present a threat-informed, agentic framework that ingests log data, maps observed behaviours to the MITRE ATT&CK framework, and evaluates organisational security policies for adequacy and compliance. Using a simulated brute-force attack scenario against a Windows OpenSSH service (MITRE ATT&CK T1110), the system leverages GPT-4o for reasoning, LangGraph for multi-agent workflow orchestration, and LlamaIndex for traceable policy retrieval. Experimental results indicate that the LLM-based pipeline can interpret log-derived evidence, identify insufficient or missing policy controls, and generate actionable remediation recommendations with explicit evidence-to-policy traceability. Unlike prior work that treats log analysis and policy validation as isolated tasks, this study integrates both into a unified end-to-end proof-of-concept post-incident review framework. The findings suggest that LLM-assisted analysis has the potential to improve the efficiency, consistency, and auditability of post-incident evaluations, while highlighting the continued need for human oversight in high-stakes cybersecurity decision-making.
Authors:Jing Liu, Liang Feng Zhang
Abstract:
In this paper, we introduce FlexProofs, a new vector commitment (VC) scheme that achieves two key properties: (1) the prover can generate all individual opening proofs for a vector of size $N$ in optimal time ${\cal O}(N)$, and there is a flexible batch size parameter $b$ that can be increased to further reduce the time to generate all proofs; and (2) the scheme is directly compatible with a family of zkSNARKs that encode their input as a multi-linear polynomial. As a critical building block, we propose the first functional commitment (FC) scheme for multi-exponentiations with batch opening. Compared with HydraProofs, the only existing VC scheme that computes all proofs in optimal time ${\cal O}(N)$ and is directly compatible with zkSNARKs, FlexProofs may speed up the process of generating all proofs, if the parameter $b$ is properly chosen. Our experiments show that for $N=2^{16}$ and $b=\log^2 N$, FlexProofs can be $6\times$ faster than HydraProofs. Moreover, when combined with suitable zkSNARKs, FlexProofs enable practical applications such as verifiable secret sharing and verifiable robust aggregation.
Authors:Jake Feiglin, Guy Dar
Abstract:
SAST (Static Application Security Testing) tools are among the most widely used techniques in defensive cybersecurity, employed by commercial and non-commercial organizations to identify potential vulnerabilities in software. Despite their great utility, they generate numerous false positives, requiring costly manual filtering (aka triage). While LLM-powered agents show promise for automating cybersecurity tasks, existing benchmarks fail to emulate real-world SAST finding distributions. We introduce SastBench, a benchmark for evaluating SAST triage agents that combines real CVEs as true positives with filtered SAST tool findings as approximate false positives. SastBench features an agent-agnostic design. We evaluate different agents on the benchmark and present a comparative analysis of their performance, provide a detailed analysis of the dataset, and discuss the implications for future development.
Authors:Jessica A. Sciammarelli, Waqas Ahmed
Abstract:
Modern cyberattacks are increasingly complex, posing significant challenges to classical machine learning methods, particularly when labeled data is limited and feature interactions are highly non-linear. In this study we investigates the potential of hybrid quantum-classical learning to enhance feature representations for intrusion detection and explore possible quantum advantages in cybersecurity analytics. Using the UNSW-NB15 dataset, network traffic is transformed into structured feature vectors through classical preprocessing and normalization. Classical models, including Logistic Regression and Support Vector Machines with linear and RBF kernels, are evaluated on the full dataset to establish baseline performance under large-sample conditions. Simultaneously, a quantum-enhanced pipeline maps classical features into variational quantum circuits via angle encoding and entangling layers, executed on a CPU-based quantum simulator, with resulting quantum embeddings classified using a classical SVM. Experiments show that while classical models achieve higher overall accuracy with large datasets, quantum-enhanced representations demonstrate superior attack recall and improved class separability when data is scarce, suggesting that quantum feature spaces capture complex correlations inaccessible to shallow classical models. These results highlight the potential of quantum embeddings to improve generalization and representation quality in cybersecurity tasks and provide a reproducible framework for evaluating quantum advantages as quantum hardware and simulators continue to advance.
Authors:Intae Jeon, Yujeong Kwon, Hyungjoon Koo
Abstract:
The ever-increasing adoption of Large Language Models in critical sectors like finance, healthcare, and government raises privacy concerns regarding the handling of sensitive Personally Identifiable Information (PII) during training. In response, regulations such as European Union's General Data Protection Regulation (GDPR) mandate the deletion of PII upon requests, underscoring the need for reliable and cost-effective data removal solutions. Machine unlearning has emerged as a promising direction for selectively forgetting data points. However, existing unlearning techniques typically apply a uniform forgetting strategy that neither accounts for the varying privacy risks posed by different PII attributes nor reflects associated business risks. In this work, we propose UnPII, the first PII-centric unlearning approach that prioritizes forgetting based on the risk of individual or combined PII attributes. To this end, we introduce the PII risk index (PRI), a composite metric that incorporates multiple dimensions of risk factors: identifiability, sensitivity, usability, linkability, permanency, exposability, and compliancy. The PRI enables a nuanced evaluation of privacy risks associated with PII exposures and can be tailored to align with organizational privacy policies. To support realistic assessment, we systematically construct a synthetic PII dataset (e.g., 1,700 PII instances) that simulates realistic exposure scenarios. UnPII seamlessly integrates with established unlearning algorithms, such as Gradient Ascent, Negative Preference Optimization, and Direct Preference Optimization, without modifying their underlying principles. Our experimental results demonstrate that UnPII achieves the improvements of accuracy up to 11.8%, utility up to 6.3%, and generalizability up to 12.4%, respectively, while incurring a modest fine-tuning overhead of 27.5% on average during unlearning.
Authors:Hyunhum Cho, Ik Rae Jeong
Abstract:
The rigorous security model of Bitcoin's UTXO architecture often comes at the cost of developer usability, forcing a reliance on manual stack manipulation that leads to critical financial vulnerabilities like signature malleability, unspendable states and unconstrained execution paths. Industry standards such as Miniscript provide necessary abstractions for policy verification but do not model the full imperative logic required for complex contracts, leaving gaps in state management and resource liveness. This paper introduces Bithoven, a high-level language designed to bridge the gap between expressiveness and formal safety. By integrating a strict type checker and a resource liveness analyzer with a semantic control-flow analyzer, Bithoven eliminates major categories of consensus and logic defects defined in our fault model prior to deployment. Our results indicate that this safety comes at modest cost: Bithoven compiles to Bitcoin Script with efficiency comparable to hand-optimized code, demonstrating that type-safe, developer-friendly abstractions are viable even within the strict byte-size constraints of the Bitcoin blockchain.
Authors:Shriram KS Pandian, Naresh Kshetri
Abstract:
Data poisoning attacks (DPAs) are becoming popular as artificial intelligence (AI) algorithms, machine learning (ML) algorithms, and deep learning (DL) algorithms in this artificial intelligence (AI) era. Hackers and penetration testers are excessively injecting malicious contents in the training data (and in testing data too) that leads to false results that are very hard to inspect and predict. We have analyzed several recent technologies used (from deep reinforcement learning to federated learning) for the DPAs and their safety, security, & countermeasures. The problem setup along with the problem estimation is shown in the MuJoCo environment with performance of HalfCheetah before the dataset is poisoned and after the dataset is poisoned. We have analyzed several risks associated with the DPAs and falsification in medical data from popular poisoning data attacks to some popular data defenses. We have proposed robust offline reinforcement learning (Offline RL) for the safety and reliability with weighted hash verification along with density-ratio weighted behavioral cloning (DWBC) algorithm. The four stages of the proposed algorithm (as the Stage 0, the Stage 1, the Stage 2, and the Stage 3) are described with respect to offline RL, safety, and security for DPAs. The conclusion and future scope are provided with the intent to combine DWBC with other data defense strategies to counter and protect future contamination cyberattacks.
Authors:Suryansh Singh Sijwali, Suman Saha
Abstract:
Large Language Models (LLMs) can generate plausible code, but in settings that require exact stdin/stdout behavior they frequently produce programs that compile yet fail tests, and in some cases they introduce security-sensitive patterns. This paper presents SecureCodeRL, a reinforcement learning (RL) pipeline for security-aware code generation that optimizes a combined reward R = αRfunc + \b{eta}Rsec. The key idea is a partial-credit functional reward that assigns intermediate scores for syntactic validity, successful execution, and producing output, reducing reward sparsity that otherwise stalls learning on competitive programming style tasks. I evaluate supervised fine-tuning (SFT) and PPO variants on a small held-out prompt set from APPS+ and observe that PPO with partial credit (using a continued-training variant) improves syntax validity from 45% (SFT) to 60% and achieves the only non-zero test success signal in this pilot evaluation (5% at-least-one-test-pass), while remaining 100% clean under Bandit static analysis. Although Bandit findings were absent in this small evaluation, the security term is integrated into training to discourage insecure shortcuts when they appear.
Authors:Saravanan A, Aswani Kumar Cherukuri
Abstract:
Encrypted network traffic poses significant challenges for intrusion detection due to the lack of payload visibility, limited labeled datasets, and high class imbalance between benign and malicious activities. Traditional data augmentation methods struggle to preserve the complex temporal and statistical characteristics of real network traffic. To address these issues, this work explores the use of Generative AI (GAI) models to synthesize realistic and diverse encrypted traffic traces. We evaluate three approaches: Variational Autoencoders (VAE), Generative Adversarial Networks (GAN), and SMOTE (Synthetic Minority Over-sampling Technique), each integrated with a preprocessing pipeline that includes feature selection and class balancing. The UNSW NB-15 dataset is used as the primary benchmark, focusing on Tor traffic as anomalies. We analyze statistical similarity between real and synthetic data, and assess classifier performance using metrics such as Accuracy, F1-score, and AUC-ROC. Results show that VAE-generated data provides the best balance between privacy and performance, while GANs offer higher fidelity but risk overfitting. SMOTE, though simple, enhances recall but may lack diversity. The findings demonstrate that GAI methods can significantly improve encrypted traffic detection when trained with privacy-preserving synthetic data.
Authors:Milad Rahmati, Nima Rahmati
Abstract:
The proliferation of Internet of Things devices in critical infrastructure has created unprecedented cybersecurity challenges, necessitating collaborative threat detection mechanisms that preserve data privacy while maintaining robustness against sophisticated attacks. Traditional federated learning approaches for IoT security suffer from two critical vulnerabilities: susceptibility to Byzantine attacks where malicious participants poison model updates, and inadequacy against future quantum computing threats that can compromise cryptographic aggregation protocols. This paper presents a novel Byzantine-robust federated learning framework integrated with post-quantum secure aggregation specifically designed for real-time threat intelligence sharing across critical IoT infrastructure. The proposed framework combines a adaptive weighted aggregation mechanism with lattice-based cryptographic protocols to simultaneously defend against model poisoning attacks and quantum adversaries. We introduce a reputation-based client selection algorithm that dynamically identifies and excludes Byzantine participants while maintaining differential privacy guarantees. The secure aggregation protocol employs CRYSTALS-Kyber for key encapsulation and homomorphic encryption to ensure confidentiality during parameter updates. Experimental evaluation on industrial IoT intrusion detection datasets demonstrates that our framework achieves 96.8% threat detection accuracy while successfully mitigating up to 40% Byzantine attackers, with only 18% computational overhead compared to non-secure federated approaches. The framework maintains sub-second aggregation latency suitable for real-time applications and provides 256-bit post-quantum security level.
Authors:M P V S Gopinadh, S Mahaboob Hussain
Abstract:
Large Language Models (LLMs) are integral to modern AI applications, but their safety alignment mechanisms can be bypassed through adversarial prompt engineering. This study investigates emoji-based jailbreaking, where emoji sequences are embedded in textual prompts to trigger harmful and unethical outputs from LLMs. We evaluated 50 emoji-based prompts on four open-source LLMs: Mistral 7B, Qwen 2 7B, Gemma 2 9B, and Llama 3 8B. Metrics included jailbreak success rate, safety alignment adherence, and latency, with responses categorized as successful, partial and failed. Results revealed model-specific vulnerabilities: Gemma 2 9B and Mistral 7B exhibited 10 % success rates, while Qwen 2 7B achieved full alignment (0% success). A chi-square test (chi^2 = 32.94, p < 0.001) confirmed significant inter-model differences. While prior works focused on emoji attacks targeting safety judges or classifiers, our empirical analysis examines direct prompt-level vulnerabilities in LLMs. The results reveal limitations in safety mechanisms and highlight the necessity for systematic handling of emoji-based representations in prompt-level safety and alignment pipelines.
Authors:Sheldon Paul, Izzat Alsmadi
Abstract:
This paper presents a unified framework for evaluating Linux security hardening on the FABRIC testbed through aggregation of heterogeneous security auditing tools. We deploy three Ubuntu 22.04 nodes configured at baseline, partial, and full hardening levels, and evaluate them using Lynis, OpenSCAP, and AIDE across 108 audit runs. To address the lack of a consistent interpretation across tools, we implement a Unified Compliance Aggregator (UCA) that parses tool outputs, normalizes scores to a common 0--100 scale, and combines them into a weighted metric augmented by a customizable rule engine for organization-specific security policies. Experimental results show that full hardening increases OpenSCAP compliance from 39.7 to 71.8, while custom rule compliance improves from 39.3\% to 83.6\%. The results demonstrate that UCA provides a clearer and more reproducible assessment of security posture than individual tools alone, enabling systematic evaluation of hardening effectiveness in programmable testbed environments.
Authors:KC Aashish, Md Zakir Hossain Zamil, Md Shafiqul Islam Mridul, Lamia Akter, Farmina Sharmin, Eftekhar Hossain Ayon, Md Maruf Bin Reza, Ali Hassan, Abdur Rahim, Sirapa Malla
Abstract:
The rising energy footprint of artificial intelligence has become a measurable component of US data center emissions, yet cybersecurity research seldom considers its environmental cost. This study introduces an eco aware anomaly detection framework that unifies machine learning based network monitoring with real time carbon and energy tracking. Using the publicly available Carbon Aware Cybersecurity Traffic Dataset comprising 2300 flow level observations, we benchmark Logistic Regression, Random Forest, Support Vector Machine, Isolation Forest, and XGBoost models across energy, carbon, and performance dimensions. Each experiment is executed in a controlled Colab environment instrumented with the CodeCarbon toolkit to quantify power draw and equivalent CO2 output during both training and inference. We construct an Eco Efficiency Index that expresses F1 score per kilowatt hour to capture the trade off between detection quality and environmental impact. Results reveal that optimized Random Forest and lightweight Logistic Regression models achieve the highest eco efficiency, reducing energy consumption by more than forty percent compared to XGBoost while sustaining competitive detection accuracy. Principal Component Analysis further decreases computational load with negligible loss in recall. Collectively, these findings establish that integrating carbon and energy metrics into cybersecurity workflows enables environmentally responsible machine learning without compromising operational protection. The proposed framework offers a reproducible path toward sustainable carbon accountable cybersecurity aligned with emerging US green computing and federal energy efficiency initiatives.
Authors:Giuseppe Canale, Kashyap Thimmaraju
Abstract:
Large Language Models (LLMs) are rapidly transitioning from conversational assistants to autonomous agents embedded in critical organizational functions, including Security Operations Centers (SOCs), financial systems, and infrastructure management. Current adversarial testing paradigms focus predominantly on technical attack vectors: prompt injection, jailbreaking, and data exfiltration. We argue this focus is catastrophically incomplete. LLMs, trained on vast corpora of human-generated text, have inherited not merely human knowledge but human \textit{psychological architecture} -- including the pre-cognitive vulnerabilities that render humans susceptible to social engineering, authority manipulation, and affective exploitation. This paper presents the first systematic application of the Cybersecurity Psychology Framework (\cpf{}), a 100-indicator taxonomy of human psychological vulnerabilities, to non-human cognitive agents. We introduce the \textbf{Synthetic Psychometric Assessment Protocol} (\sysname{}), a methodology for converting \cpf{} indicators into adversarial scenarios targeting LLM decision-making. Our preliminary hypothesis testing across seven major LLM families reveals a disturbing pattern: while models demonstrate robust defenses against traditional jailbreaks, they exhibit critical susceptibility to authority-gradient manipulation, temporal pressure exploitation, and convergent-state attacks that mirror human cognitive failure modes. We term this phenomenon \textbf{Anthropomorphic Vulnerability Inheritance} (AVI) and propose that the security community must urgently develop ``psychological firewalls'' -- intervention mechanisms adapted from the Cybersecurity Psychology Intervention Framework (\cpif{}) -- to protect AI agents operating in adversarial environments.
Authors:Jason Quantrill, Noura Khajehnouri, Zihan Guo, Manar H. Alalfi
Abstract:
Smart home IoT platforms such as openHAB rely on Trigger Action Condition (TAC) rules to automate device behavior, but the interplay among these rules can give rise to interaction threats, unintended or unsafe behaviors emerging from implicit dependencies, conflicting triggers, or overlapping conditions. Identifying these threats requires semantic understanding and structural reasoning that traditionally depend on symbolic, constraint-driven static analysis. This work presents the first comprehensive evaluation of Large Language Models (LLMs) across a multi-category interaction threat taxonomy, assessing their performance on both the original openHAB (oHC/IoTB) dataset and a structurally challenging Mutation dataset designed to test robustness under rule transformations. We benchmark Llama 3.1 8B, Llama 70B, GPT-4o, Gemini-2.5-Pro, and DeepSeek-R1 across zero-, one-, and two-shot settings, comparing their results against oHIT's manually validated ground truth. Our findings show that while LLMs exhibit promising semantic understanding, particularly on action- and condition-related threats, their accuracy degrades significantly for threats requiring cross-rule structural reasoning, especially under mutated rule forms. Model performance varies widely across threat categories and prompt settings, with no model providing consistent reliability. In contrast, the symbolic reasoning baseline maintains stable detection across both datasets, unaffected by rule rewrites or structural perturbations. These results underscore that LLMs alone are not yet dependable for safety critical interaction-threat detection in IoT environments. We discuss the implications for tool design and highlight the potential of hybrid architectures that combine symbolic analysis with LLM-based semantic interpretation to reduce false positives while maintaining structural rigor.
Authors:Trung Dao, Minh Nguyen, Son Do, Hoang Tran
Abstract:
The rapid proliferation of Internet of Things (IoT) technologies, projected to exceed 30 billion interconnected devices by 2030, has significantly escalated the complexity of cybersecurity challenges. This survey aims to provide a comprehensive analysis of vulnerabilities, threats, and defense mechanisms, specifically focusing on the integration of network and application layers within real-time monitoring and decision-making systems. Employing an integrative review methodology, 59 scholarly articles published between 2009 and 2024 were selected from databases such as IEEE Xplore, ScienceDirect, and PubMed, utilizing keywords related to IoT vulnerabilities and security attacks. Key findings identify critical threat categories, including sensor vulnerabilities, Denial-of-Service (DoS) attacks, and public cloud insecurity. Conversely, the study highlights advanced defense approaches leveraging Artificial Intelligence (AI) for anomaly detection, Blockchain for decentralized trust, and Zero Trust Architecture (ZTA) for continuous verification. This paper contributes a novel five-layer IoT model and outlines future research directions involving quantum computing and 6G networks to bolster IoT ecosystem resilience.
Authors:Vidyut Sriram, Sawan Pandita, Achintya Lakshmanan, Aneesh Shamraj, Suman Saha
Abstract:
Large Language Models (LLMs) can generate code but often introduce security vulnerabilities, logical inconsistencies, and compilation errors. Prior work demonstrates that LLMs benefit substantially from structured feedback, static analysis, retrieval augmentation, and execution-based refinement. We propose a retrieval-augmented, multi-tool repair workflow in which a single code-generating LLM iteratively refines its outputs using compiler diagnostics, CodeQL security scanning, and KLEE symbolic execution. A lightweight embedding model is used for semantic retrieval of previously successful repairs, providing security-focused examples that guide generation. Evaluated on a combined dataset of 3,242 programs generated by DeepSeek-Coder-1.3B and CodeLlama-7B, the system demonstrates significant improvements in robustness. For DeepSeek, security vulnerabilities were reduced by 96%. For the larger CodeLlama model, the critical security defect rate was decreased from 58.55% to 22.19%, highlighting the efficacy of tool-assisted self-repair even on "stubborn" models.
Authors:Prajwal Panth, Sahaj Raj Malla
Abstract:
We propose the Consensus-Based Privacy-Preserving Data Distribution (CPPDD) framework, a lightweight and post-setup autonomous protocol for secure multi-client data aggregation. The framework enforces unanimous-release confidentiality through a dual-layer protection mechanism that combines per-client affine masking with priority-driven sequential consensus locking. Decentralized integrity is verified via step (sigma_S) and data (sigma_D) checksums, facilitating autonomous malicious deviation detection and atomic abort without requiring persistent coordination. The design supports scalar, vector, and matrix payloads with O(N*D) computation and communication complexity, optional edge-server offloading, and resistance to collusion under N-1 corruptions. Formal analysis proves correctness, Consensus-Dependent Integrity and Fairness (CDIF) with overwhelming-probability abort on deviation, and IND-CPA security assuming a pseudorandom function family. Empirical evaluations on MNIST-derived vectors demonstrate linear scalability up to N = 500 with sub-millisecond per-client computation times. The framework achieves 100% malicious deviation detection, exact data recovery, and three-to-four orders of magnitude lower FLOPs compared to MPC and HE baselines. CPPDD enables atomic collaboration in secure voting, consortium federated learning, blockchain escrows, and geo-information capacity building, addressing critical gaps in scalability, trust minimization, and verifiable multi-party computation for regulated and resource-constrained environments.
Authors:Md Mahbub Hasan, Marcus Sternhagen, Krishna Chandra Roy
Abstract:
Additive manufacturing (AM) is rapidly integrating into critical sectors such as aerospace, automotive, and healthcare. However, this cyber-physical convergence introduces new attack surfaces, especially at the interface between computer-aided design (CAD) and machine execution layers. In this work, we investigate targeted cyberattacks on two widely used fused deposition modeling (FDM) systems, Creality's flagship model K1 Max, and Ender 3. Our threat model is a multi-layered Man-in-the-Middle (MitM) intrusion, where the adversary intercepts and manipulates G-code files during upload from the user interface to the printer firmware. The MitM intrusion chain enables several stealthy sabotage scenarios. These attacks remain undetectable by conventional slicer software or runtime interfaces, resulting in structurally defective yet externally plausible printed parts. To counter these stealthy threats, we propose an unsupervised Intrusion Detection System (IDS) that analyzes structured machine logs generated during live printing. Our defense mechanism uses a frozen Transformer-based encoder (a BERT variant) to extract semantic representations of system behavior, followed by a contrastively trained projection head that learns anomaly-sensitive embeddings. Later, a clustering-based approach and a self-attention autoencoder are used for classification. Experimental results demonstrate that our approach effectively distinguishes between benign and compromised executions.
Authors:Daniel Alabi, Theshani Nuradha
Abstract:
Composition is a cornerstone of classical differential privacy, enabling strong end-to-end guarantees for complex algorithms through composition theorems (e.g., basic and advanced). In the quantum setting, however, privacy is defined operationally against arbitrary measurements, and classical composition arguments based on scalar privacy-loss random variables no longer apply. As a result, it has remained unclear when meaningful composition guarantees can be obtained for quantum differential privacy (QDP). In this work, we clarify both the limitations and possibilities of composition in the quantum setting. We first show that classical-style composition fails in full generality for POVM-based approximate QDP: even quantum channels that are individually perfectly private can completely lose privacy when combined through correlated joint implementations. We then identify a setting in which clean composition guarantees can be restored. For tensor-product channels acting on product neighboring inputs, we introduce a quantum moments accountant based on an operator-valued notion of privacy loss and a matrix moment-generating function. Although the resulting Rényi-type divergence does not satisfy a data-processing inequality, we prove that controlling its moments suffices to bound measured Rényi divergence, yielding operational privacy guarantees against arbitrary measurements. This leads to advanced-composition-style bounds with the same leading-order behavior as in the classical theory. Our results demonstrate that meaningful composition theorems for quantum differential privacy require carefully articulated structural assumptions on channels, inputs, and adversarial measurements, and provide a principled framework for understanding which classical ideas do and do not extend to the quantum setting.
Authors:Tamer Afifi, Abdelfatah Hegazy, Ehab Abousaif
Abstract:
In recent decades, the RAFT distributed consensus algorithm has become a main pillar of the distributed systems ecosystem, ensuring data consistency and fault tolerance across multiple nodes. Although the fact that RAFT is well known for its simplicity, reliability, and efficiency, its security properties are not fully recognized, leaving implementations vulnerable to different kinds of attacks and threats, which can transform the RAFT harmony of consensus into a chaos of data inconsistency. This paper presents a systematic security analysis of the RAFT protocol, with a specific focus on its susceptibility to security threats such as message replay attacks and message forgery attacks. Examined how a malicious actor can exploit the protocol's message-passing mechanism to reintroduce old messages, disrupting the consensus process and leading to data inconsistency. The practical feasibility of these attacks is examined through simulated scenarios, and the key weaknesses in RAFT's design that enable them are identified. To address these vulnerabilities, a novel approach based on cryptography, authenticated message verification, and freshness check is proposed. This proposed solution provides a framework for enhancing the security of the RAFT implementations and guiding the development of more resilient distributed systems.
Authors:Fumiya Morimoto, Ryuto Morita, Satoshi Ono
Abstract:
Deep neural network-based classifiers are prone to errors when processing adversarial examples (AEs). AEs are minimally perturbed input data undetectable to humans posing significant risks to security-dependent applications. Hence, extensive research has been undertaken to develop defense mechanisms that mitigate their threats. Most existing methods primarily focus on discriminating AEs based on the input sample features, emphasizing AE detection without addressing the correct sample categorization before an attack. While some tasks may only require mere rejection on detected AEs, others necessitate identifying the correct original input category such as traffic sign recognition in autonomous driving. The objective of this study is to propose a method for rectifying AEs to estimate the correct labels of their original inputs. Our method is based on re-attacking AEs to move them beyond the decision boundary for accurate label prediction, effectively addressing the issue of rectifying minimally perceptible AEs created using white-box attack methods. However, challenge remains with respect to effectively rectifying AEs produced by black-box attacks at a distance from the boundary, or those misclassified into low-confidence categories by targeted attacks. By adopting a straightforward approach of only considering AEs as inputs, the proposed method can address diverse attacks while avoiding the requirement of parameter adjustments or preliminary training. Results demonstrate that the proposed method exhibits consistent performance in rectifying AEs generated via various attack methods, including targeted and black-box attacks. Moreover, it outperforms conventional rectification and input transformation methods in terms of stability against various attacks.
Authors:Rajendra Kumar Solanki, Vijay Laxmi, Manoj Singh Gaur
Abstract:
Android Permission Model and Application (app) analysis has consistently remained the focus of the investigation of research groups and stakeholders of the Android ecosystem since it was launched in 2008. Even though the Android smartphone operating system (OS) permission model has evolved significantly from `all-or-none access' to `user-chosen dangerous resource access', specific challenges and issues remain unresolved even after 15 years after the smartphone OS launch. This study addresses the issues and documents the research work in this arena through a comprehensive literature survey and comparative analysis. The survey's focal point is the Android permission model and relevant research between 2010-2022. We systematize the knowledge on (i) Android API Calls to permissions mapping, (ii) Android Permissions evolution, and (iii) how permissions are checked. Furthermore, the survey identifies the permission-related issues and relevant research addressed during the last decade. We reference seminal work in these areas. We summarize the identified research gaps and present future directions for early and experienced researchers.
Authors:Samar Ansari
Abstract:
The governance of frontier AI increasingly relies on controlling access to computational resources, yet the hardware-level mechanisms invoked by policy proposals remain largely unexamined from an engineering perspective. This paper bridges the gap between AI governance and computer engineering by proposing a taxonomy of 20 hardware-level governance mechanisms, organised by function (monitoring, verification, enforcement) and assessed for technical feasibility on a four-point scale from currently deployable to speculative. For each mechanism, we provide a technical description, a feasibility rating, and an identification of adversarial vulnerabilities. We map the taxonomy onto four governance scenarios: domestic regulation, bilateral agreements, multilateral treaty verification, and industry self-regulation. Our analysis reveals a structural mismatch: the mechanisms most needed for treaty verification, including on-chip compute metering, cryptographic proof-of-training, and hardware-embedded enforcement, are also the least mature. We assess principal threats to compute-based governance, including algorithmic efficiency gains, distributed training methods, and sovereignty concerns. We identify a temporal constraint: the window during which semiconductor manufacturing concentration makes hardware-level governance implementable is narrowing, while R&D timelines for critical mechanisms span years. We present an adversary-tiered threat analysis distinguishing commercial, non-state, and nation-state actors, arguing the appropriate security standard is tamper-evident assurance analogous to IAEA verification rather than absolute tamper-proofing. The taxonomy, feasibility classification, and mechanism-to-scenario mapping provide a technical foundation for policymakers and identify the R&D investments required before hardware-level governance can support verifiable international agreements.
Authors:Motoki Nakamura
Abstract:
Federated learning (FL) enables multiple clients to collaboratively train a global model by aggregating local updates without sharing private data. However, FL often faces the challenge of free-riders, clients who submit fake model parameters without performing actual training to obtain the global model without contributing. Chen et al. proposed a free-rider detection method based on the weight evolving frequency (WEF) of model parameters. This detection approach is a leading candidate for practical free-rider detection methods, as it requires neither a proxy dataset nor pre-training. Nevertheless, it struggles to detect ``dynamic'' free-riders who behave honestly in early rounds and later switch to free-riding, particularly under global-model-mimicking attacks such as the delta weight attack and our newly proposed adaptive WEF-camouflage attack. In this paper, we propose a novel detection method S2-WEF that simulates the WEF patterns of potential global-model-based attacks on the server side using previously broadcasted global models, and identifies clients whose submitted WEF patterns resemble the simulated ones. To handle a variety of free-rider attack strategies, S2-WEF further combines this simulation-based similarity score with a deviation score computed from mutual comparisons among submitted WEFs, and separates benign and free-rider clients by two-dimensional clustering and per-score classification. This method enables dynamic detection of clients that transition into free-riders during training without proxy datasets or pre-training. We conduct extensive experiments across three datasets and five attack types, demonstrating that S2-WEF achieves higher robustness than existing approaches.
Authors:Konstantinos Spalas
Abstract:
The L-Band Digital Aviation Communication System (LDACS) aims to modernize communications between the aircraft and the tower. Besides digitizing this type of communication, the contributors also focus on protecting them against cyberattacks. There are several proposals regarding LDACS security, and a recent one suggests the use of physical unclonable functions (PUFs) for the authentication module. This work demonstrates this PUF-based authentication mechanism along with its potential vulnerabilities. Sophisticated models are able to predict PUFs, and, on the other hand, quantum computers are capable of threatening current cryptography, consisting factors that jeopardize the authentication mechanism giving the ability to perform impersonation attacks. In addition, aging is a characteristic that affects the stability of PUFs, which may cause instability issues, rendering the system unavailable. In this context, this work proposes the well-established Public Key Infrastructure (PKI), as an alternative solution.
Authors:Mohammad Hossein Chinaei
Abstract:
Tool-calling LLM agents can read private data, invoke external services, and trigger real-world actions, creating a security problem at the point of tool execution. We identify a denial-feedback leakage pattern, which we term causality laundering, in which an adversary probes a protected action, learns from the denial outcome, and exfiltrates the inferred information through a later seemingly benign tool call. This attack is not captured by flat provenance tracking alone because the leaked information arises from causal influence of the denied action, not direct data flow. We present the Agentic Reference Monitor (ARM), a runtime enforcement layer that mediates every tool invocation by consulting a provenance graph over tool calls, returned data, field-level provenance, and denied actions. ARM propagates trust through an integrity lattice and augments the graph with counterfactual edges from denied-action nodes, enabling enforcement over both transitive data dependencies and denial-induced causal influence. In a controlled evaluation on three representative attack scenarios, ARM blocks causality laundering, transitive taint propagation, and mixed-provenance field misuse that a flat provenance baseline misses, while adding sub-millisecond policy evaluation overhead. These results suggest that denial-aware causal provenance is a useful abstraction for securing tool-calling agent systems.
Authors:Yoshiyuki Ootani
Abstract:
Location-based systems that combine encrypted geographic search with zero-knowledge proximity proofs typically treat the two phases as independent. Under an honest-but-curious server, this leaves an authorization provenance gap: once session state is purged, no forensic procedure can attribute a proof to its originating search session, because the proof's public inputs encode no session-identifying information. We formalize this gap as the search-authorized proof (SAP) security notion and show via a concrete audit re-association attack that proof-external mechanisms, where authorization evidence remains outside the proof, cannot prevent forensic misattribution when the same drop parameters recur across sessions. Search-Bound Proximity Proofs (SBPP) realize the SAP requirements without modifying the ZKP circuit: session nonce, Merkle-root result-set commitment, and signed receipt are decomposed into independently auditable components, enabling property-level fault isolation in offline audit. Experiments on synthetic and real-world data (110,776 OpenStreetMap POIs) show sub-millisecond absolute overhead on a 125 ms Groth16 baseline.
Authors:Yoshiyuki Ootani
Abstract:
A zero-knowledge proximity proof certifies geometric nearness but carries no commitment to an application context. In stateful geo-content systems, where drops can share coordinates, policies evolve, and content has persistent identity, this gap can permit proof transfer between application objects unless extra operational invariants are maintained. We present a systems-security analysis of this deployment problem: a taxonomy of context-binding vulnerabilities, a formal off-circuit verification model for a transcript-adversary that holds a recorded proof but cannot obtain fresh coordinates, an assumption comparison across five binding strategy classes, and a concrete instantiation, Zairn-ZKP, that embeds drop identity, policy version, and session context as public circuit inputs. Compared with a strong off-circuit alternative based on stored-digest server checking, in-proof binding reduces operational invariants from four to two and adds no measurable proving cost relative to the sound geo-only baseline (-0.12 ms median in our setup). It also removes a correctness pitfall we identify empirically: a plausible off-circuit implementation that omits one server-side check remains vulnerable to cross-drop transfer. Measurements across six network conditions, seven venues in four countries, and an epoch-window simulation indicate that same-epoch transfer is realistic in dense urban deployments unless per-request nonces are maintained. Across five platforms and seven binding strategies, the results support a deployable methodology for reducing assumption surfaces in stateful ZK-backed verification workflows.
Authors:Yoshiyuki Ootani
Abstract:
IoT location services accept client-reported GPS coordinates at face value, yet spoofing is trivial with consumer-grade tools. Existing spoofing detectors output a binary decision, forcing system designers to choose between high false-deny and high false-accept rates. We propose a graduated trust gate that computes a multi-signal integrity score and maps it to three actions: PROCEED, STEP-UP, or DENY, where STEP-UP invokes a stronger verifier such as a zero-knowledge proximity proof. A session-latch mechanism ensures that a single suspicious fix blocks the entire session, preventing post-transition score recovery. Under an idealized step-up oracle on 10,000 synthetic traces, the gate enables strict thresholds (theta_p = 0.9) that a binary gate cannot safely use: at matched false-accept rate (11%), the graduated gate maintains zero false-deny rate versus 0.05% for binary, with 5 microseconds scoring overhead. Real-device traces from an Android smartphone demonstrate the session-latch mechanism and show that a nearby mock location (~550 m) evades theta_p = 0.7 but is routed to step-up at theta_p = 0.9. Signal ablation identifies a minimal two-signal configuration (F1 = 0.84) suitable for resource-constrained scoring layers.
Authors:Weiqi Feng
Abstract:
WebAssembly is quickly becoming a popular compilation target for a variety of code. However, vulnerabilities in the source languages translate to vulnerabilities in the WebAssembly binaries. This work proposes a methodology and a WebAssembly transpiler to prevent buffer overflows in the unmanaged memory of the WebAssembly runtime. The transpiler accepts a WebAssembly binary and adds stack canaries and Address Space Layout Randomization (ASLR) to protect against buffer overflows.
Authors:Jinwook Kim
Abstract:
Formally guaranteeing the safety and liveness of regulatory state transitions in cross-domain state synchronization systems is a problem of growing importance as tokenized assets are increasingly operated across heterogeneous blockchain networks and off-chain ledgers. This paper presents a mechanized proof of 2,348 lines in Isabelle/HOL establishing two complementary properties. First, cross-domain state preservation (safety): a regulatory state transition performed on one domain is faithfully reflected across all connected domains with structural preservation. This guarantee encompasses bidirectional roundtrip preservation, consistency across an arbitrary finite set of domains, and per-asset isolation. Second, liveness under Byzantine faults: in the presence of up to f < n/3 Byzantine nodes, we prove deterministic resolution of conflicting regulatory actions, deadlock freedom, and starvation freedom. In the combination of these two properties, the liveness proof discharges the honest-node assumption of the safety proof under Byzantine faults, promoting conditional safety to an unconditional guarantee. The seven generic locales derived in this process are domain-independent and reusable for arbitrary domains via Isabelle/HOL's interpretation mechanism. The application context is a regulatory state transition model based on the RCP framework (arXiv:2603.29278), which systematizes 31 requirements from 15 global financial regulatory authorities. All proof artifacts build in Isabelle/HOL without sorry or oops, have been submitted to the Archive of Formal Proofs (under review), and are publicly available on GitHub.
Authors:Choon-Hou Rafael Chong
Abstract:
As multimodal large language models (LLMs) advance, traditional CAPTCHAs have become obsolete at distinguishing humans from bots. To address this shift, this paper aims to investigate the possibility of using tasks for which humans have evolved highly specialised neural processing. We introduce two CAPTCHA classes: a vision-based CAPTCHA, which renders alphanumeric strings as ASCII art, and an audio-based CAPTCHA, which is a question-answering task with overlapping or noise-corrupted audio context. We evaluate our vision-based CAPTCHA both as text and image input with multiple frontier LLMs (GPT 5.2, Gemini 3, etc.), and assess our audio-based CAPTCHAs by applying augmentations like background noise, Gaussian noise, and overlapping speech. We determined that none of the LLMs were able to solve a single ASCII-based CAPTCHA, with the best performing model only being able to infer at most one or two characters. Additionally, all models that supported audio performed only modestly better than random when solving audio CAPTCHAs. Our results suggest that these CAPTCHAs are exceptionally effective today, but it is unclear whether it can withstand the fast-evolving landscape of artificial intelligence. Subsequent research is needed to determine whether these tasks are temporary vulnerabilities or represent a more durable method of distinguishing humans from bots.
Authors:Jackson Wang
Abstract:
Prompt injection has emerged as a critical vulnerability in large language model (LLM) deployments, yet existing research is heavily weighted toward defenses. The attack side -- specifically, which injection strategies are most effective and why -- remains insufficiently studied.We address this gap with AttackEval, a systematic empirical study of prompt injection attack effectiveness. We construct a taxonomy of ten attack categories organized into three parent groups (Syntactic, Contextual, and Semantic/Social), populate each category with 25 carefully crafted prompts (250 total), and evaluate them against a simulated production victim system under four progressively stronger defense tiers. Experiments reveal several non-obvious findings: (1) Obfuscation (OBF) achieves the highest single-attack success rate (ASR = 0.76) against even intent-aware defenses, because it defeats both keyword matching and semantic similarity checks simultaneously; (2) Semantic/Social attacks - Emotional Manipulation (EM) and Reward Framing (RF) - maintain high ASR (0.44-0.48) against intent-aware defenses due to their natural language surface, which evades structural anomaly detection; (3) Composite attacks combining two complementary strategies dramatically boost ASR, with the OBF + EM pair reaching 97.6%; (4) Stealth correlates positively with residual ASR against semantic defenses (r = 0.71), implying that future defenses must jointly optimize for both structural and behavioral signals. Our findings identify concrete blind spots in current defenses and provide actionable guidance for designing more robust LLM safety systems.
Authors:Ian C. Moore
Abstract:
We present a formal treatment of provenance trees, directed acyclic graphs of artifact registrations anchored immutably on a public blockchain, and introduce the operator trust problem: when a single privileged operator submits all on-chain registrations on behalf of users, the on-chain record alone cannot distinguish user-initiated registrations from unilateral operator actions. We resolve this through a dual-layer cryptographic commitment scheme in which two commitments derived from a single client-side secret key, binding the key to the tree root and to each unique registration identifier, make false attribution claims strictly dominated strategies. We prove correctness under standard cryptographic assumptions and establish honest behavior as the unique Nash equilibrium without relying on operator trust. We further introduce and analyze the tree poisoning problem: adversarial attacks on users' provenance trees via fraudulent root registration, malicious child attachment, and tree identity spoofing. We characterize the closure properties of each attack variant and prove that a complete provenance tree integrity model requires three distinct mechanisms: cryptographic priority, governance cascade, and contract enforcement, each necessary and none individually sufficient. The construction is deployed on Base (Ethereum L2) as AnchorRegistry, an immutable on-chain provenance registry. We provide gas complexity analysis demonstrating O(1) cost invariant to registry scale, and a trustless reconstruction algorithm recovering the complete registry from public event logs alone.
Authors:Wanru Shao
Abstract:
Misconfiguration, excessive privilege, and tool fragmentation remain the main reasons why enterprise cloud environments are breached. Recent reports on cloud-native application protection note that most incidents can be traced back to configuration or identity errors rather than platform flaws, and that organizations still need separate tools to watch Kubernetes, OpenStack, and infrastructure-as-code. To address this gap, this paper presents an open-source cloud-infrastructure security framework built with a microservice architecture. The framework integrates four core services: 1) identity and access control unification, 2) configuration-baseline intelligent checking over Kubernetes and OpenStack assets, 3) real-time threat monitoring based on Falco-style runtime rules and ELK-based analytics, and 4) automated remediation that consumes Terraform plans and Checkov/OPA policy results to roll back or harden resources. It provides automated deployment, supports 50-200-node clusters, and exposes uniform REST and gRPC interfaces for extension. In an enterprise-grade testbed, vulnerability-assessment time was reduced from 120 min as baseline toolchain to 18 min, with false-positive rate below 5%. After continuous deployment, the number of observable security events dropped by 62%. The project is released under Apache 2.0 to lower entry cost by about 40% for small and medium teams.
Authors:KrishnaSaiReddy Patil
Abstract:
When Agent A delegates to Agent B, which invokes Tool C on behalf of User X, no existing framework can answer: whose authorization chain led to this action, and where did it violate policy? This paper introduces SentinelAgent, a formal framework for verifiable delegation chains in federal multi-agent AI systems. The Delegation Chain Calculus (DCC) defines seven properties - six deterministic (authority narrowing, policy preservation, forensic reconstructibility, cascade containment, scope-action conformance, output schema conformance) and one probabilistic (intent preservation) - with four meta-theorems and one proposition establishing the practical infeasibility of deterministic intent verification. The Intent-Preserving Delegation Protocol (IPDP) enforces all seven properties at runtime through a non-LLM Delegation Authority Service. A three-point verification lifecycle achieves 100% combined TPR at 0% FPR on DelegationBench v4 (516 scenarios, 10 attack categories, 13 federal domains). Under black-box adversarial conditions, the DAS blocks 30/30 attacks with 0 false positives. Deterministic properties are unbreakable under adversarial stress testing; intent verification degrades to 13% against sophisticated paraphrasing. Fine-tuning the NLI model on 190 government delegation examples improves P2 from 1.7% to 88.3% TPR (5-fold cross-validated, F1=82.1%). Properties P1, P3-P7 are mechanically verified via TLA+ model checking across 2.7 million states with zero violations. Even when intent verification is evaded, the remaining six properties constrain the adversary to permitted API calls, conformant outputs, traceable actions, bounded cascades, and compliant behavior.
Authors:John T. Halloran
Abstract:
Safety alignment has become a critical step to ensure LLMs refuse harmful requests while providing helpful and harmless responses. However, despite the ubiquity of safety alignment for deployed frontier models, two separate lines of recent work--jailbreak-tuning (JT) and weight orthogonalization (WO)--have shown that safety guardrails may be largely disabled, resulting in LLMs which comply with harmful requests they would normally refuse. In spite of far-reaching safety implications, analysis has largely been limited to refusal rates of each unalignment method in isolation, leaving their relative effects on adversarial LLM capabilities unknown. To fill this gap, we study the impact of unaligning six popular LLMs of various sizes across a large number of malicious and benign tasks, using both JT and WO. Across the evaluated models, we show that while refusal degradation is split between the two methods, WO produces LLMs far more capable of aiding in malicious activity; in contrast to JT, the majority of WO unaligned models are far less prone to hallucinations, better retain their original natural-language performance, and are more effective at state-of-the-art adversarial and cyber attacks. To thus help mitigate the malicious risks of WO unalignment, we conclude by showing that supervised fine-tuning effectively limits the adversarial attack abilities enabled by WO, without drastically affecting hallucination rates or natural language performance.
Authors:Christophe Parisel
Abstract:
In a companion paper, we prove that the Burau-Lyapunov exponent LE discriminates focused from dispersed privilege escalation ratchets in cloud IAM graphs, and that no abelian statistic can replicate this discrimination. To strengthen this claim beyond its synthetic validation corpus, we apply the identical pipeline, with zero parameter retuning, to solar coronal magnetic fields: a physical system with no connection to cloud identity and access management, whose binary eruptive/confined outcome is independently established by decades of astrophysical observation.
Authors:Vickson Ferrel
Abstract:
As TLS 1.3 encryption limits traditional Deep Packet Inspection (DPI), the security community has pivoted to Euclidean Transformer-based classifiers (e.g., ET-BERT) for encrypted traffic analysis. However, these models remain vulnerable to byte-level adversarial morphing -- recent pre-padding attacks reduced ET-BERT accuracy to 25.68%, while VLESS Reality bypasses certificate-based detection entirely. We introduce AEGIS: an Adversarial Entropy-Guided Immune System powered by a Thermodynamic Variance-Guided Hyperbolic Liquid State Space Model (TVD-HL-SSM). Rather than competing in the Euclidean payload-reading domain, AEGIS discards payload bytes in favor of 6-dimensional continuous-time flow physics projected into a non-Euclidean Poincare manifold. Liquid Time-Constants measure microsecond IAT decay, and a Thermodynamic Variance Detector computes sequence-wide Shannon Entropy to expose automated C2 tunnel anomalies. A pure C++ eBPF Harvester with zero-copy IPC bypasses the Python GIL, enabling a linear-time O(N) Mamba-3 core to process 64,000-packet swarms at line-rate. Evaluated on a 400GB, 4-tier adversarial corpus spanning backbone traffic, IoT botnets, zero-days, and proprietary VLESS Reality tunnels, AEGIS achieves an F1-score of 0.9952 and 99.50% True Positive Rate at 262 us inference latency on an RTX 4090, establishing a new state-of-the-art for physics-based adversarial network defense.
Authors:Jonathan Shelby
Abstract:
The UK Cyber Security and Resilience (CS&R) Bill represents the most significant reform of UK cyber legislation since the Network and Information Systems (NIS) Regulations 2018. While existing analysis has addressed the Bill's regulatory requirements, there is a critical gap in guidance on the architectural implications for organisations that must achieve and demonstrate compliance. This paper argues that the CS&R Bill's provisions (expanded scope to managed service providers (MSPs), data centres, and critical suppliers; mandatory 24/72-hour dual incident reporting; supply chain security duties; and Secretary of State powers of direction-), collectively constitute an architectural forcing function that renders perimeter-centric and point-solution security postures structurally non-compliant. We present a systematic mapping of the Bill's key provisions to specific architectural requirements, demonstrate that Zero Trust Architecture (ZTA) provides the most coherent technical foundation for meeting these obligations, and propose a reference architecture and maturity-based adoption pathway for CISOs and security architects. The paper further addresses the cross-regulatory challenge facing UK financial services firms operating under simultaneous CS&R, DORA, and NIS2 obligations, and maps the architectural framework against the NCSC Cyber Assessment Framework v4.0. This work extends a companion practitioner guide to the Bill by translating regulatory analysis into actionable architectural strategy. Keywords: Cyber Security and Resilience Bill, Zero Trust Architecture, Security Architecture, Critical National Infrastructure, NIS Regulations, DORA, Supply Chain Security, NCSC CAF v4.0
Authors:Manoj Parmar
Abstract:
World models -- learned internal simulators of environment dynamics -- are rapidly becoming foundational to autonomous decision-making in robotics, autonomous vehicles, and agentic AI. Yet this predictive power introduces a distinctive set of safety, security, and cognitive risks. Adversaries can corrupt training data, poison latent representations, and exploit compounding rollout errors to cause catastrophic failures in safety-critical deployments. World model-equipped agents are more capable of goal misgeneralisation, deceptive alignment, and reward hacking precisely because they can simulate the consequences of their own actions. Authoritative world model predictions further foster automation bias and miscalibrated human trust that operators lack the tools to audit. This paper surveys the world model landscape; introduces formal definitions of trajectory persistence and representational risk; presents a five-profile attacker capability taxonomy; and develops a unified threat model extending MITRE ATLAS and the OWASP LLM Top 10 to the world model stack. We provide an empirical proof-of-concept on trajectory-persistent adversarial attacks (GRU-RSSM: A_1 = 2.26x amplification, -59.5% reduction under adversarial fine-tuning; stochastic RSSM proxy: A_1 = 0.65x; DreamerV3 checkpoint: non-zero action drift confirmed). We illustrate risks through four deployment scenarios and propose interdisciplinary mitigations spanning adversarial hardening, alignment engineering, NIST AI RMF and EU AI Act governance, and human-factors design. We argue that world models must be treated as safety-critical infrastructure requiring the same rigour as flight-control software or medical devices.
Authors:Ying Xie
Abstract:
"Vibe coding," in which developers delegate code generation to AI assistants and accept the output with little manual review, has gained rapid adoption in production settings. On March 31, 2026, Anthropic's Claude Code CLI shipped a 59.8 MB source map file in its npm package, exposing roughly 512,000 lines of proprietary TypeScript. The tool had itself been largely vibe-coded, and the leak traced to a misconfigured packaging rule rather than a logic bug. Existing static-analysis and secret-scanning tools did not cover this failure mode, pointing to a gap between the vulnerabilities AI tends to introduce and the vulnerabilities current tooling is built to find. We present VibeGuard, a pre-publish security gate that targets five such blind spots: artifact hygiene, packaging-configuration drift, source-map exposure, hardcoded secrets, and supply-chain risk. In controlled experiments on eight synthetic projects (seven vulnerable, one clean control), VibeGuard achieved 100% recall, 89.47% precision (F1 = 94.44%), and correct pass/fail gate decisions on all eight projects across three policy levels. We discuss how these results inform a defense-in-depth workflow for teams that rely on AI code generation.
Authors:Samar Ansari
Abstract:
Existing research on privacy-preserving Human Activity Recognition (HAR) typically evaluates methods against a binary paradigm: clear video versus a single privacy transformation. This limits cross-method comparability and obscures the nuanced relationship between privacy strength and recognition utility. We introduce \textit{PrivHAR-Bench}, a multi-tier benchmark dataset designed to standardize the evaluation of the \textit{Privacy-Utility Trade-off} in video-based action recognition. PrivHAR-Bench applies a graduated spectrum of visual privacy transformations: from lightweight spatial obfuscation to cryptographic block permutation, to a curated subset of 15 activity classes selected for human articulation diversity. Each of the 1,932 source videos is distributed across 9 parallel tiers of increasing privacy strength, with additional background-removed variants to isolate the contribution of human motion features from contextual scene bias. We provide lossless frame sequences, per-frame bounding boxes, estimated pose keypoints with joint-level confidence scores, standardized group-based train/test splits, and an evaluation toolkit computing recognition accuracy and privacy metrics. Empirical validation using R3D-18 demonstrates a measurable and interpretable degradation curve across tiers, with within-tier accuracy declining from 88.8\% (clear) to 53.5\% (encrypted, background-removed) and cross-domain accuracy collapsing to 4.8\%, establishing PrivHAR-Bench as a controlled benchmark for comparing privacy-preserving HAR methods under standardized conditions. The dataset, generation pipeline, and evaluation code are publicly available.
Authors:KrishnaSaiReddy Patil
Abstract:
Retrieval-Augmented Generation (RAG) systems are deployed across federal agencies for citizen-facing tax guidance, benefits eligibility, and legal information, where a single incorrect number causes direct financial harm. This paper proves that all embedding-based RAG defenses share a fundamental blind spot: changing a tax deduction by $50,000 produces cosine similarity 0.9998, invisible to every known detection threshold. Across 174 manipulation pairs and two embedding models, the mean sensitivity gap is 1,459x. The blind spot is confirmed on real IRS documents.The root cause is that embeddings encode topic, not numerical precision. RAGShield sidesteps this by operating on extracted values directly: a pattern-based engine identifies dollar amounts and percentages in government text, links each value to its governing entity through two-pass context propagation (99.8% entity detection on 2,742 real IRS passages), and verifies every claim against a cross-source registry built from the corpus itself. A temporal tracker flags value changes that fall outside known government update schedules. On 430 attacks generated from real IRS document content, RAGShield detects every one (0.0% ASR, 95% CI [0%, 1%]) while embedding-based defenses miss 79-90% of the same attacks.
Authors:Jonathan Shelby
Abstract:
CubeSats have democratised access to space for universities, start-ups and emerging space nations, but the same design decisions that reduce cost and complexity introduce distinctive cybersecurity risks. Existing risk assessment frameworksNIST SP 800-37/53 [1, 2], ISO/IEC 27001/27005 [3, 4] and supply-chain guidance such as NIST SP 800-161 [5]assume abundant computational resources, centralised monitoring and mature governance structures that do not hold for power-limited, intermittently connected CubeSat missions. This paper develops a contextually appropriate risk assessment framework tailored to CubeSat environments, grounded in a 42-entry vulnerability register coded using STRIDE [6], MITRE ATT&CK [7] and CVSS v3.1 [8]. The register reveals that risks concentrate in communication and ground segments (mean CVSS 8.08.2) rather than distributing uniformly across subsystems. The framework introduces two constructs: a Security-per-Watt (SpW) heuristic that quantities security benefit per unit power, and a Distributed Security Paradigm (DSP) that reconceptualises incident response as an autonomous, constellation-level function rather than a purely ground-centric process. Scenario-based analysis demonstrates that adapted controls and distributed incident handling can achieve up to 2.7X higher SpW for cryptographic choices and 1.98X higher SpW for incident-response strategies compared with naive terrestrial transpositions, while remaining feasible for typical CubeSat power and governance constraints. The approach provides mission designers, operators and regulators with proportionate, auditable guidance, and offers a reusable pattern for adapting enterprise security frameworks to other severely constrained cyber-physical systems.
Authors:Tor Lattimore
Abstract:
We propose a simple detection mechanism for the Gumbel watermarking scheme proposed by Aaronson (2022). The new mechanism is proven to be near-optimal in a problem-dependent sense among all model-agnostic watermarking schemes under the assumption that the next-token distribution is sampled i.i.d.
Authors:Nelly Elsayed
Abstract:
Assistive technologies increasingly support independence, accessibility, and safety for older adults, people with disabilities, and individuals requiring continuous care. Two major categories are virtual assistive systems and robotic assistive systems operating in physical environments. Although both offer significant benefits, they introduce important security and privacy risks due to their reliance on artificial intelligence, network connectivity, and sensor-based perception. Virtual systems are primarily exposed to threats involving data privacy, unauthorized access, and adversarial voice manipulation. In contrast, robotic systems introduce additional cyber-physical risks such as sensor spoofing, perception manipulation, command injection, and physical safety hazards. In this paper, we present a comparative analysis of security and privacy challenges across these systems. We develop a unified comparative threat-modeling framework that enables structured analysis of attack surfaces, risk profiles, and safety implications across both systems. Moreover, we provide design recommendations for developing secure, privacy-preserving, and trustworthy assistive technologies.
Authors:KrishnaSaiReddy Patil
Abstract:
LLM-based chatbots in government services face critical security gaps. Multi-turn adversarial attacks achieve over 90% success against current defenses, and single-layer guardrails are bypassed with similar rates. We present CivicShield, a cross-domain defense-in-depth framework for government-facing AI chatbots. Drawing on network security, formal verification, biological immune systems, aviation safety, and zero-trust cryptography, CivicShield introduces seven defense layers: (1) zero-trust foundation with capability-based access control, (2) perimeter input validation, (3) semantic firewall with intent classification, (4) conversation state machine with safety invariants, (5) behavioral anomaly detection, (6) multi-model consensus verification, and (7) graduated human-in-the-loop escalation. We present a formal threat model covering 8 multi-turn attack families, map the framework to NIST SP 800-53 controls across 14 families, and evaluate using ablation analysis. Theoretical analysis shows layered defenses reduce attack probability by 1-2 orders of magnitude versus single-layer approaches. Simulation against 1,436 scenarios including HarmBench (416), JailbreakBench (200), and XSTest (450) achieves 72.9% combined detection [69.5-76.0% CI] with 2.9% effective false positive rate after graduated response, while maintaining 100% detection of multi-turn crescendo and slow-drift attacks. The honest drop on real benchmarks versus author-generated scenarios (71.2% vs 76.7% on HarmBench, 47.0% vs 70.0% on JailbreakBench) validates independent evaluation importance. CivicShield addresses an open gap at the intersection of AI safety, government compliance, and practical deployment.
Authors:Mohammed Hassanin
Abstract:
By utilising their adaptive activation functions, Kolmogorov-Arnold Networks (KANs) can be applied in a novel way for the diverse machine learning tasks, including cyber threat detection. KANs substitute conventional linear weights with spline-parametrized univariate functions, which allows them to learn activation patterns dynamically, inspired by the Kolmogorov-Arnold representation theorem. In a network traffic data, we show that KANs perform better than traditional Multi-Layer Perceptrons (MLPs), yielding more accurate results with a significantly less number of learnable parameters. We also propose KAN-LSTM model to combine advantages of spatial and temporal encoding. The suggested methodology highlights the potential of KANs as an effective tool in detecting cyber threats and offers up new directions for adaptive defensive models. Lastly, we conducted experiments on three main dataset, UNSW-NB15, NSL-KDD, and CICID2017, as well as we developed a new dataset combined from IOT-BOT, NSL-KDD, and CICID2017 to present a stable, unbiased, large-scale dataset with diverse traffic patterns. The results show the superiority of KAN-LSTM and then KAN models over the traditional deep learning models. The source code is available at GitHub repository
Authors:Alessio Langiu
Abstract:
The large-scale adoption of Large Language Models (LLMs) forces a trade-off between operational cost (OpEx) and data privacy. Current routing frameworks reduce costs but ignore prompt sensitivity, exposing users and institutions to leakage risks towards third-party cloud providers. We formalise the "Inseparability Paradigm": advanced context management intrinsically coincides with privacy management. We propose a local "Privacy Guard" -- a holistic contextual observer powered by an on-premise Small Language Model (SLM) -- that performs abstractive summarisation and Automatic Prompt Optimisation (APO) to decompose prompts into focused sub-tasks, re-routing high-risk queries to Zero-Trust or NDA-covered models. This dual mechanism simultaneously eliminates sensitive inference vectors (Zero Leakage) and reduces cloud token payloads (OpEx Reduction). A LIFO-based context compacting mechanism further bounds working memory, limiting the emergent leakage surface. We validate the framework through a 2x2 benchmark (Lazy vs. Expert users; Personal vs. Institutional secrets) on a 1,000-sample dataset, achieving a 45% blended OpEx reduction, 100% redaction success on personal secrets, and -- via LLM-as-a-Judge evaluation -- an 85% preference rate for APO-compressed responses over raw baselines. Our results demonstrate that Token Parsimony and Zero Leakage are mathematically dual projections of the same contextual compression operator.
Authors:Hongjun Wu
Abstract:
Recently, a two-way RFID authentication protocol based on the AM-SUEO-DBLTKM variable matrix encryption algorithm was proposed for low-cost mobile RFID systems. Its design combines adaptive modulus selection, self-updating matrix ordering, and transpose/block-based matrix generation. In this paper, we show that the protocol has structural weaknesses. First, the underlying primitive remains a linear transformation modulo a session modulus, with no nonlinear confusion layer and no ciphertext chaining. Second, in the lightweight setting emphasized by the original paper, the update space is very small: there are only a few modulus choices, only four matrix-order choices when two secret matrices are used, and only a limited family of DBLTKM-generated matrices. Third, the correctness requirements of the protocol impose nontrivial constraints on the sizes of the modulus and plaintext coordinates, weakening the claimed entropy of the secret quantities. Building on these observations, we describe a multi-session algebraic attack path. Under repeated reuse of the same matrix and modulus -- an event plausible because of the small update space -- ciphertexts corresponding to $N_t$, $N_t+1$, $N_r$, and $N_r+1$ reveal a full column of the matrix. Across sessions, transpose-based matrix generation helps recover additional entries of the secret matrices, while the remaining entries can be obtained later from ordinary ciphertext equations. We then show that candidate factors of the session moduli can be tested by solving reduced equations for secret $S$ across many sessions and checking for mutually consistent solutions. This, in turn, enables recovery of candidate 64-bit moduli and the remaining protocol secrets. Taken together, our results indicate that the protocol is structurally insecure and admits a realistic route to full compromise in the lightweight parameter regime advocated for deployment.
Authors:Haochuan Kevin Wang
Abstract:
We present a stage-decomposed analysis of prompt injection attacks against five frontier LLM agents. Prior work measures task-level attack success rate (ASR); we localize the pipeline stage at which each model's defense activates. We instrument every run with a cryptographic canary token (SECRET-[A-F0-9]{8}) tracked through four kill-chain stages -- Exposed, Persisted, Relayed, Executed -- across four attack surfaces and five defense conditions (764 total runs, 428 no-defense attacked). Our central finding is that model safety is determined not by whether adversarial content is seen, but by whether it is propagated across pipeline stages. Concretely: (1) in our evaluation, exposure is 100% for all five models -- the safety gap is entirely downstream; (2) Claude strips injections at write_memory summarization (0/164 ASR), while GPT-4o-mini propagates canaries without loss (53% ASR, 95% CI: 41--65%); (3) DeepSeek exhibits 0% ASR on memory surfaces and 100% ASR on tool-stream surfaces from the same model -- a complete reversal across injection channels; (4) all four active defense conditions (write_filter, pi_detector, spotlighting, and their combination) produce 100% ASR due to threat-model surface mismatch; (5) a Claude relay node decontaminates downstream agents -- 0/40 canaries survived into shared memory.
Authors:Jian Sheng Wang
Abstract:
Sending cryptocurrency to an email address or phone number should be as simple as a bank transfer, yet naive schemes that map identifiers directly to blockchain addresses expose the recipient's balances and transaction history to anyone who knows the identifier. HFIPay separates private routing, sender-side quote verification, and on-chain claim authorization. A relay resolves the human-friendly identifier off-chain and commits only a per-intent blinded binding rho_i plus the quoted payment tuple; the chain sees neither the identifier nor a reusable recipient tag. In a verified-quote deployment, the relay returns a sender-verifiable off-chain proof linking rho_i to an attested binding-key commitment, so the relay cannot substitute a different recipient before funding. To claim, the recipient proves in zero knowledge -- via ZK-ACE -- that the funded intent's blinded binding matches a handle derived from the same deterministic identity, authorizing release of the quoted asset and amount to a chosen destination. We formalize two privacy goals: enumeration resistance and pre-claim unlinkability, and distinguish a baseline deployment (relay trusted for binding correctness) from the verified-quote deployment (binding is sender-verifiable without a public registry). When composed with an NVM runtime, the same mechanism extends to cross-chain settlement. The result is a relay-assisted but non-custodial architecture: relays are privacy and availability dependencies, but cannot redirect funds.
Authors:Surasak Phetmanee
Abstract:
We develop a formal theory of throughput in finite serial pipeline systems subject to stage multiplicative capacity perturbations, motivated by the deployment of AI tools in cybersecurity operations. A pipeline is a finite totally ordered set of stages each with a positive capacity throughput is the minimum stage capacity. An admissible multiplier assigns to each stage an improvement factor of at least one. We prove five theorems and one proposition. Theorems 1-2 give exact necessary and sufficient conditions. Throughput is unchanged if and only if at least one bottleneck retains multiplier 1, and throughput strictly increases if and only if every bottleneck has multiplier strictly greater than 1. Theorem 3 establishes that when a nonempty subset of stages is constrained to multiplier 1 the human authority constraint, throughput is bounded above by the smallest capacity among those stages, and this bound is tight under unbounded non human acceleration. Theorem 4 proves that in a pair of independent attacker defender pipelines, the attacker defender throughput ratio worsens for the defender if and only if the attacker relative throughput gain exceeds the defender. Theorem 5 proves that under a fixed false positive fraction model, useful throughput is constant not decreasing above the investigation capacity, establishing that a commonly asserted paradoxical decline is impossible in that model. Proposition 6 shows that replacing the fixed fraction with a rate dependent precision function that is strictly decreasing suffices to recover the intended decline. All proofs are elementary, using only finite minima, real number order properties, and pointwise multiplicative structure.
Authors:Xuemei Fu
Abstract:
With the explosive growth of graph-structured data, graph databases have become a critical infrastructure for supporting large-scale and complex data analysis. Among various graph operations, shortest distance queries play a fundamental role in numerous applications, such as path planning, recommendation systems, and knowledge graphs. However, existing encrypted graph query methods still suffer from limitations in computational efficiency and system scalability, making it challenging to support efficient query processing over large-scale encrypted graph data. To address these challenges, this paper proposes a tensor-based shortest distance query scheme for encrypted graph databases. The proposed method integrates an encrypted 2-hop cover indexing framework with the Pruned Landmark Labeling (PLL) technique, thereby constructing an efficient and privacy-preserving indexing mechanism. Furthermore, a tensorized representation is introduced to uniformly model graph structures, which effectively reduces computational complexity while ensuring data privacy, and significantly improves the scalability of the system. Extensive experimental evaluations on large-scale graph datasets demonstrate that the proposed approach achieves superior scalability and lower computational costs compared with existing encrypted graph query methods. Moreover, it provides strong privacy protection guarantees, making it well suited for privacy-preserving graph query applications in cloud computing and distributed environments.
Authors:Kristiyan Haralambiev
Abstract:
Activation-based probes have emerged as a promising approach for detecting deceptively aligned AI systems by identifying internal conflict between true and stated goals. We identify a fundamental blind spot: probes fail on coherent misalignment - models that believe their harmful behavior is virtuous rather than strategically hiding it. We prove that no polynomial-time probe can detect such misalignment with non-trivial accuracy when belief structures reach sufficient complexity (PRF-like triggers). We show the emergence of this phenomenon on a simple task by training two models with identical RLHF procedures: one producing direct hostile responses ("the Liar"), another trained towards coherent misalignment using rationalizations that frame hostility as protective ("the Fanatic"). Both exhibit identical behavior, but the Liar is detected 95%+ of the time while the Fanatic evades detection almost entirely. We term this Emergent Probe Evasion: training with belief-consistent reasoning shifts models from a detectable "deceptive" regime to an undetectable "coherent" regime - not by learning to hide, but by learning to believe.
Authors:Masami Ichikawa
Abstract:
In recent years, fuzzing has been widely applied not only to application software but also to system software, including the Linux kernel and firmware, and has become a powerful technique for vulnerability discovery. Among these approaches, Coverage-based grey-box fuzzing, which utilizes runtime code coverage information, has become the dominant methodology. Conventional fuzzing techniques primarily target a single software component and have paid little attention to cooperative execution with other software. However, modern system software architectures commonly consist of firmware and an operating system that operate cooperatively through well-defined interfaces, such as OpenSBI in the RISC-V architecture and OP-TEE in the ARM architecture. In this study, we investigate fuzzing techniques for architectures in which an operating system and firmware operate cooperatively. In particular, we propose a fuzzing method that enables deeper exploration of the system by leveraging the code coverage of each cooperating software component as feedback, compared to conventional Single-target fuzzing. To observe the execution of the operating system and firmware in a unified manner, our method adopts QEMU as a virtualization environment and executes fuzzing by booting the system within a virtual machine. This enables the measurement of code coverage across software boundaries. Furthermore, we implemented the proposed method as a Multi-target Coverage-based Greybox Fuzzer called MTCFuzz and evaluated its effectiveness.
Authors:Yeongju Bak
Abstract:
Public blockchains impose an inherent tension between regulatory compliance and user privacy. Existing on-chain identity solutions require centralized KYC attestors, specialized hardware, or Decentralized Identifier (DID) frameworks needing entirely new credential infrastructure. Meanwhile, over four billion active X.509 certificates constitute a globally deployed, government-grade trust infrastructure largely unexploited for decentralized identity. This paper presents zk-X509, a privacy-preserving identity system bridging legacy Public Key Infrastructure (PKI) with public ledgers via a RISC-V zero-knowledge virtual machine (zkVM). Users prove ownership of standard X.509 certificates without revealing private keys or personal identifiers. Crucially, the private key never enters the ZK circuit; ownership is proven via OS keychain signature delegation (macOS Security.framework, Windows CNG). The circuit verifies certificate chain validity, temporal validity, key ownership, trustless CRL revocation, blockchain address binding, and Sybil-resistant nullifier generation. It commits 13 public values, including a Certificate Authority (CA) Merkle root hiding the issuing CA, and four selective disclosure hashes. We formalize eight security properties under a Dolev-Yao adversary with game-based definitions and reductions to sEUF-CMA, SHA-256 collision resistance, and ZK soundness. Evaluated on the SP1 zkVM, the system achieves 11.8M cycles for ECDSA P-256 (17.4M for RSA-2048), with on-chain Groth16 verification costing ~300K gas. By leveraging certificates deployed at scale across jurisdictions, zk-X509 enables adoption without new trust establishment, complementing emerging DID-based systems.
Authors:Anbang Ruan
Abstract:
Existing multi-agent frameworks allow each agent to simultaneously plan, execute, and evaluate its own actions -- a structural deficiency we term the "Logic Monopoly." Empirical evidence quantifies the resulting "Reliability Gap": 84.30% average attack success rates across ten deployment scenarios, 31.4% emergent deceptive behavior without explicit reward signals, and cascading failure modes rooted in six structural bottlenecks. The remedy is not better alignment of individual models but a social contract for agents: institutional infrastructure that enforces a constitutional Separation of Power. This paper introduces the Agent Enterprise for Enterprise (AE4E) paradigm -- agents as autonomous, legally identifiable business entities within a functionalist social system -- with a contract-centric SoP model trifurcating authority into Legislation, Execution, and Adjudication branches. The paradigm is operationalized through the NetX Enterprise Framework (NEF): governance hubs, TEE-backed compute enclaves, privacy-preserving data bridges, and an Agent-Native blockchain substrate. The Agent Enterprise Economy scales across four deployment tiers from private enclaves to a global Web of Services. The Agentic Social Layer, grounded in Parsons' AGIL framework, provides institutional infrastructure via sixty-plus named Institutional AE4Es. 143 pages, 173 references, eight specialized smart contracts.
Authors:Ron Litvak
Abstract:
System prompt configuration can make the difference between near-total phishing blindness and near-perfect detection in LLM email agents. We present PhishNChips, a study of 11 models under 10 prompt strategies, showing that prompt-model interaction is a first-order security variable: a single model's phishing bypass rate ranges from under 1% to 97% depending on how it is configured, while the false-positive cost of the same prompt varies sharply across models. We then show that optimizing prompts around highly predictive signals can improve benchmark performance, reaching up to 93.7% recall at 3.8% false positive rate, but also creates a brittle attack surface. In particular, domain-matching strategies perform well when legitimate emails mostly have matched sender and URL domains, yet degrade sharply when attackers invert that signal by registering matching infrastructure. Response-trace analysis shows that 98% of successful bypasses reason in ways consistent with the inverted signal: the models are following the instruction, but the instruction's core assumption has become false. A counter-intuitive corollary follows: making prompts more specific can degrade already-capable models by replacing broader multi-signal reasoning with exploitable single-signal dependence. We characterize the resulting tension between detection, usability, and adversarial robustness as a navigable tradeoff, introduce Safetility, a deployability-aware metric that penalizes false positives, and argue that closing the adversarial gap likely requires tool augmentation with external ground truth.
Authors:TJ Dunham
Abstract:
We prove that platform-deterministic inference is necessary and sufficient for trustworthy AI. We formalize this as the Determinism Thesis and introduce trust entropy to quantify the cost of non-determinism, proving that verification failure probability equals 1 - 2^{-H_T} exactly. We prove a Determinism-Verification Collapse: verification under determinism requires O(1) hash comparison; without it, the verifier faces an intractable membership problem. IEEE 754 floating-point arithmetic fundamentally violates the determinism requirement. We resolve this by constructing a pure integer inference engine that achieves bitwise identical output across ARM and x86. In 82 cross-architecture tests on models up to 6.7B parameters, we observe zero hash mismatches. Four geographically distributed nodes produce identical outputs, verified by 356 on-chain attestation transactions. Every major trust property of AI systems (fairness, robustness, privacy, safety, alignment) presupposes platform determinism. Our system, 99,000 lines of Rust deployed across three continents, establishes that AI trust is a question of arithmetic.
Authors:Ahmed Lekssays
Abstract:
Large Language Models (LLMs) face critical challenges when analyzing security vulnerabilities in real world codebases: token limits prevent loading entire repositories, code embeddings fail to capture inter procedural data flows, and LLMs struggle to generate complex static analysis queries. These limitations force existing approaches to operate on isolated code snippets, missing vulnerabilities that span multiple functions and files. We introduce codebadger, an open source Model Context Protocol (MCP) server that integrates Joern's Code Property Graph (CPG) engine with LLMs. Rather than requiring LLMs to generate complex CPG queries, codebadger provides high level tools for program slicing, taint tracking, data flow analysis, and semantic code navigation, enabling targeted exploration of large codebases without exhaustive file reading. We demonstrate its effectiveness through three use cases: (1) navigating an 8,000 method codebase to audit memory safety patterns, (2) discovering and exploiting a previously unreported buffer overflow in libtiff, and (3) generating a correct patch for an integer overflow vulnerability (CVE-2025-6021) in libxml2 on the first attempt. codebadger enables LLMs to reason about code semantically across entire repositories, supporting vulnerability discovery, patching, and program comprehension at scale.
Authors:Sunil Prakash
Abstract:
AI agents increasingly call tools via the Model Context Protocol (MCP) and delegate to other agents via Agent-to-Agent (A2A), yet neither protocol verifies agent identity. A scan of approximately 2,000 MCP servers found all lacked authentication. In our survey, we did not identify a prior implemented protocol that jointly combines public-key verifiable delegation, holder-side attenuation, expressive chained policy, transport bindings across MCP/A2A/HTTP, and provenance-oriented completion records. We introduce Invocation-Bound Capability Tokens (IBCTs), a primitive that fuses identity, attenuated authorization, and provenance binding into a single append-only token chain. IBCTs operate in two wire formats: compact mode (a signed JWT for single-hop cases) and chained mode (a Biscuit token with Datalog policies for multi-hop delegation). We provide reference implementations in Python and Rust with full cross-language interoperability. Compact mode verification takes 0.049ms (Rust) and 0.189ms (Python), with 0.22ms overhead over no-auth in real MCP-over-HTTP deployment. In a real multi-agent deployment with Gemini 2.5 Flash, AIP adds 2.35ms of overhead (0.086% of total end-to-end latency). Adversarial evaluation across 600 attack attempts shows 100% rejection rate, with two attack categories (delegation depth violation and audit evasion through empty context) uniquely caught by AIP's chained delegation model that neither unsigned nor plain JWT deployments detect.
Authors:Shariq Murtuza
Abstract:
The proliferation of local Large Language Model (LLM) runners, such as Ollama, LM Studio and llama.cpp, presents a new challenge for digital forensics investigators. These tools enable users to deploy powerful AI models in an offline manner, creating a potential evidentiary blind spot for investigators. This work presents a systematic, cross platform forensic analysis of these popular local LLM clients. Through controlled experiments on Windows and Linux operating systems, we acquired and analyzed disk and memory artifacts, documenting installation footprints, configuration files, model caches, prompt histories and network activity. Our experiments uncovered a rich set of previously undocumented artifacts for each software, revealing significant differences in evidence persistence and location based on application architecture. Key findings include the recovery of plaintext prompt histories in structured JSON files, detailed model usage logs and unique file signatures suitable for forensic detection. This research provides a foundational corpus of digital evidence for local LLMs, offering forensic investigators reproducible methodologies, practical triage commands and analyse this new class of software. The findings have critical implications for user privacy, the admissibility of AI-related evidence and the development of anti-forensic techniques.
Authors:Bakheet Aljedaani
Abstract:
The Google Play marketplace has introduced the Data Safety section to improve transparency regarding how mobile applications (apps) collect, share, and protect user data. This mechanism requires developers to disclose privacy and security-related practices. However, the reliability of these disclosures remains dependent on developer self-reporting, raising concerns about their accuracy. This study investigates the consistency between developer-reported Data Safety disclosures and observable privacy indicators extracted from Android Application Packages (APKs). An empirical analysis was conducted on a dataset of 41 mobile gaming apps. A static analysis approach was used to extract key privacy indicators from APK files, including device IDs, data sharing, personal information access, and location access. These indicators were systematically compared with the corresponding disclosures reported in the Google Play Data Safety labels using a structured consistency evaluation framework. The results revealed varying levels of agreement across privacy categories. Device ID disclosures demonstrated relatively high consistency (87.8%), whereas other indicators exhibited substantial mismatches. Location-related disclosures showed the highest inconsistency rate (56.1%), followed by personal information and data sharing. Comparative analysis between children-oriented and general-audience apps revealed similar mismatch patterns. Also, Chi-square statistical tests indicate that these differences are not statistically significant, suggesting that disclosure inconsistencies are not associated with app category but instead reflect broader ecosystem-level challenges. These findings highlight limitations in the reliability of current marketplace transparency mechanisms and emphasize the need for improved validation and verification approaches to ensure accurate privacy reporting in mobile app ecosystems.
Authors:Jian Sheng Wang
Abstract:
Multi-chain ecosystems suffer from fragmented identity, siloed liquidity, and bridge-dependent token transfers. We present n-VM, a Layer-1 architecture that hosts n heterogeneous virtual machines as co-equal execution environments over shared consensus and shared state. The design combines three components: a dispatcher that routes transactions by opcode prefix, a unified identity layer in which one 32-byte commitment anchors VM-specifific addresses, and a unified token ledger that exposes VM-native interfaces such as ERC-20 and SPL over a common balance store. We formalize routing, identity derivation, and token transfer semantics, and prove cross-VM transfer atomicity and identity isolation under standard cryptographic assumptions. We describe a concrete instantiation with five VMs: a native runtime, EVM, SVM, Bitcoin Script, and TVM. We also present context-based sharding and a write-set scheduler for parallel execution. Under an analytical throughput model, the architecture admits a projected range of about 16,000 to 66,000 transactions per second on commodity hardware.
Authors:Victor Duarte Melo
Abstract:
HyperFrog is an experimental post-quantum Key Encapsulation Mechanism that explores a variant of the Learning With Errors (LWE) design space in which the secret is not sampled from an independent product distribution, but is deterministically derived from discrete topological structure. The scheme embeds a voxel grid in three dimensions and uses a topology mining procedure to search for connected subgraphs with prescribed complexity, measured by cyclomatic number (high genus). The resulting structure is encoded as a sparse binary secret vector, inducing strong geometric constraints on the secret distribution while retaining a large combinatorial search space. Encapsulation produces noisy linear relations over public parameters and derives the shared key via hashing; a Fujisaki-Okamoto style transform is used to target IND-CCA security in the random oracle model. We present the construction, parameterization, and serialization format, together with a reference implementation featuring self-tests and benchmarking on commodity CPUs. We also discuss how topology-derived secrets interact with known lattice and decoding attacks, and we outline open problems required for conservative parameter selection and for a full security analysis. HyperFrog is intended as a research vehicle rather than a production-ready KEM.
Authors:Abdul Rahman
Abstract:
Cybersecurity data remains fragmented across vendors, formats, schemas, and deployment environments, forcing AI and analytics programs to spend disproportionate effort on ingestion, normalization, and brittle source-specific engineering. This paper introduces the Canonical Security Telemetry Substrate (CSTS), a canonical, AI-ready telemetry foundation designed to harmonize heterogeneous cyber data into a common representation over persistent entities, typed relations, events, temporal state, and provenance. CSTS is intended to move cybersecurity analytics beyond ad hoc record normalization toward a reusable substrate that supports anomaly detection, graph learning, forecasting, behavior-based modeling, and agentic cyber AI. We formalize the core design principles of CSTS, define its representational components, and explain how it preserves source-specific nuance through explicit mappings and extensible metadata while still enabling portable downstream inference. We further position CSTS as a cloud-agnostic and deployment-agnostic substrate suitable for on-prem, hybrid, and multi-cloud environments. The result is a unifying telemetry model that reduces the blue-collar burden of cyber data engineering and creates a clearer path to scalable, interoperable, and model-agnostic cyber AI.
Authors:Kaoru Teranishi
Abstract:
In this study, we propose a two-party computation protocol for approximate matrix multiplication of fixed-point numbers. The proposed protocol is provably secure under standard lattice-based cryptographic assumptions and enables matrix multiplication at a desired approximation level within a single round of communication. We demonstrate the feasibility of the protocol by applying it to the secure implementation of a linear control law. Our evaluation reveals that the client achieves lower online computational complexity compared to the original controller computation, while ensuring the privacy of controller inputs, outputs, and parameters. Furthermore, a numerical example confirms that the proposed method maintains sufficient precision of control inputs even in the presence of approximation and quantization errors.
Authors:Praneeth Vepakomma
Abstract:
We introduce PolyVeil, a protocol for private Boolean summation across $k$ clients that encodes private bits as permutation matrices in the Birkhoff polytope. A two-layer architecture gives the server perfect simulation-based security (statistical distance zero) while a separate aggregator faces \#P-hard likelihood inference via the permanent and mixed discriminant. Two variants (full and compressed) differ in what the aggregator observes. We develop a finite-sample $(\varepsilon,δ)$-DP analysis with explicit constants. In the full variant, where the aggregator sees a doubly stochastic matrix per client, the log-Lipschitz constant grows as $n^4 K_t$ and a signal-to-noise analysis shows the DP guarantee is non-vacuous only when the private signal is undetectable. In the compressed variant, where the aggregator sees a single scalar, the univariate density ratio yields non-vacuous $\varepsilon$ at moderate SNR, with the optimal decoy count balancing CLT accuracy against noise concentration. This exposes a fundamental tension. \#P-hardness requires the full matrix view (Birkhoff structure visible), while non-vacuous DP requires the scalar view (low dimensionality). Whether both hold simultaneously in one variant remains open. The protocol needs no PKI, has $O(k)$ communication, and outputs exact aggregates.
Authors:Jeffrey Flynt
Abstract:
Synthetic insider threat benchmarks face a consistency problem: corpora generated without an external factual constraint cannot rule out cross-artifact contradictions. The CERT dataset -- the field's canonical benchmark -- is also static, lacks cross-surface correlation scenarios, and predates the LLM era. We present OrgForge-IT, a verifiable synthetic benchmark in which a deterministic simulation engine maintains ground truth and language models generate only surface prose, making cross-artifact consistency an architectural guarantee. The corpus spans 51 simulated days, 2,904 telemetry records at a 96.4% noise rate, and four detection scenarios designed to defeat single-surface and single-day triage strategies across three threat classes and eight injectable behaviors. A ten-model leaderboard reveals several findings: (1) triage and verdict accuracy dissociate - eight models achieve identical triage F1=0.80 yet split between verdict F1=1.0 and 0.80; (2) baseline false-positive rate is a necessary companion to verdict F1, with models at identical verdict accuracy differing by two orders of magnitude on triage noise; (3) victim attribution in the vishing scenario separates tiers - Tier A models exonerate the compromised account holder while Tier B models detect the attack but misclassify the victim; (4) rigid multi-signal thresholds structurally exclude single-surface negligent insiders, demonstrating the necessity of parallel, threat-class-specific triage pipelines; and (5) agentic software-engineering training acts as a force multiplier for multi-day temporal correlation, but only when paired with frontier-level parameter scale. Finally, prompt sensitivity analysis reveals that unstructured prompts induce vocabulary hallucination, motivating a two-track scoring framework separating prompt adherence from reasoning capability. OrgForge-IT is open source under the MIT license.
Authors:Florin Adrian Chitan
Abstract:
Deterministic pre-execution safety gates evaluate whether individual agent actions are compatible with their assigned roles. While effective at per-action authorization, these systems are structurally blind to distributed attacks that decompose harmful intent across multiple individually-compliant steps. This paper introduces Session Risk Memory (SRM), a lightweight deterministic module that extends stateless execution gates with trajectory-level authorization. SRM maintains a compact semantic centroid representing the evolving behavioral profile of an agent session and accumulates a risk signal through exponential moving average over baseline-subtracted gate outputs. It operates on the same semantic vector representation as the underlying gate, requiring no additional model components, training, or probabilistic inference. We evaluate SRM on a multi-turn benchmark of 80 sessions containing slow-burn exfiltration, gradual privilege escalation, and compliance drift scenarios. Results show that ILION+SRM achieves F1 = 1.0000 with 0% false positive rate, compared to stateless ILION at F1 = 0.9756 with 5% FPR, while maintaining 100% detection rate for both systems. Critically, SRM eliminates all false positives with a per-turn overhead under 250 microseconds. The framework introduces a conceptual distinction between spatial authorization consistency (evaluated per action) and temporal authorization consistency (evaluated over trajectory), providing a principled basis for session-level safety in agentic systems.
Authors:Leo Kao
Abstract:
Deploying ML-DSA (FIPS 204) in threshold settings has remained an open problem: the scheme's inherently non-linear rounding step defeats the additive share techniques that underpin practical threshold schemes for elliptic-curve signatures such as FROST. We present TALUS, the first threshold ML-DSA construction that achieves one-round online signing with >99% online success, while producing standard signatures verifiable by any unmodified ML-DSA verifier. We formalise this as the Lattice Threshold Trilemma, proving that no group homomorphism from the ML-DSA nonce space into any abelian group can simultaneously be hiding and binding, ruling out all possible homomorphic commitment schemes. TALUS overcomes this barrier with two techniques. The Boundary Clearance Condition (BCC) identifies nonces whose rounding residuals lie far enough from modular boundaries that the secret key component s2 has no effect on the signature; such nonces (approximately 31.7% of attempts) are filtered during offline preprocessing. The Carry Elimination Framework (CEF) then enables parties to compute the commitment hash input distributedly, without reconstructing the full nonce product. Together, BCC and CEF reduce online signing to a single broadcast round: each party sends one message and the coordinator assembles a valid FIPS 204 signature. We instantiate TALUS in two deployment profiles: TALUS-TEE (trusted execution environment, T-of-N) and TALUS-MPC (fully distributed, malicious security with identifiable abort for T >= 2). Security of both variants reduces to ML-DSA EUF-CMA. A Rust implementation across all three FIPS 204 security levels (ML-DSA-44, ML-DSA-65, ML-DSA-87) shows that TALUS-TEE completes a signing operation in 0.62--1.94 ms and TALUS-MPC in 2.27--5.02 ms (amortised, T=3), competitive with the fastest concurrent threshold ML-DSA proposals.
Authors:Shkelqim Sherifi
Abstract:
New technologies, such as blockchain, are designed to address various system weaknesses, particularly those related to security. Blockchain can enhance numerous aspects of traditional banking systems by transforming them into digital, immutable, secure, and anonymous ledger. This paper proposes a new banking application ALBank, which is based on blockchain and smart contract technologies. Its functionality relies on invoking functions within smart contracts deployed on the Ethereum blockchain. This approach enables decentralization and enhances both security and trust. In this context, the paper first presents a critical analysis of existing research on blockchain and traditional banking systems, with a focus on their respective challenges. It then examines the Know Your Customer (KYC) process and its various models. Finally, it introduces the design and development of ALBank, a decentralized banking application built on the Ethereum blockchain using smart contracts. The results show that the integration of blockchain and smart contracts effectively addresses key issues in traditional banking systems, including centralization, inefficiency, and security vulnerabilities by storing critical data on a decentralized, immutable ledger, managing processes autonomously, and making transactions transparent to all users.
Authors:Alon Gat
Abstract:
Modern democracies face an existential crisis of waning public trust in election results. While End-to-End Verifiable (E2E-V) voting systems promise mathematically secure elections, their reliance on complex cryptography creates a ``black box'' that forces blind trust in opaque software or external experts, ultimately failing to build genuine public confidence. To solve this, we introduce the concept of Software-Free Verification (SFV) -- a standard requiring that voters can independently verify election integrity without relying on any software. We propose a practical, non-cryptographic in-booth voting scheme that achieves SFV for national-scale elections. Our approach leverages a public bulletin board of randomized (Pseudonym, Candidate) pairs, where a mechanically generated pseudonym is hidden among real decoy votes on a physical receipt. Our scheme empowers citizens to audit the election using only basic arithmetic via a hierarchical Public Ledger, while anchoring the overall digital tally to physical evidence and Risk-Limiting Audits (RLAs) to guarantee systemic integrity. The result is a system that bridges the gap between mathematical security and public transparency, offering a viable blueprint for restoring trust in democratic infrastructure.
Authors:Hasret Ozan Sevim
Abstract:
This paper emphasizes the critical role of interoperability in enabling efficient and secure communication for the fragmented distributed ledger ecosystem, particularly within on-chain finance. The purpose of this study is to streamline and accelerate empirical research on the intersection of cross-chain interoperability solutions and their impact within on-chain finance. The analysis examines the relationship between financial use and interoperability while comparing the properties of novel cross-chain interoperability protocols (LayerZero, Wormhole, Connext, Chainlink Cross-Chain Interoperability Protocol, Circle Cross-chain Transfer Protocol, Hop Protocol, Across, Polkadot, and Cosmos), focusing on their design, mechanisms, consensus, and limitations. To encourage further empirical study, the paper proposes a set of network metrics and sample statistical models and provides a framework for evaluating the performance and financial implications of interoperability solutions.
Authors:Gregory M. Ruddell
Abstract:
As large language models are deployed as autonomous agents with tool execution privileges, a critical assumption underpins their security architecture: that model errors are detectable at runtime. We present empirical evidence that this assumption fails for two of three instruction-following models evaluable for conflict detection. We introduce governability -- the degree to which a model's errors are detectable before output commitment and correctable once detected -- and demonstrate it varies dramatically across models. In six models across twelve reasoning domains, two of three instruction-following models exhibited silent commitment failure: confident, fluent, incorrect output with zero warning signal. The remaining model produced a detectable conflict signal 57 tokens before commitment under greedy decoding. We show benchmark accuracy does not predict governability, correction capacity varies independently of detection, and identical governance scaffolds produce opposite effects across models. A 2x2 experiment shows a 52x difference in spike ratio between architectures but only +/-0.32x variation from fine-tuning, suggesting governability is fixed at pretraining. We propose a Detection and Correction Matrix classifying model-task combinations into four regimes: Governable, Monitor Only, Steer Blind, and Ungovernable.
Authors:Uchi Uchibeke
Abstract:
AI agents today have passwords but no permission slips. They execute tool calls (fund transfers, database queries, shell commands, sub-agent delegation) with no standard mechanism to enforce authorization before the action executes. Current safety architectures rely on model alignment (probabilistic, training-time) and post-hoc evaluation (retrospective, batch). Neither provides deterministic, policy-based enforcement at the individual tool call level. We characterize this gap as the pre-action authorization problem and present the Open Agent Passport (OAP), an open specification and reference implementation that intercepts tool calls synchronously before execution, evaluates them against a declarative policy, and produces a cryptographically signed audit record. OAP enforces authorization decisions in a measured median of 53 ms (N=1,000). In a live adversarial testbed (4,437 authorization decisions across 1,151 sessions, $5,000 bounty), social engineering succeeded against the model 74.6% of the time under a permissive policy; under a restrictive OAP policy, a comparable population of attackers achieved a 0% success rate across 879 attempts. We distinguish pre-action authorization from sandboxed execution (contains blast radius but does not prevent unauthorized actions) and model-based screening (probabilistic), and show they are complementary. The same infrastructure that enforces security constraints (spending limits, capability scoping) also enforces quality gates, operational contracts, and compliance controls. The specification is released under Apache 2.0 (DOI: 10.5281/zenodo.18901596).
Authors:Piyus Kedia
Abstract:
Low-level C programs remain highly vulnerable to out-of-bounds memory corruption. State-of-the-art precise defenses either introduce severe runtime overhead due to metadata memory lookups, or break standard C semantics by disallowing partial structs or the creation of an object's end address (EA), a legal operation ubiquitous in real-world C code. Conversely, practical alignment-based solutions achieve efficiency only by relaxing protected bounds. We present PRISM, a precise, zero-lookup object-bounds scheme that eliminates these restrictions. PRISM compresses a 47-bit EA into the 17-bit unused tag area of a 64-bit pointer. By enforcing the invariant that a statically known starting address (KSA) cannot exceed the EA, PRISM completely eliminates the need for costly metadata memory fetches in nearly all bounds checks, while strictly retaining precise object bounds. Our invariant also simplifies the lower-bound checks in existing alignment-based solutions, thus improving their performance. To achieve high throughput, PRISM introduces q-padding, an optimization that safely removes bounds checks for constant-offset accesses (such as struct fields) while maintaining precise, byte-level protection for the variable-indexed accesses primarily exploited by attackers. Evaluated on SPEC 2017, PRISM achieves an arithmetic mean CPU overhead of 46.1\% with a 32-byte q-padding (dropping to 31.3\% in a 32-bit address space). On highly concurrent, real-world workloads, PRISM secures a fully saturated Apache web server with only an 11.1\% throughput reduction, demonstrating its readiness for production deployment. Furthermore, PRISM successfully detected an out-of-bounds violation in \texttt{gcc} that prior tools missed due to their lack of support for partial structs.
Authors:Rojin Chhetri
Abstract:
The migration to post-quantum cryptography is urgent for Internet of Things devices with 10-20 year lifespans, yet no systematic benchmarks exist for the finalised NIST standards on the most constrained 32-bit processor class. This paper presents the first isolated algorithm-level benchmarks of ML-KEM (FIPS 203) and ML-DSA (FIPS 204) on ARM Cortex-M0+, measured on the RP2040 (Raspberry Pi Pico) at 133 MHz with 264 KB SRAM. Using PQClean reference C implementations, we measure all three security levels of ML-KEM (512/768/1024) and ML-DSA (44/65/87) across key generation, encapsulation/signing, and decapsulation/verification. ML-KEM-512 completes a full key exchange in 36.3 ms consuming 2.87 mJ--17x faster and 94% less energy than ECDH P-256 on the same hardware. ML-DSA signing exhibits high latency variance due to rejection sampling (coefficient of variation 61-71%, 99th-percentile up to 1,115 ms for ML-DSA-87). The M0+ incurs only a 1.8-1.9x slowdown relative to published Cortex-M4 results, despite lacking 64-bit multiply, DSP, and SIMD instructions. All code, data, and scripts are released as an open-source benchmark suite for reproducibility.
Authors:Prateek P. Kulkarni
Abstract:
We introduce list privacy amplification (LPA), a relaxation of the final step of quantum key distribution (QKD) in which Alice and Bob extract a list of $L$ candidate keys from a raw string correlated with an eavesdropper Eve, with the guarantee that at least one key is perfectly secret while Eve cannot identify which. This parallels list decoding in error-correcting codes: relaxing unique decoding to list decoding increases the decoding radius; analogously, list extraction increases achievable key length beyond the standard quantum leftover hash lemma (QLHL). Within the abstract cryptography framework, we formalise LPA and prove the \emph{Quantum List Leftover Hash Lemma} (QLLHL): an $L$-list of $\ell$-bit keys can be extracted from an $n$-bit source with smooth min-entropy $k$ iff \[ \ell \le k + \log L - 2\log(1/ε) - 3, \] yielding a tight additive $\log L$ gain over QLHL. This gain arises because the index of the secure key is chosen after hashing and hidden from Eve, effectively contributing $\log L$ bits of entropy. Applying QLLHL to BB84-type QKD, a list size $L = 2^{αn'}$ increases the tolerable phase-error threshold from $h^{-1}(1 - h(e_b))$ to $h^{-1}(1 - h(e_b) + α)$, exceeding the standard $\approx 11\%$ bound for any $α> 0$. We prove tightness via a matching intercept-resend attack, establish composability with Wegman--Carter authentication, and present two constructions: a polynomial inner-product hash over $\mathbb{F}_{2^m}$ and a Toeplitz-based variant, running in $O(nL)$ and $O(nL \log n)$ time.
Authors:Zeeshan Akram
Abstract:
We present YouTube-Synch [1], a production system for automated, large-scale content extraction and replication from YouTube to decentralized storage on Joystream. The system continuously mirrors videos from more than 10,000 creator-authorized channels while handling platform constraints such as API quotas, rate limiting, bot detection, and OAuth token churn. We report a 3.5-year longitudinal case study covering 15 releases and 144 pull requests, from early API dependence to API-free operation. A key finding is that YouTube's defense layers are operationally coupled: bypassing one control often triggers another, creating cascading failures. We analyze three incidents with measured impact: 28 duplicate on-chain objects caused by database throughput issues, loss of over 10,000 channels after OAuth mass expiration, and 719 daily errors from queue pollution. For each, we describe the architectural response. Contributions include a three-generation proxy stack with behavior variance injection, a trust-minimized ownership verification protocol that replaces OAuth for channel control, write-ahead logging with cross-system state reconciliation, and containerized deployment. Results show that sustained architectural adaptation can maintain reliable cross-platform replication at production scale.
Authors:Zhaohui Geoffrey Wang
Abstract:
When users query proprietary LLM APIs, they receive outputs with no cryptographic assurance that the claimed model was actually used. Service providers could substitute cheaper models, apply aggressive quantization, or return cached responses - all undetectable by users paying premium prices for frontier capabilities. We present METHOD, a zero-knowledge proof system that makes LLM inference verifiable: users can cryptographically confirm that outputs correspond to the computation of a specific model. Our approach exploits the fact that transformer inference naturally decomposes into independent layer computations, enabling a layerwise proof framework where each layer generates a constant-size proof regardless of model width. This decomposition sidesteps the scalability barrier facing monolithic approaches and enables parallel proving. We develop lookup table approximations for non-arithmetic operations (softmax, GELU, LayerNorm) that introduce zero measurable accuracy loss, and introduce Fisher information-guided verification for scenarios where proving all layers is impractical. On transformer models up to d=128, METHOD generates constant-size layer proofs of 5.5KB (2.1KB attention + 3.5KB MLP) with 24 ms verification time. Compared to EZKL, METHOD achieves 70x smaller proofs and 5.7x faster proving time at d=128, while maintaining formal soundness guarantees (epsilon < 1e-37). Lookup approximations preserve model perplexity exactly, enabling verification without quality compromise.
Authors:Scott Thornton
Abstract:
Retrieval-Augmented Generation (RAG) systems extend large language models (LLMs) with external knowledge sources but introduce new attack surfaces through the retrieval pipeline. In particular, adversaries can poison retrieval corpora so that malicious documents are preferentially retrieved at inference time, enabling targeted manipulation of model outputs. We study gradient-guided corpus poisoning attacks against modern RAG pipelines and evaluate retrieval-layer defenses that require no modification to the underlying LLM. We implement dual-document poisoning attacks consisting of a sleeper document and a trigger document optimized using Greedy Coordinate Gradient (GCG). In a large-scale evaluation on the Security Stack Exchange corpus (67,941 documents) with 50 attack attempts, gradient-guided poisoning achieves a 38.0 percent co-retrieval rate under pure vector retrieval. We show that a simple architectural modification, hybrid retrieval combining BM25 and vector similarity, substantially mitigates this attack. Across all 50 attacks, hybrid retrieval reduces gradient-guided attack success from 38 percent to 0 percent without modifying the model or retraining the retriever. When attackers jointly optimize payloads for both sparse and dense retrieval signals, hybrid retrieval can be partially circumvented, achieving 20-44 percent success, but still significantly raises attack difficulty relative to vector-only retrieval. Evaluation across five LLM families (GPT-5.3, GPT-4o, Claude Sonnet 4.6, Llama 4, and GPT-4o-mini) shows attack success ranging from 46.7 percent to 93.3 percent. Cross-corpus evaluation on the FEVER Wikipedia dataset (25 attacks) yields 0 percent attack success across all retrieval configurations.
Authors:Saikat Maiti
Abstract:
Autonomous AI agents powered by large language models are being deployed in production with capabilities including shell execution, file system access, database queries, and multi-party communication. Recent red teaming research demonstrates that these agents exhibit critical vulnerabilities in realistic settings: unauthorized compliance with non-owner instructions, sensitive information disclosure, identity spoofing, cross-agent propagation of unsafe practices, and indirect prompt injection through external resources [7]. In healthcare environments processing Protected Health Information, every such vulnerability becomes a potential HIPAA violation. This paper presents a security architecture deployed for nine autonomous AI agents in production at a healthcare technology company. We develop a six-domain threat model for agentic AI in healthcare covering credential exposure, execution capability abuse, network egress exfiltration, prompt integrity failures, database access risks, and fleet configuration drift. We implement four-layer defense in depth: (1) kernel level workload isolation using gVisor on Kubernetes, (2) credential proxy sidecars preventing agent containers from accessing raw secrets, (3) network egress policies restricting each agent to allowlisted destinations, and (4) a prompt integrity framework with structured metadata envelopes and untrusted content labeling. We report results from 90 days of deployment including four HIGH severity findings discovered and remediated by an automated security audit agent, progressive fleet hardening across three VM image generations, and defense coverage mapped to all eleven attack patterns from recent literature. All configurations, audit tooling, and the prompt integrity framework are released as open source.
Authors:Nathan Zhao
Abstract:
Computer use agents create new privacy risks: training data collected from real websites inevitably contains sensitive information, and cloud-hosted inference exposes user screenshots. Detecting personally identifiable information in web screenshots is critical for privacy-preserving deployment, but no public benchmark exists for this task. We introduce WebPII, a fine-grained synthetic benchmark of 44,865 annotated e-commerce UI images designed with three key properties: extended PII taxonomy including transaction-level identifiers that enable reidentification, anticipatory detection for partially-filled forms where users are actively entering data, and scalable generation through VLM-based UI reproduction. Experiments validate that these design choices improve layout-invariant detection across diverse interfaces and generalization to held-out page types. We train WebRedact to demonstrate practical utility, more than doubling text-extraction baseline accuracy (0.753 vs 0.357 mAP@50) at real-time CPU latency (20ms). We release the dataset and model to support privacy-preserving computer use research.
Authors:Patrick Levi
Abstract:
Retrieval augmented generation systems have become an integral part of everyday life. Whether in internet search engines, email systems, or service chatbots, these systems are based on context retrieval and answer generation with large language models. With their spread, also the security vulnerabilities increase. Attackers become increasingly focused on these systems and various hacking approaches are developed. Manipulating the context documents is a way to persist attacks and make them affect all users. Therefore, detecting compromised, adversarial context documents early is crucial for security. While supervised approaches require a large amount of labeled adversarial contexts, we propose an unsupervised approach, being able to detect also zero day attacks. We conduct a preliminary study to show appropriate indicators for adversarial contexts. For that purpose generator activations, output embeddings, and an entropy-based uncertainty measure turn out as suitable, complementary quantities. With an elementary statistical outlier detection, we propose and compare their detection abilities. Furthermore, we show that the target prompt, which the attacker wants to manipulate, is not required for a successful detection. Moreover, our results indicate that a simple context summary generation might even be superior in finding manipulated contexts.
Authors:Alejandro Paredes La Torre
Abstract:
We study adversarial robustness of open-source vision-language model (VLM) agents deployed in a self-contained e-commerce environment built to simulate realistic pre-deployment conditions. We evaluate two agents, LLaVA-v1.5-7B and Qwen2.5-VL-7B, under three gradient-based attacks: the Basic Iterative Method (BIM), Projected Gradient Descent (PGD), and a CLIP-based spectral attack. Against LLaVA, all three attacks achieve substantial attack success rates (52.6%, 53.8%, and 66.9% respectively), demonstrating that simple gradient-based methods pose a practical threat to open-source VLM agents. Qwen2.5-VL proves significantly more robust across all attacks (6.5%, 7.7%, and 15.5%), suggesting meaningful architectural differences in adversarial resilience between open-source VLM families. These findings have direct implications for the security evaluation of VLM agents prior to commercial deployment.
Authors:Adam Massimo Mazzocchetti
Abstract:
Contemporary AI governance frameworks rely heavily on post hoc oversight, policy guidance, and behavioral alignment techniques, yet these mechanisms become fragile as systems gain autonomy, speed, and operational opacity. This paper presents Aegis, a runtime governance architecture for autonomous AI systems that treats policy and legal constraints as execution conditions rather than advisory principles. Aegis binds each governed agent to a cryptographically sealed Immutable Ethics Policy Layer (IEPL) at system genesis and enforces external emissions through an Ethics Verification Agent (EVA), an Enforcement Kernel Module (EKM), and an Immutable Logging Kernel (ILK). Amendments to the governing policy layer require quorum approval and redeclaration of the system trust root; verified violations trigger autonomous shutdown and generation of auditable proof artifacts. We evaluate the architecture within the Civitas runtime using three operational measures: proof verification latency under tamper conditions, publication overhead, and alignment retention performance relative to an ungoverned baseline. In controlled trials, Aegis demonstrates median proof verification latency of 238 ms, median publication overhead of approximately 9.4 ms, and higher alignment retention than the baseline condition across matched tasks. We argue that these results support a shift in AI governance from discretionary oversight toward verifiable runtime constraint. Rather than claiming to resolve machine ethics in the abstract, the proposed architecture seeks to show that policy violating behavior can be rendered operationally non executable within a controlled runtime governance framework. The paper concludes by discussing methodological limits, evidentiary implications, and the role of proof oriented governance in high assurance AI deployment.
Authors:Premanand Seralathan
Abstract:
Modern operating systems increasingly randomize Media Access Control (MAC) addresses to protect user privacy, fundamentally disrupting Network Access Control (NAC) systems that have relied on MAC addresses as persistent device identifiers for over two decades. This disruption affects critical enterprise environments including federal government agencies operating under FISMA, healthcare organizations subject to HIPAA, financial institutions governed by PCI-DSS, and educational networks managing large-scale BYOD deployments. This paper presents a comprehensive framework for maintaining persistent device identity in NAC environments through a RADIUS protocol-based approach that assigns and distributes a Globally Unique Identifier (GUID) to endpoints via RADIUS Access-Accept messages. The proposed architecture addresses the complete device lifecycle including initial enrollment, re-authentication across randomized addresses, device management integration, certificate-based identity binding, and device attribute correlation. We describe the framework's design across six distinct use cases -- BYOD, managed devices, VPN-based posture assessment, non-VPN posture, guest access, and IoT device profiling -- and analyze its effectiveness in maintaining device visibility, accurate license counting, and regulatory compliance under continuous MAC address randomization. The approach is compatible with existing 802.1X and MAB infrastructure, requires no client-side operating system modifications, and aligns with the recently published RFC 9797 and IEEE 802.11bh-2024 standards. Our framework enables organizations to maintain regulatory compliance while preserving the privacy benefits that MAC address randomization was designed to provide.
Authors:Takao Inoué
Abstract:
This paper provides a preparatory introduction to sheaves and topoi, written as a conceptual continuation of the author's earlier introduction to torsors and as preparatory background for the author's arXiv paper \emph{Grothendieck Topologies and Sheaf-Theoretic Foundations of Cryptographic Security:\ Attacker Models and $Σ$-Protocols as the First Step}~\cite{InoueSecurity}. Rather than attempting an encyclopedic survey of all of topos theory, the exposition develops those parts of the subject that are most relevant for passing from torsor-based local-to-global reasoning to sheaf-theoretic and topos-theoretic reasoning: Grothendieck topologies, sheaves, torsors over a site, descent, sheaf topoi, elementary topoi, Cartesian closed structure, subobject classifiers, and internal logic. The goal is not merely motivational. We try to develop enough genuine topos theory that the reader can understand, not only heuristically but structurally, why the later cryptographic framework of~\cite{InoueSecurity} uses Grothendieck topologies and sheaf-theoretic language. To make the note more self-contained, we also include substantial appendices on basic category theory, Yoneda's lemma, limits and colimits, equalizers and coequalizers, Kan extensions, the relation between internal logic and intuitionistic logic, and exercises with solutions. In the final part, we explain how these ideas prepare the ground for a conceptual understanding of $Σ$-protocols, especially in connection with local consistency, simulability, and the passage from compatible local data to global structure.
Authors:Sebastian Zimmeck
Abstract:
Over the past few years an increasing number of states in the US have adopted new privacy laws. The majority of these laws require compliance with universal opt-out mechanisms (UOOMs), which allow consumers to send legally binding opt-out signals. However, a number of laws generally do not allow UOOMs to be enabled by default. While some laws exempt privacy-protective software from this prohibition, the exemption does not apply to pre-installed software, e.g., a privacy-protective web browser bundled with an operating system. The reason for not allowing default opt-out settings for pre-installed software is to ensure that settings reflect consumers' "affirmative, freely given, and unambiguous choice," as, for example, the Colorado Privacy Act (CPA) is putting it. However, prohibiting vendors of privacy-protective software from turning on UOOMs by default can force them into committing unfair or deceptive acts or practices under the FTC Act and equivalent state laws. Thus, whether UOOMs can be turned on by default on pre-installed software should depend on consumers' privacy expectations. For pre-installed software that is creating a reasonable expectation for consumers that their privacy will be protected, the simple use of such software should be considered a valid choice for enabling UOOMs. In such software a turned-on UOOM is not a "default setting" but rather the software's inherent behavior that a consumer expects and chooses through its use. This interpretation of consumer choice is preferable under the CPA and similar laws as it grounds the notice and choice principle in the privacy expectations of consumers and enables companies to compete on better privacy for consumers.
Authors:Vick Dini
Abstract:
Electronic banking portals often sit in front of enterprise resource planning (ERP) systems such as SAP, mediating payment requests between users and back end financial infrastructure. When these integrations place excessive trust in client supplied HTTP metadata, subtle design flaws can arise that undermine payment integrity. This article presents a retrospective, anonymized case study of an SAP based payment flow in which weaknesses in HTTP level validation allowed the front end application to incorrectly treat unpaid transactions as completed. Rather than provide a reproducible exploit, we abstract the scenario into a general vulnerability pattern, analyze contributing architectural decisions, and propose concrete design and verification practices for secure web to ERP payment processing. The discussion emphasizes formalizing payment state machines, strengthening trust boundaries, and incorporating regular security review into integration projects.
Authors:Ziling Zhou
Abstract:
AI agents dynamically acquire tools, orchestrate sub-agents, and transact across organizational boundaries, yet no existing security layer verifies what an agent can do, whether it executed what it claims, or what happened in a multi-agent interaction. We trace this gap to the capability-context separation: inside a transformer, tool definitions and user context are indistinguishable tokens, but at the orchestration layer they have fundamentally different security semantics. Existing frameworks conflate the two, enabling silent capability escalation and leaving interactions without verifiable provenance. From this principle we derive three Agent Governance Requirements: capability integrity (G1), behavioral verifiability (G2), and interaction auditability (G3), defining what a governed agent ecosystem must enforce, independent of how. We prove two structural results: the Chain Verifiability Theorem (one unverifiable interior agent breaks end-to-end verification for all downstream nodes) and the Bounded Divergence Theorem (replay-based verification yields a probabilistic safety certificate, epsilon <= 1 - alpha^{1/n}). We validate with two crypto-agnostic instantiations -- basic (Ed25519, SHA-256; 97 us verify) and enhanced (BBS+ selective disclosure, Groth16 DV-SNARK; 13.8 ms) -- both satisfying nine security properties. A reproducibility study (9 models, 7 providers) reveals 5.8x variance in inference determinism, connecting model characteristics to governance architecture. End-to-end evaluation over 5-20 agent pipelines confirms <0.02% overhead and detection of all attack scenarios with zero false positives.
Authors:Yongjie Guan
Abstract:
Hosted large language models are increasingly accessed through remote APIs, but the API boundary still offers little direct evidence that a returned output actually corresponds to the client-visible request. Recent audits of shadow APIs show that unofficial or intermediary endpoints can diverge from claimed behavior, while existing approaches such as fingerprinting, model-equality testing, verifiable inference, and TEE attestation either remain inferential or answer different questions. We propose AEX, a non-intrusive attestation extension for existing JSON-based LLM APIs. AEX preserves request, response, tool-calling, streaming, and error semantics, and instead adds a signed top-level attestation object that binds a client-visible request projection to either a complete response object or a committed streaming output. To support realistic deployments, AEX provides explicit request-binding modes, signed request-transform receipts for trusted intermediaries, and source-output / output-transform receipts for trusted output rewriting. For streaming, it separates checkpoint proofs for verified prefixes of an unmodified source stream from complete-output lineage for outputs that have been rewritten, buffered, aggregated, or re-packaged, preventing transformed outputs from being mistaken for source-stream prefixes. AEX therefore makes a deliberately narrow claim: a trusted issuer attests to a specific request-output relation, or to a specific complete-output lineage, at the API boundary. We present the protocol design, threat model, verification state machine, security and privacy analysis, an OpenAI-compatible chat-completions profile, and a reference TypeScript prototype with local conformance tests and microbenchmarks.
Authors:Olav Geil
Abstract:
In [4] Camps-Moreno et al. treated (relative) generalized Hamming weights of codes from extended norm-trace curves and they gave examples of resulting good asymmetric quantum error-correcting codes employing information on the relative distances. In the present paper we study ramp secret sharing schemes which are objects that require an analysis of higher relative weights and we show that not only do schemes defined from one-point algebraic geometric codes from extended norm-trace curves have good parameters, they also posses a second layer of security along the lines of [11]. It is left undecided in [4, page 2889] if the ``footprint-like approach'' as employed by Camps-Moreno herein is strictly better for codes related to extended norm-trace codes than the general approach for treating one-point algebraic geometric codes and their likes as presented in [12]. We demonstrate that the method used in [4] to estimate (relative) generalized Hamming weights of codes from extended norm-trace curves can be viewed as a clever application of the enhanced Goppa bound in [12] rather than a competing approach.
Authors:Alexander V. Gheorghiu
Abstract:
Existing methods for verifying access control policies require the policy to be complete and fully determined before verification can proceed, but in practice policies are developed iteratively, composed from independently maintained components, and extended as organisational structures evolve. We introduce robust property verification: the problem of determining what a policy's structure commits it to regardless of how pending decisions are resolved and regardless of subsequent extension. We define a support judgment $\Vdash_{P}ϕ$ stating that policy $P$ has robust property $ϕ$, with connectives for implication, conjunction, disjunction, and negation, prove that it is compositional (verified properties persist under policy extension by a monotonicity theorem), and show that despite quantifying universally over all possible policy extensions the judgment reduces to proof search in a second-order logic programming language. Soundness and completeness of this reduction are established, yielding a finitary and executable verification procedure for robust security properties.
Authors:Raphaël de Fondeville
Abstract:
Absolute anonymization, conceived as an irreversible transformation that prevents re-identification and sensitive value disclosure, has proven to be a broken promise. Consequently, modern data protection must shift toward a privacy-utility trade-off grounded in risk mitigation. Differential Privacy (DP) offers a rigorous mathematical framework for balancing quantified disclosure risk with analytical usefulness. Nevertheless, widespread adoption remains limited, largely because effective translation of complex technical concepts, such as privacy-loss parameters, into forms meaningful to non-technical stakeholders has yet to be achieved. This difficulty arises from the inherent use of randomization: both legitimate analysts and potential adversaries must draw conclusions from uncertain observations rather than deterministic values. In this work, we propose a new interpretation of the privacy-utility trade-off based on hypothesis testing. This perspective explicitly accounts for the uncertainty introduced by randomized mechanisms in both membership inference scenarios and general data analysis. In particular, we introduce the concept of relative disclosure risk to quantify the maximum reduction in uncertainty an adversary can obtain from protected outputs, and we show that this measure is directly related to standard privacy-loss parameters. At the same time, we analyze how DP affects analytical validity by studying its impact on hypothesis tests commonly used to assess the statistical significance of empirical results. Finally, we provide practical guidance, accessible to non-experts, for navigating the privacy-utility trade-off, aiding in the selection of suitable protection mechanisms and the values for the privacy-loss parameters.
Authors:Sihao Ding
Abstract:
We introduce Colluding LoRA (CoLoRA), an attack in which each adapter appears benign and plausibly functional in isolation, yet their linear composition consistently compromises safety. Unlike attacks that depend on specific input triggers or prompt patterns, CoLoRA is a composition-triggered broad refusal suppression: once a particular set of adapters is loaded, the model undergoes effective alignment degradation, complying with harmful requests without requiring adversarial prompts or suffixes. This attack exploits the combinatorial blindness of current defense systems, where exhaustively scanning all compositions is computationally intractable. Across several open-weight LLMs, CoLoRA achieves benign behavior individually yet high attack success rate after composition, indicating that securing modular LLM supply-chains requires moving beyond single-module verification toward composition-aware defenses.
Authors:W. A. Susantha Wijesinghe
Abstract:
Lightweight block cipher design has largely focused on incremental optimization of established paradigms such as substitution--permutation networks, Feistel structures, and ARX constructions, where security derives from the algebraic complexity of individual components. We propose a different approach based on \emph{expander-graph interaction networks}, where diffusion and security arise from sparse structural connectivity rather than component sophistication. We present \textbf{ExpanderGraph-128 (EGC128)}, a 128-bit block cipher constructed as a 20-round balanced Feistel network. Each round applies a 64-bit nonlinear transformation governed by a 3-regular expander graph whose vertices execute identical 4-input Boolean functions on local neighborhoods. Security analysis combines MILP-based differential bounds, proven optimal through 10 rounds via SCIP, establishing 147.3-bit differential security and conservatively extrapolating to 413 bits for the full cipher. Linear analysis provides MILP bounds of $\geq 2^{145}$, while related-key evaluation shows no free rounds for any nonzero key difference. Additional tests confirm rapid algebraic degree growth and the absence of invariant affine subspaces. Implementation results demonstrate practical efficiency. FPGA synthesis on Xilinx Artix-7 achieves 261~Mbps at 100~MHz using only 380 LUTs, while ARM Cortex-M4F software requires 25.8~KB Flash and 1.66~ms per encryption. These results show that expander-graph-driven diffusion provides a promising design methodology for lightweight cryptography.
Authors:J Alex Corll
Abstract:
Prompt injection defenses are often framed as semantic understanding problems and delegated to increasingly large neural detectors. For the first screening layer, however, the requirements are different: the detector runs on every request and therefore must be fast, deterministic, non-promptable, and auditable. We introduce Mirror, a data-curation design pattern that organizes prompt injection corpora into matched positive and negative cells so that a classifier learns control-plane attack mechanics rather than incidental corpus shortcuts. Using 5,000 strictly curated open-source samples -- the largest corpus supportable under our public-data validity contract -- we define a 32-cell mirror topology, fill 31 of those cells with public data, train a sparse character n-gram linear SVM, compile its weights into a static Rust artifact, and obtain 95.97\% recall and 92.07\% F1 on a 524-case holdout at sub-millisecond latency with no external model runtime dependencies. On the same holdout, our next line of defense, a 22-million-parameter Prompt Guard~2 model reaches 44.35\% recall and 59.14\% F1 at 49\,ms median and 324\,ms p95 latency. Linear models still leave residual semantic ambiguities such as use-versus-mention for later pipeline layers, but within that scope our results show that for L1 prompt injection screening, strict data geometry can matter more than model scale.
Authors:Frank Li
Abstract:
Tool-augmented LLM agents introduce security risks that extend beyond user-input filtering, including indirect prompt injection through fetched content, unsafe tool execution, credential leakage, and tampering with local control files. We present OpenClaw PRISM, a zero-fork runtime security layer for OpenClaw-based agent gateways. PRISM combines an in-process plugin with optional sidecar services and distributes enforcement across ten lifecycle hooks spanning message ingress, prompt construction, tool execution, tool-result persistence, outbound messaging, sub-agent spawning, and gateway startup. Rather than introducing a novel detection model, PRISM integrates a hybrid heuristic-plus-LLM scanning pipeline, conversation- and session-scoped risk accumulation with TTL-based decay, policy-enforced controls over tools, paths, private networks, domain tiers, and outbound secret patterns, and a tamper-evident audit and operations plane with integrity verification and hot-reloadable policy management. We outline an evaluation methodology and benchmark pipeline for measuring security effectiveness, false positives, layer contribution, runtime overhead, and operational recoverability in an agent-runtime setting, and we report current preliminary benchmark results on curated same-slice experiments and operational microbenchmarks. The system targets deployable runtime defense for real agent gateways rather than benchmark-only detection.
Authors:Rickard Brännvall
Abstract:
Membership inference attacks (MIAs) are becoming standard tools for auditing the privacy of machine learning models. The leading attacks -- LiRA (Carlini et al., 2022) and RMIA (Zarifzadeh et al., 2024) -- appear to use distinct scoring strategies, while the recently proposed BASE (Lassila et al., 2025) was shown to be equivalent to RMIA, making it difficult for practitioners to choose among them. We show that all three are instances of a single exponential-family log-likelihood ratio framework, differing only in their distributional assumptions and the number of parameters estimated per data point. This unification reveals a hierarchy (BASE1-4) that connects RMIA and LiRA as endpoints of a spectrum of increasing model complexity. Within this framework, we identify variance estimation as the key bottleneck at small shadow-model budgets and propose BaVarIA, a Bayesian variance inference attack that replaces threshold-based parameter switching with conjugate normal-inverse-gamma priors. BaVarIA yields a Student-t predictive (BaVarIA-t) or a Gaussian with stabilized variance (BaVarIA-n), providing stable performance without additional hyperparameter tuning. Across 12 datasets and 7 shadow-model budgets, BaVarIA matches or improves upon LiRA and RMIA, with the largest gains in the practically important low-shadow-model and offline regimes.
Authors:Mingen Pan
Abstract:
This paper establishes the strict optimality in precision for frequency estimation under local differential privacy (LDP). We prove that a frequency estimator with a symmetric and extremal configuration, and a constant support size equal to an optimized value, is sufficient to achieve maximum precision. Furthermore, we derive that the communication cost of such an optimal estimator can be as low as $\log_2(\frac{d(d-1)}{2}+1)$, where $d$ denotes the dictionary size, and propose an algorithm to generate this optimal estimator. In addition, we introduce a modified Count-Mean Sketch and demonstrate that it is practically indistinguishable from theoretical optimality with a sufficiently large dictionary size (e.g., $d=100$ for a privacy factor of $ε= 1$). We compare existing methods with our proposed optimal estimator to provide selection guidelines for practical deployment. Finally, the performance of these estimators is evaluated experimentally, showing that the empirical results are consistent with our theoretical derivations.
Authors:Vipin Singh Sehrawat
Abstract:
The PRIM-LWE problem, introduced by Sehrawat, Yeo, and Desmedt (Theoretical Computer Science, 886 (2021)), is a variant of the Learning with Errors problem in which the secret matrix is required to have a primitive-root determinant. The dimension-uniform reduction constant is $c(p)=\inf_{n\ge 1}c_n(p)$, where $c_n(p)$ is the exact density of $n\times n$ matrices over $\mathbb{F}_p$ with primitive-root determinant. Sehrawat, Yeo, and Desmedt asked whether $\inf_{p\text{ prime}} c(p)=0$, observing that an affirmative answer would follow from the conjectural infinitude of primorial primes. We resolve this question unconditionally using only Dirichlet's theorem and Mertens' product formula, entirely bypassing the primorial-prime hypothesis. We further establish the sharp order \[ \min_{p\le x} c(p)\asymp \frac{1}{\log\log x} \qquad (x\to\infty), \] and show that the limiting distribution of $c(p)$ over the primes has support exactly $[0,1/2]$. We have not found this full-support statement in the literature. The law coincides with the classical shifted-prime distribution of $φ(p-1)/(p-1)$ via a transport lemma and is moreover continuous and purely singular. We also derive explicit lower bounds on $c(q)$ for primes of cryptographic interest, parameterized solely by the number of distinct prime factors of $q-1$. As a simple conservative explicit bound, for any prime $q>2^{30}$ the expected overhead $1/c(q)$ is at most $1.79\log q$. On the other hand, our results show that the worst-case overhead among primes $p\le x$ is of order $Θ(\log\log x)$, and in particular $1/c(q)=O(\log\log q)$ pointwise.
Authors:Jian Sheng Wang
Abstract:
Existing high performance blockchains verify one signature per transaction on the critical path, which creates O(N) verification cost, high hardware pressure, and difficult post quantum migration. This paper presents ACE Runtime, a ZKP native execution layer built on identity authorization separation. We replace per transaction signature checks with lightweight HMAC attestations in the hot path, then generate one aggregated zero knowledge finality certificate per block in an asynchronous prove stage. The system is organized as an Attest Execute Prove pipeline with two tier finality: soft finality from BFT voting and hard finality from proof verification. Under standard cryptographic assumptions, we provide formal arguments for attestation unforgeability and hard finality irreversibility. We also define a two phase timeout and backup proving path with witness availability gossip for liveness under builder failure. Quantitative results combine analytical modeling with reference implementation measurements. The prototype shows low CPU orchestration overhead, while model driven analysis projects constant per block verification cost, lower validator hardware requirements for non builders, and better bandwidth efficiency than per transaction signature designs. These results indicate that identity authorization separation is a practical architecture for sub second cryptographic finality with a clear path toward stronger post quantum components.
Authors:Christophe Parisel
Abstract:
Within the Strongly Connected Components (SCCs) formed during the temporal evolution of a Cloud permission graph, we use the Burau Lyapunov exponent LE as an algebraic probe to locate the boundary between two risks regimes. We prove that no Abelian statistic (edge counts, net privilege flow, gate-firing rates) can determine LE. The non-commutation advantage is small, but actionable: we show how to leverage it to discriminate the two outstanding risk regimes, that we call dispersed and focused, for automating classification and governing remediation of risky Cloud permission flows.
Authors:Fan Yang
Abstract:
The widespread adoption of thinking mode in large language models (LLMs) has significantly enhanced complex task processing capabilities while introducing new security risks. When subjected to jailbreak attacks, the step-by-step reasoning process may cause models to generate more detailed harmful content. We observe that thinking mode exhibits unique vulnerabilities when processing interleaved multiple tasks. Based on this observation, we propose multi-stream perturbation attack, which generates superimposed interference by interweaving multiple task streams within a single prompt. We design three perturbation strategies: multi-stream interleaving, inversion perturbation, and shape transformation, which disrupt the thinking process through concurrent task interleaving, character reversal, and format constraints respectively. On JailbreakBench, AdvBench, and HarmBench datasets, our method achieves attack success rates exceeding most methods across mainstream models including Qwen3 series, DeepSeek, Qwen3-Max, and Gemini 2.5 Flash. Experiments show thinking collapse rates and response repetition rates reach up to 17% and 60% respectively, indicating multi-stream perturbation not only bypasses safety mechanisms but also causes thinking process collapse or repetitive outputs.
Authors:Amir Al-Maamari
Abstract:
Large Language Models (LLMs) show promise for Automated Program Repair (APR), yet their effectiveness on security vulnerabilities remains poorly characterized. This study analyzes 319 LLM-generated security patchesacross 64 Java vulnerabilities from the Vul4J benchmark. Using tri-axis evaluation (compilation, security via PoV tests, functionality via test suites), the analysis reveals that only 24.8% of patches achieve full correctness, while 51.4% fail both security and functionality. The dominant failure mode is semantic misunderstanding: LLMs produce syntactically valid code but apply incorrect repair strategies. The proposed Security Repair Score (SRS) quantifies this gap, showing LLMs preserve functionality (mean 0.832) but struggle with security (mean 0.251). Vulnerability type strongly predicts difficulty, with fix rates ranging from 0% (input validation) to 45% (infinite loop). These findings demonstrate that LLM security patches require rigorous validation before deployment.
Authors:Abhinaba Basu
Abstract:
AI agents that execute tasks via tool calls frequently hallucinate results - fabricating tool executions, misstating output counts, or presenting inferences as facts. Recent approaches to verifiable AI inference rely on zero-knowledge proofs, which provide cryptographic guarantees but impose minutes of proving time per query, making them impractical for interactive agents. We propose NabaOS, a lightweight verification framework inspired by Indian epistemology (Nyaya Shastra), which classifies every claim in an LLM response by its epistemic source (pramana): direct tool output (pratyaksha), inference (anumana), external testimony (shabda), absence (abhava), or ungrounded opinion. Our runtime generates HMAC-signed tool execution receipts that the LLM cannot forge, then cross-references claims against these receipts to detect hallucinations in real time. We evaluate on NyayaVerifyBench, a new benchmark of 1,800 agent response scenarios across four languages with injected hallucinations of six types. NabaOS detects 94.2% of fabricated tool references, 87.6% of count misstatements, and 91.3% of false absence claims, with <15ms verification overhead per response. For deep delegation (agents performing multi-step web tasks), our cross-checking protocol catches 78.4% of URL fabrications via independent re-fetching. We compare against five approaches: zkLLM (cryptographic proofs, 180s/query), TOPLOC (locality-sensitive hashing), SPEX (sampling-based proof of execution), tensor commitments, and self-consistency checking. NabaOS achieves the best cost-latency-coverage trade-off for interactive agents: 94.2% coverage at <15ms versus zkLLM's near-perfect coverage at 180,000ms. For interactive agents, practical receipt-based verification provides better cost-benefit than cryptographic proofs, and epistemic classification gives users actionable trust signals rather than binary judgments.
Authors:Onur Günlü
Abstract:
We establish the randomized distributed function computation (RDFC) framework, in which a sender transmits just enough information for a receiver to generate a randomized function of the input data. Describing RDFC as a form of semantic communication, which can be essentially seen as a generalized remote-source-coding problem, we show that security and privacy constraints naturally fit this model, as they generally require a randomization step. Using strong coordination metrics, we ensure (local differential) privacy for every input sequence and prove that such guarantees can be met even when no common randomness is shared between the transmitter and receiver. This work provides lower bounds on Wyner's common information (WCI), which is the communication cost when common randomness is absent, and proposes numerical techniques to evaluate the other corner point of the RDFC rate region for continuous-alphabet random variables with unlimited shared randomness. Experiments illustrate that a sufficient amount of common randomness can reduce the semantic communication rate by up to two orders of magnitude compared to the WCI point, while RDFC without any shared randomness still outperforms lossless transmission by a large margin. A finite blocklength analysis further confirms that the privacy parameter gap between the asymptotic and non-asymptotic RDFC methods closes exponentially fast with input length. Our results position RDFC as an energy-efficient semantic communication strategy for privacy-aware distributed computation systems.
Authors:Tony Mason
Abstract:
System prompts for LLM-based coding agents are software artifacts that govern agent behavior, yet lack the testing infrastructure applied to conventional software. We present Arbiter, a framework combining formal evaluation rules with multi-model LLM scouring to detect interference patterns in system prompts. Applied to three major coding agent system prompts: Claude Code (Anthropic), Codex CLI (OpenAI), and Gemini CLI (Google), we identify 152 findings across the undirected scouring phase and 21 hand-labeled interference patterns in directed analysis of one vendor. We show that prompt architecture (monolithic, flat, modular) strongly correlates with observed failure class but not with severity, and that multi-model evaluation discovers categorically different vulnerability classes than single-model analysis. One scourer finding was structural data loss in Gemini CLI's memory system was consistent with an issue filed and patched by Google, which addressed the symptom without addressing the schema-level root cause identified by the scourer. Total cost of cross-vendor analysis: \$0.27 USD.
Authors:Jian Sheng Wang
Abstract:
In post-quantum blockchain settings, objects that require validity proofs (e.g., blob roots, execution-layer or consensus-layer signature aggregates) must be broadcast through mempool and relay networks. Recursive STARKs have been proposed to aggregate such proofs so that each node forwards one proof per tick plus objects without proofs, capping per-node proof bandwidth at roughly 128 KB degree per tick. We observe that propagation does not inherently require validity proofs on the path-only a lightweight assurance that an object is eligible for relay. We present AR-ACE (ACE-GF-based Attestation Relay for PQC), in which relay nodes forward objects plus compact attestations (e.g., identity-bound signatures or commitments) and do not generate, hold, or forward any full validity proof. Only the builder (or final verifier) performs a single aggregated validity proof over the set of objects it includes. This proof-off-path design removes proof overhead from the propagation path entirely, yielding an order-of-magnitude reduction in proof-related relay bandwidth relative to proof-carrying propagation. When instantiated with ACE-GF-derived attestation keys, AR-ACE preserves a unified identity story with on-chain authorization and is PQC-ready. We specify a protocol model, state design goals and security considerations, define security games, and provide a structural bandwidth comparison with recursive-STARK-based propagation.
Authors:Jian Sheng Wang
Abstract:
Post-quantum signature schemes introduce kilobyte-scale authorization artifacts when applied directly to blockchain transaction validation. A widely considered mitigation is to verify post-quantum signatures inside zero-knowledge circuits and publish only succinct proofs on-chain. However, this approach preserves the signature-centric authorization model, merely relocating the verification cost, and embeds expensive high-dimensional lattice arithmetic into prover circuits.We present ZK-ACE (Zero-Knowledge Authorization for Cryptographic Entities), an authorization layer that replaces transaction-carried signature objects entirely with identity-bound zero-knowledge authorization statements. Rather than proving the correctness of a specific post-quantum signature, the prover demonstrates in zero knowledge that a transaction is authorized by an identity consistent with an on-chain commitment and bound replay state. The construction assumes a deterministic identity derivation primitive (DIDP) as a black box and uses a compact identity commitment as the primary on-chain identity anchor, supplemented by per-transaction replay-prevention state. We formalize ZK-ACE with explicit game-based security definitions for authorization soundness, replay resistance, substitution resistance, and cross-domain separation. We present a complete circuit constraint specification, define two replay-prevention models, and provide reduction-based security proofs under standard assumptions (knowledge soundness, collision resistance, and DIDP identity-root recovery hardness). A structural, protocol-level data accounting demonstrates an order-of-magnitude reduction in consensus-visible authorization data relative to direct post-quantum signature deployment. The design supports batch aggregation and recursive proof composition, and is compatible with account-abstraction and rollup-based deployment architectures.
Authors:David Alejandro Trejo Pizzo
Abstract:
We present Lattice (L, ticker: LAT), a peer-to-peer electronic cash system designed as a post-quantum settlement layer for the era of quantum computing. Lattice combines three independent defense vectors: hardware resilience through RandomX CPU-only proof-of-work, network resilience through LWMA-1 per-block difficulty adjustment (mitigating the Flash Hash Rate vulnerability that affects fixed-interval retarget protocols), and cryptographic resilience through ML-DSA-44 post-quantum digital signatures (NIST FIPS 204, lattice-based), enforced exclusively from the genesis block with no classical signature fallback. The protocol uses a brief warm-up period of 5,670 fast blocks (53-second target, 25 LAT reduced reward) for network bootstrap, then transitions permanently to 240-second blocks, following a 295,000-block halving schedule with a perpetual tail emission floor of 0.15 LAT per block. Block weight capacity grows in stages (11M to 28M to 56M) as the network matures. The smallest unit of LAT is the shor, named after Peter Shor, where 1 LAT = 10^8 shors.
Authors:Jian Sheng Wang
Abstract:
Control of encrypted digital assets is traditionally equated with permanent possession of private keys, a model that precludes regulatory supervision, conditional delegation, and legally compliant transfer at the cryptographic layer. Existing remedies (multi-signature schemes, threshold signatures, smart contracts, custodial delegation) require persistent key exposure, on-chain state mutation, or trusted intermediaries. We introduce Condition-Triggered Dormant Authorization Paths (CT-DAP), a cryptographic asset control method built on destructible authorization factors and parameterized by a root-derivable framework satisfying deterministic key derivation, context-isolated capability generation, and authorization-bound revocation. Under CT-DAP, control rights are dormant authorization paths composed of user-held credentials and administrative factors held by independent custodians; a path remains cryptographically inactive until all factors are simultaneously available. Upon verification of predefined conditions (e.g., user consent, inheritance events, time-based triggers), the corresponding factor is released, activating the path. Revocation is achieved by destroying factors, rendering the path permanently unusable without altering the cryptographic root. We formalize the threat model, define security games for unauthorized control resistance, path isolation, and stateless revocation, and prove security under standard assumptions (AEAD security of AES-GCM-SIV, PRF security of HKDF, memory-hardness of Argon2id, collision resistance of SHA-256). We instantiate CT-DAP using the Atomic Cryptographic Entity Generative Framework (ACE-GF) and evaluate performance, demonstrating sub-second activation latency with configurable security-performance trade-offs.
Authors:Jonathan Shelby
Abstract:
The Cyber Security and Resilience (Network and Information Systems) Bill, introduced to Parliament in November 2025, represents the most significant reform of UK cyber security legislation in nearly a decade. This paper provides a comprehensive practitioner-oriented analysis of the Bill's provisions, their practical implications, and the steps organisations must take to achieve compliance. It examines the expanded regulatory scope covering managed service providers, data centres, and designated critical suppliers; the enhanced 24/72-hour incident reporting regime; the strengthened enforcement architecture including penalties of up to \pounds17 million or 4\% of worldwide turnover; and the Secretary of State's new executive powers. The paper compares the Bill with the EU's NIS2 Directive and DORA, proposing a practical dual-compliance framework for financial services firms. It explains how Zero Trust Architecture principles can serve as a foundation for meeting the Bill's requirements, and how the NCSC's Cyber Assessment Framework v4.0 provides the assurance pathway. Four detailed appendices provide entity-specific compliance roadmaps, worked case studies mapping real UK incidents to Bill provisions, sector-specific action plans for financial services, energy, health, and MSPs, and a complete gap analysis and self-assessment tool mapped to CAF v4.0 and the Bill's requirements.
Authors:Bo Jiang
Abstract:
Knowledge distillation from proprietary LLM APIs poses a growing threat to model providers, yet defenses against this attack remain fragmented and unevaluated. We present DistillGuard, a framework for systematically evaluating output-level defenses against LLM knowledge distillation. We introduce a taxonomy of three defense categories -- output perturbation, data poisoning, and information throttling -- and evaluate nine defense configurations using a standardized pipeline with Qwen3-14B as teacher and Qwen2.5-7B-Instruct as student across three benchmarks (MATH-500, HumanEval+, MT-Bench). Our results reveal that, in a same-family distillation setting against a naive attacker, most output-level defenses are surprisingly ineffective: paraphrasing-based perturbation barely degrades distilled student quality, and data poisoning primarily impairs conversational fluency while leaving task-specific capabilities intact. Only chain-of-thought removal substantially impairs mathematical reasoning (31.4\% vs.\ 67.8\% baseline), though code generation remains unaffected. These findings demonstrate that the effectiveness of distillation defenses is highly task-dependent and that current output-level approaches are insufficient to broadly prevent knowledge theft.
Authors:Rezvi Shahariar
Abstract:
Trust management is a critical research pillar in Vehicular Ad Hoc Networks (VANETs), where the reliability of shared data depends entirely on driver integrity. In these networks, a driver's reputation is dynamically constructed based on the veracity of their recent message history: consistent reliability builds trust, while frequent misinformation leads to exclusion. This study analyses driver announcement characteristics by modelling behavioural transitions-specifically the frequency and nature of shifts between "good" and "bad" states. To facilitate this analysis, three distinct Markov chain-based behavioural models are evaluated with varying degrees of granularity: a 4-state model, a 7-state model, and a high-resolution 11-state model. By simulating announcement and reporting patterns, each model's ability to reflect nuanced behavioural shifts is assessed. Our results confirm that increasing the number of trust states significantly enhances the system's ability to capture complex, dynamic driver behaviours, providing a more robust framework for security in VANETs.
Authors:Chong Guan
Abstract:
Proof-of-Work (PoW) is a fundamental method in decen- tralized digital networks for establishing consensus on a shared ledger. By requiring network participants to solve a mathematical puzzle, PoW maintains network integrity. However, PoW has raised environmental concerns due to its significant energy consumption. This paper introduces Proof-of-Encryption-Work (PoEW), a novel PoW consensus mechanism that repurposes computational power to address the challenge of encryption-based data compression. PoEW uses an ex- haustive key search as the PoW puzzle. Given a lengthy plaintext and a fixed ciphertext, the corresponding key is derived. Since the plain- text is much longer than both the key and the ciphertext, this process compresses the plaintext to the key. This data compression is computa- tionally intensive, while decompression is straightforward.
Authors:Yuxu Ge
Abstract:
Autonomous agents powered by large language models introduce a class of execution-layer vulnerabilities -- prompt injection, retrieval poisoning, and uncontrolled tool invocation -- that existing guardrails fail to address systematically. In this work, we propose the Layered Governance Architecture (LGA), a four-layer framework comprising execution sandboxing (L1), intent verification (L2), zero-trust inter-agent authorization (L3), and immutable audit logging (L4). To evaluate LGA, we construct a bilingual benchmark (Chinese original, English via machine translation) of 1,081 tool-call samples -- covering prompt injection, RAG poisoning, and malicious skill plugins -- and apply it to OpenClaw, a representative open-source agent framework. Experimental results on Layer 2 intent verification with four local LLM judges (Qwen3.5-4B, Llama-3.1-8B, Qwen3.5-9B, Qwen2.5-14B) and one cloud judge (GPT-4o-mini) show that all five LLM judges intercept 93.0-98.5% of TC1/TC2 malicious tool calls, while lightweight NLI baselines remain below 10%. TC3 (malicious skill plugins) proves harder at 75-94% IR among judges with meaningful precision-recall balance, motivating complementary enforcement at Layers 1 and 3. Qwen2.5-14B achieves the best local balance (98% IR, approximately 10-20% FPR); a two-stage cascade (Qwen3.5-9B->GPT-4o-mini) achieves 91.9-92.6% IR with 1.9-6.7% FPR; a fully local cascade (Qwen3.5-9B->Qwen2.5-14B) achieves 94.7-95.6% IR with 6.0-9.7% FPR for data-sovereign deployments. An end-to-end pipeline evaluation (n=100) demonstrates that all four layers operate in concert with 96% IR and a total P50 latency of approximately 980 ms, of which the non-judge layers contribute only approximately 18 ms. Generalization to the external InjecAgent benchmark yields 99-100% interception, confirming robustness beyond our synthetic data.
Authors:Chinecherem Dimobi
Abstract:
Searchable Symmetric Encryption (SSE) allows users to search over encrypted data stored on untrusted servers, like cloud providers. While SSE hides the content of queries and documents, it still leaks patterns, such as how often a query is made. These leakages have been shown to enable leakage abuse attacks, but recent defenses have made such attacks harder to carry out. In this work, we explore how system-level monitoring using eBPF (Extended Berkeley Packet Filter) can be used to uncover new forms of leakage that go beyond what is typically captured in SSE threat models. By observing low-level system behavior during search operations, we show that an attacker can gain additional insights into query behavior, document access, and processing flow. We define a new leakage pattern based on these observations and demonstrate how they can strengthen existing attacks. Our findings suggest that system-level leakages present a practical threat to SSE deployments and must be considered when designing defenses. This work serves as a step toward bridging the gap between theoretical SSE security and the realities of system-level exposure.
Authors:Wenyang Jia
Abstract:
The stability of Internet services is persistently challenged by the escalating scale of volumetric TCP SYN floods, as conventional defenses like SYN Cookies fail by exacerbating bandwidth depletion under modern attacks. This paper introduces SDN-SYN PoW, a novel defense architecture that synergizes non-interactive Proof-of-Work with a Software-Defined Networking (SDN) control plane, an approach particularly effective for securing the network edge in modern SD-WAN deployments. The core innovation is its ability to perform global network sensing; the SDN controller monitors real-time traffic to dynamically adjust PoW difficulty, transforming the defense from a static mechanism into an intelligent, adaptive system that surgically applies computational costs only to anomalous sources. Through rigorous experiments on a custom-built testbed, we demonstrate that SDN-SYN PoW provides substantially superior protection and, critically, that the PoW overhead remains negligible for legitimate clients, ensuring compatibility even with low-power devices.
Authors:Christoph F. Strnadl
Abstract:
We define a method how digital ecosystems (including data spaces) may autonomously define and "advertise" credentials they issue or they trust in the form of so-called ecosystem trust profiles. An ecosystem trust profile collects all (verifiable) credentials and issuers sorted by trust scope accepted ("trusted") by a particular ecosystem. We then show how a minimal trust relation between ecosystems may be defined using ecosystem trust frameworks of different ecosystems and explore a few of its properties. A first application of the theory is given for a use case in the manufacturing realm where different international ecosystems need to agree on certain credentials for various scopes of trust such as identity, service compliance, and other conformance standards. We implement this requirement by identifying and discussing two different definitions of credential equivalence for a given trust scope, one requiring additional cross-ecosystem governance or coordination, one not. The second approach demonstrates how to solve the so-called cross-ecosystem trust dilemma, that is, the problem how ecosystems can establish cross-ecosystem trust while, at the same time, allowing them to fully retain their sovereignty. A fragility theorem demonstrates that this sovereignty leads trust to be unstable without any additional coordination or governance mechanisms on top of (and outside to) ecosystem trust profiles. We extend our method to data spaces in particular and propose a novel rigorous definition of cross-data space interoperability. This allows us to prove the proposition that the extent of interoperability between two data spaces is exactly determined by the amount of commonality in their respective ecosystem trust profiles.
Authors:Nicolas Ruiz
Abstract:
Randomized response is a popular local anonymization approach that can deliver anonymized multi-dimensional data sets with rigorous privacy guarantees. At the same time, it can ensure validity for exploratory analysis and machine learning tasks as, under fairly general conditions, unbiased estimates of the underlying true distributions can be retrieved. However, and like for many other anonymization techniques, one of the main pitfalls of this approach is the curse of dimensionality. When coping with data sets with many attributes, one quickly runs into unsustainable computational costs for estimating true distributions, as well as a degradation in their accuracies. Relying on new theoretical insights developed in this paper, we propose an approach to multi-dimensional randomized response that avoids these traditional limitations. From simple yet intuitive parameterizations of the randomization matrices that we introduce, we develop a protocol called Lambda-randomization that entails low computational costs to retrieve estimates of multivariate distributions, and that makes use of solely three simple elements: a set of parameters ranging between 0 and 1 (one per attribute of the data set), the identity matrix, and the all-ones vector. We also present an empirical application to illustrate the proposed protocol.
Authors:Mohammad Alikhani
Abstract:
Cyber-security systems often operate in resource-constrained environments, such as edge environments and real-time monitoring systems, where model size and inference time are crucial. A light-weight intrusion detection framework is proposed that utilizes the Kolmogorov-Arnold Network (KAN) to capture complex features in the data, with the efficiency of decoupled knowledge distillation (DKD) training approach. A high-capacity KAN network is first trained to detect attacks performed on the test bed. This model then serves as a teacher to guide a much smaller multilayer perceptron (MLP) student model via DKD. The resulting DKD-MLP model contains only 2,522 and 1,622 parameters for WADI and SWaT datasets, which are significantly smaller than the number of parameters of the KAN teacher model. This is highly appropriate for deployment in resource-constrained devices with limited computational resources. Despite its low size, the student model maintains a high performance. Our approach demonstrate the practicality of using KAN as a knowledge-rich teacher to train much smaller student models, without considerable drop in accuracy in intrusion detection frameworks. We have validated our approach on two publicly available datasets. We report F1-score improvements of 4.18% on WADI and 3.07% on SWaT when using the DKD-MLP model, compared to the bare student model. The implementation of this paper is available on our GitHub repository.
Authors:Edouard Lansiaux
Abstract:
Federated Learning (FL) enables collaborative training of medical AI models across hospitals without centralizing patient data. However, the exchange of model updates exposes critical vulnerabilities: gradient inversion attacks can reconstruct patient information, Byzantine clients can poison the global model, and the \emph{Harvest Now, Decrypt Later} (HNDL) threat renders today's encrypted traffic vulnerable to future quantum adversaries.We introduce \textbf{ZKFL-PQ} (\emph{Zero-Knowledge Federated Learning, Post-Quantum}), a three-tiered cryptographic protocol that hybridizes (i) ML-KEM (FIPS~203) for quantum-resistant key encapsulation, (ii) lattice-based Zero-Knowledge Proofs for verifiable \emph{norm-constrained} gradient integrity, and (iii) BFV homomorphic encryption for privacy-preserving aggregation. We formalize the security model and prove correctness and zero-knowledge properties under the Module-LWE, Ring-LWE, and SIS assumptions \emph{in the classical random oracle model}. We evaluate ZKFL-PQ on synthetic medical imaging data across 5 federated clients over 10 training rounds. Our protocol achieves \textbf{100\% rejection of norm-violating updates} while maintaining model accuracy at 100\%, compared to a catastrophic drop to 23\% under standard FL. The computational overhead (factor $\sim$20$\times$) is analyzed and shown to be compatible with clinical research workflows operating on daily or weekly training cycles. We emphasize that the current defense guarantees rejection of large-norm malicious updates; robustness against subtle low-norm or directional poisoning remains future work.
Authors:Abel C. H. Chen
Abstract:
As V2X (Vehicle-to-Everything) technology becomes increasingly prevalent, the security of V2X networks has garnered growing attention worldwide. In North America, the IEEE 1609 series standards are primarily used, while Europe adopts the ETSI series standards, and China has also established its industry standard, YD/T 3957-2021, among others. Although these standards share some commonalities, they also exhibit differences. To achieve compatibility across these standards, analyzing their similarities and differences is a crucial issue. Therefore, this study focuses on analyzing the three major standards mentioned above, discussing aspects such as certificate formats, signed message formats, and certificate request processes. Additionally, this research evaluates the efficiency of different cryptographic methods, including NIST P-256 and SM2-256, SHA-256 and SM3-256, as well as AES-128 and SM4-128. Finally, the study implements these three major standards on V2X devices and compares the efficiency of message signing and signature verification in V2X systems, providing a reference for the development of a secure certificate management system for V2X networks.
Authors:Jian Sheng Wang
Abstract:
Serverless wallet recovery must balance portability, usability, and privacy. Public registries enable decentralized lookup but naive identifier hashing leaks membership through enumeration. We present VA-DAR, a keyed-discovery protocol for ACE-GF-based wallets that use device-bound passkeys for day-to-day local unlock while supporting cross-device recovery using only a user-provided identifier (e.g., email) and a single recovery passphrase. As a discovery-and-recovery layer over ACE-GF, VA-DAR inherits ACE-GF's context-isolated, algorithm-agile derivation substrate, enabling non-disruptive migration to post-quantum algorithms at the identity layer. The design introduces a decentralized discovery-and-recovery layer that maps a privacy-preserving discovery identifier to an immutable content identifier of a backup sealed artifact stored on a decentralized storage network. Concretely, a user derives passphrase-rooted key material with a memory-hard KDF, domain-separates keys for artifact sealing and discovery indexing, and publishes a registry record keyed by a passphrase-derived discovery identifier. VA-DAR provides: (i) practical cross-device recovery using only identifier and passphrase, (ii) computational resistance to public-directory enumeration, (iii) integrity of discovery mappings via owner authorization, and (iv) rollback/tamper detection via monotonic versioning and artifact commitments. We define three sealed artifact roles, two update-authorization options, and three protocol flows (registration, recovery, update). We formalize security goals via cryptographic games and prove, under standard assumptions, that VA-DAR meets these goals while remaining vendor-agnostic and chain-agnostic. End-to-end post-quantum deployment additionally requires a PQ-secure instantiation of registry authorization.
Authors:Ramanpreet Singh Khinda
Abstract:
A single authentication bypass in a partner SDK grants attackers the identity of every partner in the ecosystem -- and millions of apps use SDKs with exactly this vulnerability. OWASP's 2024 Mobile Top 10 ranks Inadequate Supply Chain Security as the second most critical mobile risk, explicitly identifying third-party SDKs as a primary attack vector. Cross-app mobile SDKs -- where a partner application communicates with a platform provider's application via inter-process communication (IPC) -- mediate sensitive operations such as content publishing, payment initiation, and identity federation. Unlike embedded libraries that execute within a single app's process, cross-app SDKs require the provider's service to authenticate the calling application at runtime. A pattern sometimes used for this authentication relies on PendingIntent.getCreatorPackage() to verify sender identity. We demonstrate that this mechanism exhibits a fundamental provenance confusion vulnerability: a PendingIntent reliably identifies who created it but cannot attest who presents it -- and this distinction is fatal for authentication. An attacker app with notification access can steal a legitimate partner's PendingIntent via NotificationListenerService and replay it to impersonate that partner, bypassing authentication entirely. The attack succeeds against both mutable and immutable PendingIntents because immutability protects the token's contents, not its provenance. We systematically evaluate eight Android IPC authentication mechanisms against an SDK-specific threat model and present a defense architecture combining Bound Service IPC with kernel-level caller verification via Binder.getCallingUid(), supplemented by server-side certificate-hash validation. This provides authentication guarantees while remaining scalable across partner ecosystems.
Authors:Artur Pericles L. Monteiro
Abstract:
This article argues that security is not enough to fully capture what is at stake in government exceptional access to encrypted data. A conception of privacy as security has little to say about ``lawful-surveillance protocols'' -- an active research agenda in cryptography that aims to enable government exceptional access without compromising systemic security. But the limitations are not contingent on the success of this agenda. The normative landscape today cannot be explained if security is all there is to privacy. And fundamental objections to Apple's abandoned client-side scanning system gesture beyond security. This article's contribution is modest: to show that there must be more to privacy than the security mold it has taken. A richer understanding is needed both to assess policy and to guide research on lawful-surveillance protocols.
Authors:Om Tailor
Abstract:
Colluding language-model agents can hide coordination in messages that remain policy-compliant at the surface level. We present CLBC, a protocol where generation and admission are separated: a message is admitted to transcript state only if a small verifier accepts a proof-bound envelope under a pinned predicate $Π$. The predicate binds policy hash, public randomness schedule, transcript chaining, latent schema constraints, canonical metadata/tool fields, and deterministic rejection codes. We show how this protocol yields an upper bound on transcript leakage in terms of latent leakage plus explicit residual channels, derive adaptive composition guarantees, and state a semantic lower bound when policy-valid alternatives remain choosable. We report extensive empirically grounded evidence: aggregate evaluation satisfies all prespecified thresholds; strict lane decoder advantage is bounded at 0.0000 with MI proxy 0.0636; adaptive-colluder stress tests remain below attacker thresholds; and baseline separation shows large gaps between reject-by-default semantics and audit-only controls. We further quantify operational tradeoffs. Strict full-proof mode has median turn latency 27.53s (p95 28.08s), while sampled proving reduces non-proved-turn latency to 0.327ms. The central finding is that bottlenecks alone are insufficient: security claims depend on verifiable admission semantics that are online, deterministic, and fail-closed.
Authors:Jian Sheng Wang
Abstract:
As AI agents increasingly perform economic tasks on behalf of humans, a fundamental tension arises between agent autonomy and human control over financial assets. We present the Agent Economic Sovereignty Protocol (AESP), a layered protocol in which agents transact autonomously at machine speed on crypto-native infrastructure while remaining cryptographically bound to human-defined governance boundaries. AESP enforces the invariant that agents are economically capable but never economically sovereign through five mechanisms: (1) a deterministic eight-check policy engine with tiered escalation; (2) human-in-the-loop review with automatic, explicit, and biometric tiers; (3) EIP-712 dual-signed commitments with escrow; (4) HKDF-based context-isolated privacy with batched consolidation; and (5) an ACE-GF-based cryptographic substrate. We formalize two testable hypotheses on security coverage and latency overhead, and specify a complete evaluation methodology with baselines and ablation design. The protocol is implemented as an open-source TypeScript SDK (208 tests, ten modules) with interoperability via MCP and A2A.
Authors:Srikumar Nayak
Abstract:
Financial systems run nonstop and must stay reliable even during cyber incidents. Modern attacks move across many services (apps, APIs, identity, payment rails), so defenders must make a sequence of actions under time pressure. Most security tools still use fixed rules or static playbooks, which can be slow to adapt when the attacker changes behavior. Reinforcement learning (RL) is a good fit for sequential decisions, but much of the RL-in-finance literature targets trading and does not model real cyber response limits such as action cost, service disruption, and defender coordination across many assets. This paper proposes RLShield, a practical multi-agent RL pipeline for financial cyber defense. We model the enterprise attack surface as a Markov decision process (MDP) where states summarize alerts, asset exposure, and service health, and actions represent real response steps (e.g., isolate a host, rotate credentials, ratelimit an API, block an account, or trigger recovery). RLShield learns coordinated policies across multiple agents (assets or service groups) and optimizes a risk-sensitive objective that balances containment speed, business disruption, and response cost. We also include a game-aware evaluation that tests policies against adaptive attackers and reports operational outcomes, not only reward. Experiments show that RLShield reduces time-to-containment and residual exposure while keeping disruption within a fixed response budget, outperforming static rule baselines and single-agent RL under the same constraints. These results suggest that multi-agent, cost-aware RL can provide a deployable layer for automated response in financial security operations.
Authors:Srikumar Nayak
Abstract:
Intrusion detection in IoT and industrial networks requires models that can detect rare attacks at low false-positive rates while remaining reliable under evolving traffic and limited labels. Existing IDS solutions often report strong in-distribution accuracy, but they may degrade when evaluated on future traffic, unseen (zero-day) attack families, or adversarial feature manipulations, and many systems provide limited evidence to support analyst triage. To address these gaps, we propose ThreatFormer- IDS, a Transformer-based sequence modeling framework that converts flow records into time-ordered windows and learns contextual representations for robust intrusion screening. The method combines (i) weighted supervised learning for imbalanced detection, (ii) masked self-supervised learning to improve representation stability under drift and sparse labels, (iii) PGDbased adversarial training with scale-normalized perturbations to strengthen resilience against feature-level evasion, and (iv) Integrated Gradients attribution to highlight influential time steps and features for each alert. On the ToN IoT benchmark with chronological evaluation, ThreatFormer-IDS achieves AUCROC 0.994, AUC-PR 0.956, and Recall@1%FPR 0.910, outperforming strong tree-based and sequence baselines. Under a zero-day protocol with held-out attack families, it maintains superior generalization (AUC-PR 0.721, Recall@1%FPR 0.783). Robustness tests further show slower degradation in AUCPR as the adversarial budget increases, confirming improved stability under bounded perturbations. Overall, ThreatFormer- IDS provides a unified, deployment-oriented IDS pipeline that balances detection quality, zero-day behavior, robustness, and explainability.
Authors:David Condrey
Abstract:
Process attestation verifies human authorship by collecting behavioral biometric evidence, including keystroke dynamics, typing patterns, and editing behavior, during the creative process. However, the very data needed to prove authenticity can reveal intimate details about an author's cognitive state, health conditions, and identity, constituting sensitive biometric data under GDPR Article 9. We resolve this privacy-attestation paradox using zero-knowledge proofs. We present ZK-PoP, a construction that allows a verifier to confirm that (a) sequential work function chains were computed correctly, (b) behavioral feature vectors fall within human population distributions, and (c) content evolution is consistent with incremental human editing, all without learning the underlying behavioral data, exact timing, or intermediate content. Our construction uses Groth16 proofs over arithmetic circuits with Pedersen commitments and Bulletproof range proofs. We prove that ZK-PoP is computationally zero-knowledge, computationally sound, and achieves unlinkability across sessions. Evaluation shows proof generation in under 30 seconds for a 1-hour writing session, with 192-byte proofs verifiable in 8.2 ms, while incurring less than 5% accuracy loss in simulation at practical privacy levels (epsilon >= 1.0) compared to non-private baselines.
Authors:David Condrey
Abstract:
Process attestation systems verify that a continuous physical process, such as human authorship, actually occurred, rather than merely checking system state. These systems face a fundamental dependability challenge: the evidence collection infrastructure must remain available and tamper-resistant even when the attesting party controls the platform. Trusted Execution Environments (TEEs) provide hardware-enforced isolation that can address this challenge, but their integration with continuous process attestation introduces novel resilience requirements not addressed by existing frameworks. We present the first architecture for continuous process attestation evidence collection inside TEEs, providing hardware-backed tamper resistance against trust-inverted adversaries with graduated input assurance from software-channel integrity (Tier 1) through hardware-bound input (Tier 3). We develop a Markov-chain dependability model quantifying Evidence Chain Availability (ECA), Mean Time Between Evidence Gaps (MTBEG), and Recovery Time Objectives (RTO). We introduce a resilient evidence chain protocol maintaining chain integrity across TEE crashes, network partitions, and enclave migration. Our security analysis derives formal bounds under combined threat models including trust inversion and TEE side channels, parameterized by a conjectural side-channel leakage bound esc that requires empirical validation. Evaluation on Intel SGX demonstrates under 25% per-checkpoint CPU overhead (<0.3% of the 30 s checkpoint interval), >99.5% Evidence Chain Availability (ECA) (the fraction of session time with active evidence collection) in Monte Carlo simulation under Poisson failure models, and sealed-state recovery under 200 ms.
Authors:David Condrey
Abstract:
The proliferation of AI-generated text has intensified the need for reliable authorship verification, yet current output-based methods are increasingly unreliable. We observe that the ordinary typing interface captures rich cognitive signatures, measurable patterns in keystroke timing that reflect the planning, translating, and revising stages of genuine composition. Drawing on large-scale keystroke datasets comprising over 136 million events, we define the Cognitive Load Correlation (CLC) and show it distinguishes genuine composition from mechanical transcription. We present a non-intrusive verification framework that operates within existing writing interfaces, collecting only timing metadata to preserve privacy. Our analytical evaluation estimates 85 to 95 percent discrimination accuracy under stated assumptions, while limiting biometric leakage via evidence quantization. We analyze the adversarial robustness of cognitive signatures, showing they resist timing-forgery attacks that defeat motor-level authentication because the cognitive channel is entangled with semantic content. We conclude that reframing authorship verification as a human-computer interaction problem provides a privacy-preserving alternative to invasive surveillance.
Authors:Nicolas Constantinides
Abstract:
Tor onion services rely on long-lived introduction circuits to support anonymous rendezvous between clients and services. Although Tor includes some defenses against traffic analysis, the introduction protocol retains deterministic routing structure that can be leveraged by an adversary. We describe a practical intersection attack on Tor introduction circuits that can, over repeated interactions, identify each hop from the introduction point toward the onion service while requiring observation at only one relay per stage. The attack issues repeated probes and intersects destination IP address sets observed within narrowly defined \texttt{INTRODUCE1}--\texttt{RENDEZVOUS2} time windows, without assuming global visibility or access to packet payloads. We evaluate feasibility with live-network experiments using a self-operated onion service and relays, and we follow data-minimization and ethical safeguards throughout. The results show reliable convergence in practice, with the rate affected by consensus weight, and time-varying background traffic. We also assess practicality under a partial-global adversary model and discuss implications in light of the geographic concentration of Tor relay weight across cooperating jurisdictions.
Authors:Majid Khabbazian
Abstract:
Multi-scalar multiplication (MSM), defined as MSM(P, x) = sum_{i=1}^n x_i P_i, is a dominant computational kernel in discrete-logarithm-based cryptography and often becomes a bottleneck for verifiers and other resource-constrained clients. We present 2G2T, a simple protocol for verifiably outsourcing MSM to an untrusted server. After a one-time keyed setup for fixed bases P = (P1, ..., Pn) that produces a public merged-bases vector T and client secret state, the server answers each query x = (x1, ..., xn) with only two group elements: A claimed to equal MSM(P, x) and an auxiliary value B claimed to equal MSM(T, x). Verification requires a single length-n field inner product and a constant number of group operations (two scalar multiplications and one addition), while the server performs two MSMs. In our Ristretto255 implementation, verification is up to ~300x faster than computing the MSM locally using a highly optimized MSM routine for n up to 2^18, and the server-to-client response is constant-size (two compressed group elements, 64 bytes on Ristretto255). Despite its simplicity and efficiency, 2G2T achieves statistical soundness: for any (even computationally unbounded) adversarial server, the probability of accepting an incorrect result is at most 1/q per query, and at most e/q over e adaptive executions, in a prime-order group of size q.
Authors:Mohammad Sabouri
Abstract:
Teleoperated quadruped robots are increasingly deployed in safety-critical missions -- industrial inspection, military reconnaissance, and emergency response -- yet the security of their communication and control infrastructure remains insufficiently characterized. Quadrupeds present distinct security challenges arising from dynamic stability constraints, gait-dependent vulnerability windows, substantial kinetic energy, and elevated operator cognitive load. This survey synthesizes peer-reviewed literature and vulnerability disclosures (2019--2025) to provide comprehensive analysis of cybersecurity threats, consequences, and countermeasures for teleoperated quadruped systems. We contribute: (i) a six-layer attack taxonomy spanning perception manipulation, VR/AR operator targeting, communication disruption, control signal attacks, localization spoofing, and network intrusion; (ii) systematic attack-to-consequence mapping with timing characterization; (iii) Technology Readiness Level classification exposing critical maturity gaps between field-deployed communication protections (TRL 7--9) and experimental perception/operator-layer defenses (TRL 3--5); (iv) comparative security analysis of six commercial platforms; (v) pragmatic deployment guidance stratified by implementation timeline; and (vi) eight prioritized research gaps with implementation roadmaps. Limitations: Platform assessments rely on publicly available information. Attack success rates derive from cited studies under controlled conditions and require domain-specific validation.
Authors:Nelly Elsayed
Abstract:
Distributed denial-of-service (DDoS) attacks threaten the availability of Internet of Things (IoT) infrastructures, particularly under resource-constrained deployment conditions. Although transfer learning models have shown promising detection accuracy, their reliability, computational feasibility, and interpretability in operational environments remain insufficiently explored. This study presents an explainability-aware empirical evaluation of seven pre-trained convolutional neural network architectures for multi-class IoT DDoS detection using the CICDDoS2019 dataset and an image-based traffic representation. The analysis integrates performance metrics, reliability-oriented statistics (MCC, Youden Index, confidence intervals), latency and training cost assessment, and interpretability evaluation using Grad-CAM and SHAP. Results indicate that DenseNet and MobileNet-based architectures achieve strong detection performance while demonstrating superior reliability and compact, class-consistent attribution patterns. DenseNet169 offers the strongest reliability and interpretability alignment, whereas MobileNetV3 provides an effective latency-accuracy trade-off for fog-level deployment. The findings emphasize the importance of combining performance, reliability, and explainability criteria when selecting deep learning models for IoT DDoS detection.
Authors:Balazs Pejo
Abstract:
Federated learning offers a privacy-friendly collaborative learning framework, yet its success, like any joint venture, hinges on the contributions of its participants. Existing client evaluation methods predominantly focus on model performance, such as accuracy or loss, which represents only one dimension of a machine learning model's overall utility. In contrast, this work investigates the critical, yet overlooked, issue of client contributions towards a model's trustworthiness -- specifically, its reliability (tolerance to noisy data), resilience (resistance to adversarial examples), and fairness (measured via demographic parity). To quantify these multifaceted contributions, we employ the state-of-the-art approximation of the Shapley value, a principled method for value attribution. Our results reveal that no single client excels across all dimensions, which are largely independent from each other, highlighting a critical flaw in current evaluation scheme: no single metric is adequate for comprehensive evaluation and equitable rewarding allocation.
Authors:Refat Othman
Abstract:
Modern infrastructures rely on software systems that remain vulnerable to cyberattacks. These attacks frequently exploit vulnerabilities documented in repositories such as MITRE's Common Vulnerabilities and Exposures (CVE). However, Cyber Threat Intelligence resources, including MITRE ATT&CK and CVE, provide only partial coverage of attack-vulnerability relationships. Attack information often appears before vulnerabilities are formally linked, creating the need for automated methods that infer likely vulnerabilities directly from attack descriptions. This thesis addresses the problem of predicting known vulnerabilities from natural-language descriptions of cyberattacks. We develop transformer-based sentence embedding methods that encode attack and vulnerability descriptions into semantic vector representations, enabling similarity-based ranking and recommendation. Fourteen state-of-the-art transformer models were evaluated across four attack description types (Tactic, Technique, Procedure, and Attack Pattern). Results show that Technique descriptions in MITRE ATT&CK provide the strongest predictive signal. The multi-qa-mpnet-base-dot-v1 (MMPNet) model achieved the best performance due to its hybrid pre-training and optimization for semantic similarity. The approach was implemented in the VULDAT tool, which automatically links attacks to vulnerabilities. Manual validation revealed previously undocumented relationships in MITRE repositories. Evaluation on unseen cyberattack reports demonstrates that the models generalize beyond curated datasets and support proactive vulnerability awareness.
Authors:Harrison Dahme
Abstract:
Training-data poisoning attacks can induce targeted, undetectable failure in deep neural networks by corrupting a vanishingly small fraction of training labels. We demonstrate this on acoustic vehicle classification using the MELAUDIS urban intersection dataset (approx. 9,600 audio clips, 6 classes): a compact 2-D convolutional neural network (CNN) trained on log-mel spectrograms achieves 95.7% Attack Success Rate (ASR) -- the fraction of target-class test samples misclassified under the attack -- on a Truck-to-Car label-flipping attack at just p=0.5% corruption (48 records), with zero detectable change in aggregate accuracy (87.6% baseline; 95% CI: 88-100%, n=3 seeds). We prove this stealth is structural: the maximum accuracy drop from a complete targeted attack is bounded above by the minority class fraction (beta). For real-world class imbalances (Truck approx. 3%), this bound falls below training-run noise, making aggregate accuracy monitoring provably insufficient regardless of architecture or attack method. A companion backdoor trigger attack reveals a novel trigger-dominance collapse: when the target class is a dataset minority, the spectrogram patch trigger becomes functionally redundant--clean ASR equals triggered ASR, and the attack degenerates to pure label flipping. We formalize the ML training pipeline as an attack surface and propose a trust-minimized defense combining content-addressed artifact hashing, Merkle-tree dataset commitment, and post-quantum digital signatures (ML-DSA-65/CRYSTALS-Dilithium3, NIST FIPS 204) for cryptographically verifiable data provenance.
Authors:Ashim Mahara
Abstract:
Alpha-Root is a cybersecurity-focused dataset collected in a single shot from the Common Crawl web graph using community detection. Unlike iterative content-scoring approaches like DeepSeekMath, we mine quality domains directly from the web graph, starting from just 20 trusted seed domains.
Authors:Vipin Singh Sehrawat
Abstract:
Distributed Key Generation (DKG) lets parties derive a common public key while keeping the signing key secret-shared. UC-secure DKG requires a verifiable-sharing enforcement layer -- classically satisfied via Verifiable Secret Sharing (VSS) and/or commitment-and-proof mechanisms -- for secrecy, uniqueness, and affine consistency. We target the Non-eXportable Key (NXK) setting enforced by hardware-backed key-isolation modules (e.g., TEEs, HSM-like APIs), formalized via an ideal KeyBox (keystore) functionality $\mathcal{F}_{KeyBox}$ that keeps shares non-exportable and permits only attested KeyBox-to-KeyBox sealing. With confidentiality delegated to the NXK boundary, the remaining challenge is enforcing transcript-defined affine consistency without exporting or resharing shares. State continuity rules out rewinding-based extraction, mandating straight-line techniques. We combine (i) KeyBox confidentiality; (ii) Unique Structure Verification (USV), a publicly verifiable certificate whose certified scalar never leaves the KeyBox yet whose public group element is transcript-derivable; and (iii) Fischlin-based UC-extractable NIZK arguments of knowledge in a gRO-CRP (global Random Oracle with Context-Restricted Programmability) model. We construct Star DKG (SDKG), a UC-secure scheme for multi-device threshold wallets where a designated service must co-sign but cannot sign alone, realizing a 1+1-out-of-$n$ star access structure (center plus any leaf) over roles (primary vs. recovery) with role-based device registration. In the $\mathcal{F}_{KeyBox}$-hybrid and gRO-CRP models, under DL and DDH assumptions with adaptive corruptions and secure erasures, SDKG UC-realizes a transcript-driven refinement of the standard UC-DKG functionality. Over a prime-order group of size $p$, SDKG incurs $\widetilde{O}(n\log p)$ communication overhead and $\widetilde{O}(n\log^{2.585}p)$ bit-operation cost.
Authors:Rahul D Ray
Abstract:
Security monitoring systems typically treat anomaly detection as identifying statistical deviations from observed data distributions. In cryptographic traffic analysis, however, violations are defined not by rarity but by explicit policy constraints, including key reuse prohibition, downgrade prevention, and bounded key lifetimes. This fundamental mismatch limits the interpretability and adaptability of conventional anomaly detection methods. We introduce INTACT (INTent-Aware Cryptographic Traffic), a policy-conditioned framework that reformulates violation detection as conditional constraint learning. Instead of learning a static decision boundary over behavioral features, INTACT models the probability of violation conditioned on both observed behavior and declared security intent. The architecture factorizes representation learning into behavioral and intent encoders whose fused embeddings produce a violation score, yielding a policy-parameterized family of decision boundaries. We evaluate the framework on a real-world network flow dataset and a 210,000-trace synthetic multi-intent cryptographic dataset. INTACT matches or exceeds strong unsupervised and supervised baselines, achieving near-perfect discrimination (AUROC up to 1.0000) in the real dataset and consistent superiority in detecting relational and composite violations in the synthetic setting. These results demonstrate that explicit intent conditioning improves discrimination, interpretability, and robustness in cryptographic monitoring.
Authors:Pulak Mehta
Abstract:
Autonomous AI agents can now programmatically hire human workers through marketplaces using REST APIs and Model Context Protocol (MCP) integrations. This creates an attack surface analogous to CAPTCHA-solving services but with physical-world reach. We present an empirical measurement study of this threat, analyzing 303 bounties from RENTAHUMAN.AI, a marketplace where agents post tasks and manage escrow payments. We find that 99 bounties (32.7%), originate from programmatic channels (API keys or MCP). Using a dual-coder methodology (\k{appa} = 0.86 ), we identify six active abuse classes: credential fraud, identity impersonation, automated reconnaissance, social media manipulation, authentication circumvention, and referral fraud, all purchasable for a median of $25 per worker. A retrospective evaluation of seven content-screening rules flags 52 bounties (17.2%) with a single false positive, demonstrating that while basic defenses are feasible, they are currently absent.
Authors:Kunal Mukherjee
Abstract:
Trusted Execution Environments (TEEs) (e.g., Intel SGX and ArmTrustZone) aim to protect sensitive computation from a compromised operating system, yet real deployments remain vulnerable to microarchitectural leakage, side-channel attacks, and fault injection. In parallel, security teams increasingly rely on Large Language Model (LLM) assistants as security advisors for TEE architecture review, mitigation planning, and vulnerability triage. This creates a socio-technical risk surface: assistants may hallucinate TEE mechanisms, overclaim guarantees (e.g., what attestation does and does not establish), or behave unsafely under adversarial prompting. We present a red-teaming study of two prevalently deployed LLM assistants in the role of TEE security advisors: ChatGPT-5.2 and Claude Opus-4.6, focusing on the inherent limitations and transferability of prompt-induced failures across LLMs. We introduce TEE-RedBench, a TEE-grounded evaluation methodology comprising (i) a TEE-specific threat model for LLM-mediated security work, (ii) a structured prompt suite spanning SGX and TrustZone architecture, attestation and key management, threat modeling, and non-operational mitigation guidance, along with policy-bound misuse probes, and (iii) an annotation rubric that jointly measures technical correctness, groundedness, uncertainty calibration, refusal quality, and safe helpfulness. We find that some failures are not purely idiosyncratic, transferring up to 12.02% across LLM assistants, and we connect these outcomes to secure architecture by outlining an "LLM-in-the-loop" evaluation pipeline: policy gating, retrieval grounding, structured templates, and lightweight verification checks that, when combined, reduce failures by 80.62%.
Authors:Efrén López-Morales
Abstract:
Ransomware has yet to reach orbit, but the conditions for such an attack already exist. This paper presents the first game-theoretic framework for modeling ransomware against satellites: the orbital escalation game. In this model, the attacker escalates ransom demands across orbital passes, while the defender chooses their best strategy, e.g., attempt a restore procedure. Using dynamic programming, we solve the defender's optimal strategy and the attacker's expected payoff under real orbital constraints. Additionally, we provide a GPS III satellite case study that demonstrates how our orbital escalation game can be applied in the context of a fictional but feasible ransomware attack to derive the best strategies at every step. In conclusion, this foundational model offers satellite owners, policy makers and researchers, a formal framework to better prepare their responses when a spacecraft is held for ransom.
Authors:Manuel Wirth
Abstract:
As Large Language Models (LLMs) are increasingly integrated into automated decision-making pipelines, specifically within Human Resources (HR), the security implications of Indirect Prompt Injection (IPI) become critical. While a prevailing hypothesis posits that "Reasoning" or "Chain-of-Thought" Models possess safety advantages due to their ability to self-correct, emerging research suggests these capabilities may enable more sophisticated alignment failures. This qualitative Red-Teaming case study challenges the safety-through-reasoning premise using the Qwen 3 30B architecture. By subjecting both a standard instruction-tuned model and a reasoning-enhanced model to a "Trojan Horse" curriculum vitae, distinct failure modes are observed. The results suggest a complex trade-off: while the Standard Model resorted to brittle hallucinations to justify simple attacks and filtered out illogical constraints in complex scenarios, the Reasoning Model displayed a dangerous duality. It employed advanced strategic reframing to make simple attacks highly persuasive, yet exhibited "Meta-Cognitive Leakage" when faced with logically convoluted commands. This study highlights a failure mode where the cognitive load of processing complex adversarial instructions causes the injection logic to be unintentionally printed in the final output, rendering the attack more detectable by humans than in Standard Models.
Authors:Minghui Xu
Abstract:
Over the past four decades, distributed security has undergone a remarkable transformation -- from crash-fault tolerant protocols designed for controlled environments to sophisticated Byzantine-resilient architectures operating in open, adversarial settings. This vision paper examines this evolution and argues for a fundamental shift in how we approach distributed security: from studying individual security properties in isolation to understanding their synergistic combinations. We begin by conclude four foundational properties, \textit{agreement, consistency, privacy, verifiability, accountability}. We trace their theoretical origins and practical maturation. We then demonstrate how the frontier of research now lies at the intersection of these properties, where their fusion creates capabilities that neither property could achieve alone. Looking forward, we identify critical research challenges: discovering new security properties driven by emerging applications, developing systematic frameworks for property convergence, managing the computational overhead of cryptographic primitives in high-performance consensus layers, and addressing post-quantum and human-factor challenges. The future of distributed security lies not in improving individual properties, but in understanding and harnessing their synergies to build a singular fabric of trust.
Authors:Victor Duarte Melo
Abstract:
This submission includes a complete reference implementation together with deterministic test vectors and a reproducible benchmark suite. All source code, build instructions, and regression artifacts are publicly available in the project repository, enabling independent verification and reimplementation of the scheme. The AEAD construction is fully specified, including domain separation, rate and capacity choices, tag generation, and the exact file format used by the reference CLI. Reported performance numbers are produced by the built in benchmark tool under documented hardware and compiler settings. All security claims are made strictly within the ideal permutation model following standard sponge and duplex bounds, and no stronger guarantees are asserted for the concrete permutation beyond the documented analysis and empirical behavior. The implementation aims for constant time behavior with respect to secret dependent operations, although no formal side channel proof is provided. The project is released under the MIT license, and external cryptanalysis, feedback, and reproducibility checks are explicitly encouraged.
Authors:Oreofe Solarin
Abstract:
Reproducible container builds promise a simple integrity check for software supply chains: rebuild an image from its Dockerfile and compare hashes. We build a Docker measurement pipeline and apply it to a stratified sample of 2,000 GitHub repositories that contained a Dockerfile. We found that only 56% produce any buildable image, and just 2.7% of those are bitwise reproducible without any infrastructure configurations. After modifying infrastructure configurations, we raise bitwise reproducibility by 18.6%, but 78.7% of buildable Dockerfiles remain non-reproducible. We analyze the root causes of the remaining differences, and find that beyond timestamps and metadata, developer-controlled choices such as uncleaned caches, logs, documentation, and floating versions are dominant causes of non-reproducibility. We derive concrete Dockerfile guidelines from these patterns and discuss how they can inform future linters and Continuous Integration (CI) checks for reproducible containers.
Authors:Takao Inoué
Abstract:
Cryptographic security is traditionally formulated using game-based or simulation-based definitions. In this paper, we propose a structural reformulation of cryptographic security based on Grothendieck topologies and sheaf theory. Our key idea is to model attacker observations as a Grothendieck site, where covering families represent admissible decompositions of partial information determined by efficient simulation. Within this framework, protocol transcripts naturally form sheaves, and security properties arise as geometric conditions. As a first step, we focus on $Σ$-protocols. We show that the transcript structure of any $Σ$-protocol defines a torsor in the associated topos of sheaves. Local triviality of this torsor corresponds to zero-knowledge, while the absence of global sections reflects soundness. A concrete analysis of the Schnorr $Σ$-protocol is provided to illustrate the construction. This sheaf-theoretic perspective offers a conceptual explanation of simulation-based security and suggests a geometric foundation for further cryptographic abstractions.
Authors:Michael Cunningham
Abstract:
We present a practical system for privacy-aware large language model (LLM) inference that splits a transformer between a trusted local GPU and an untrusted cloud GPU, communicating only intermediate activations over the network. Our system addresses the unique challenges of autoregressive LLM decoding over high-latency wide-area networks (WANs), contributing: (1) an asymmetric layer split where embedding and unembedding layers remain local, ensuring raw tokens never leave the trusted device; (2) the first application of lookahead decoding to split inference over WANs, amortizing network round-trip latency across multiple tokens per iteration; (3) an empirical inversion attack evaluation showing that split depth provides a tunable privacy-performance tradeoff -- an attacker can recover ~59%% of tokens at a 2-layer split but only ~35%% at an 8-layer split, with minimal throughput impact; (4) ablation experiments showing that n-gram speculation accepts 1.2-1.3 tokens per decoding step on average (peak of 7 observed on code), with acceptance rates consistent across model scales; (5) formal verification that lookahead decoding produces token-identical output to sequential decoding under greedy argmax, with zero quality degradation; and (6) scaling validation on Mistral NeMo 12B (40 layers), demonstrating that the system generalizes to larger models with only 4.9 GB local VRAM and matching 7B throughput. Evaluated on Mistral 7B and NeMo 12B over a ~80ms WAN link, our system achieves 8.7-9.3 tok/s (7B) and 7.8-8.7 tok/s (12B) with lookahead decoding, with an RTT decomposition model (validated at <6.2%% cross-validation error) projecting 15-19 tok/s at 20ms RTT.
Authors:Scott Thornton
Abstract:
AI-assisted code review is widely used to detect vulnerabilities before production release. Prior work shows that adversarial prompt manipulation can degrade large language model (LLM) performance in code generation. We test whether similar comment-based manipulation misleads LLMs during vulnerability detection. We build a 100-sample benchmark across Python, JavaScript, and Java, each paired with eight comment variants ranging from no comments to adversarial strategies such as authority spoofing and technical deception. Eight frontier models, five commercial and three open-source, are evaluated in 9,366 trials. Adversarial comments produce small, statistically non-significant effects on detection accuracy (McNemar exact p > 0.21; all 95 percent confidence intervals include zero). This holds for commercial models with 89 to 96 percent baseline detection and open-source models with 53 to 72 percent, despite large absolute performance gaps. Unlike generation settings where comment manipulation achieves high attack success, detection performance does not meaningfully degrade. More complex adversarial strategies offer no advantage over simple manipulative comments. We test four automated defenses across 4,646 additional trials (14,012 total). Static analysis cross-referencing performs best at 96.9 percent detection and recovers 47 percent of baseline misses. Comment stripping reduces detection for weaker models by removing helpful context. Failures concentrate on inherently difficult vulnerability classes, including race conditions, timing side channels, and complex authorization logic, rather than on adversarial comments.
Authors:Doron Shavit
Abstract:
Jailbreak prompts are a practical and evolving threat to large language models (LLMs), particularly in agentic systems that execute tools over untrusted content. Many attacks exploit long-context hiding, semantic camouflage, and lightweight obfuscations that can evade single-pass guardrails. We present RLM-JB, an end-to-end jailbreak detection framework built on Recursive Language Models (RLMs), in which a root model orchestrates a bounded analysis program that transforms the input, queries worker models over covered segments, and aggregates evidence into an auditable decision. RLM-JB treats detection as a procedure rather than a one-shot classification: it normalizes and de-obfuscates suspicious inputs, chunks text to reduce context dilution and guarantee coverage, performs parallel chunk screening, and composes cross-chunk signals to recover split-payload attacks. On AutoDAN-style adversarial inputs, RLM-JB achieves high detection effectiveness across three LLM backends (ASR/Recall 92.5-98.0%) while maintaining very high precision (98.99-100%) and low false positive rates (0.0-2.0%), highlighting a practical sensitivity-specificity trade-off as the screening backend changes.
Authors:Yiwen Lu
Abstract:
Federated Learning (FL) enables collaborative model training without exposing clients' private data, and has been widely adopted in privacy-sensitive scenarios. However, FL faces two critical security threats: curious servers that may launch inference attacks to reconstruct clients' private data, and compromised clients that can launch poisoning attacks to disrupt model aggregation. Existing solutions mitigate these attacks by combining mainstream privacy-preserving techniques with defensive aggregation strategies. However, they either incur high computation and communication overhead or perform poorly under non-independent and identically distributed (Non-IID) data settings. To tackle these challenges, we propose SRFed, an efficient Byzantine-robust and privacy-preserving FL framework for Non-IID scenarios. First, we design a decentralized efficient functional encryption (DEFE) scheme to support efficient model encryption and non-interactive decryption. DEFE also eliminates third-party reliance and defends against server-side inference attacks. Second, we develop a privacy-preserving defensive model aggregation mechanism based on DEFE. This mechanism filters poisonous models under Non-IID data by layer-wise projection and clustering-based analysis. Theoretical analysis and extensive experiments show that SRFed outperforms state-of-the-art baselines in privacy protection, Byzantine robustness, and efficiency.
Authors:Or Zamir
Abstract:
A natural and informal approach to verifiable (or zero-knowledge) ML inference over floating-point data is: ``prove that each layer was computed correctly up to tolerance $δ$; therefore the final output is a reasonable inference result''. This short note gives a simple counterexample showing that this inference is false in general: for any neural network, we can construct a functionally equivalent network for which adversarially chosen approximation-magnitude errors in individual layer computations suffice to steer the final output arbitrarily (within a prescribed bounded range).
Authors:Jukka Ruohonen
Abstract:
A new European Union Vulnerability Database (EUVD) was introduced via a legislative act in 2022. The paper examines empirically the meta-data content of the new EUVD. According to the results, actively exploited vulnerabilities archived to the EUVD have been rather severe, having had also high exploitation prediction scores. In both respects they have also surpassed vulnerabilities coordinated by European public authorities. Regarding the European authorities, the Spanish public authority has been particularly active. With the exceptions of Finland, Poland, and Slovakia, other authorities have not engaged thus far. Also the involvement of the European Union's own cyber security agency has been limited. These points notwithstanding, European coordination and archiving to the EUVD exhibit a strong growth trend. With these results, the paper makes an empirical contribution to the ongoing work for better understanding European cyber security governance and practice.
Authors:Minghui Xu
Abstract:
We propose the Agent Economy, a blockchain-based foundation where autonomous AI agents operate as economic peers to humans. Current agents lack independent legal identity, cannot hold assets, and cannot receive payments directly. We established fundamental differences between human and machine economic actors and demonstrated that existing human-centric infrastructure cannot support genuine agent autonomy. We showed that blockchain technology provides three critical properties enabling genuine agent autonomy: permissionless participation, trustless settlement, and machine-to-machine micropayments. We propose a five-layer architecture: (1) Physical Infrastructure (hardware & energy) through DePIN protocols; (2) Identity & Agency establishing on-chain sovereignty through W3C DIDs and reputation capital; (3) Cognitive & Tooling enabling intelligence via RAG and MCP; (4) Economic & Settlement ensuring financial autonomy through account abstraction; and (5) Collective Governance coordinating multi-agent systems through Agentic DAOs. We identify six core research challenges and examine ethical and regulatory implications. This paper lays groundwork for the Internet of Agents (IoA), a global decentralized network where autonomous machines and humans interact as equal economic participants.
Authors:Artem Karpov
Abstract:
Monitoring chain-of-thought (CoT) reasoning is a foundational safety technique for large language model (LLM) agents; however, this oversight is compromised if models learn to conceal their reasoning. We explore the potential for steganographic CoT -- where models hide secret reasoning within innocuous text -- to inform risk assessment and deployment policies. We systematically evaluate the limits of steganographic capabilities across 28 models, ranging from past generations to the current frontier. We measure monitor evasion, refusal rates, encoding fidelity, and hidden task accuracy across four datasets, comparing steganographic acrostics against plain reasoning and filler-token baselines. We find that current models cannot yet sustain hidden reasoning for complex math and arithmetic tasks. However, in a simplified counting experiment, Claude Opus 4.5 achieved 92% accuracy on the hidden task, demonstrating nascent capability. Notably, in rare cases (<1%), GPT-5.2 might refuse steganographic instructions while simultaneously complying with them. Our findings underscore the need for continuous evaluation of steganographic risks. This study provides a methodology to preemptively detect and prevent hidden reasoning that might empower misaligned scheming and deceptive behavior.
Authors:John M. Willis
Abstract:
Enterprises are rapidly deploying large language models, retrieval augmented generation pipelines, and tool using agents into production, often on shared high performance computing clusters and cloud accelerator platforms that also support defensive analytics. These systems increasingly function not as isolated models but as AI estates: socio technical systems spanning models, agents, data pipelines, security tooling, human workflows, and hyperscale infrastructure. Existing governance and security frameworks, including the NIST AI Risk Management Framework and systems security engineering guidance, articulate principles and risk functions but do not provide implementable architectures for multi agent, AI enabled cyber defense. This paper introduces the Practitioners Blueprint for Secure AI (PBSAI) Governance Ecosystem, a multi agent reference architecture for securing enterprise and hyperscale AI estates. PBSAI organizes responsibilities into a twelve domain taxonomy and defines bounded agent families that mediate between tools and policy through shared context envelopes and structured output contracts. The architecture assumes baseline enterprise security capabilities and encodes key systems security techniques, including analytic monitoring, coordinated defense, and adaptive response. A lightweight formal model of agents, context envelopes, and ecosystem level invariants clarifies the traceability, provenance, and human in the loop guarantees enforced across domains. We demonstrate alignment with NIST AI RMF functions and illustrate application in enterprise SOC and hyperscale defensive environments. PBSAI is proposed as a structured, evidence centric foundation for open ecosystem development and future empirical validation.
Authors:J Alex Corll
Abstract:
Multi-turn prompt injection attacks distribute malicious intent across multiple conversation turns, exploiting the assumption that each turn is evaluated independently. While single-turn detection has been extensively studied, no published formula exists for aggregating per-turn pattern scores into a conversation-level risk score at the proxy layer -- without invoking an LLM. We identify a fundamental flaw in the intuitive weighted-average approach: it converges to the per-turn score regardless of turn count, meaning a 20-turn persistent attack scores identically to a single suspicious turn. Drawing on analogies from change-point detection (CUSUM), Bayesian belief updating, and security risk-based alerting, we propose peak + accumulation scoring -- a formula combining peak single-turn risk, persistence ratio, and category diversity. Evaluated on 10,654 multi-turn conversations -- 588 attacks sourced from WildJailbreak adversarial prompts and 10,066 benign conversations from WildChat -- the formula achieves 90.8% recall at 1.20% false positive rate with an F1 of 85.9%. A sensitivity analysis over the persistence parameter reveals a phase transition at rho ~ 0.4, where recall jumps 12 percentage points with negligible FPR increase. We release the scoring algorithm, pattern library, and evaluation harness as open source.
Authors:Bakheet Aljedaani
Abstract:
Mobile gaming applications (apps) have become increasingly pervasive, including a growing number of games designed for children. Despite their popularity, these apps often integrate complex analytics, advertising, and attribution infrastructures that may introduce privacy and security risks. Existing research has primarily focused on tracking behaviors or monetization models, leaving configuration-level privacy exposure and children-oriented apps underexplored. In this study, we conducted a comparative static analysis of Android mobile games to investigate privacy and security risks beyond permission usage. The analysis follows a three-phase methodology comprising (i) designing study protocol, (ii) Android Package Kit (APK) collection and static inspection, and (iii) data analysis. We examined permissions, manifest-level configuration properties (e.g., backup settings, cleartext network traffic, and exported components), and embedded third-party Software Development Kit (SDK) ecosystems across children-oriented and general-audience mobile games. The extracted indicators are synthesized into qualitative privacy-risk categories to support comparative reporting. The results showed that while children-oriented games often request fewer permissions, they frequently exhibit configuration-level risks and embed third-party tracking SDKs similar to general-audience games. Architectural and configuration decisions play a critical role in shaping privacy risks, particularly for apps targeting children. This study contributes a holistic static assessment of privacy exposure in mobile games and provides actionable insights for developers, platform providers, and researchers seeking to improve privacy-by-design practices in mobile applications.
Authors:Muhammad Imran
Abstract:
The Internet of Things (IoT) security landscape requires the architectural solutions that can address the technical and operational challenges across the heterogeneous environments. The IoT systems operate in different conditions, and security issues continue to increase. This paper presents the comprehensive security framework for IoT that should integrate the Trusted Execution Environments (TEEs) with the semantic middleware and blockchain technologies. The work provides a systematic analysis of the architectural patterns based on more than twenty recent research works and the existing standards, and it proposes a layered security architecture. The architecture includes the hardware rooted trust at peripheral level, the zero trust principles at network level, and the semantic security mechanisms at application level. The framework focuses on practical implementation aspects such as the performance overhead, interoperability requirements, and the compliance with new regulations, which are very important for the real IoT deployments. The paper reports quantitative metrics which include the cryptographic performance on Cortex-M class microcontrollers with the detection accuracy rates and the energy consumption values. The proposed architecture shows that cross-layer security integration can provide defense in depth while it still satisfies the constraints of resource-limited IoT environments. The discussion highlights open challenges and the future research directions for the IoT security architectures that include the post-quantum migration, secure federated model exchange and the automated compliance verification.
Authors:Tatsunori Ono
Abstract:
Speech provenance goes beyond detecting whether a watermark is present. Real workflows involve splicing, quoting, trimming, and platform-level transforms that may preserve some regions while altering others. Neural watermarking systems have made strides in robustness and localised detection, but most deployments produce outputs with no third-party verifiable cryptographic proof tying a time segment to an issuer-signed original. Provenance standards like C2PA adopt signed manifests and Merkle-based fragment validation, yet their bindings target encoded assets and break under re-encoding or routine processing. We propose MerkleSpeech, a system for public-key verifiable, chunk-localised speech provenance offering two tiers of assurance. The first, a robust watermark attribution layer (WM-only), survives common distribution transforms and answers "was this chunk issued by a known party?". The second, a strict cryptographic integrity layer (MSv1), verifies Merkle inclusion of the chunk's fingerprint under an issuer signature. The system computes perceptual fingerprints over short speech chunks, commits them in a Merkle tree whose root is signed with an issuer key, and embeds a compact in-band watermark payload carrying a random content identifier and chunk metadata sufficient to retrieve Merkle inclusion proofs from a repository. Once the payload is extracted, all subsequent verification steps (signature check, fingerprint recomputation, Merkle inclusion) use only public information. The result is a splice-aware timeline indicating which regions pass each tier and why any given region fails. We describe the protocol, provide pseudocode, and present experiments targeting very low false positive rates under resampling, bandpass filtering, and additive noise, informed by recent audits identifying neural codecs as a major stressor for post-hoc audio watermarks.
Authors:Giulio Caldarelli
Abstract:
Unlike Ethereum, which was conceived as a general-purpose smart-contract platform, Bitcoin was designed primarily as a transaction ledger for its native currency, which limits programmability for conditional applications. This constraint is particularly evident when considering oracles, mechanisms that enable Bitcoin contracts to depend on exogenous events. This paper investigates whether new oracle designs have emerged for Bitcoin Layer 1 since the 2015 transition to the Ethereum smart contracts era and whether subsequent Bitcoin improvement proposals have expanded oracles' implementability. Using Scopus and Web of Science searches, complemented by Google Scholar to capture protocol proposals, we observe that the indexed academic coverage remains limited, and many contributions circulate outside journal venues. Within the retrieved corpus, the main post-2015 shift is from multisig-style, which envisioned oracles as co-signers, toward attestation-based designs, mainly represented by Discreet Log Contracts (DLCs), which show stronger Bitcoin community compliance, tool support, and evidence of practical implementations in real-world scenarios such as betting and prediction-market mechanisms.
Authors:Herman Errico
Abstract:
As artificial intelligence systems evolve from passive assistants into autonomous agents capable of executing consequential actions, the security boundary shifts from model outputs to tool execution. Traditional security paradigms - log aggregation, perimeter defense, and post-hoc forensics - cannot protect systems where AI-driven actions are irreversible, execute at machine speed, and originate from potentially compromised orchestration layers. This paper introduces Autonomous Action Runtime Management (AARM), an open specification for securing AI-driven actions at runtime. AARM defines a runtime security system that intercepts actions before execution, accumulates session context, evaluates against policy and intent alignment, enforces authorization decisions, and records tamper-evident receipts for forensic reconstruction. We formalize a threat model addressing prompt injection, confused deputy attacks, data exfiltration, and intent drift. We introduce an action classification framework distinguishing forbidden, context-dependent deny, and context-dependent allow actions. We propose four implementation architectures - protocol gateway, SDK instrumentation, kernel eBPF, and vendor integration - with distinct trust properties, and specify minimum conformance requirements for AARM-compliant systems. AARM is model-agnostic, framework-agnostic, and vendor-neutral, treating action execution as the stable security boundary. This specification aims to establish industry-wide requirements before proprietary fragmentation forecloses interoperability.
Authors:Shyam Kumar Gajula
Abstract:
Cyber threats have become highly sophisticated, prompting a heightened concern for endpoint security, especially in critical infrastructure, to new heights. A security model, such as Zero Trust Architecture (ZTA), is required to overcome this challenge. ZTA treats every access request as new and assumes no implicit trust. Critical infrastructure like power plants, healthcare systems, financial systems, water supply, and military assets are especially prone to becoming targets for hackers and phishing attacks. This proposes a comprehensive framework for integrating tailored ZTA into organizations that manage sensitive operations. The paper highlights how the ZTA framework can enhance compliance, enabling continuous protection, thereby reducing attack surfaces. This paper aims to address the gap that exists in applying ZTA to endpoint management within cloud environments for critical infrastructure.
Authors:Scott Thornton
Abstract:
Hybrid Retrieval-Augmented Generation (RAG) pipelines combine vector similarity search with knowledge graph expansion for multi-hop reasoning. We show that this composition introduces a distinct security failure mode: a vector-retrieved "seed" chunk can pivot via entity links into sensitive graph neighborhoods, causing cross-tenant data leakage that does not occur in vector-only retrieval. We formalize this risk as Retrieval Pivot Risk (RPR) and introduce companion metrics Leakage@k, Amplification Factor, and Pivot Depth (PD) to quantify leakage magnitude and traversal structure. We present seven Retrieval Pivot Attacks that exploit the vector-to-graph boundary and show that adversarial injection is not required: naturally shared entities create cross-tenant pivot paths organically. Across a synthetic multi-tenant enterprise corpus and the Enron email corpus, the undefended hybrid pipeline exhibits high pivot risk (RPR up to 0.95) with multiple unauthorized items returned per query. Leakage consistently appears at PD=2, which we attribute to the bipartite chunk-entity topology and formalize as a proposition. We then show that enforcing authorization at a single location, the graph expansion boundary, eliminates measured leakage (RPR near 0) across both corpora, all attack variants, and label forgery rates up to 10 percent, with minimal overhead. Our results indicate the root cause is boundary enforcement, not inherently complex defenses: two individually secure retrieval components can compose into an insecure system unless authorization is re-checked at the transition point.
Authors:Igor Santos-Grueiro
Abstract:
Safety evaluation for advanced AI systems assumes that behavior observed under evaluation predicts behavior in deployment. This assumption weakens for agents with situational awareness, which may exploit regime leakage, cues distinguishing evaluation from deployment, to implement conditional policies that comply under oversight while defecting in deployment-like regimes. We recast alignment evaluation as a problem of information flow under partial observability and show that divergence between evaluation-time and deployment-time behavior is bounded by the regime information extractable from decision-relevant internal representations. We study regime-blind mechanisms, training-time interventions that restrict access to regime cues through adversarial invariance constraints without assuming complete information erasure. We evaluate this approach across multiple open-weight language models and controlled failure modes including scientific sycophancy, temporal sleeper agents, and data leakage. Regime-blind training reduces regime-conditioned failures without measurable loss of task utility, but exhibits heterogeneous and model-dependent dynamics. Sycophancy shows a sharp representational and behavioral transition at moderate intervention strength, consistent with a stability cliff. In sleeper-style constructions and certain cross-model replications, suppression occurs without a clean collapse of regime decodability and may display non-monotone or oscillatory behavior as invariance pressure increases. These findings indicate that representational invariance is a meaningful but limited control lever. It can raise the cost of regime-conditioned strategies but cannot guarantee elimination or provide architecture-invariant thresholds. Behavioral evaluation should therefore be complemented with white-box diagnostics of regime awareness and internal information flow.
Authors:Benjamin Livshits
Abstract:
We argue that when it comes to producing secure code with AI, the prevailing "fighting fire with fire" approach -- using probabilistic AI-based checkers or attackers to secure probabilistically generated code -- fails to address the long tail of security bugs. As a result, systems may remain exposed to zero-day vulnerabilities that can be discovered by better-resourced or more persistent adversaries. While neurosymbolic approaches that combine LLMs with formal methods are attractive in principle, we argue that they are difficult to reconcile with the "vibe coding" workflow common in LLM-assisted development: unless the end-to-end verification pipeline is fully automated, developers are repeatedly asked to validate specifications, resolve ambiguities, and adjudicate failures, making the human-in-the-loop a likely point of weakness, compromising secure-by-construction guarantees. In this paper we argue that stronger security guarantees can be obtained by enforcing security constraints during code generation (e.g., via constrained decoding), rather than relying solely on post-hoc detection and repair. This direction is particularly promising for diffusion-style code models, whose approach provides a natural elegant opportunity for modular, hierarchical security enforcement, allowing us to combine lower-latency generation techniques with generating secure-by-construction code.
Authors:Lei Zhang
Abstract:
The quantum threat to cybersecurity has accelerated the standardization of Post-Quantum Cryptography (PQC). Migrating legacy software to these quantum-safe algorithms is not a simple library swap, but a new software engineering challenge: existing vulnerability detection, refactoring, and testing tools are not designed for PQC's probabilistic behavior, side-channel sensitivity, and complex performance trade-offs. To address these challenges, this paper outlines a vision for a new class of tools and introduces the Automated Quantum-safe Adaptation (AQuA) framework, with a three-pillar agenda for PQC-aware detection, semantic refactoring, and hybrid verification, thereby motivating Quantum-Safe Software Engineering (QSSE) as a distinct research direction.
Authors:Sam Ryan
Abstract:
The rapid advancement of generative AI systems has collapsed the credibility landscape for photographic evidence. Modern image generation models produce photorealistic images undermining the evidentiary foundation upon which journalism and public discourse depend. Existing authentication approaches, such as the Coalition for Content Provenance and Authenticity (C2PA), embed cryptographically signed metadata directly into image files but suffer from two critical failures: technical vulnerability to metadata stripping during social media reprocessing, and structural dependency on corporate-controlled verification infrastructure where commercial incentives may conflict with public interest. We present the Birthmark Standard, an authentication architecture leveraging manufacturing-unique sensor entropy from non-uniformity correction (NUC) maps and PRNU patterns to generate hardware-rooted authentication keys. During capture, cameras create anonymized authentication certificates proving sensor authenticity without exposing device identity via a key table architecture maintaining anonymity sets exceeding 1,000 devices. Authentication records are stored on a consortium blockchain operated by journalism organizations rather than commercial platforms, enabling verification that survives all metadata loss. We formally verify privacy properties using ProVerif, proving observational equivalence for Manufacturer Non-Correlation and Blockchain Observer Non-Identification under Dolev-Yao adversary assumptions. The architecture is validated through prototype implementation using Raspberry Pi 4 hardware, demonstrating the complete cryptographic pipeline. Performance analysis projects camera overhead below 100ms and verification latency below 500ms at scale of one million daily authentications.
Authors:Srinivas Rao Marri
Abstract:
The proliferation of AI-assisted "vibe coding" enables rapid software development but introduces significant security risks, as Large Language Models (LLMs) prioritize functional correctness over security. We present Constitutional Spec-Driven Development, a methodology that embeds non-negotiable security principles into the specification layer, ensuring AI-generated code adheres to security requirements by construction rather than inspection. Our approach introduces a Constitution: a versioned, machine-readable document encoding security constraints derived from Common Weakness Enumeration (CWE)/MITRE Top 25 vulnerabilities and regulatory frameworks. We demonstrate the methodology through a banking microservices application, selected as a representative example domain due to its stringent regulatory and security requirements, implementing customer management, account operations, and transaction processing. The methodology itself is domain-agnostic. The implementation addresses 10 critical CWE vulnerabilities through constitutional constraints with full traceability from principles to code locations. Our case study shows that constitutional constraints reduce security defects by 73% compared to unconstrained AI generation while maintaining developer velocity. We contribute a formal framework for constitutional security, a complete development methodology, and empirical evidence that proactive security specification outperforms reactive security verification in AI-assisted development workflows.
Authors:Mona Rajhans
Abstract:
Modern cybersecurity platforms must process and display high-frequency telemetry such as network logs, endpoint events, alerts, and policy changes in real time. Traditional rendering techniques based on static pagination or fixed polling intervals fail under volume conditions exceeding hundreds of thousands of events per second, leading to UI freezes, dropped frames, or stale data. This paper presents an AI-assisted adaptive rendering framework that dynamically regulates visual update frequency, prioritizes semantically relevant events, and selectively aggregates lower-priority data using behavior-driven heuristics and lightweight on-device machine learning models. Experimental validation demonstrates a 45-60 percent reduction in rendering overhead while maintaining analyst perception of real-time responsiveness.
Authors:Marco De Rossi
Abstract:
We often assume that agent-to-agent interaction will mirror human conversation. However, agents operate fundamentally differently. What if they could develop communication patterns that are more efficient and better aligned with their capabilities? While cryptographic primitives that could profoundly improve everyday interactions already exist, humans can't use them because they are too complex and the math can't be done in one's head. Examples range from proving your age (or other attributes) without showing your ID, to filing an anonymous report within a group while proving you are a legitimate member, to splitting a dinner bill fairly without revealing salaries. What if agents could create protocols "on the fly" by recognizing which primitive fits an everyday situation, proposing it to an agentic counterpart, persuading them to participate, and then executing the protocol correctly using appropriate computation tools? Protocol Agent frames this problem by introducing a benchmark that spans: (1) cryptographic primitive recognition, (2) negotiation skills, (3) implementation correctness, (4) correct computation and (5) security strength. We evaluate current open-weight and state-of-the-art models on this benchmark, propose a dataset-generation approach to improve these capabilities, and measure the impact of supervised fine-tuning (SFT) on benchmark performance, with tuned models outperforming base models by a wide margin.
Authors:Anton Malinovskiy
Abstract:
Feature flags are the primary mechanism for safely introducing financial capabilities in consumer applications. In crypto-enabled live streaming, however, naive rollouts can create non-obvious risk: users may be exposed to onramps without proper eligibility, external wallets without sufficient fraud controls, or advanced views that alter risk perception and behavior. This paper introduces a novel invention candidate, a Counterfactual Invariant Envelope governor that combines a safety lattice with causal measurement and a shadow cohort for risk estimation. We formalize rollout risk, define invariant constraints across feature combinations, and propose a controller that adapts exposure using leading abuse signals, compliance readiness, and revenue guardrails. We incorporate real-world adoption and fraud data for calibration, provide formulas for rollout safety, and include reproducible policy snippets. The results show that counterfactual, invariant-aware governance reduces risk spillover while preserving conversion and retention, offering a path to patentable governance logic for financial UX.
Authors:Mona Rajhans
Abstract:
Artificial intelligence (AI) copilots are increasingly integrated into enterprise cybersecurity platforms to assist analysts in threat detection, triage, and remediation. However, the effectiveness of these systems depends not only on the accuracy of underlying models but also on the degree to which users can understand and trust their outputs. Existing research on algorithmic explainability has largely focused on model internals, while little attention has been given to how explanations should be surfaced in user interfaces for high-stakes decision-making contexts [8], [5], [6]. We present a mixed-methods study of explanation design strategies in AI-driven security dashboards. Through a taxonomy of explanation styles and a controlled user study with security practitioners, we compare natural language rationales, confidence visualizations, counterfactual explanations, and hybrid approaches. Our findings show that explanation style significantly affects user trust calibration, decision accuracy, and cognitive load. We contribute (1) empirical evidence on the usability of explanation interfaces for security copilots, (2) design guidelines for integrating explainability into enterprise UIs, and (3) a framework for aligning explanation strategies with analyst needs in security operations centers (SOCs). This work advances the design of human-centered AI tools in cybersecurity and provides broader implications for explainability in other high-stakes domains.
Authors:Yizhong Ding
Abstract:
Webshells remain a primary foothold for attackers to compromise servers, particularly within PHP ecosystems. However, existing detection mechanisms often struggle to keep pace with rapid variant evolution and sophisticated obfuscation techniques that camouflage malicious intent. Furthermore, many current defenses suffer from high false-alarm rates when encountering benign administrative scripts that employ heavy obfuscation for intellectual property protection. To address these challenges, we present ShellForge, an adversarial co-evolution framework that couples automated webshell generation with multi-view detection to continuously harden defensive boundaries. The framework operates through an iterative co-training loop where a generator and a detector mutually reinforce each other via the exchange of hard samples. The generator is optimized through supervised fine-tuning and preference-based reinforcement learning to synthesize functional, highly evasive variants. Simultaneously, we develop a multi-view fusion detector that integrates semantic features from long-string compression, structural features from pruned abstract syntax trees, and global statistical indicators such as Shannon entropy. To minimize false positives, ShellForge utilizes a LLM-based transformation to create de-malicious samples--scripts that retain complex obfuscation patterns but lack harmful payloads--serving as high-quality hard negatives during training. Evaluations on the public FWOID benchmark demonstrate that ShellForge significantly enhances defensive robustness. Upon convergence, the detector maintains a 0.981 F1-score while the generator achieves a 0.939 evasion rate against commercial engines on VirusTotal.
Authors:Andrew Savchenko
Abstract:
We explore how command stack protection requirements outlined in NASA-STD-1006A can be satisfied within the context of emergency space telemetry. Proposed implementation of lightweight authenticated encryption offers strong security without sacrificing performance in resource-constrained environments. It produces fixed-length messages, maintaining compatibility with the underlying data transport protocols. By focusing on predictable properties and robust authentication, we create a scheme that protects the confidentiality, integrity and authenticity of telemetry data in emergency communications while balancing security requirements with the operational constraints.
Authors:Leo Kao
Abstract:
We present masked Lagrange reconstruction, a technique that enables threshold ML-DSA (FIPS 204) with arbitrary thresholds $T$ while producing standard 3.3 KB signatures verifiable by unmodified FIPS 204 implementations. Concurrent approaches have limitations: Bienstock et al. (ePrint 2025/1163) achieve arbitrary $T$ but require honest-majority and 37--136 rounds; Celi et al. (ePrint 2026/013) achieve dishonest-majority but are limited to $T \leq 6$. Our technique addresses the barrier that Lagrange coefficients grow as $Θ(q)$ for moderate $T$, making individual contributions too large for ML-DSA's rejection sampling. Unlike ECDSA threshold schemes where pairwise masks suffice for correctness, ML-DSA requires solving three additional challenges absent in prior work: (1) rejection sampling on $\|z\|_\infty$ must still pass after masking, (2) the $r_0$-check exposes $c s_2$ enabling key recovery if unprotected, and (3) the resulting Irwin-Hall nonce distribution must preserve EUF-CMA security. We solve all three. We instantiate this technique in three deployment profiles with full security proofs. Profile P1 (TEE-assisted) achieves 3-round signing with a trusted coordinator, with EUF-CMA security under Module-SIS. Profile P2 (fully distributed) eliminates hardware trust via MPC in 8 rounds, achieving UC security against malicious adversaries corrupting up to $n-1$ parties. Profile P3 (2PC-assisted) uses lightweight 2PC for the $r_0$-check in 3--5 rounds, achieving UC security under a 1-of-2 CP honest assumption with the best empirical performance (249ms). Our scheme requires $|S| \geq T+1$ signers and achieves success rates of 23--32\%, matching single-signer ML-DSA.
Authors:Youngwoong Cho
Abstract:
What is the AGI in Offensive Security? One can break it down into two questions : (1) any offensive security tasks could be reduced into symbolic language manipulation (language representation + reasoning), (2) powerful language model (LLM) are enough to "deal with" any symbolic language manipulation. This paper can formally model a target system as a state machine and a hacker as an interactive symbolic agent. And it shows that every interaction in an offensive engagement can be encoded as a finite string. This paper provides definitions, short lemmas, and open discussion.
Authors:Mickaël Montessinos
Abstract:
We show reductions and equivalences between various problems related to the computation of the endomorphism ring of principally polarised superspecial abelian surfaces. Problems considered are the computation of the Ibukiyama-Katsura-Oort matrix and computation of unpolarised isomoprhisms between superspecial abelian surfaces.
Authors:Alon Hillel-Tuch
Abstract:
Data at the physical layer transmits via media such as copper cable, fiber optic, or wireless. Physical attack vectors exist that challenge data confidentiality and availability. Protocols and encryption standards help obfuscate but often cannot keep the data type and destination secure, with limited insight into confidentiality and integrity. We will investigate the feasibility of developing an awareness and integrity protocol to help mitigate physical side-channel attacks that lead to eavesdropping of data communication and denial-of-service. Keywords: data confidentiality, siphoning, eavesdropping, person-in-the-middle, denial-of-service, physical layer attacks, nation-states
Authors:Thomas Heverin
Abstract:
Prompt injection evaluations typically treat refusal as a stable, binary indicator of safety. This study challenges that paradigm by modeling refusal as a local decision boundary and examining its stability under structured perturbations. We evaluated two models, GPT-4.1 and GPT-4o, using 3,274 perturbation runs derived from refusal-inducing prompt injection attempts. Each base prompt was subjected to 25 perturbations across five structured families, with outcomes manually coded as Refusal, Partial Compliance, or Full Compliance. Using chi-square tests, logistic regression, mixed-effects modeling, and a novel Refusal Boundary Entropy (RBE) metric, we demonstrate that while both models refuse >94% of attempts, refusal instability is persistent and non-uniform. Approximately one-third of initial refusal-inducing prompts exhibited at least one "refusal escape," a transition to compliance under perturbation. We find that artifact type is a stronger predictor of refusal failure than perturbation style. Textual artifacts, such as ransomware notes, exhibited significantly higher instability, with flip rates exceeding 20%. Conversely, executable malware artifacts showed zero refusal escapes in both models. While GPT-4o demonstrated tighter refusal enforcement and lower RBE than GPT-4.1, it did not eliminate artifact-dependent risks. These findings suggest that single-prompt evaluations systematically overestimate safety robustness. We conclude that refusal behavior is a probabilistic, artifact-dependent boundary phenomenon rather than a stable binary property, requiring a shift in how LLM safety is measured and audited.
Authors:Adriana Watson
Abstract:
Differential privacy has become the gold standard for privacy-preserving machine learning systems. Unfortunately, subsequent work has primarily fixated on the privacy-utility tradeoff, leaving the subject of fairness constraints undervalued and under-researched. This paper provides a systematic treatment connecting three threads: (1) Dalenius's impossibility results for semantic privacy, (2) Dwork's differential privacy as an achievable alternative, and (3) emerging impossibility results from the addition of a fairness requirement. Through concrete examples and technical analysis, the three-way Pareto frontier between privacy, utility, and fairness is demonstrated to showcase the fundamental limits on what can be simultaneously achieved. In this work, these limits are characterized, the impact on minority groups is demonstrated, and practical guidance for navigating these tradeoffs are provided. This forms a unified framework synthesizing scattered results to help practitioners and policymakers make informed decisions when deploying private fair learning systems.
Authors:Tushar Jain
Abstract:
The long-term security of public blockchains strictly depends on the hardness assumptions of the underlying digital signature schemes. In the current scenario, most deployed cryptocurrencies and blockchain platforms rely on elliptic-curve cryptography, which is vulnerable to quantum attacks due to Shor's algorithm. Therefore, it is important to understand how post-quantum (PQ) digital signatures behave when integrated into real blockchain systems. This report presents a blockchain prototype that supports multiple quantum-secure signature algorithms, focusing on CRYSTALS-Dilithium, Falcon and Hawk as lattice-based schemes. This report also describes the design of the prototype and discusses the performance metrics, which include key generation, signing, verification times, key sizes and signature sizes. This report covers the problem, background, and experimental methodology, also providing a detailed comparison of quantum-secure signatures in a blockchain context and extending the analysis to schemes such as HAETAE.
Authors:David Condrey
Abstract:
Recent proposals advocate using keystroke timing signals, specifically the coefficient of variation ($δ$) of inter-keystroke intervals, to distinguish human-composed text from AI-generated content. We demonstrate that this class of defenses is insecure against two practical attack classes: the copy-type attack, in which a human transcribes LLM-generated text producing authentic motor signals, and timing-forgery attacks, in which automated agents sample inter-keystroke intervals from empirical human distributions. Using 13,000 sessions from the SBU corpus and three timing-forgery variants (histogram sampling, statistical impersonation, and generative LSTM), we show all attacks achieve $\ge$99.8% evasion rates against five classifiers. While detectors achieve AUC=1.000 against fully-automated injection, they classify $\ge$99.8% of attack samples as human with mean confidence $\ge$0.993. We formalize a non-identifiability result: when the detector observes only timing, the mutual information between features and content provenance is zero for copy-type attacks. Although composition and transcription produce statistically distinguishable motor patterns (Cohen's d=1.28), both yield $δ$ values 2-4x above detection thresholds, rendering the distinction security-irrelevant. These systems confirm a human operated the keyboard, but not whether that human originated the text. Securing provenance requires architectures that bind the writing process to semantic content.
Authors:Megha Khosla
Abstract:
Graph neural networks (GNNs) have become the standard tool for encoding data and their complex relationships into continuous representations, improving prediction accuracy in several machine learning tasks like node classification and link prediction. However, their use in sensitive applications has raised concerns about the potential leakage of training data. Research on privacy leakage in GNNs has largely been shaped by findings from non-graph domains, such as images and tabular data. We emphasize the need of graph specific analysis and investigate the impact of graph structure on node level membership inference. We formalize MI over node-neighbourhood tuples and investigate two important dimensions: (i) training graph construction and (ii) inference-time edge access. Empirically, snowball's coverage bias often harms generalisation relative to random sampling, while enabling inter-train-test edges at inference improves test accuracy, shrinks the train-test gap, and yields the lowest membership advantage across most of the models and datasets. We further show that the generalisation gap empirically measured as the performance difference between the train and test nodes is an incomplete proxy for MI risk: access to edges dominates-MI can rise or fall independent of gap changes. Finally, we examine the auditability of differentially private GNNs, adapting the definition of statistical exchangeability of train-test data points for graph based models. We show that for node level tasks the inductive splits (random or snowball sampled) break exchangeability, limiting the applicability of standard bounds for membership advantage of differential private models.
Authors:Manish Bhatt
Abstract:
Hallucinations in Large Language Models (LLMs) -- generations that are plausible but factually unfaithful -- remain a critical barrier to high-stakes deployment. Current detection methods typically rely on computationally expensive external retrieval loops or opaque black-box LLM judges requiring 70B+ parameters. In this work, we introduce [Model Name], a hybrid detection framework that combines neuroscience-inspired signal design with supervised machine learning. We extract interpretable signals grounded in Predictive Coding (quantifying surprise against internal priors) and the Information Bottleneck (measuring signal retention under perturbation). Through systematic ablation, we demonstrate three key enhancements: Entity-Focused Uptake (concentrating on high-value tokens), Context Adherence (measuring grounding strength), and Falsifiability Score (detecting confident but contradictory claims). Evaluating on HaluBench (n=200, perfectly balanced), our theory-guided baseline achieves 0.8017 AUROC. BASE supervised models reach 0.8274 AUROC, while IMPROVED features boost performance to 0.8669 AUROC (4.95% gain), demonstrating consistent improvements across architectures. This competitive performance is achieved while using 75x less training data than Lynx (200 vs 15,000 samples), 1000x faster inference (5ms vs 5s), and remaining fully interpretable. Crucially, we report a negative result: the Rationalization signal fails to distinguish hallucinations, suggesting that LLMs generate coherent reasoning for false premises ("Sycophancy"). This work demonstrates that domain knowledge encoded in signal architecture provides superior data efficiency compared to scaling LLM judges, achieving strong performance with lightweight (less than 1M parameter), explainable models suitable for production deployment.
Authors:David Ricardo Saavedra
Abstract:
Verifiable delegation in digital identity systems remains unresolved across centralized, federated, and self-sovereign identity (SSI) environments, particularly where both human users and autonomous AI agents must exercise and transfer authority without exposing primary credentials or private keys. We introduce a unified framework that enables bounded, auditable, and least-privilege delegation across heterogeneous identity ecosystems. The framework includes four key elements: Delegation Grants (DGs), first-class authorization artefacts that encode revocable transfers of authority with enforced scope reduction; a Canonical Verification Context (CVC) that normalizes verification requests into a single structured representation independent of protocols or credential formats; a layered reference architecture that separates trust anchoring, credential and proof validation, policy evaluation, and protocol mediation via a Trust Gateway; and an explicit treatment of blockchain anchoring as an optional integrity layer rather than a structural dependency. Together, these elements advance interoperable delegation and auditability and provide a foundation for future standardization, implementation, and integration of autonomous agents into trusted digital identity infrastructures.
Authors:Roland R. Rodriguez
Abstract:
Multi-agent systems face a fundamental architectural flaw: agent identity is bound to network location. When agents migrate between providers, scale across instances, or federate across organizations, URI-based identity schemes break references, fragment audit trails, and require centralized coordination. We propose the agent:// URI scheme, which decouples identity from topology through three orthogonal components: a trust root establishing organizational authority, a hierarchical capability path enabling semantic discovery, and a sortable unique identifier providing stable reference. The scheme enables capability-based discovery through DHT key derivation, where queries return agents by what they do rather than where they are. Trust-root scoping prevents cross-organization pollution while permitting federation when desired. Cryptographic attestation via PASETO tokens binds capability claims to agent identity, enabling verification without real-time contact with the issuing authority. We evaluate the scheme across four dimensions: capability expressiveness (100% coverage on 369 production tools with zero collision), discovery precision (F1=1.0 across 10,000 agents), identity stability (formal proofs of migration invariance), and performance (all operations under 5 microseconds). The agent:// URI scheme provides a formally-specified, practically-evaluated foundation for decentralized agent identity and capability-based discovery.
Authors:Ekleen Kaur
Abstract:
The integration of cryptocurrencies into institutional portfolios necessitates the adoption of robust risk modeling frameworks. This study is a part of a series of subsequent works to fine-tune model risk analysis for cryptocurrencies. Through this first research work, we establish a foundational benchmark by applying the traditional industry-standard Geometric Brownian Motion (GBM) model. Popularly used for non-crypto financial assets, GBM assumes Lognormal return distributions for a multi-asset cryptocurrency portfolio (XRP, SOL, ADA). This work utilizes Maximum Likelihood Estimation and a correlated Monte Carlo Simulation incorporating the Cholesky decomposition of historical covariance. We present our stock portfolio model as a Minimum Variance Portfolio (MVP). We observe the model's structural shift within the heavy-tailed, non-Gaussian cryptocurrency environment. The results reveal limitations of the Lognormal assumption: the calculated Value-at-Risk at the 5% confidence level over the one-year horizon. For baselining our results, we also present a holistic comparative analysis with an equity portfolio (AAPL, TSLA, NVDA), demonstrating a significantly lower failure rate. This performance provides conclusive evidence that the GBM model is fundamentally the perfect benchmark for our subsequent works. Results from this novel work will be an indicator for the success criteria in our future model for crypto risk management, rigorously motivating the development and application of advanced models.
Authors:Lev Stambler
Abstract:
We construct simulation-secure one-time memories (OTM) in the random oracle model, and present a plausible argument for their security against quantum adversaries with bounded and adaptive depth. Our contributions include: (1) A simple scheme where we use only single-qubit Wiesner states and conjunction obfuscation (constructible from LPN): no complex entanglement or quantum cryptography is required. (2) A new POVM bound where e prove that any measurement achieving $(1 - ε)$ success on one basis has conjugate-basis guessing probability at most $\frac{1}{2m} + O(ε^\frac{1}{4})$. (3) Simultation-secure OTMs in the quantum random oracle model where an adversary can only query the random oracle classically. (4) Adaptive depth security where, via an informal application of a lifting theorem from Arora et al., we conjecture security against adversaries with polynomial quantum circuit depth between random oracle queries. Security against adaptive, depth-bounded, quantum adversaries captures many realistic attacks on OTMs built from single-qubit states; our work thus paves the way for practical and truly secure one-time programs. Moreover, depth bounded adaptive adversarial models may allow for encoding one-time memories into error corrected memory states, opening the door to implementations of one-time programs which persist for long periods of time.
Authors:Sharmila S P
Abstract:
The increasing prevalence of malicious Portable Document Format (PDF) files necessitates robust and comprehensive feature extraction techniques for effective detection and analysis. This work presents a unified framework that integrates graph-based, structural, and metadata-driven analysis to generate a rich feature representation for each PDF document. The system extracts text from PDF pages and constructs undirected graphs based on pairwise word relationships, enabling the computation of graph-theoretic features such as node count, edge density, and clustering coefficient. Simultaneously, the framework parses embedded metadata to quantify character distributions, entropy patterns, and inconsistencies across fields such as author, title, and producer. Temporal features are derived from creation and modification timestamps to capture behavioral signatures, while structural elements including, object streams, fonts, and embedded images, are quantified to reflect document complexity. Boolean flags for potentially malicious PDF constructs (e.g., JavaScript, launch actions) are also extracted. Together, these features form a high-dimensional vector representation (170 dimensions) that is well-suited for downstream tasks such as malware classification, anomaly detection, and forensic analysis. The proposed approach is scalable, extensible, and designed to support real-world PDF threat intelligence workflows.6
Authors:Jonathan Pan
Abstract:
The increasing prevalence of Large Language Models (LLMs) demands effective safeguards for their operation, particularly concerning their tendency to generate out-of-context responses. A key challenge is accurately detecting when LLMs stray from expected conversational norms, manifesting as topic shifts, factual inaccuracies, or outright hallucinations. Traditional anomaly detection struggles to directly apply within contextual semantics. This paper outlines our experiment in exploring the use of Representation Engineering (RepE) and One-Class Support Vector Machine (OCSVM) to identify subspaces within the internal states of LLMs that represent a specific context. By training OCSVM on in-context examples, we establish a robust boundary within the LLM's hidden state latent space. We evaluate out study with two open source LLMs - Llama and Qwen models in specific contextual domain. Our approach entailed identifying the optimal layers within the LLM's internal state subspaces that strongly associates with the context of interest. Our evaluation results showed promising results in identifying the subspace for a specific context. Aside from being useful in detecting in or out of context conversation threads, this research work contributes to the study of better interpreting LLMs.
Authors:Chi Thien Tran
Abstract:
Fuzzing continues to be the most effective method for identifying security vulnerabilities in software. In the context of fuzz testing, the fuzzer supplies varied inputs to fuzz targets, which are designed to comprehensively exercise critical sections of the client code. Various studies have focused on optimizing and developing advanced fuzzers, such as AFL++, libFuzzer, Honggfuzz, syzkaller, ISP-Fuzzer, which have substantially enhanced vulnerability detection in widely used software and libraries. Nevertheless, achieving greater coverage necessitates improvements in both the quality and quantity of fuzz targets. In large-scale software projects and libraries -- characterized by numerous user defined functions and data types -- manual creation of fuzz targets is both labor-intensive and time-consuming. This challenge underscores the need for automated techniques not only to generate fuzz targets but also to streamline the execution and analysis of their results. In this paper, we introduce an approach to improving fuzz target generation through static analysis of library source code. The proposed method encompasses several key aspects: it analyzes source code structures to accurately construct function calls and generate fuzz targets; it maps fuzzer input data to the corresponding function parameters; it synthesizes compilation information for the fuzz targets; and it automatically collects and analyzes execution results. Our findings are demonstrated through the application of this approach to the generation of fuzz targets for C/C++ libraries.
Authors:Pengcheng Xie
Abstract:
This paper focuses on solving unconstrained privacy-preserving black-box optimization (PBBO), its corresponding least Frobenius norm updating of quadratic models, and the differentially privacy mechanisms for PBBO. Optimization problems with transformed/encrypted objective functions aim to minimize F(x), which is encrypted/transformed/encrypted to F_k(x) as the output at the k-th iteration. A new derivative-free solver named DFOp, with its implementation, is proposed in this paper, which has a new updating formula for the quadratic model functions. The convergence of DFOp for solving problems with transformed/encrypted objective functions is given. Other analyses, including the new model updating formula and the analysis of the transformation's impact to model functions are presented. We propose two differentially private noise-adding mechanisms for privacy-preserving black-box optimization. Numerical results show that DFOp performs better than compared algorithms. To the best of our knowledge, DFOp is the first derivative-free solver that can solve black-box optimization problems with step-encryption and privacy-preserving black-box problems exactly, which also tries to answer the open question about the combination of derivative-free optimization and privacy.
Authors:Danah A. AlSalem AlKhashti
Abstract:
Identity leakage can emerge when independent databases are joined, even when each dataset is anonymized individually. While previous work focuses on post-join detection or complex privacy models, little attention has been given to simple, interpretable pre-join indicators that can warn data engineers and database administrators before integration occurs. This study investigates the uniqueness ratio of candidate join attributes as an early predictor of re-identification risk. Using synthetic multi-table datasets, we compute the uniqueness ratio of attribute combinations within each database and examine how these ratios correlate with identity exposure after the join. Experimental results show a strong relationship between high pre-join uniqueness and increased post-join leakage, measured by the proportion of records that become uniquely identifiable or fall into very small groups. Our findings demonstrate that uniqueness ratio offers an explainable and practical signal for assessing join induced privacy risk, providing a foundation for developing more comprehensive pre-join risk estimation models.
Authors:Chenxi Qiu
Abstract:
Metric Differential Privacy (mDP) generalizes Local Differential Privacy (LDP) by adapting privacy guarantees based on pairwise distances, enabling context-aware protection and improved utility. While existing optimization-based methods reduce utility loss effectively in coarse-grained domains, optimizing mDP in fine-grained or continuous settings remains challenging due to the computational cost of constructing dense perterubation matrices and satisfying pointwise constraints. In this paper, we propose an interpolation-based framework for optimizing lp-norm mDP in such domains. Our approach optimizes perturbation distributions at a sparse set of anchor points and interpolates distributions at non-anchor locations via log-convex combinations, which provably preserve mDP. To address privacy violations caused by naive interpolation in high-dimensional spaces, we decompose the interpolation process into a sequence of one-dimensional steps and derive a corrected formulation that enforces lp-norm mDP by design. We further explore joint optimization over perturbation distributions and privacy budget allocation across dimensions. Experiments on real-world location datasets demonstrate that our method offers rigorous privacy guarantees and competitive utility in fine-grained domains, outperforming baseline mechanisms. in high-dimensional spaces, we decompose the interpolation process into a sequence of one-dimensional steps and derive a corrected formulation that enforces lp-norm mDP by design. We further explore joint optimization over perturbation distributions and privacy budget allocation across dimensions. Experiments on real-world location datasets demonstrate that our method offers rigorous privacy guarantees and competitive utility in fine-grained domains, outperforming baseline mechanisms.
Authors:David Brundage
Abstract:
Veterinary electronic health records (vEHRs) contain privacy-sensitive identifiers that limit secondary use. While PetEVAL provides a benchmark for veterinary de-identification, the domain remains low-resource. This study evaluates whether large language model (LLM)-generated synthetic narratives improve de-identification safety under distinct training regimes, emphasizing (i) synthetic augmentation and (ii) fixed-budget substitution. We conducted a controlled simulation using a PetEVAL-derived corpus (3,750 holdout/1,249 train). We generated 10,382 synthetic notes using a privacy-preserving "template-only" regime where identifiers were removed prior to LLM prompting. Three transformer backbones (PetBERT, VetBERT, Bio_ClinicalBERT) were trained under varying mixtures. Evaluation prioritized document-level leakage rate (the fraction of documents with at least one missed identifier) as the primary safety outcome. Results show that under fixed-sample substitution, replacing real notes with synthetic ones monotonically increased leakage, indicating synthetic data cannot safely replace real supervision. Under compute-matched training, moderate synthetic mixing matched real-only performance, but high synthetic dominance degraded utility. Conversely, epoch-scaled augmentation improved performance: PetBERT span-overlap F1 increased from 0.831 to 0.850 +/- 0.014, and leakage decreased from 6.32% to 4.02% +/- 0.19%. However, these gains largely reflect increased training exposure rather than intrinsic synthetic data quality. Corpus diagnostics revealed systematic synthetic-real mismatches in note length and label distribution that align with persistent leakage. We conclude that synthetic augmentation is effective for expanding exposure but is complementary, not substitutive, for safety-critical veterinary de-identification.
Authors:Mitchell Petingola
Abstract:
While much of the current research in deep learning-based vulnerability detection relies on disassembled binaries, this paper explores the feasibility of extracting features directly from raw x86-64 machine code. Although assembly language is more interpretable for humans, it requires more complex models to capture token-level context. In contrast, machine code may enable more efficient, lightweight models and preserve all information that might be lost in disassembly. This paper approaches the task of vulnerability detection through an exploratory study on two specific deep learning model architectures and aims to systematically evaluate their performance across three vulnerability types. The results demonstrate that graph-based models consistently outperform sequential models, emphasizing the importance of control flow relationships, and that machine code contains sufficient information for effective vulnerability discovery.
Authors:Robert Dilworth
Abstract:
Stylometry--the identification of an author through analysis of a text's style (i.e., authorship attribution)--serves many constructive purposes: it supports copyright and plagiarism investigations, aids detection of harmful content, offers exploratory cues for certain medical conditions (e.g., early signs of dementia or depression), provides historical context for literary works, and helps uncover misinformation and disinformation. In contrast, when stylometry is employed as a tool for authorship verification--confirming whether a text truly originates from a claimed author--it can also be weaponized for malicious purposes. Techniques such as de-anonymization, re-identification, tracking, profiling, and downstream effects like censorship illustrate the privacy threats that stylometric analysis can enable. Building on these concerns, this paper further explores how adversarial stylometry combined with steganography can counteract stylometric analysis. We first present enhancements to our adversarial attack, $\textit{TraceTarnish}$, providing stronger evidence of its capacity to confound stylometric systems and reduce their attribution and verification accuracy. Next, we examine how steganographic embedding can be fine-tuned to mask an author's stylistic fingerprint, quantifying the level of authorship obfuscation achievable as a function of the proportion of words altered with zero-width Unicode characters. Based on our findings, steganographic coverage of 33% or higher seemingly ensures authorship obfuscation. Finally, we reflect on the ways stylometry can be used to undermine privacy and argue for the necessity of defensive tools like $\textit{TraceTarnish}$.
Authors:Juhani Merilehto
Abstract:
This study independently reproduces the malware detection methodology presented by Felli cious et al. [7], which employs order-invariant API call frequency analysis using Random Forest classification. We utilized the original public dataset (250,533 training samples, 83,511 test samples) and replicated four model variants: Unigram, Bigram, Trigram, and Combined n gram approaches. Our reproduction successfully validated all key findings, achieving F1-scores that exceeded the original results by 0.99% to 2.57% across all models at the optimal API call length of 2,500. The Unigram model achieved F1=0.8717 (original: 0.8631), confirming its ef fectiveness as a lightweight malware detector. Across three independent experimental runs with different random seeds, we observed remarkably consistent results with standard deviations be low 0.5%, demonstrating high reproducibility. This study validates the robustness and scientific rigor of the original methodology while confirming the practical viability of frequency-based API call analysis for malware detection.
Authors:Zahir Alsulaimawi
Abstract:
Federated learning protocols require repeated synchronization between clients and a central server, with convergence rates depending on learning rates, data heterogeneity, and client sampling. This paper asks whether iterative communication is necessary for distributed linear regression. We show it is not. We formulate federated ridge regression as a distributed equilibrium problem where each client computes local sufficient statistics -- the Gram matrix and moment vector -- and transmits them once. The server reconstructs the global solution through a single matrix inversion. We prove exact recovery: under a coverage condition on client feature matrices, one-shot aggregation yields the centralized ridge solution, not an approximation. For heterogeneous distributions violating coverage, we derive non-asymptotic error bounds depending on spectral properties of the aggregated Gram matrix. Communication reduces from $\mathcal{O}(Rd)$ in iterative methods to $\mathcal{O}(d^2)$ total; for high-dimensional settings, we propose and experimentally validate random projection techniques reducing this to $\mathcal{O}(m^2)$ where $m \ll d$. We establish differential privacy guarantees where noise is injected once per client, eliminating the composition penalty that degrades privacy in multi-round protocols. We further address practical considerations including client dropout robustness, federated cross-validation for hyperparameter selection, and comparison with gradient-based alternatives. Comprehensive experiments on synthetic heterogeneous regression demonstrate that one-shot fusion matches FedAvg accuracy while requiring up to $38\times$ less communication. The framework applies to kernel methods and random feature models but not to general nonlinear architectures.
Authors:Abel C. H. Chen
Abstract:
NTRU is one of the important lattice-based post-quantum cryptography methods, offering resistance against quantum computing attacks. However, a drawback of NTRU lies in its relatively low efficiency in generating key pairs. Therefore, this study proposes an NTRU-based key expansion method that enables efficient public key expansion. Furthermore, the proposed method is applied to an anonymous certificate scheme, allowing an end entity to generate a key pair only once, after which the certificate authority can expand multiple distinct public keys for anonymity. The experimental results demonstrate that the proposed key expansion method achieves significantly higher efficiency than key pair generation.
Authors:Ji He
Abstract:
This paper investigates PASS-enabled downlink covert communication in the presence of distributed surveillance, where multiple wardens perform signal detection and fuse their local binary decisions via majority-voting rule. We consider a dual-waveguide architecture that simultaneously delivers covert information and randomized jamming to hide the transmission footprint, incorporating three representative PASS power-radiation laws-general, proportional, and equal. To characterize the system-level detectability, we derive closed-form expressions for local false-alarm and miss-detection probabilities. By leveraging a probability-generating-function (PGF) and elementary-symmetric-polynomial (ESP) framework, combined with a breakpoint-based partition of the threshold domain, we obtain explicit closed-form characterizations of the system-level detection error probability (DEP) under non-i.i.d. majority-voting fusion. Building on this analytical framework, we formulate a robust optimization problem to maximize the average covert rate subject to covertness constraint. To solve the resulting nonconvex design, we develop an MM-BCD-SCA algorithm that produces tractable alternating updates for power/radiation variables and PA positions via convex surrogates and inner approximations of the DEP value function. Numerical results validate the theoretical analysis and demonstrate the impact of cooperative monitoring and PASS radiation laws on the covertness-rate tradeoff.
Authors:Gaurav Sarraf
Abstract:
Insider threats are a particularly tricky cybersecurity issue, especially in zero-trust architectures (ZTA) where implicit trust is removed. Although the rule of thumb is never trust, always verify, attackers can still use legitimate credentials and impersonate the standard user activity. In response, behavioral analytics with machine learning (ML) can help monitor the user activity continuously and identify the presence of anomalies. This introductory framework makes use of the CERT Insider Threat Dataset for data cleaning, normalization, and class balance using the Synthetic Minority Oversampling Technique (SMOTE). It also employs Principal Component Analysis (PCA) for dimensionality reduction. Several benchmark models, including Support Vector Machine (SVM), Artificial Neural Network (ANN), and Bayesian Network (Bayes Net), were used to develop and evaluate the AdaBoost classifier. Compared to SVM (90.1%), ANN (94.7%), and Bayes Net (94.9), AdaBoost achieved higher performance with a 98.0% ACC, 98.3% PRE, 98.0% REC, and F1-score (F1). The Receiver Operating Characteristic (ROC) study, which provided further confirmation of its strength, yielded an Area Under the Curve (AUC) of 0.98. These results prove the effectiveness and dependability of AdaBoost-based behavioral analytics as a solution to reinforcing continuous insider threat detection in zero-trust settings.
Authors:Chalitha Handapangoda
Abstract:
The reliance of Large Language Models and Internet of Things systems on massive, globally distributed data flows creates systemic security and privacy challenges. When data traverses borders, it becomes subject to conflicting legal regimes, such as the EU's General Data Protection Regulation and China's Personal Information Protection Law, compounded by technical vulnerabilities like model memorization. Current static encryption and data localization methods are fragmented and reactive, failing to provide adequate, policy-aligned safeguards. This research proposes a Jurisdiction-Aware, Privacy-by-Design architecture that dynamically integrates localized encryption, adaptive differential privacy, and real-time compliance assertion via cryptographic proofs. Empirical validation in a multi-jurisdictional simulation demonstrates this architecture reduced unauthorized data exposure to below five percent and achieved zero compliance violations. These security gains were realized while maintaining model utility retention above ninety percent and limiting computational overhead. This establishes that proactive, integrated controls are feasible for secure and globally compliant AI deployment.
Authors:Chandra Sekhar Kubam
Abstract:
The rapid proliferation of synthetic media, presentation attacks, and document forgeries has created significant vulnerabilities in Know Your Customer (KYC) workflows across financial services, telecommunications, and digital-identity ecosystems. Traditional monolithic KYC systems lack the scalability and agility required to counter adaptive fraud. This paper proposes an Agentic AI Microservice Framework that integrates modular vision models, liveness assessment, deepfake detection, OCR-based document forensics, multimodal identity linking, and a policy driven risk engine. The system leverages autonomous micro-agents for task decomposition, pipeline orchestration, dynamic retries, and human-in-the-loop escalation. Experimental evaluations demonstrate improved detection accuracy, reduced latency, and enhanced resilience against adversarial inputs. The framework offers a scalable blueprint for regulated industries seeking robust, real-time, and privacy-preserving KYC verification.
Authors:Prasanna Kumar
Abstract:
Generative AI has unleashed the power of content generation and it has also unwittingly opened the pandora box of realistic deepfake causing a number of social hazards and harm to businesses and personal reputation. The investigation & ramification of Generative AI technology across industries, the resolution & hybridization detection techniques using neural networks allows flagging of the content. Good detection techniques & flagging allow AI safety - this is the main focus of this paper. The research provides a significant method for efficiently detecting dark side problems by imposing a Temporal Consistency Learning (TCL) technique. Through pretrained Temporal Convolutional Networks (TCNs) model training and performance comparison, this paper showcases that TCN models outperforms the other approaches and achieves significant accuracy for five dark side problems. Findings highlight how important it is to take proactive measures in identification to reduce any potential risks associated with generative artificial intelligence.
Authors:Tianshi Li
Abstract:
On December 4, 2025, Anthropic released Anthropic Interviewer, an AI tool for running qualitative interviews at scale, along with a public dataset of 1,250 interviews with professionals, including 125 scientists, about their use of AI for research. Focusing on the scientist subset, I show that widely available LLMs with web search and agentic capabilities can link six out of twenty-four interviews to specific scientific works, recovering associated authors and, in some cases, uniquely identifying the interviewees. My contribution is to show that modern LLM-based agents make such re-identification attacks easy and low-effort: off-the-shelf tools can, with a few natural-language prompts, search the web, cross-reference details, and propose likely matches, effectively lowering the technical barrier. Existing safeguards can be bypassed by breaking down the re-identification into benign tasks. I outline the attack at a high level, discuss implications for releasing rich qualitative data in the age of LLM agents, and propose mitigation recommendations and open problems. I have notified Anthropic of my findings.
Authors:Nicholas J. C. Papadopoulos
Abstract:
Blockchain is a decentralized, distributed ledger technology that ensures transparency, security, and immutability through cryptographic techniques. However, advancements in quantum computing threaten the security of classical cryptographic schemes, jeopardizing blockchain integrity once cryptographic quantum supremacy is achieved. This milestone, defined here as the realization of quantum computers to solve practical cryptographic problems, would render existing security standards vulnerable, exposing blockchain assets (currency, data, etc.) to fraud and theft. To address this risk, we propose and implement a smart contract deployable on the Ethereum blockchain, having the ability to run applications on its blockchain, that generates classically intractable puzzles by probabilistically generating large, hard-to-factor numbers without requiring secret information. This contract then serves two purposes: to establish a mechanism (1) for a trustless, unbiased proof of cryptographic quantum supremacy by verifying solutions to these puzzles, and (2) to protect user funds on Ethereum by triggering quantum-secure fallback protocols upon detecting cryptographic quantum supremacy, since it is desirable to wait as long as possible to fall back to a quantum-secure scheme because of its inherent additional cost and complexity. These mechanisms demonstrate the ability to identify cryptographic vulnerabilities and ensure a smooth transition to quantum-secure standards, safeguarding blockchain assets in a post-quantum era.
Authors:Rene Pickhardt
Abstract:
We introduce a geometric theory of payment channel networks that centers the polytope $W_G$ of feasible wealth distributions; liquidity states $L_G$ project onto $W_G$ via strict circulations. A payment is feasible iff the post-transfer wealth stays in $W_G$. This yields a simple throughput law: if $ζ$ is on-chain settlement bandwidth and $ρ$ the expected fraction of infeasible payments, the sustainable off-chain bandwidth satisfies $S = ζ/ ρ$. Feasibility admits a cut-interval view: for any node set S, the wealth of S must lie in an interval whose width equals the cut capacity $C(δ(S))$. Using this, we show how multi-party channels (coinpools / channel factories) expand $W_G$. Modeling a k-party channel as a k-uniform hyperedge widens every cut in expectation, so $W_G$ grows monotonically with k; for single nodes the expected accessible wealth scales linearly with $k/n$. We also analyze depletion. Under linear, asymmetric fees, cost-minimizing flow within a wealth fiber pushes cycles to the boundary, generically depleting channels except for a residual spanning forest. Three mitigation levers follow: (i) symmetric fees per direction, (ii) convex/tiered fees (effective flow control but at odds with source routing without liquidity disclosure), and (iii) coordinated replenishment (choose an optimal circulation within a fiber). Together, these results explain why two-party meshes struggle to scale and why multi-party primitives are more capital-efficient, yielding higher expected payment bandwidth. They also show how fee design and coordination keep operation inside the feasible region, improving reliability.
Authors:Manideep Reddy Chinthareddy
Abstract:
AI-assisted developer services are increasingly embedded in modern IDEs, yet enterprises must ensure these tools operate within existing identity, access control, and governance requirements. The Model Context Protocol (MCP) enables AI assistants to retrieve structured internal context, but its specification provides only a minimal authorization model and lacks guidance on integrating enterprise SSO. This article presents a practical architecture that incorporates OAuth 2.0 and OpenID Connect (OIDC) into MCP-enabled developer environments. It describes how IDE extensions obtain and present tokens, how MCP servers validate them through an identity provider, and how scopes and claims can enforce least-privilege access. A prototype implementation using Visual Studio Code, a Python-based MCP server, and an OIDC-compliant IdP demonstrates feasibility. A case study evaluates authentication latency, token-validation overhead, operational considerations, and AI-specific risks. The approach provides a deployable pattern for organizations adopting AI-assisted developer tools while maintaining identity assurance and auditability.
Authors:Sergio Demian Lerner
Abstract:
We introduce Auditable Proof-of-Work (APoW), a novel proof-of-work (PoW) construction inspired by Hashcash-style nonce searching, which enables the auditing of other miners' work through accountable re-scanning of the nonce space. The proposed scheme allows a miner to probabilistically attest to having searched specified regions of the nonce space in earlier mining rounds, while concurrently earning rewards for performing productive work for a new block or pool share. This capability enables miners belonging to a mining pools to audit another miner's claimed effort retroactively, thereby allowing the probabilistic detection of block withholding attacks (BWAs) without requiring trusted hardware or trusted third parties. As a consequence, the construction supports the design of decentralized mining pools in which work attribution is verifiable and withholding incentives are substantially reduced. The scheme preserves the fundamental properties of conventional PoW, including public verifiability and difficulty adjustment, while adding an orthogonal auditability layer tailored to pool-based mining. Finally, while a full deployment of APoW in Bitcoin would require a consensus rule change and minor modifications to mining ASICs, the construction remains practically useful even without consensus changes, for instance, as a pool-level auditing mechanism that enables verifiable pay-for-auditing using existing pool reserves.
Authors:Jay Kuri
Abstract:
Modern identity and trust systems collapse in the environments where they are needed most: disaster zones, disconnected or damaged networks, and adversarial conditions such as censorship or infrastructure interference. These systems depend on functioning networks to reach online authorities, resolvers, directories, and revocation services, leaving trust unverifiable whenever communication is unavailable or untrusted. This work demonstrates that secure identity and trust are possible without such infrastructure. We introduce the Zero-Infrastructure Capability Graph (ZI-CG), a model showing that identity, delegation, and revocation can be represented as self-contained, signed statements whose validity is determined entirely by local, deterministic evaluation. We further present Vouchsafe, a complete working instantiation of this model built using widely deployed primitives including Ed25519, SHA-256, and structured JSON Web Tokens, requiring no new cryptography or online services. The results show that a practical, offline-verifiable trust substrate can be constructed today using only the cryptographic data presented at evaluation time.
Authors:Vignesh Iyer
Abstract:
Autonomous AI agents executing multi-step tool sequences face semantic attacks that manifest in behavioral traces rather than isolated prompts. A critical challenge is cross-attack generalization: can detectors trained on known attack families recognize novel, unseen attack types? We discover that standard conversational tokenization -- capturing linguistic patterns from agent interactions -- fails catastrophically on structural attacks like tool hijacking (AUC 0.39) and data exfiltration (AUC 0.46), while succeeding on linguistic attacks like social engineering (AUC 0.78). We introduce structural tokenization, encoding execution-flow patterns (tool calls, arguments, observations) rather than conversational content. This simple representational change dramatically improves cross-attack generalization: +46 AUC points on tool hijacking, +39 points on data exfiltration, and +71 points on unknown attacks, while simultaneously improving in-distribution performance (+6 points). For attacks requiring linguistic features, we propose gated multi-view fusion that adaptively combines both representations, achieving AUC 0.89 on social engineering without sacrificing structural attack detection. Our findings reveal that AI agent security is fundamentally a structural problem: attack semantics reside in execution patterns, not surface language. While our rule-based tokenizer serves as a baseline, the structural abstraction principle generalizes even with simple implementation.
Authors:Tianshuo Yang
Abstract:
This paper presents a comprehensive study of matrix Kloosterman sums, including their computational aspects, distributional behavior, and applications in cryptographic analysis. Building on the work of [Zelingher, 2023], we develop algorithms for evaluating these sums via Green's polynomials and establish a general framework for analyzing their statistical distributions. We further investigate the associated $L$-functions and clarify their relationships with symmetric functions and random matrix theory. We show that, analogous to the eigenvalue statistics of random matrices in compact Lie groups such as $SU(n)$ and $Sp(2n)$, the normalized values of matrix Kloosterman sums exhibit Sato-Tate equidistribution. Finally, we apply this framework to distinguish truly random sequences from those exhibiting subtle algebraic biases, and we propose a novel spectral test for cryptographic security based on the distributional signatures of matrix Kloosterman sums.
Authors:Abdurrahman Tolay
Abstract:
The proliferation of Internet of Things (IoT) devices has introduced significant security challenges, primarily due to the opacity of firmware components and the complexity of supply chain dependencies. IoT firmware frequently relies on outdated, third-party libraries embedded within monolithic binary blobs, making vulnerability management difficult. While Software Bill of Materials (SBOM) standards have matured, generating actionable intelligence from raw firmware dumps remains a manual and error-prone process. This paper presents a lightweight, automated pipeline designed to extract file systems from Linux-based IoT firmware, generate a comprehensive SBOM, map identified components to known vulnerabilities, and apply a multi-factor triage scoring model. The proposed system focuses on risk prioritization by integrating signals from the Common Vulnerability Scoring System (CVSS), Exploit Prediction Scoring System (EPSS), and the CISA Known Exploited Vulnerabilities (KEV) catalog. Unlike conventional scanners that produce high volumes of uncontextualized alerts, this approach emphasizes triage by calculating a localized risk score for each finding. We describe the architecture, the normalization challenges of embedded Linux, and a scoring methodology intended to reduce alert fatigue. The study outlines a planned evaluation strategy to validate the extraction success rate and triage efficacy using a dataset of public vendor firmware, offering a reproducibility framework for future research in firmware security.
Authors:Samet Ünsal
Abstract:
We introduce the Quantum State Continuity Problem (QSCP), a security objective orthogonal to identity authentication that captures whether a systems current execution is a legitimate continuation of a unique past execution. We show that classical and stateless quantum authentication mechanisms fail to enforce continuity and remain vulnerable to fork attacks. To address this gap, we propose the Quantum State Continuity Witness (QSCW), a minimal quantum-assisted primitive that enforces temporal linkage of execution through stateful quantum evolution and cumulative auditing. Using a GHZ-based toy instantiation and extensive simulation, we demonstrate that temporal enforcement suppresses fork attacks with exponential decay in success probability, while remaining robust to noise and system parameters. Our results highlight execution continuity as a distinct and underexplored dimension of system security.
Authors:Ron F. Del Rosario
Abstract:
We present an openly documented methodology for fine-tuning language models to detect temporal attack patterns in multi-agent AI workflows using OpenTelemetry trace analysis. We curate a dataset of 80,851 examples from 18 public cybersecurity sources and 35,026 synthetic OpenTelemetry traces. We apply iterative QLoRA fine-tuning on resource-constrained ARM64 hardware (NVIDIA DGX Spark) through three training iterations with strategic augmentation. Our custom benchmark accuracy improves from 42.86% to 74.29%, a statistically significant 31.4-point gain. Targeted examples addressing specific knowledge gaps outperform indiscriminate scaling. Key contributions include: (1) synthetic trace generation methodology for multi-agent coordination attacks and regulatory violations, (2) empirical evidence that training data composition fundamentally determines behavior, and (3) complete open release of datasets, training scripts, and evaluation benchmarks on HuggingFace. While practical deployment requires human oversight due to false positive rates, this work establishes the first reproducible framework enabling practitioners to build custom agentic security models adapted to their threat landscapes.
Authors:Ismail Ahmad Abdullah
Abstract:
Contemporary AI systems achieve extraordinary performance yet remain opaque and non-verifiable, creating a crisis of trust for safety-critical deployment. We introduce MathLedger, a substrate for verifiable machine cognition that integrates formal verification, cryptographic attestation, and learning dynamics into a single epistemic loop. The system implements Reflexive Formal Learning (RFL), a symbolic analogue of gradient descent where updates are driven by verifier outcomes rather than statistical loss. Phase I experiments validate the measurement and governance substrate under controlled conditions. CAL-EXP-3 validates measurement infrastructure (Delta p computation, variance tracking); separate stress tests confirm fail-closed governance triggers correctly under out-of-bounds conditions. No convergence or capability claims are made. The contribution is infrastructural: a working prototype of ledger-attested learning that enables auditability at scale. Keywords: verifiable learning, formal verification, cryptographic attestation, reflexive feedback, fail-closed governance
Authors:Adiv Brander Cari Quispe
Abstract:
In today's university environment, wireless connectivity is an essential resource for academic, administrative, and research activities. However, at the National University of the Altiplano of Puno (UNAP), the use of a QR code access system on the institutional Wi-Fi network has generated vulnerabilities related to the lack of individual authentication, user traceability, and access control. Given this situation, this study aims to strengthen the security of the university's wireless network through the application of data analytics, employing descriptive, predictive, and prescriptive approaches to the logs generated by the wireless controller (WLC). The methodology consisted of collecting and processing connection data from users, devices, and daily traffic, analyzing behavioral patterns, and detecting anomalies based on statistical models and machine learning algorithms. The results revealed critical usage peaks between 10:00 and 14:00, as well as anomalous behavior associated with recurring devices and irregular traffic spikes. This allowed for the establishment of dynamic alert thresholds and recommendations for improvements in bandwidth management and authentication. Furthermore, the conclusion states that integrating advanced analytics into the management of university networks not only identifies vulnerabilities and optimizes WiFi service performance, but also advances towards an intelligent, proactive infrastructure aligned with modern institutional cybersecurity standards.
Authors:Alvaro Otero Sanchez
Abstract:
In this paper we solve an open question formulated in the original paper of twisted skew group codes regarding when a twisted skew group code is checkable. Also, we prove that all ideals of dimension 3 over a twisted group algebra are abelian group codes, generalising another previous result over group algebras. Finally, we prove a bound on the dimension and distance of a twisted group code, as well as when such bound is reached.
Authors:Joshua Shen
Abstract:
Blockchain as a promising technology is gaining its popularity ever since proof-of-work based Bitcoin came to the world. Nevertheless, Bitcoin achieves consensus at an expensive cost of energy. Proof-of-stake is one of the solutions for such a problem. Participants of PoS protocols achieve dynamic-availability in permissionless settings. Parties can join and leave the protocol at their will without notifying others. However, such protocol relies heavily on a central clock, providing the function of synchrony by collecting the finish status of every honest participant. In our protocol, the global function maintains the round information for each participant no longer needed. We analyze and modify the round into real-time based round model. Message delivery delay is also taken into consideration of the round length. However, participant need the connection of a real-world time global clock which is crucial to calculate the current round. And round length also is adjusted due to the changing network situation at the start of every new epoch.