TNNLS2025

Abstract Paper Portal of IEEE Transactions on Neural Networks and Learning Systems (TNNLS) 2025

PaperID: 1,

Authors: Siddharth Barve, Rashmi Jha

Affiliations: Department of Electrical and Computer Engineering (ECE), University of Cincinnati, Cincinnati, OH, USA

Abstract:
Many currently available deep neural network (DNN) accelerators are highly application specific and have focused on supervised learning. In addition, many accelerators have rigid architectures and algorithms that prevent adapting to dynamic environments. In this work, we propose a neuromorphic architecture implementing a self-organizing feature map (SOFM) using ferroelectric field-effect transistors (FeFETs) for in-memory error computation. The neuromorphic architecture takes inspiration from biological networks and is able to grow neurons to adapt to the application. Furthermore, it is able to modulate the distance between neurons to provide more fluidity to its topography. We demonstrate that the ability of the network to adapt to various datasets and even exhibit lifelong learning and self-repair. We further demonstrate the architecture’s efficiency in terms of both power and speed as well as its robustness to device variability.

PaperID: 2,

Authors: Man-Sheng Chen, Jia-Qi Lin, Chang-Dong Wang, Dong Huang, Jian-Huang Lai

Affiliations: School of Computer Science and Engineering and Guangdong Key Laboratory of Big Data Analysis and Processing, Sun Yat-sen University, Guangzhou, China; School of Mathematics (Zhuhai), Sun Yat-sen University, Guangzhou, China; Department of Computer Science, South China Agricultural University, Guangzhou, China; School of Computer Science and Engineering, Guangdong Key Laboratory of Information Security Technology, and the Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, Sun Yat-sen University, Guangzhou, China

Title: Contrastive Ensemble Clustering

Abstract:
Ensemble clustering aims to combine different base clusterings into a better clustering than that of the individual one. In general, a co-association matrix depicting the pairwise affinity between different data samples is constructed by average fusion or weighted fusion of the connective matrices from multiple base clusterings. Despite the significant success, the existing works fail to capture the global structure information from multiple noisy connective matrices. Meanwhile, the locality property of the resulting representation matrix could not be explicitly preserved. In this article, we propose a novel contrastive ensemble clustering (CEC) method. Specifically, a consensus mapping model is designed for the discovery of the latent representation from the noisy observations with distinct confidences. Furthermore, a contrastive regularizer is dexterously formulated to refine the latent representation while preserving its locality property. Extensive experiments conducted on several benchmark datasets demonstrate the superiority of the proposed CEC method. To the best of our knowledge, it is the first time to explore the potential of latent representation learning and contrastive components for the ensemble clustering task.

PaperID: 3,

Authors: Fuzhi Wu, Jiasong Wu, Chen Zhang, Youyong Kong, Chunfeng Yang, Guanyu Yang, Huazhong Shu, Guy Carrault, Lotfi Senhadji

Affiliations: Laboratory of Image Science and Technology (LIST), the Key Laboratory of Computer Network and Information Integration, Ministry of Education, and Jiangsu Provincial Joint International Research Laboratory of Medical Information Processing, Southeast University, Nanjing, China; School of Information Engineering, Yangzhou University, Yangzhou, Jiangsu, China; Centre de Recherche en Information Biomédicale Sino-français (CRIBs), Inserm, LTSI-UMR , University of Rennes, Rennes, France

Title: Wavelet-Based Dual-Task Network

Abstract:
In image processing, wavelet transform (WT) offers multiscale image decomposition, generating a blend of low-resolution approximation images and high-resolution detail components. Drawing parallels to this concept, we view feature maps in convolutional neural networks (CNNs) as a similar mix, but uniquely within the channel domain. Inspired by multitask learning (MTL) principles, we propose a wavelet-based dual-task (WDT) framework. This novel framework employs WT in the channel domain to split a single task into two parallel tasks, thereby reforming traditional single-task CNNs into dynamic dual-task networks. Our WDT framework integrates seamlessly with various popular network architectures, enhancing their versatility and efficiency. It offers a more rational approach to resource allocation in CNNs, balancing between low-frequency and high-frequency information. Rigorous experiments on Cifar10, ImageNet, HMDB51, and UCF101 validate our approach’s effectiveness. Results reveal significant improvements in the performance of traditional CNNs on classification tasks, and notably, these enhancements are achieved with fewer parameters and computations. In summary, our work presents a pioneering step toward redefining the performance and efficiency of CNN-based tasks through WT.

PaperID: 4,

Authors: Amelia Sorrenti, Giovanni Bellitto, Federica Proietto Salanitri, Matteo Pennisi, Simone Palazzo, Concetto Spampinato

Affiliations: PeRCeiVe Lab Research Group, University of Catania, Catania, Italy

Title: Wake-Sleep Consolidated Learning

Abstract:
We propose wake-sleep consolidated learning (WSCL), a learning strategy leveraging complementary learning system (CLS) theory and the wake-sleep phases of the human brain to improve the performance of deep neural networks (DNNs) for visual classification tasks in continual learning (CL) settings. Our method learns continually via the synchronization between distinct wake and sleep phases. During the wake phase, the model is exposed to sensory input and adapts its representations, ensuring stability through a dynamic parameter freezing mechanism and storing episodic memories in a short-term temporary memory (similar to what happens in the hippocampus). During the sleep phase, the training process is split into nonrapid eye movement (NREM) and rapid eye movement (REM) stages. In the NREM stage, the model’s synaptic weights are consolidated using replayed samples from the short-term and long-term memory and the synaptic plasticity mechanism is activated, strengthening important connections and weakening unimportant ones. In the REM stage, the model is exposed to previously-unseen realistic visual sensory experience, and the dreaming process is activated, which enables the model to explore the potential feature space, thus preparing synapses for future knowledge. We evaluate the effectiveness of our approach on four benchmark datasets: CIFAR-10, CIFAR-100, Tiny-ImageNet, and FG-ImageNet. In all cases, our method outperforms the baselines and prior work, yielding a significant performance gain on continual visual classification tasks. Furthermore, we demonstrate the usefulness of all processing stages and the importance of dreaming to enable positive forward transfer (FWT). The code is available at: https://github.com/perceivelab/wscl.

PaperID: 5,

Authors: Ling Huang, Dong Huang, Han Zou, Yuefang Gao, Chang-Dong Wang, Philip S. Yu

Affiliations: College of Mathematics and Informatics, South China Agricultural University, Guangzhou, China; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China; Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA

Title: Knowledge-Reinforced Cross-Domain Recommendation

Abstract:
Over the past few years, cross-domain recommendation has gained great attention to resolve the cold-start issue. Many existing cross-domain recommendation methods model a preference bridge between the source and target domains to transfer preferences by the overlapping users. However, when there are insufficient cross-domain users available to bridge the two domains, it will negatively impact the recommender system’s accuracy (ACC) and performance. Therefore, in this article, we propose to create a link between the source and the target domains by leveraging knowledge graph (KG) as the auxiliary information, and propose a novel knowledge-reinforced cross-domain recommendation (KR-CDR) method. First of all, we construct a new cross-domain KG (CDKG) by using the KGs that represent the source and target domains, respectively. Additionally, we employ reinforcement learning (RL) with meta learning on CDKG to discover meta-paths between the source and target domains. With these meta-paths, we obtain meta-path aggregated embedding vectors for cold-start users. Ultimately, the predicted rating can be acquired from the user meta-path aggregated embedding vector and item embedding vector. Experiments carried out on five real-world datasets show that the proposed method performs better than the state-of-the-art methods.

PaperID: 6,

Authors: Hongkun Dou, Junzhe Lu, Zeyu Li, Xiaoqing Zhong, Wen Yao, Lijun Yang, Yue Deng

Affiliations: School of Astronautics, Beihang University, Beijing, China; Institute of Telecommunication and Navigation Satellites, China Academy of Space Technology, Beijing, China; Defense Innovation Institute, Chinese Academy of Military Science, Beijing, China; Institute of Artificial Intelligence, Beihang University, Beijing, China

Title: Score-Based Neural Processes

Abstract:
Neural processes (NPs) have recently emerged as a powerful meta-learning framework capable of making predictions based on an arbitrary number of context points. However, the learning of NPs and their variants is hindered by the need for explicit reliance on the log-likelihood of predictive distributions, which complicates the training process. To tackle this problem, we introduce score-based NP (SNP) models, drawing inspiration from recently developed score-based generative models (SGMs) that restore data from noise by reversing a perturbation process. With denoising score matching (DSM) techniques, the SNPs bypass the intractable log-likelihood calculations, learning parameterized score functions instead. We also demonstrate that score functions possess excellent attributes that enable us to represent a wide family of conditional distributions naturally. Moreover, as data points are inherently unordered, it is crucial to incorporate appropriate inductive biases into SNPs. To this end, we propose building blocks for parameterizing permutation equivariant score functions, which induce the SNPs with the desired properties. Through extensive experimentation on both synthetic and real-world datasets, our SNPs exhibit remarkable performance and outperform existing state-of-the-art NP approaches.

PaperID: 7,

Authors: Baochang Zhang, Runqi Wang, Xiaodi Wang, Jungong Han, Rongrong Ji

Affiliations: School of Automation Science and Electrical Engineering, Beihang University, Beijing, China; Department of Computer Science, Prifysgol Aberystwyth University, Aberystwyth, U.K.; School of Informatics, Xiamen University, Xiamen, China

Title: Modulated Convolutional Networks

Abstract:
While the deep convolutional neural network (DCNN) has achieved overwhelming success in various vision tasks, its heavy computational and storage overhead hinders the practical use of resource-constrained devices. Recently, compressing DCNN models has attracted increasing attention, where binarization-based schemes have generated great research popularity due to their high compression rate. In this article, we propose modulated convolutional networks (MCNs) to obtain binarized DCNNs with high performance. We lead a new architecture in MCNs to efficiently fuse the multiple features and achieve a similar performance as the full-precision model. The calculation of MCNs is theoretically reformulated as a discrete optimization problem to build binarized DCNNs, for the first time, which jointly consider the filter loss, center loss, and softmax loss in a unified framework. Our MCNs are generic and can decompose full-precision filters in DCNNs, e.g., conventional DCNNs, VGG, AlexNet, ResNets, or Wide-ResNets, into a compact set of binarized filters which are optimized based on a projection function and a new updated rule during the backpropagation. Moreover, we propose modulation filters (M-Filters) to recover filters from binarized ones, which lead to a specific architecture to calculate the network model. Our proposed MCNs substantially reduce the storage cost of convolutional filters by a factor of 32 with a comparable performance to the full-precision counterparts, achieving much better performance than other state-of-the-art binarized models.

PaperID: 8,

Authors: Di Wang, Ping Wang, Zhong Ji, Xiaojun Yang, Hong-Yue Li

Affiliations: School of Electrical and Information Engineering, Tianjin University, Tianjin, China; Tianjin Meteorological Observatory, Tianjin, China

Title: Conformal Loss-Controlling Prediction

Abstract:
Conformal prediction (CP) is a learning framework controlling prediction coverage of prediction sets, which can be built on any learning algorithm for point prediction. This work proposes a learning framework named conformal loss-controlling prediction, which extends CP to the situation where the value of a loss function needs to be controlled. Different from existing works about risk-controlling prediction sets and conformal risk control with the purpose of controlling the expected values of loss functions, the proposed approach in this article focuses on the loss for any test object, which is an extension of CP from miscoverage loss to some general loss. The controlling guarantee is proved under the assumption of exchangeability of data in finite-sample cases and the framework is tested empirically for classification with a class-varying loss and statistical postprocessing of numerical weather forecasting applications, which are introduced as point-wise classification and point-wise regression problems. All theoretical analysis and experimental results confirm the effectiveness of our loss-controlling approach.

PaperID: 9,

Authors: Meng Liu, Ke Liang, Hao Yu, Lingyuan Meng, Siwei Wang, Sihang Zhou, Xinwang Liu

Affiliations: College of Computer Science and Technology, National University of Defense Technology, Changsha, China; Intelligent Game and Decision Laboratory, Beijing, China; College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China

Title: Multiview Temporal Graph Clustering

Abstract:
As an emerging task, temporal graph clustering (TGC) is committed to clustering nodes on temporal graphs through interaction sequence-based batch-processing patterns. These patterns allow for more flexibility in finding a balance between time and space requirements than adjacency matrix-based static graph clustering. However, as a new task, TGC still has important unresolved challenges, such as insufficient information. This challenge manifests itself in a variety of problems in real-world datasets, including missing features (eigenvalues are missing or even nonexistent), long-tail nodes (most inactive nodes have little interaction), and noisy data (data is subject to anomalies, errors, and sparsity). These problems occur before training, making it difficult for the model to train well with insufficient information. To solve the challenge, we propose a method that introduces multiview clustering (MVC) into TGC, called MVTGC. Our method aims to perform data augmentation on the temporal graph by constructing multiple views to increase the information richness. In particular, we utilize different techniques to model a certain part of the temporal graph to generate enhanced views focusing on different angles. These views are combined into training through early fusion and late fusion and ultimately enhance the model’s receptive field and information richness. Comparative experiments and a case study on real-world datasets demonstrate the significance and effectiveness of MVTGC, which achieves at most 10.48% performance improvement. The code and data are available at https://github.com/MGitHubL/MVTGC

PaperID: 10,

Authors: Nunzio A. Letizia, Nicola Novello, Andrea M. Tonello

Affiliations: Institute of Networked and Embedded Systems, University of Klagenfurt, Klagenfurt, Austria

Title: Copula Density Neural Estimation

Abstract:
Probability density estimation from observed data constitutes a central task in statistics. In this brief, we focus on the problem of estimating the copula density associated with any observed data, as it fully describes the dependence between random variables. We separate univariate marginal distributions from the joint dependence structure in the data, the copula itself, and we model the latter with a neural network-based method referred to as copula density neural estimation (CODINE). Results show that the novel learning approach is capable of modeling complex distributions and can be applied for mutual information estimation and data generation.

PaperID: 11,

Authors: Chao Ren, Rudai Yan, Huihui Zhu, Han Yu, Minrui Xu, Yuan Shen, Yan Xu, Ming Xiao, Zhao Yang Dong, Mikael Skoglund, Dusit Niyato, Leong Chuan Kwek

Affiliations: School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden; School of Electrical and Electronic Engineering, Nanyang Technological University, Jurong West, Singapore; ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China; College of Computing and Data Science, Nanyang Technological University, Jurong West, Singapore; Centre for Quantum Technologies, National University of Singapore, Queenstown, Singapore; Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China

Title: Toward Quantum Federated Learning

Abstract:
Quantum federated learning (QFL) is an emerging interdisciplinary field that merges the principles of quantum computing (QC) and federated learning (FL), with the goal of leveraging quantum technologies to enhance privacy, security, and efficiency in the learning process. Currently, there is no comprehensive survey for this interdisciplinary field. This review offers a thorough, holistic examination of QFL. We aim to provide a comprehensive understanding of the principles, techniques, and emerging applications of QFL. We discuss the current state of research in this rapidly evolving field, identify challenges and opportunities associated with integrating these technologies, and outline future directions and open research questions. We propose a unique taxonomy of QFL techniques, categorized according to their characteristics and the quantum techniques employed. As the field of QFL continues to progress, we can anticipate further breakthroughs and applications across various industries, driving innovation and addressing challenges related to data privacy, security, and resource optimization. This review serves as a first-of-its-kind comprehensive guide for researchers and practitioners interested in understanding and advancing the field of QFL.

PaperID: 12,

Authors: Chaoyang Li, Heyan Chai, Yan Jia, Ning Hu, Qing Liao

Affiliations: Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China; Department of Computer Science and Engineering, Chinese University of Hong Kong, Hong Kong, China

Title: Multitask Causal Contrastive Learning

Abstract:
Multitask learning (MTL) aims to improve the performance of multiple tasks by sharing knowledge among multiple different tasks, which has attracted increasing interest and shown success in various fields. However, MTL often suffers from negative transfer since the model may utilize useless features and face interference among tasks’ optimization objectives. The utilization of useless features can be attributed to the confounding factors in multitask features, while the interference among tasks’ optimization objectives is due to inadequate measurement of the relationship among tasks. This article proposes a novel multitask causal contrastive learning (MT-CCL) approach to address the above problem. First, we propose a multitask causal inference method, which removes the confounding factors in features via task-aware causal intervention (TACI) and measures the relationship among tasks from a novel causal perspective by quantifying the intertask causal affinity. Then, we build a dual contrastive learning objective to help the model better learn useful features via intratask contrast (Intra-TCS) and mitigate interference among tasks’ optimization objectives via intertask contrast (Inter-TCS). Experiments demonstrate that MT-CCL achieves improved performance over state-of-the-art methods on Multi-MNIST, NYU-v2, CityScapes, and CelebA, verifying the effectiveness of intertask causal affinity.

PaperID: 13,

Authors: Shuyi Ji, Yifan Feng, Donglin Di, Shihui Ying, Yue Gao

Affiliations: BNRist, THUIBCS, BLBCI, School of Software, Tsinghua University, Beijing, China; Department of Mathematics, School of Science, Shanghai University, Shanghai, China

Title: Mode Hypergraph Neural Network

Abstract:
The hypergraph neural network (HGNN) is an emerging powerful tool for modeling and learning complex, high-order correlations among entities upon hypergraph structures. While existing HGNN-based approaches excel in modeling high-order correlations among data using hyperedges, they often have difficulties in distinguishing diverse semantics (e.g., bioactivities between drug and target in biological networks) of different correlations, making it challenging to learn accurate final representations. The underlying reason is that the specific semantic information of each hyperedge cannot be captured and distinguished during the modeling and learning process. To address this, we propose a mode HGNN ( \textsf MHGNN ) framework that extends the vanilla hypergraph structure by endowing hyperedges with mode information for encapsulating their semantics and then performs mode-aware high-order message passing upon mode hypergraph for achieving comprehensive node representations. Extensive evaluations on four real-world datasets under two representative tasks have demonstrated the outstanding performance of \textsf MHGNN against the state of the arts.

PaperID: 14,

Authors: Yueyi Zhang, Jin Wang, Wenming Weng, Xiaoyan Sun, Zhiwei Xiong

Affiliations: MoE Key Laboratory of Brain-Inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, China

Title: EGVD: Event-Guided Video Deraining

Abstract:
Recent research has explored leveraging event cameras, known for their prowess in capturing scenes with nonuniform motion, for video deraining, leading to performance improvements. However, the existing event-based method still faces the challenge that the complex spatiotemporal distribution disrupts temporal information fusion and complicates feature separation. This article proposes a novel end-to-end learning framework for video deraining that effectively extracts the rich dynamic information provided by the event stream. Our framework incorporates two key modules: an event-aware motion detection (EAMD) module that adaptively aggregates multiframe motion information using event-driven masks and a pyramidal adaptive selection module that separates background and rain layers by leveraging contextual priors from both event and conventional camera data. To facilitate efficient training, we introduce a real-world dataset of synchronized rainy videos and event streams. Extensive evaluations on both synthetic and real-world datasets demonstrate the superiority of our proposed method compared to state-of-the-art approaches. The code is available at https://github.com/booker-max/EGVD.

PaperID: 15,

Authors: Hyo-Seok Hwang, Yoojoong Kim, Junhee Seok

Affiliations: School of Electrical Engineering, Korea University, Seoul, Republic of Korea; School of Computer Science and Information Engineering, The Catholic University of Korea, Bucheon, Republic of Korea

Title: Generative Adversarial Soft Actor-Critic

Abstract:
In deep reinforcement learning (RL), learning stochastic and multimodal policies are crucial for various tasks, but most continuous control algorithms model the policy using a deterministic or unimodal Gaussian distribution. Despite being designed to inherently learn a stochastic policy under the maximum entropy RL framework, soft actor–critic (SAC) also uses a factorized Gaussian policy for tractable optimization, which not only restricts the expressiveness of the policy but also ignores correlations among the components of the action vector. In this article, we revisit the approach of employing normalizing flow for SAC policy, justified by the change of variable theorem, and then propose a state-dependent nonvolume preserving (SD-NVP) architecture suitable for SAC learning. In addition, we introduce a generative adversarial SAC (GASAC) that implicitly defines and optimizes various divergences without calculating the normalization constant through a generative adversarial loss. Experimental results on multigoal environment and MuJoCo continuous control tasks suite demonstrate that GASAC model multimodal policy and learn policy more stably in terms of cumulative return.

PaperID: 16,

Authors: Jun Chen, Jingyang Xiang, Tianxin Huang, Xiangrui Zhao, Yong Liu

Affiliations: National Special Education Resource Center for Children with Autism, Zhejiang Normal University, Hangzhou, China; Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China

Title: Hyperbolic Binary Neural Network

Abstract:
Binary neural network (BNN) converts full-precision weights and activations into their extreme 1-bit counterparts, making it particularly suitable for deployment on lightweight mobile devices. While BNNs are typically formulated as a constrained optimization problem and optimized in the binarized space, general neural networks are formulated as an unconstrained optimization problem and optimized in the continuous space. This article introduces the hyperbolic BNN (HBNN) by leveraging the framework of hyperbolic geometry to optimize the constrained problem. Specifically, we transform the constrained problem in hyperbolic space into an unconstrained one in Euclidean space using the Riemannian exponential map. On the other hand, we also propose the exponential parametrization cluster (EPC) method, which, compared with the Riemannian exponential map, shrinks the segment domain based on a diffeomorphism. This approach increases the probability of weight flips, thereby maximizing the information gain in BNNs. Experimental results on CIFAR10, CIFAR100, and ImageNet classification datasets with VGGsmall, ResNet18, and ResNet34 models illustrate the superior performance of our HBNN over state-of-the-art methods.

PaperID: 17,

Authors: Yesong Xu, Shuo Chen, Jun Li, Jian Yang

Affiliations: School of Computer and Information, Anhui Polytechnic University, Wuhu, China; School of Intelligence Science and Technology, Nanjing University, Suzhou, China; PCA Laboratory, Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, and Jiangsu Key Laboratory of Image and Video Understanding for Social Security, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China

Title: Metric Learning-Based Subspace Clustering

Abstract:
The self-expressive strategy has shown excellent capabilities in realizing low-dimensional representations of high-dimensional data for subspace clustering algorithms. The existing designs, however, are formulated on the linearization assumptions of the data, neglecting the precise characterization of linear relationships within samples. Considering that real-world data adheres to diverse distribution forms, it becomes impractical to first treat the samples as existing in a uniform linear space before finding an appropriate manifold space. To handle this challenge, we propose a novel self-expressive-based learning framework termed metric learning-based subspace clustering (MLSC). Particularly, we smoothly incorporate metric learning into the subspace clustering framework by introducing adaptive neighbors learning and defining a linearity-aware distance to discover the linear manifold space of the original data. We simultaneously utilize the generated representation of the linear structure as input for self-expressiveness to pursue an ideal similarity matrix, which establishes an essential connection with the linearization assumption of the self-expressive strategy. Furthermore, we theoretically demonstrate that our measure can accurately describe the level of linear correlation between instances. Finally, our tests demonstrate that the proposed MLSC attains competitive clustering results compared to state-of-the-art approaches on benchmark datasets.

PaperID: 18,

Authors: Tenghai Qiu, Shiguang Wu, Zhen Liu, Zhiqiang Pu, Jianqiang Yi, Yuqian Zhao, Biao Luo

Affiliations: School of Automation, Central South University, Changsha, China; Institute of Automation, Chinese Academy of Sciences, Beijing, China

Title: Cognition-Oriented Multiagent Reinforcement Learning

Abstract:
Inspired by psychological insights into individual behavior, we propose a novel cognition-oriented multiagent reinforcement learning (CORL) framework. CORL equips agents with two distinct types of cognition—situational and self-cognition—derived from local observations. To enhance the informativeness and precision of these cognition types, we introduce two information-theoretical regularizers: one to align situational cognition with the global state and the other to align self-cognition with each agent’s identity for improved role differentiation and team coordination. In addition, the centralized training and decentralized execution framework is adopted to train the policy network. Our simulations demonstrate that CORL effectively harnesses local observations for enriched cooperation, leading to pronounced performance improvements, particularly in challenging tasks.

PaperID: 19,

Authors: Dai Shi, Zhiqi Shao, Junbin Gao, Zhiyong Wang, Yi Guo

Affiliations: School of Computer, Data and Mathematical Sciences, Western Sydney University, Parramatta, NSW, Australia; Discipline of Business Analytics, The University of Sydney Business School, The University of Sydney, Camperdown, NSW, Australia; School of Computer Science, The University of Sydney, Sydney, NSW, Australia

Title: Frameless Graph Knowledge Distillation

Abstract:
Knowledge distillation (KD) has shown great potential for transferring knowledge from a complex teacher model to a simple student model in which the heavy learning task can be accomplished efficiently and without losing too much prediction accuracy. Recently, many attempts have been made by applying the KD mechanism to graph representation learning models such as graph neural networks (GNNs) to accelerate the model’s inference speed via student models. However, many existing KD-based GNNs utilize multilayer perceptron (MLP) as a universal approximator in the student model to imitate the teacher model’s process without considering the graph knowledge from the teacher model. In this work, we provide a KD-based framework on multiscaled GNNs, known as graph framelet, and prove that by adequately utilizing the graph knowledge in a multiscaled manner provided by graph framelet decomposition, the student model is capable of adapting both homophilic and heterophilic graphs and has the potential of alleviating the oversquashing issue with a simple yet effective graph surgery. Furthermore, we show how the graph knowledge supplied by the teacher is learned and digested by the student model via both algebra and geometry. Comprehensive experiments show that our proposed model can generate learning accuracy identical to or even surpass the teacher model while maintaining the high speed of inference.

PaperID: 20,

Authors: Chao Wang, Jiaxuan Zhao, Lingling Li, Licheng Jiao, Fang Liu, Shuyuan Yang

Affiliations: Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Xidian University, Xi’an, China

Title: Automatic Graph Topology-Aware Transformer

Abstract:
Existing efforts are dedicated to designing many topologies and graph-aware strategies for the graph Transformer, which greatly improve the model’s representation capabilities. However, manually determining the suitable Transformer architecture for a specific graph dataset or task requires extensive expert knowledge and laborious trials. This article proposes an evolutionary graph Transformer architecture search (EGTAS) framework to automate the construction of strong graph Transformers. We build a comprehensive graph Transformer search space with the micro-level and macro-level designs. EGTAS evolves graph Transformer topologies at the macro level and graph-aware strategies at the micro level. Furthermore, a surrogate model based on generic architectural coding is proposed to directly predict the performance of graph Transformers, substantially reducing the evaluation cost of evolutionary search. We demonstrate the efficacy of EGTAS across a range of graph-level and node-level tasks, encompassing both small-scale and large-scale graph datasets. Experimental results and ablation studies show that EGTAS can construct high-performance architectures that rival state-of-the-art manual and automated baselines.

PaperID: 21,

Authors: Qin Zhao, Peihan Wu, Gang Liu, Dongdong An, Jie Lian, MengChu Zhou

Affiliations: Shanghai Engineering Research Center of Intelligent Education and Big Data, Shanghai Normal University, Shanghai, China; School of Information and Electronic Engineering, Zhejiang Gongshang University, Hangzhou, China

Title: Sociological-Theory-Based Multitopic Self-Supervised Recommendation

Abstract:
Social relationships offer crucial supplementary information for recommendations by leveraging users’ social connections to gain insights into their preferences. However, prevalent social recommendation methods often grapple with the issues of sparsity and noise, which curtail their effectiveness. In addition, these methods overlook the intricacies of user interactions within social networks, which could provide invaluable information. Addressing their deficiencies, this article introduces a novel sociological-theory-based multitopic self-supervised recommendation method (SMSR). This method integrates user attitude information into the construction of social relationships and utilizes dynamic routing to identify and categorize topics, thereby mitigating the impact of social noise on recommendation accuracy. Furthermore, we reveal sophisticated higher order user relations within these topics by using motifs. By combining the light graph convolutional network with balance theory, SMSR efficiently aggregates information from diverse social relations to gain its outstanding performance. Moreover, we have devised and integrated four self-supervised signals, inspired by social theory and derived from heterogeneous graph analysis, to more effectively exploit the rich structural and semantic information inherent in social relationship graphs. Empirical results from extensive experiments on publicly available datasets underscore SMSR’s superiority over the state of the art.

PaperID: 22,

Authors: Naoyuki Terashita, Hiroki Ohashi, Satoshi Hara

Affiliations: Hitachi Ltd., Tokyo, Japan; Graduate School of Informatics and Engineering, University of Electro-Communications, Tokyo, Japan

Title: Data Cleansing for GANs

Abstract:
As the application of generative adversarial networks (GANs) expands, it becomes increasingly critical to develop a unified approach that improves performance across various generative tasks. One effective strategy that applies to any machine learning task is identifying harmful instances, whose removal improves the performance. While previous studies have successfully estimated these harmful training instances in supervised settings, their approaches are not easily applicable to GANs. The challenge lies in two requirements of the previous approaches that do not apply to GANs. First, previous approaches require that the absence of a training instance directly affects the parameters. However, in the training for GANs, the instances do not directly affect the generator’s parameters since they are only fed into the discriminator. Second, previous approaches assume that the change in loss directly quantifies the harmfulness of the instance to a model’s performance, while common types of GAN losses do not always reflect the generative performance. To overcome the first challenge, we propose influence estimation methods that use the Jacobian of the generator’s gradient with respect to the discriminator’s parameters (and vice versa). Such a Jacobian represents the indirect effect between two models: how removing an instance from the discriminator’s training changes the generator’s parameters. Second, we propose an instance evaluation scheme that measures the harmfulness of each training instance based on how a GAN evaluation metric [e.g., inception score (IS)] is expected to change by the instance’s removal. Furthermore, we demonstrate that removing the identified harmful instances significantly improves the generative performance on various GAN evaluation metrics. The code is available at https://github.com/hitachi-rd-cv/data-cleansing-for-gans.

PaperID: 23,

Authors: Zhongyi Que, Chih-Jen Lin

Affiliations: Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan

Title: One-Class SVM Probabilistic Outputs

Abstract:
One-class support vector machine (SVM) is an extension of SVM to handle unlabeled data. As a mature technique for outlier detection, one-class SVM has been widely used in many applications. However, similar to standard two-class SVM, the design of one-class SVM does not give probabilistic outputs. For two-class SVM, some methods have been proposed to effectively obtain probabilistic outputs, but due to the difficulty of no-label information, less attention has been paid to one-class SVM. Our aim in this work is to propose some practically viable techniques to generate probabilistic outputs for one-class SVM. We investigate existing methods for two-class SVM and explain why they may not be suitable for one-class SVM. Due to the lack of label information, we think a feasible setting is to have probabilities mimic to the decision values of training data. Based on this principle, we propose several new methods. Detailed experiments on both artificial and real-world data demonstrate the effectiveness of the proposed methods.

PaperID: 24,

Authors: Luca Cosmo, Giorgia Minello, Alessandro Bicciato, Michael M. Bronstein, Emanuele Rodolà, Luca Rossi, Andrea Torsello

Affiliations: Dipartimento di Scienze Ambientali, Informatica e Statistica, Ca’ Foscari University of Venice, Venice, Italy; Department of Computer Science, University of Oxford, Oxford, U.K.; Dipartimento di Informatica, Sapienza University of Rome, Rome, Italy; Department of Electrical and Electronic Engineering, The Hong Kong Polytechnic University, Hung Hom, Hong Kong

Title: Graph Kernel Neural Networks

Abstract:
The convolution operator at the core of many modern neural architectures can effectively be seen as performing a dot product between an input matrix and a filter. While this is readily applicable to data such as images, which can be represented as regular grids in the Euclidean space, extending the convolution operator to work on graphs proves more challenging, due to their irregular structure. In this article, we propose to use graph kernels, i.e., kernel functions that compute an inner product on graphs, to extend the standard convolution operator to the graph domain. This allows us to define an entirely structural model that does not require computing the embedding of the input graph. Our architecture allows to plug-in any type of graph kernels and has the added benefit of providing some interpretability in terms of the structural masks that are learned during the training process, similar to what happens for convolutional masks in traditional convolutional neural networks (CNNs). We perform an extensive ablation study to investigate the model hyperparameters’ impact and show that our model achieves competitive performance on standard graph classification and regression datasets.

PaperID: 25,

Authors: Zebiao Hu, Jian Wang, Kai Zhang, Witold Pedrycz, Nikhil R. Pal

Affiliations: College of Control Science and Engineering, China University of Petroleum (East China), Qingdao, China; College of Science, China University of Petroleum (East China), Qingdao, China; College of Petroleum Engineering, China University of Petroleum, Qingdao, China; Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada; Techno India University, Kolkata, India

Title: Bi-Level Spectral Feature Selection

Abstract:
Unsupervised feature selection (UFS) aims to learn an indicator matrix relying on some characteristics of the high-dimensional data to identify the features to be selected. However, traditional unsupervised methods perform only at the feature level, i.e., they directly select useful features by feature ranking. Such methods do not pay any attention to the interaction information with other tasks such as classification, which severely degrades their feature selection performance. In this article, we propose an UFS method which also takes into account the classification level, and selects features that perform well both in clustering and classification. To achieve this, we design a bi-level spectral feature selection (BLSFS) method, which combines classification level and feature level. More concretely, at the classification level, we first apply the spectral clustering to generate pseudolabels, and then train a linear classifier to obtain the optimal regression matrix. At the feature level, we select useful features via maintaining the intrinsic structure of data in the embedding space with the learned regression matrix from the classification level, which in turn guides classifier training. We utilize a balancing parameter to seamlessly bridge the classification and feature levels together to construct a unified framework. A series of experiments on 12 benchmark datasets are carried out to demonstrate the superiority of BLSFS in both clustering and classification performance.

PaperID: 26,

Authors: Ping He, Xiaohua Xu, Suquan Chen

Affiliations: Department of Computer Science, Yangzhou University, Yangzhou, China

Title: Robust Supervised Spline Embedding

Abstract:
High-dimensional data present significant challenges such as inadequate sample size, abundance of noise, and the curse of dimensionality, which make many traditional classification algorithms inapplicable. To provide valid inference for such data, it requires finding a noise-free low-dimensional representation that preserves both the underlying manifold structure and discriminative information. However, the existing methods often fail to take full consideration of these requirements. In this article, we introduce a robust supervised spline embedding (RS2E) algorithm for high-dimensional classification. The proposed approach is highlighted in four aspects: 1) it preserves the class-aware submanifold structure in the thin plate spline embedding space; 2) it eliminates noise and outliers to recover the clean manifold by exploiting its intrinsic low complexity; 3) it separates the class-aware submanifolds by maximizing the distance between each data point and the marginal data points of other class-aware submanifolds; and 4) it applies the alternating direction method of multipliers with generalized power iteration to solve the objective function. Promising experimental results on the real-world, generative adversarial network (GAN)-generated and artificially corrupted datasets demonstrate that RS2E outperforms other supervised dimensionality reduction algorithms in terms of classification accuracy.

PaperID: 27,

Authors: Saeed Anwar, Nick Barnes, Lars Petersson

Affiliations: Imaging and Computer Vision Group, CSIRO and the School of Computer Science and Engineering, Australian National University, Canberra, ACT, Australia; Research School of Electrical, Energy and Materials Engineering, Australian National University, Canberra, ACT, Australia

Title: Attention-Based Real Image Restoration

Abstract:
Deep convolutional neural networks perform better on images containing spatially invariant degradations, also known as synthetic degradations; however, their performance is limited on real-degraded photographs and requires multiple-stage network modeling. To advance the practicability of restoration algorithms, this article proposes a novel single-stage blind real image restoration network ( \textR^2 Net) by employing a modular architecture. We use a residual on the residual structure to ease low-frequency information flow and apply feature attention to exploit the channel dependencies. Furthermore, the evaluation in terms of quantitative metrics and visual quality for four restoration tasks, i.e., denoising, super-resolution, raindrop removal, and JPEG compression on 11 real degraded datasets against more than 30 state-of-the-art algorithms, demonstrates the superiority of our \textR^2 Net. We also present the comparison on three synthetically generated degraded datasets for denoising to showcase our method’s capability on synthetics denoising. The codes, trained models, and results are available on https://github.com/saeed-anwar/R2Net.

PaperID: 28,

Authors: Ben Yang, Xuetao Zhang, Jinghan Wu, Feiping Nie, Zhiping Lin, Fei Wang, Badong Chen

Affiliations: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, the National Engineering Research Center for Visual Information and Applications, and the Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, China; School of Computer Science and the School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi’an, China; School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang, Singapore

Title: Fast Multiview Anchor-Graph Clustering

Abstract:
Due to its high computational complexity, graph-based methods have limited applicability in large-scale multiview clustering tasks. To address this issue, many accelerated algorithms, especially anchor graph-based methods and indicator learning-based methods, have been developed and made a great success. Nevertheless, since the restrictions of the optimization strategy, these accelerated methods still need to approximate the discrete graph-cutting problem to a continuous spectral embedding problem and utilize different discretization strategies to obtain discrete sample categories. To avoid the loss of effectiveness and efficiency caused by the approximation and discretization, we establish a discrete fast multiview anchor graph clustering (FMAGC) model that first constructs an anchor graph of each view and then generates a discrete cluster indicator matrix by solving the discrete multiview graph-cutting problem directly. Since the gradient descent-based method makes it hard to solve this discrete model, we propose a fast coordinate descent-based optimization strategy with linear complexity to solve it without approximating it as a continuous one. Extensive experiments on widely used normal and large-scale multiview datasets show that FMAGC can improve clustering effectiveness and efficiency compared to other state-of-the-art baselines.

PaperID: 29,

Authors: Ren Togo, Nao Nakagawa, Takahiro Ogawa, Miki Haseyama

Affiliations: Faculty of Information Science and Technology, Hokkaido University, Sapporo, Japan; Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan

Title: ConcVAE: Conceptual Representation Learning

Abstract:
Disentangled representation learning aims at obtaining an independent latent representation without supervisory signals. However, the independence of a representation does not guarantee interpretability to match human intuition in the unsupervised settings. In this article, we introduce conceptual representation learning, an unsupervised strategy to learn a representation and its concepts. An antonym pair forms a concept, which determines the semantically meaningful axes in the latent space. Since the connection between signifying words and signified notions is arbitrary in natural languages, the verbalization of data features makes the representation make sense to humans. We thus construct Conceptual VAE (ConcVAE), a variational autoencoder (VAE)-based generative model with an explicit process in which the semantic representation of data is generated via trainable concepts. In visual data, ConcVAE utilizes natural language arbitrariness as an inductive bias of unsupervised learning by using a vision-language pretraining, which can tell an unsupervised model what makes sense to humans. Qualitative and quantitative evaluations show that the conceptual inductive bias in ConcVAE effectively disentangles the latent representation in a sense-making manner without supervision. Code is available at https://github.com/ganmodokix/concvae.

PaperID: 30,

Authors: Xiangyu Li, Hunter Belcher, Hua Wang

Affiliations: Department of Computer Science, Colorado School of Mines, Golden, CO, USA

Title: Robust Spherical Laplacian Embedding

Abstract:
Laplacian embedding (LE) aims to project high-dimensional input data samples, which often contain nonlinear structures, into a low-dimensional space. However, existing distance functions used in the embedding space fail to provide discriminative representations for real-world datasets, especially those related to text analysis or image processing. Cosine similarity measurements are advantageous in dealing with sparse data but are fragile to the impact of outliers and noise from samples or data features. In this work, we propose robust spherical LE (RS-LE), a powerful method that builds the LE on a novel metric that unifies the Euclidean distance and cosine similarity in a spherical space using a robust \ell _p,p -norm. We introduce an efficient iterative algorithm, the proximal alternating linearized minimization (ALM) method, to solve the difficult optimization problem that arises from the new metric, which is neither convex nor smooth. This algorithm allows the solution to achieve both global and objective convergence. Our findings are presented in rigorous theoretical analysis and validated in experiments.

PaperID: 31,

Authors: Yanshan Xiao, Bo Liu, Zhifeng Hao

Affiliations: School of Computers, Guangdong University of Technology, Guangzhou, China; School of Automation, Guangdong University of Technology, Guangzhou, China; College of Science, Shantou University, Shantou, China

Title: Multi-Instance Nonparallel Tube Learning

Abstract:
In multi-instance nonparallel plane learning (NPL), the training set is comprised of bags of instances and the nonparallel planes are trained to classify the bags. Most of the existing multi-instance NPL methods are proposed based on a twin support vector machine (TWSVM). Similar to TWSVM, they use only a single plane to generalize the data occurrence of one class and do not sufficiently consider the boundary information, which may lead to the limitation of their classification accuracy. In this article, we propose a multi-instance nonparallel tube learning (MINTL) method. Distinguished from the existing multi-instance NPL methods, MINTL embeds the boundary information into the classifier by learning a large-margin-based \epsilon -tube for each class, such that the boundary information can be incorporated into refining the classifier and further improving the performance. Specifically, given a K -class multi-instance dataset, MINTL seeks K~\epsilon -tubes, one for each class. In multi-instance learning, each positive bag contains at least one positive instance. To build up the \epsilon _k -tube of class k , we require that each bag of class k should have at least one instance included in the \epsilon _k -tube. Moreover, except for one instance included in the \epsilon _k -tube, the remaining instances in the positive bag may include positive instances or irrelevant instances, and their labels are unavailable. A large margin constraint is presented to assign the remaining instances either inside the \epsilon _k -tube or outside the \epsilon _k -tube with a large margin. Substantial experiments on real-world datasets have shown that MINTL obtains significantly better classification accuracy than the existing multi-instance NPL methods.

PaperID: 32,

Authors: Zhangkai Wu, Longbing Cao, Lei Qi

Affiliations: Faculty of Engineering and Information Technology, University of Technology Sydney, Broadway, NSW, Australia; Data Science Lab, School of Computing, DataX Research Centre, Macquarie University, Sydney, NSW, Australia; Computer Science and Engineering, Southeast University, Nanjing, China

Title: eVAE: Evolutionary Variational Autoencoder

Abstract:
Variational autoencoders (VAEs) are challenged by the imbalance between representation inference and task fitting caused by surrogate loss. To address this issue, existing methods adjust their balance by directly tuning their coefficients. However, these methods suffer from a tradeoff uncertainty, i.e., nondynamic regulation over iterations and inflexible hyperparameters for learning tasks. Accordingly, we make the first attempt to introduce an evolutionary VAE (eVAE), building on the variational information bottleneck (VIB) theory and integrative evolutionary neural learning. eVAE integrates a variational genetic algorithm (VGA) into VAE with variational evolutionary operators, including variational mutation (V-mutation), crossover, and evolution. Its training mechanism synergistically and dynamically addresses and updates the learning tradeoff uncertainty in the evidence lower bound (ELBO) without additional constraints and hyperparameter tuning. Furthermore, eVAE presents an evolutionary paradigm to tune critical factors of VAEs and addresses the premature convergence and random search problem in integrating evolutionary optimization into deep learning. Experiments show that eVAE addresses the KL-vanishing problem for text generation with low reconstruction loss, generates all the disentangled factors with sharp images, and improves image generation quality. eVAE achieves better disentanglement, generation performance, and generation–inference balance than its competitors. Code available at: https://github.com/amasawa/eVAE.

PaperID: 33,

Authors: Kahou Tam, Li Li, Bo Han, Cheng-Zhong Xu, Huazhu Fu

Affiliations: State Key Laboratory of Internet of Things for Smart City, University of Macau, Macau, China; Department of Computer Science, Hong Kong Baptist University, Hong Kong, China; Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Fusionopolis, Singapore

Title: Federated Noisy Client Learning

Abstract:
Federated learning (FL) collaboratively trains a shared global model depending on multiple local clients, while keeping the training data decentralized to preserve data privacy. However, standard FL methods ignore the noisy client issue, which may harm the overall performance of the shared model. We first investigate the critical issue caused by noisy clients in FL and quantify the negative impact of the noisy clients in terms of the representations learned by different layers. We have the following two key observations: 1) the noisy clients can severely impact the convergence and performance of the global model in FL and 2) the noisy clients can induce greater bias in the deeper layers than the former layers of the global model. Based on the above observations, we propose federated noisy client learning (Fed-NCL), a framework that conducts robust FL with noisy clients. Specifically, Fed-NCL first identifies the noisy clients through well estimating the data quality and model divergence. Then robust layerwise aggregation is proposed to adaptively aggregate the local models of each client to deal with the data heterogeneity caused by the noisy clients. We further perform label correction on the noisy clients to improve the generalization of the global model. Experimental results on various datasets demonstrate that our algorithm boosts the performances of different state-of-the-art systems with noisy clients. Our code is available at https://github.com/TKH666/Fed-NCL.

PaperID: 34,

Authors: Gairui Bai, Wei Xi, Xiaopeng Hong, Xinhui Liu, Yang Yue, Songwen Zhao

Affiliations: Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China; School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China

Title: Robust and Rotation-Equivariant Contrastive Learning

Abstract:
Contrastive learning (CL) methods achieve great success by learning the invariant representation from various transformations. However, rotation transformations are considered harmful to CL and are rarely used, which results in failure when the objects show unseen orientations. This article proposes a representation focus shift network (RefosNet), which adds the rotation transformations to CL methods to improve the robustness of representation. First, the RefosNet constructs the rotation-equivariant mapping between the features of the original image and the rotated ones. Then, the RefosNet learns semantic-invariant representations (SIRs) based on explicitly decoupling the rotation-invariant features and the rotation-equivariant features. Moreover, an adaptive gradient passivation strategy is introduced to gradually shift the representation focus to invariant representations. This strategy can prevent catastrophic forgetting of the rotation equivariance, which is beneficial to the generalization of representations in both seen and unseen orientations. We adapt the baseline methods (i.e., “SimCLR” and “momentum contrast (MoCo) v2”) to work with RefosNet to verify the performance. Extensive experimental results show that our method achieves significant improvements on the task of recognition. On ObjectNet-13 with unseen orientations, RefosNet gains 7.12% in terms of classification accuracy compared with SimCLR. On datasets in seen orientation, the performance improves by 5.5% on ImageNet-100, 7.29% on STL10, and 1.93% on CIFAR10. In addition, RefosNet has strong generalization on Place205, PASCAL VOC, and Caltech 101. Our method has also achieved satisfactory results in image retrieval tasks.

PaperID: 35,

Authors: Yang Liu, Fang Liu, Licheng Jiao, Qianyue Bao, Shuo Li, Lingling Li, Xu Liu, Puhua Chen, Wenping Ma

Affiliations: School of Artificial Intelligence, the Key Laboratory of Intelligent Perception and Image Understanding, Ministry of Education, the International Research Center for Intelligent Perception and Computation, and the Joint International Research Laboratory of Intelligent Perception and Computation, Xidian University, Xi’an, China

Title: Chain-of-Situation Aware Progressive Inference Learning

Abstract:
The grounded situation recognition (GSR) task aims to recognize the structured semantics of an image to achieve “human-like” event understanding. Most previous studies primarily focus on the visual features of the situation, overlooking the step-by-step cognitive reasoning process that humans employ in complex task settings. Recently, the emergence of multimodal large language models (MLLMs) has provided novel directions for addressing complex problems. However, directly deploying MLLMs on the GSR task is suboptimal due to their tendency to exhibit “hallucination” issues. Additionally, fine-tuning MLLMs for the GSR task incurs high training costs. To address these challenges, inspired by human cognitive theory and the chain-of-thought (CoT) strategy, we propose the chain-of-situation progressive inference learning (CoS-PIL) framework, a lightweight approach that progressively completes verb prediction, noun prediction, and role grounding. The prediction of each step depends on the historical information of the previous step. Specifically, we first design situation prompts tailored to the GSR task and utilize MLLMs to analyze the input image and language prompts, generating heuristic response text for the current situation in the image. Instead of fine-tuning the MLLM, we activate the reasoning capabilities of the frozen MLLM and adapt its generated responses into three lightweight modules: CoS-Verb, CoS-Noun, and CoS-Ground. Considering that MLLMs may generate redundant content, we carefully design the chain-of-interest predictor (CoI-Predictor) to extract key information from the extensive response text and inject it into the model as prompts to enhance the performance. Extensive experiments on the challenging SWiG benchmark demonstrate that CoS-PIL outperforms other state-of-the-art methods. The code is publically available at https://github.com/XDLiuyyy/CoS-PIL

PaperID: 36,

Authors: Jian Chen, Wenlong Shi, Wanyu Lin, Chen Wang, Wei Liu, Hailong Sun, Gaoyang Liu

Affiliations: Department of Computer Science, China University of Geosciences, Wuhan, China; Hubei Key Laboratory of Internet of Intelligence, School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China; Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China; School of Software, Beihang University, Beijing, China; School of Computing Science, Simon Fraser University, Burnaby, BC, Canada

Title: Unlearning Attacks for Regression Learning

Abstract:
Recently, the machine unlearning has emerged as a popular method for efficiently erasing the impact of personal data in machine learning (ML) models upon the data owner’s removal request. However, few studies take into consideration the security concerns that may exist in the unlearning process. In this article, we propose the first unlearning attack dubbed unlearning attack for regression learning (UnAR) to deliberately influence the predictive behavior of the target sample against regression learning models. The central concept of UnAR revolves around misleading the regression model into erasing the information associated with the influential samples for the target sample. Observing that the influential samples for target data are generally located far away from the regression plane, we thus propose two novel methods, known as influential sample selection (ISS) and influential sample unlearning (ISU), to identify and subsequently eliminate the lineage of the influential samples. By doing so, we can substantially introduce bias into the prediction pertaining to the target sample, yielding the deliberate manipulation for the user adversely. We extensively evaluate UnAR on five public datasets, and the experimental results indicate our attacks can achieve prediction deviations over 35% by unlearning only 0.5% data as the influential samples.

PaperID: 37,

Authors: Xiaotong Li, Licheng Jiao, Lingling Li, Fang Liu, Hao Zhu, Xin Zhang, Xu Liu, Shuyuan Yang

Affiliations: Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, School of Artificial Intelligence, Xidian University, Xi’an, Shaanxi, China

Title: Complex Dual-Tree Pyramid Scattering Transformer

Abstract:
Attention-based transformer networks have recently played an increasingly important role in computer vision tasks. However, since pixel-by-pixel attention multiplication does not involve constraint assumptions such as spatial invariance, the computational complexity grows quadratically with the increase of input pixels. Therefore, this article proposes a complex pyramid scattering Transformer in dense scale space, which introduces sparse scattering constraints with a small number of wavelet basis parameters. It enhances the Transformer’s flexibility and sparsity in multiscale space and, to a certain extent, slows down the increase in computational complexity caused by multiresolution input. In addition, compared with the general single-tree real wavelet transform, the dual-tree complex scattering method improves the aliasing of the scattering attention layer and helps obtain a more robust feature representation. At the same time, the multihead stepwise pyramid scattering coupling mechanism helps increase the abundance of directional priors. We conduct experiments in image classification and video tracking scenarios and verify the reliability and superiority of our dual-tree complex pyramid scattering Transformer for visual tasks with different scale requirements. The performance is better than that of the baseline Transformer and other advanced wavelet scattering networks at the same parameter scale. The code is available at https://github.com/Dawn5786/CPSTFormer

PaperID: 38,

Authors: Chuanbin Zhang, Long Chen, Weiping Ding, Kai Zhao, Zhaoyin Shi, Yingxu Wang, Chun-Lung Philip Chen

Affiliations: School of Computer Science and Software, Zhaoqing University, Zhaoqing, China; Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, China; School of Artificial Intelligence and Computer Science, Nantong University, Nantong, China; School of Automation, Chongqing University, Chongqing, China; College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen, China; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

Title: Scale-Driven Tensor Representation-Based Multiview Clustering

Abstract:
Real-world data tends to exhibit an inherent hierarchical structure, providing a natural multiview perspective where features at different scales can be treated as distinct views. However, most existing multiview clustering algorithms primarily focus on the inter-sample relationships at a single level. These methods overlook the hierarchical structures present in the data and are specifically designed for native multiview data. This article introduces a comprehensive multiview clustering framework that transforms both typical data and images into a unified multiview feature representation. The framework allows for extracting multiscale features from the raw data and clustering different types of data with the same algorithm. A novel scale-driven pre-processing approach unifies the feature structure across various data types and explores local relationships among samples at multiple scales. Features at larger scales delineate the global cluster contours, while features at smaller scales reveal fine-grained local details. Subsequently, the proposed method learns the view-specific partitions from different scales of views and derives consensus features through tensor low-rank representation. By optimizing these consensus features, the approach effectively captures the precise cluster shapes from coarse to fine-grained levels. The final label indicator matrix is directly obtained from these consensus features. To demonstrate the effectiveness and versatility of the proposed method, we conducted experimental comparisons with state-of-the-art (SOTA) algorithms in both multiview clustering and image segmentation across diverse datasets. The source code and datasets are released at https://github.com/ChuanbinZhang/SDTR

PaperID: 39,

Authors: Jie Chen, Lingling Li, Licheng Jiao, Fang Liu, Xu Liu, Yuwei Guo, Puhua Chen, Wenping Ma

Affiliations: Key Laboratory of Intelligent Perception and Image Understanding of the Ministry of Education of China, International Research Center of Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi’an, Shaanxi, China

Title: Heterogeneous Riemannian Few-Shot Learning Network

Abstract:
How to learn and accurately distinguish new concepts from few samples, as humans do, is a long-standing concern in artificial intelligence (AI). Studies in brain science and neuroscience have shown that human brain perception is based on nonlinear manifolds, and high-dimensional manifolds can facilitate concept learning in neural circuits. Based on this inspiration, in this paper, we propose a heterogeneous Riemannian few-shot learning network (HRFL-Net), which is the first few-shot learning method to perform end-to-end deep learning on heterogeneous Riemannian manifolds. Specifically, to enhance the geometric invariance of the image representation, the image features are projected into three heterogeneous Riemannian manifold spaces. Then, the implicit Riemannian kernel function maps the manifolds to the separable high-dimensional reproducing Hilbert space. It is assumed that the embedded kernel features of the complementary manifolds are mapped to the same common subspace. Thus, a novel neural network-based Riemannian metric learning method is designed to solve the subspace feature vectors by imposing orthogonal normalized projection, which overcomes the data extension limitation of the Riemannian metric. Finally, with the optimization objective of increasing the interclass distance and decreasing the intraclass distance in Hilbert space, the HRFL-Net is trained with end-to-end stochastic optimization, and the optimal aggregation subspace is learned during the gradient descent process. Thus, the proposed HRFL-Net can be easily generalized to challenging nonconvex data. The evaluation of four public datasets shows that the proposed HRFL-Net has significant superiority and also achieves competitive results compared with the state-of-the-art methods.

PaperID: 40,

Authors: Yuebin Xu, C. L. Philip Chen, Mengqi Wu, Tong Zhang

Affiliations: School of Computer Science and Engineering, South China University of Technology, Guangzhou, China; Guangdong Provincial Key Laboratory of Computational AI Models and Cognitive Intelligence, South China University of Technology, Guangzhou, China

Title: SGB-Net: Scalable Graph Broad Network

Abstract:
Due to the complexity and self-evolutionary property of graph data in reality, graph learning methods require both validity to represent unstructured data and scalability to adapt to evolving graphs. However, current works have representation learning limitations on optimizable graph feature space due to the bottleneck of the structure depth. Moreover, they encounter a complete retraining process when graphs evolve, especially in the case without the assistance of new labels. To address the above issues, we propose a scalable graph broad network (SGB-Net), which contains three proposed modules: the graph feature broad transformation layer (GFBT layer) for enhancing graph embedding and two update algorithms (SGB-Net-U, SGB-Net-S) for endowing scalability. The GFBT layer aims to explicitly expand the graph feature space and broadly build the model. It constructs two expandable feature spaces in various graph scales to embed graphs discriminatively. SGB-Net-U is an exploratory method designed to tackle the label-free graph incremental learning (GIL) problem by leveraging unsupervised incremental knowledge to expand graph representation. SGB-Net-S endows scalability in classical incremental learning scenarios involving labels. Benefiting from its broad construction framework, SGB-Net not only enhances graph embeddings but also seamlessly adapts and improves performance in response to graph expansion without requiring retraining. In the experiments conducted on 15 benchmark datasets, SGB-Net outperforms state-of-the-art GNNs in terms of both effectiveness and scalability.

PaperID: 41,

Authors: Jingkai Ma, Shuang Bai

Affiliations: School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing, China

Title: Multi-View Part-Based Few-Shot Object Detection

Abstract:
Few-shot object detection (FSOD) aims to detect new object categories with a few annotated samples and has achieved great development. However, compared with general object detection, object misclassification remains a serious issue in FSOD. This is because, in scenarios with few samples, the model struggles to learn sufficient discriminative information to represent the corresponding classes. To address this issue, we propose a multi-view part-based FSOD network (MPFSOD), which extracts more discriminative information from multiple views to generate highly discriminative parts that effectively characterize objects, thereby promoting the accurate detection of new categories. Specifically, we first propose a part-based detector (PBD), where task-aware object-level parts are generated and used to enhance the discriminativeness of the object representations. Based on the PBD, we then introduce an image-level multi-view fusion module (Img-MVF) and an instance-level multi-view modulation module (Inst-MVM). These two modules extract richer discriminative information hidden in multiple views of the target, further facilitating the generation of discriminative parts in PBD. Extensive experiments on PASCAL VOC and MS COCO demonstrate that our method significantly outperforms a strong baseline (up to 11.2%) and previous state-of-the-art methods (4.3% in average).

PaperID: 42,

Authors: Huiqiang Chen, Tianqing Zhu, Wanlei Zhou, Wei Zhao

Affiliations: Faculty of Data Science, City University of Macau, Macau, China; Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, China

Title: AFed: Algorithmic Fair Federated Learning

Abstract:
Federated learning (FL) has gained significant attention as it facilitates collaborative machine learning among multiple clients without centralizing their data on a server. FL ensures the privacy of participating clients by locally storing their data, which creates new challenges in fairness. Traditional debiasing methods assume centralized access to sensitive information, rendering them impractical for the FL setting. Additionally, FL is more susceptible to fairness issues than centralized machine learning due to the diverse client data sources that may be associated with group information. Therefore, training a fair model in FL without access to client local data is important and challenging. This article presents AFed, a straightforward, yet effective framework for promoting group fairness in FL. The core idea is to circumvent restricted data access by learning the global data distribution. This article proposes two approaches: AFed-G, which uses a conditional generator trained on the server side, and AFed-GAN, which improves upon AFed-G by training a conditional GAN on the client side. We augment the client data with the generated samples to help remove bias. Our theoretical analysis justifies the proposed methods, and empirical results on multiple real-world datasets demonstrate a substantial improvement in AFed over several baselines.

PaperID: 43,

Authors: Puhong Duan, Tianci Shan, Xudong Kang, Shutao Li

Affiliations: School of Robotics and the Key Laboratory of Visual Perception and Artificial Intelligence of Hunan Province, Hunan University, Changsha, China; College of Electrical and Information Engineering and the Key Laboratory of Visual Perception and Artificial Intelligence of Hunan Province, Hunan University, Changsha, China

Title: Spectral Super-Resolution in Frequency Domain

Abstract:
Spectral super-resolution aims to reconstruct a hyperspectral image (HSI) from its corresponding RGB image, which has drawn much more attention in remote sensing field. Recent advances in the application of deep learning models for spectral super-resolution have demonstrated great potential. However, these methods only work in spectral-spatial domain while rarely explore the potential property in the frequency domain. In this work, we first attempt to address spectral super-resolution in the frequency domain. To well merge the frequency information into the super-resolution network, a spectral-spatial–frequency domain fusion network (SSFDF) is designed, which consists of three key parts: frequency-domain feature learning, spectral-spatial domain feature learning, and feature fusion module. In more detail, a frequency-domain feature learning network is first exploited to dig the frequency-domain information of the input data. Then, a symmetric convolutional neural network (CNN) is developed to acquire the spectral-spatial features of the input data, where a parameter-sharing strategy is utilized to reduce network parameters. Finally, a feature fusion module is proposed to reconstruct HSI. Comprehensive experiments on several datasets reveal that our method can attain state-of-the-art reconstruction result with respect to other spectral super-resolution techniques.

PaperID: 44,

Authors: Yanbei Liu, Yu Zhao, Zhitao Xiao, Lei Geng, Xiao Wang, Yanwei Pang, Jerry Chun-Wei Lin

Affiliations: School of Life Sciences, Tiangong University, Tianjin, China; Sinopec Catalyst (Tianjin) Company Ltd., Tianjin, China; School of Software, Beihang University, Beijing, China; School of Electrical and Information Engineering, Tianjin University, Tianjin, China; Department of Distributed Systems and IT Devices, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland

Title: Multiscale Subgraph Adversarial Contrastive Learning

Abstract:
Graph contrastive learning (GCL), as a typical self-supervised learning paradigm, has been able to achieve promising performance without labels and gradually attracts much attention. Graph-level method aims to learn representations of each graph by contrasting two augmented graphs. Previous studies usually simply apply contrastive learning to keep the embeddings of augmented views from the same anchor graph (positive pairs) close to each other, as well as separate the embeddings of augmented views from different anchor graphs (negative pairs). However, it is well-known that the structure of graph is always complex and multiscale, which gives rise to a fundamental question: after graph augmentation, will the previous assumption still hold in reality? Through experimental analytics, we find that the semantic information of two augmented graphs from the same anchor graph may be not consistent, and whether two augmented graphs are positive or negative sample pairs is highly correlated with the multiscale structure of the graph. Based on this observation, we then propose a multiscale subgraph contrastive learning method, named MSSGCL, which can characterize the fine-grained semantic information. Specifically, we generate global and local views at different scales based on subgraph sampling and construct multiple contrastive relationships according to their semantic associations to provide richer self-supervised information. Furthermore, to further improve the generalization performance of the model, we propose an extended model called MSSGCL++. It adopts an asymmetric structure to avoid pushing semantically similar negative samples far away. We further introduce adversarial training to perturb the augmented view and thus construct a more difficult self-supervised training task. Finally, a min-max saddle point problem is optimized and the “free” strategy is used to speed up the training process. Extensive experiments and parametric analysis on 16 real-world graph classification datasets confirm the effectiveness of our proposed approach. Compared with state of the art (SOTA) method, our method achieves improvements of 2% and 1.6% in unsupervised and transfer learning settings, respectively.

PaperID: 45,

Authors: Xingfu Wang, Wenxia Qi, Wenjie Yang, Wei Wang

Affiliations: CAS Key Laboratory of Space Manufacturing Technology, Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing, China

Title: Cholesky Space for Brain-Computer Interfaces

Abstract:
Brain–computer interfaces (BCIs) based on electroencephalogram (EEG) enable direct interactions between the brain and external environments, with applications in medical rehabilitation, motor substitution, gaming, and entertainment. Traditional methods that model the non-Euclidean characteristics of EEG signals demonstrate robustness and high performance, but they suffer from significant computational costs and are typically restricted to a single BCI paradigm. This article addresses these limitations by utilizing a diffeomorphism from Riemannian manifolds to the Cholesky space, which simplifies the solution process and enables application across multiple BCI paradigms. Our proposed Cholesky space-based model, CSNet, achieves state-of-the-art (SOTA) performance in motor imagery (MI) decoding and emotion recognition and demonstrates competitive performance in error-related negativity (ERN) decoding, all without the need for data preprocessing. Furthermore, our runtime comparison shows that the Cholesky space method is more efficient than the method based on the Riemannian manifold as the matrix dimension increases. To enhance the interpretability of CSNet, we perform t-distributed stochastic neighbor embedding (t-SNE) visualization for MI, frequency band energy visualization for emotion recognition, and temporal importance visualization for ERN. The results indicate that CSNet effectively learns discriminative features, identifies important frequency bands, and focuses on important temporal features. The CSNet effectively captures the non-Euclidean characteristics of EEG signals across various BCI paradigms, while mitigating high computational costs, making it a promising candidate for future BCI algorithms. The code for this study is publicly available at: https://github.com/XingfuWang/CSNet.

PaperID: 46,

Authors: Nian Wang, Zhigao Cui, Aihua Li, Yihang Lu, Rong Wang, Feiping Nie

Affiliations: Xi’an Research Institute of High-Tech, Xi’an, China; School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi’an, Shaanxi, China

Title: Structured Doubly Stochastic Graph-Based Clustering

Abstract:
Graph-based clustering is a hot topic in machine learning, whose effectiveness highly relies on the quality of the learned graph. Recent researches preferred to learn the nearest doubly stochastic approximation of a graph to suppress intercluster connections and enhance intracluster connections and thus improve clustering performance. While current paradigm is limited by three key problems: 1) it is restricted by a predefined graph; 2) the separated stages of spectral decomposition-based way (graph learning, spectral embedding learning, and cluster assignment by k-means) cause mismatched problems and randomness; and 3) the optimization of doubly stochastic conditions is generally achieved by von Neumann successive projection (VNSP) lemma, which separates the conditions to form two subproblems for alternative optimization, converging only to a feasible solution. To solve these problems, in this article, a novel structured doubly stochastic graph-based clustering model termed SDSGC is proposed, which learns a structured doubly stochastic graph from data to directly provide cluster indicators. For optimization, a simple but effective augmented Lagrangian multiplier (ALM)-based method is proposed, which optimizes all the doubly stochastic conditions simultaneously to obtain the optimal solution. Experiments on one toy dataset and eight ad hoc noised face datasets have demonstrated that the proposed SDSGC is more robust to noise. Furthermore, a quantitative comparison of ten benchmarks has verified our SDSGC achieves better clustering performance when compared with SOTA methods. The code is available at https://github.com/NianWang-HJJGCDX/SDSGC.git.

PaperID: 47,

Authors: Luca Savant Aira, Diego Valsesia, Enrico Magli

Affiliations: Department of Electronics and Telecommunications, Politecnico di Torino, Turin, Italy

Title: Modeling Uncertainty for Gaussian Splatting

Abstract:
We present stochastic Gaussian splatting (SGS): the first framework for uncertainty estimation using Gaussian splatting (GS). GS recently advanced the novel-view synthesis field by achieving impressive reconstruction quality at a fraction of the computational cost of neural radiance fields (NeRFs). However, contrary to the latter, it still lacks the ability to provide information about the confidence associated with their outputs. To address this limitation, in this brief, we introduce a variational inference (VI)-based approach that seamlessly integrates uncertainty prediction into the common rendering pipeline of GS. In addition, we introduce the area under sparsification error (AUSE) as a new term in the loss function, enabling optimization of uncertainty estimation alongside image reconstruction. Experimental results on the three different datasets demonstrate that our method outperforms existing approaches in terms of both image rendering quality and uncertainty estimation accuracy. Overall, our framework equips practitioners with valuable insights into the reliability of synthesized views, facilitating safer decision-making in real-world applications.

PaperID: 48,

Authors: Yazhou Ren, Jingyu Pu, Zhimeng Yang, Jie Xu, Guofeng Li, Xiaorong Pu, Philip S. Yu, Lifang He

Affiliations: School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China; Department of Computer Science, University of Illinois Chicago, Chicago, IL, USA; Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA

Title: Deep Clustering: A Comprehensive Survey

Abstract:
Cluster analysis plays an indispensable role in machine learning and data mining. Learning a good data representation is crucial for clustering algorithms. Recently, deep clustering (DC), which can learn clustering-friendly representations using deep neural networks (DNNs), has been broadly applied in a wide range of clustering tasks. Existing surveys for DC mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering. To address this issue, in this article, we provide a comprehensive survey for DC in views of data sources. With different data sources, we systematically distinguish the clustering methods in terms of methodology, prior knowledge, and architecture. Concretely, DC methods are introduced according to four categories, i.e., traditional single-view DC, semi-supervised DC, deep multiview clustering (MVC), and deep transfer clustering. Finally, we discuss the open challenges and potential future opportunities in different fields of DC.

PaperID: 49,

Authors: Jing Sun, Shuo Chen, Cong Zhang, Yining Ma, Jie Zhang

Affiliations: School of Computer Science and Engineering, Nanyang Technological University, Jurong West, Singapore; Beijing Institute for General Artificial Intelligence, Beijing, China

Title: Decision-Making With Speculative Opponent Models

Abstract:
Opponent modeling has proven effective in enhancing the decision-making of the controlled agent by constructing models of opponent agents. However, existing methods often rely on access to the observations and actions of opponents, a requirement that is infeasible when such information is either unobservable or challenging to obtain. To address this issue, we introduce distributional opponent-aided multiagent actor–critic (DOMAC), the first speculative opponent modeling algorithm that relies solely on local information (i.e., the controlled agent’s observations, actions, and rewards). Specifically, the actor maintains a speculated belief about the opponents using the tailored speculative opponent models that predict the opponents’ actions using only local information. Moreover, DOMAC features distributional critic models that estimate the return distribution of the actor’s policy, yielding a more fine-grained assessment of the actor’s quality. This thus more effectively guides the training of the speculative opponent models that the actor depends upon. Furthermore, we formally derive a policy gradient theorem with the proposed opponent models. Extensive experiments under eight different challenging multiagent benchmark tasks within the MPE, Pommerman, and starcraft multiagent challenge (SMAC) demonstrate that our DOMAC successfully models opponents’ behaviors and delivers superior performance against state-of-the-art (SOTA) methods with a faster convergence speed.

PaperID: 50,

Authors: Zheng Wang, Zhenwei Gao, Yang Yang, Guoqing Wang, Chengbo Jiao, Heng Tao Shen

Affiliations: College of Electronic and Information Engineering, Tongji University, Shanghai, China; School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China

Title: Geometric Matching for Cross-Modal Retrieval

Abstract:
Despite its significant progress, cross-modal retrieval still suffers from one-to-many matching cases, where the multiplicity of semantic instances in another modality could be acquired by a given query. However, existing approaches usually map heterogeneous data into the learned space as deterministic point vectors. In spite of their remarkable performance in matching the most similar instance, such deterministic point embedding suffers from the insufficient representation of rich semantics in one-to-many correspondence. To address the limitations, we intuitively extend a deterministic point into a closed geometry and develop geometric representation learning methods for cross-modal retrieval. Thus, a set of points inside such a geometry could be semantically related to many candidates, and we could effectively capture the semantic uncertainty. We then introduce two types of geometric matching for one-to-many correspondence, i.e., point-to-rectangle matching (dubbed P2RM) and rectangle-to-rectangle matching (termed R2RM). The former treats all retrieved candidates as rectangles with zero volume (equivalent to points) and the query as a box, while the latter encodes all heterogeneous data into rectangles. Therefore, we could evaluate semantic similarity among heterogeneous data by the Euclidean distance from a point to a rectangle or the volume of intersection between two rectangles. Additionally, both strategies could be easily employed for off-the-self approaches and further improve the retrieval performance of baselines. Under various evaluation metrics, extensive experiments and ablation studies on several commonly used datasets, two for image-text matching and two for video-text retrieval, demonstrate our effectiveness and superiority.

PaperID: 51,

Authors: Yongsik Jin, Sang-Moon Lee

Affiliations: Robotics and Mobility Research Section, Electronics and Telecommunications Research Institute (ETRI), Daegu, South Korea; Cyber Physical Systems and Control Laboratory, School of Electronics Engineering, Kyungpook National University, Daegu, South Korea

Title: Sampled-Data State Estimation for LSTM

Abstract:
This article first introduces a sampled-data state estimator design method for continuous-time long short-term memory (LSTM) neural networks with irregularly sampled output. To this end, the structure of the LSTM is addressed to obtain its dynamic equation. As a result, the LSTM neural network is modeled as a continuous-time linear parameter-varying system that is dependent on the gate units. For this system, the sampled-data Luenberger- and Arcak-type state estimator design methods are presented in terms of linear matrix inequalities (LMIs) by using the properties of the gate units. Lastly, the proposed method not only provides a numerical example for analyzing absolute stability but also demonstrates it in practice by applying a pre-trained behavior generation model of a robot manipulator.

PaperID: 52,

Authors: Kun Hu, Yingyuan Xiao, Wenguang Zheng, Wenxin Zhu, Ching-Hsien Hsu

Affiliations: School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China; College of Basic Science, Tianjin Agricultural University, Tianjin, China; College of Information and Electrical Engineering, Asia University, Taichung, Taiwan

Title: Multiview Large Margin Distribution Machine

Abstract:
Margin distribution has been proven to play a crucial role in improving generalization ability. In recent studies, many methods are designed using large margin distribution machine (LDM), which combines margin distribution with support vector machine (SVM), such that a better performance can be achieved. However, these methods are usually proposed based on single-view data and ignore the connection between different views. In this article, we propose a new multiview margin distribution model, called MVLDM, which constructs both multiview margin mean and variance. Besides, a framework is proposed to achieve multiview learning (MVL). MVLDM provides a new way to explore the utilization of complementary information in MVL from the perspective of margin distribution and satisfies both the consistency principle and the complementarity principle. In the theoretical analysis, we used Rademacher complexity theory to analyze the consistency error bound and generalization error bound of the MVLDM. In the experiments, we constructed a new performance metric, the view consistency rate (VCR), for the characteristics of multiview data. The effectiveness of MVLDM was evaluated using both VCR and other traditional performance metrics. The experimental results show that MVLDM is superior to other benchmark methods.

PaperID: 53,

Authors: Yang Tan, Enming Zhang, Yang Li, Shao-Lun Huang, Xiao-Ping Zhang

Affiliations: Shenzhen Key Laboratory of Ubiquitous Data Enablingm Shenzhen International Graduate School, Tsinghua University, Shenzhen, China

Title: Transferability-Guided Cross-Domain Cross-Task Transfer Learning

Abstract:
We propose two novel transferability metrics fast optimal transport-based conditional entropy (F-OTCE) and joint correspondence OTCE (JC-OTCE) to evaluate how much the source model (task) can benefit the learning of the target task and to learn more generalizable representations for cross-domain cross-task transfer learning. Unlike the original OTCE metric that requires evaluating the empirical transferability on auxiliary tasks, our metrics are auxiliary-free such that they can be computed much more efficiently. Specifically, F-OTCE estimates transferability by first solving an optimal transport (OT) problem between source and target distributions and then uses the optimal coupling to compute the negative conditional entropy (NCE) between the source and target labels. It can also serve as an objective function to enhance downstream transfer learning tasks including model finetuning and domain generalization (DG). Meanwhile, JC-OTCE improves the transferability accuracy of F-OTCE by including label distances in the OT problem, though it incurs additional computation costs. Extensive experiments demonstrate that F-OTCE and JC-OTCE outperform state-of-the-art auxiliary-free metrics by 21.1% and 25.8%, respectively, in correlation coefficient with the ground-truth transfer accuracy. By eliminating the training cost of auxiliary tasks, the two metrics reduce the total computation time of the previous method from 43 min to 9.32 and 10.78 s, respectively, for a pair of tasks. When applied in the model finetuning and DG tasks, F-OTCE shows significant improvements in the transfer accuracy in few-shot classification experiments, with up to 4.41% and 2.34% accuracy gains, respectively.

PaperID: 54,

Authors: Yu Duan, Zhoumin Lu, Rong Wang, Xuelong Li, Feiping Nie

Affiliations: School of Computer Science, the School of Artificial Intelligence, Optics and Electronics (iOPEN), and the Key Laboratory of Intelligent Interaction and Applications (Ministry of Industry and Information Technology), Northwestern Polytechnical University, Xi’an, China

Title: Toward Balance Deep Semisupervised Clustering

Abstract:
The goal of balanced clustering is partitioning data into distinct groups of equal size. Previous studies have attempted to address this problem by designing balanced regularizers or utilizing conventional clustering methods. However, these methods often rely solely on classic methods, which limits their performance and primarily focuses on low-dimensional data. Although neural networks exhibit effective performance on high-dimensional datasets, they struggle to effectively leverage prior knowledge for clustering with a balanced tendency. To overcome the above limitations, we propose deep semisupervised balanced clustering, which simultaneously learns clustering and generates balance-favorable representations. Our model is based on the autoencoder paradigm incorporating a semisupervised module. Specifically, we introduce a balance-oriented clustering loss and incorporate pairwise constraints into the penalty term as a pluggable module using the Lagrangian multiplier method. Theoretically, we ensure that the proposed model maintains a balanced orientation and provides a comprehensive optimization process. Empirically, we conducted extensive experiments on four datasets to demonstrate significant improvements in clustering performance and balanced measurements. Our code is available at https://github.com/DuannYu/BalancedSemi-TNNLS.

PaperID: 55,

Authors: Ekanut Sotthiwat, Liangli Zhen, Chi Zhang, Zengxiang Li, Rick Siow Mong Goh

Affiliations: Department of Computer Science, National University of Singapore, Queenstown, Singapore; Institute of High Performance Computing, Agency for Science Technology and Research (A*STAR), Connexis North Tower, Singapore; ENN Group, ENNEW Digital Research Institute, Beijing, China

Title: Generative Image Reconstruction From Gradients

Abstract:
In this article, we propose a method, generative image reconstruction from gradients (GIRG), for recovering training images from gradients in a federated learning (FL) setting, where privacy is preserved by sharing model weights and gradients rather than raw training data. Previous studies have shown the potential for revealing clients’ private information or even pixel-level recovery of training images from shared gradients. However, existing methods are limited to low-resolution images and small batch sizes (BSs) or require prior knowledge about the client data. GIRG utilizes a conditional generative model to reconstruct training images and their corresponding labels from the shared gradients. Unlike previous generative model-based methods, GIRG does not require prior knowledge of the training data. Furthermore, GIRG optimizes the weights of the conditional generative model to generate highly accurate “dummy” images instead of optimizing the input vectors of the generative model. Comprehensive empirical results show that GIRG is able to recover high-resolution images with large BSs and can even recover images from the aggregation of gradients from multiple participants. These results reveal the vulnerability of current FL practices and call for immediate efforts to prevent inversion attacks in gradient-sharing-based collaborative training.

PaperID: 56,

Authors: Shifei Ding, Chao Li, Xiao Xu, Lili Guo, Ling Ding, Xindong Wu

Affiliations: School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China; College of Computing and Intelligence, Tianjin University, Tianjin, China; Research Center for Knowledge Engineering, Zhejiang Laboratory, Hangzhou, China

Title: Horizontal Federated Density Peaks Clustering

Abstract:
Density peaks clustering (DPC) is a popular clustering algorithm, which has been studied and favored by many scholars because of its simplicity, fewer parameters, and no iteration. However, in previous improvements of DPC, the issue of privacy data leakage was not considered, and the “Domino” effect caused by the misallocation of noncenters has not been effectively addressed. In view of the above shortcomings, a horizontal federated DPC (HFDPC) is proposed. First, HFDPC introduces the idea of horizontal federated learning and proposes a protection mechanism for client parameter transmission. Second, DPC is improved by using similar density chain (SDC) to alleviate the “Domino” effect caused by multiple local peaks in the flow pattern dataset. Finally, a novel data dimension reduction and image encryption are used to improve the effectiveness of data partitioning. The experimental results show that compared with DPC and some of its improvements, HFDPC has a certain degree of improvement in accuracy and speed.

PaperID: 57,

Authors: Renyao Chen, Junye Lei, Hong Yao, Tailong Li, Shengwen Li

Affiliations: School of Computer Science, China University of Geosciences, Wuhan, China; School of Computer Science and the State Key Laboratory of Biogeology and Environmental Geology, China University of Geosciences, Wuhan, China; School of Future Technology, China University of Geosciences, Wuhan, China

Title: Anchor-Enhanced Geographical Entity Representation Learning

Abstract:
Geographical entity representation learning (GERL) aims to embed geographical entities into a low-dimensional vector space, which provides a generalized approach for utilizing geographical entities to serve various geographical intelligence applications. In practice, the spatial distribution of geographical entities is highly unbalanced; thus, it is challenging to embed them accurately. Previous GERL models treated all geographical entities uniformly, resulting in insufficient entity representations. To address this issue, this article proposes an anchor-enhanced GERL (AE-GERL) model, which utilizes the key informative entities as anchors to improve the representations of geographical entities. Specifically, AE-GERL develops an anchor selection algorithm to identify anchors from large-scale geographical entities based on their spatial distribution and entity types. To utilize anchors to guide geographical entities, AE-GERL constructs an anchor-enhanced graph to establish explicit connections between anchors and nonanchor entities. Finally, a graph neural network (GNN) based anchor to nonanchor node learning model is designed to impute missing information of nonanchor entities. Extensive experiments are conducted on four datasets, and the experimental results demonstrate that AE-GERL outperforms the baseline models in both sparse and dense scenarios. This study provides a methodological reference for embedding geographical entities in various geographical applications and also provides an effective approach to improve the performance of message-passing-based GNN models.

PaperID: 58,

Authors: Mohammad Askarizadeh, Alireza Morsali, Kim Khoa Nguyen

Affiliations: Department of Electrical Engineering, École de Technologie Supérieure (ÉTS), University of Quebec, Montreal, QC, Canada; Independent Researcher, Vancouver, BC, Canada

Title: Resource-Constrained Multisource Instance-Based Transfer Learning

Abstract:
In today’s machine learning (ML), the need for vast amounts of training data has become a significant challenge. Transfer learning (TL) offers a promising solution by leveraging knowledge across different domains/tasks, effectively addressing data scarcity. However, TL encounters computational and communication challenges in resource-constrained scenarios, and negative transfer (NT) can arise from specific data distributions. This article presents a novel focus on maximizing the accuracy of instance-based TL in multisource resource-constrained environments while mitigating NT, a key concern in TL. Previous studies have overlooked the impact of resource consumption in addressing the NT problem. To address these challenges, we introduce an optimization model named multisource resource-constrained optimized TL (MSOPTL), which employs a convex combination of empirical sources and target errors while considering feasibility and resource constraints. Moreover, we enhance one of the generalization error upper bounds in domain adaptation setting by demonstrating the potential to substitute the H \Delta H divergence with the Kullback–Leibler (KL) divergence. We utilize this enhanced error upper bound as one of the feasibility constraints of MSOPTL. Our suggested model can be applied as a versatile framework for various ML methods. Our approach is extensively validated in a neural network (NN)-based classification problem, demonstrating the efficiency of MSOPTL in achieving the desired trade-offs between TL’s benefits and associated costs. This advancement holds tremendous potential for enhancing edge artificial intelligence (AI) applications in resource-constrained environments.

PaperID: 59,

Authors: Chuan Ma, Yingwei Zhang, Chun-Yi Su

Affiliations: State Laboratory of Synthesis Automation of Process Industry, Northeastern University, Shenyang, China

Title: Graph-Based Multicentroid Nonnegative Matrix Factorization

Abstract:
Nonnegative matrix factorization (NMF) is a widely recognized approach for data representation. When it comes to clustering, NMF fails to handle data points located in complex geometries, as each sample cluster is represented by a centroid. In this article, a novel multicentroid-based clustering method called graph-based multicentroid NMF (MCNMF) is proposed. Because the method constructs the neighborhood connection graph between data points and centroids, each data point is represented by adjacent centroids, which preserves the local geometric structure. Second, because the method constructs an undirected connected graph with centroids as nodes, in which the centroids are divided into different centroid clusters, a novel data clustering method based on MCNMF is proposed. In addition, the membership index matrix is reconstructed based on the obtained centroid clusters, which solves the problem of membership identification of the final sample. Extensive experiments conducted on synthetic datasets and real benchmark datasets illustrate the effectiveness of the proposed MCNMF method. Compared with single-centroid-based methods, the MCNMF can obtain the best experimental results.

PaperID: 60,

Authors: Chunjiang Ge, Rui Huang, Mixue Xie, Zihang Lai, Shiji Song, Shuang Li, Gao Huang

Affiliations: Department of Automation, Tsinghua University, Beijing, China; School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China; Department of Engineering Science, University of Oxford, Oxford, U.K.

Title: Domain Adaptation via Prompt Learning

Abstract:
Unsupervised domain adaptation (UDA) aims to adapt models learned from a well-annotated source domain to a target domain, where only unlabeled samples are given. Current UDA approaches learn domain-invariant features by aligning source and target feature spaces through statistical discrepancy minimization or adversarial training. However, these constraints could lead to the distortion of semantic feature structures and loss of class discriminability. In this article, we introduce a novel prompt learning paradigm for UDA, named domain adaptation via prompt learning (DAPrompt). In contrast to prior works, our approach learns the underlying label distribution for target domain rather than aligning domains. The main idea is to embed domain information into prompts, a form of representation generated from natural language, which is then used to perform classification. This domain information is shared only by images from the same domain, thereby dynamically adapting the classifier according to each domain. By adopting this paradigm, we show that our model not only outperforms previous methods on several cross-domain benchmarks but also is very efficient to train and easy to implement.

PaperID: 61,

Authors: Zhiqiang Kou, Jing Wang, Yuheng Jia, Biao Liu, Xin Geng

Affiliations: School of Computer Science and Engineering and the Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Ministry of Education, Southeast University, Nanjing, China

Title: Instance-Dependent Inaccurate Label Distribution Learning

Abstract:
Label distribution learning (LDL) is a novel learning paradigm that assigns each instance with a label distribution. Although many specialized LDL algorithms have been proposed, few of them have noticed that the obtained label distributions are generally inaccurate with noise due to the difficulty of annotation. Besides, existing LDL algorithms overlooked that the noise in the inaccurate label distributions generally depends on instances. In this article, we identify the instance-dependent inaccurate LDL (IDI-LDL) problem and propose a novel algorithm called low-rank and sparse LDL (LRS-LDL). First, we assume that the inaccurate label distribution consists of the ground-truth label distribution and instance-dependent noise. Then, we learn a low-rank linear mapping from instances to the ground-truth label distributions and a sparse mapping from instances to the instance-dependent noise. In the theoretical analysis, we establish a generalization bound for LRS-LDL. Finally, in the experiments, we demonstrate that LRS-LDL can effectively address the IDI-LDL problem and outperform existing LDL methods.

PaperID: 62,

Authors: Mehran Mazandarani, Jianfei Pan

Affiliations: Department of Mechatronics and Control Engineering, Shenzhen University, Shenzhen, China

Title: The Q-Fractionalism Reasoning Learning Method

Abstract:
As the title suggests, in this work, a modern machine learning method called the Q-fractionalism reasoning is introduced. The proposed method is founded upon a synergy of the Q-learning and fractional fuzzy inference systems (FFISs). Unlike other approaches, the Q-fractionalism reasoning not only incorporates the knowledge base to understand how to perform but also explores a reasoning mechanism from the fractional order to justify what it has performed. This method suggests that the agent choose actions aimed at the characterization of reasoning. In fact, the agent deals with states termed as primary and secondary fuzzy states. The primary fuzzy states are unobservable and uncertain, for which the agent chooses actions. However, the projection of primary fuzzy states onto the knowledge base results in secondary fuzzy states, which are observable by the agent, allowing it to detect primary fuzzy states with degrees of detectability. With a practical experiment implemented on a linear switched reluctance motor (LSRM), the results demonstrate that the application of the Q-fractionalism reasoning in the real-time position control of the LSRM leads to a remarkable improvement of about 70% in the accuracy of the control objective compared with a typical fuzzy inference system (FIS) under the same setting.

PaperID: 63,

Authors: Yen-Lung Lai, Zhe Jin

Affiliations: State Key Laboratory of Opto-Electronic Information Acquisition and Protection Technology, Anhui Provincial Key Laboratory of Secure Artificial Intelligence, Anhui Provincial International Joint Research Center for Advanced Technology in Medical Imaging, and the School of Artificial Intelligence, Anhui University, Hefei, China

Title: Wormhole Dynamics in Deep Neural Networks

Abstract:
This work investigates the generalization behavior of deep neural networks (DNNs), focusing on the phenomenon of “fooling examples,” where DNNs confidently classify inputs that appear random or unstructured to humans. To explore this phenomenon, we introduce an analytical framework based on maximum likelihood estimation (MLE), without adhering to conventional numerical approaches that rely on gradient-based optimization and explicit labels. Our analysis reveals that DNNs operating in an overparameterized regime exhibit a collapse in the output feature space. While this collapse improves network generalization, adding more layers eventually leads to a state of degeneracy, where the model learns trivial solutions by mapping distinct inputs to the same output, resulting in zero loss. Further investigation demonstrates that this degeneracy can be bypassed using our newly derived “wormhole” solution. The wormhole solution, when applied to arbitrary fooling examples, reconciles meaningful labels with random ones and provides a novel perspective on shortcut learning. These findings offer deeper insights into DNN generalization and highlight directions for future research on learning dynamics in unsupervised settings to bridge the gap between theory and practice.

PaperID: 64,

Authors: Yaxuan Hu, Jie Hua, Zhen Han, Hua Zou, Gang Wu, Zhongyuan Wang

Affiliations: National Engineering Research Center for Multimedia Software (NERCMS), School of Computer Science, Wuhan University, Wuhan, China; College of Cyber Security, Tarim University, Alar, China

Title: DiffusionMOT: A Diffusion-Based Multiple Object Tracker

Abstract:
Recently, researchers have introduced diffusion models into multiple object tracking (MOT) tasks. However, existing diffusion-based MOT methods, such as DiffusionTrack, have significant limitations, including frequent ID switching, reduced performance when tracking nonlinear motion objects, and long inference time. To this end, we propose a more effective diffusion-based multiple object tracker named DiffusionMOT. In particular, we propose a mixed intersection over union (IoU) and Re-Identification (ReID) method for trajectory matching, which effectively reduces incorrect matches. Meanwhile, we propose a secondary calibration method for trajectory boxes, improving the accuracy of the generated detection boxes. Moreover, we introduce the parallel sampling technique from the field of image generation into object tracking and propose a parallel sampling module to enhance the model’s inference speed while maintaining tracking accuracy. Furthermore, we design a pair-based two-stage matching (PTM) pipeline to more effectively utilize potential detection information. Extensive experiments on several public MOT benchmarks, including DanceTrack, SportsMOT, MOT20, and MOT17, demonstrate that our approach achieves state-of-the-art (SOTA) performance. The code and models are available at https://github.com/sad123-yx/DiffusionMOT

PaperID: 65,

Authors: Xinyu Su, Xiwen Wang, Dezhong Peng, Xiaomin Song, Huiming Zheng, Zhong Yuan

Affiliations: College of Computer Science, Sichuan University, Chengdu, China; Sichuan National Innovation New Vision UHD Video Technology Company Ltd., Chengdu, China

Title: Identifying Outliers via Local Granular-Ball Density

Abstract:
Existing density-based outlier detection methods process data at the single-granularity level of individual samples, requiring pairwise distance calculations between all samples and exhibiting high sensitivity to noise. The single-granularity-based processing paradigm fails to mine the information at multiple levels of granularity in data, and most of these methods ignore the potential uncertainty information in data, such as fuzziness, resulting in an inability to effectively detect potential outliers in data. As a novel granular computing method, Granular-Ball Computing (GBC) is characterized by its multi-granularity and robustness, which makes it able to make up for the above drawbacks well. In this study, we propose local Granular-Ball Density-based Outlier (GBDO) detection to improve the performance of the density-based methods. In GBDO, we first identify the k\text - similarity Granular-Ball (GB) neighborhoods of each GB via the fuzzy relations among them. Subsequently, the local reachability similarity density of the GBs is calculated through the reachability similarity we defined. Finally, the local GB outlier factors of the samples are calculated based on the local reachability similarity density of the GBs. We adopt a multi-granularity processing paradigm using GBs as the basic units, which reduces computational complexity and improves robustness to noisy data by leveraging the multi-granularity nature of GBs. The experimental results demonstrate the effectiveness of GBDO by comparing it with state-of-the-art methods. The source code and datasets are publicly available at https://github.com/Mxeron/GBDO.

PaperID: 66,

Authors: Rudolf J. Szadkowski, Jan Faigl

Affiliations: Department of Computer Science, Faculty of Electrical Engineering, Czech Technical University, Prague, Czechia

Title: Lifelong Active Inference of Gait Control

Abstract:
Sustaining the robot’s longevity becomes challenging in dynamic deployments characterized by new unknown environments and embodiments outside of the prior knowledge. Hence, the knowledge of robot-environment interactions needs to be continually updated for system adaptation. It can be implemented through self-verification as a continual comparison of predictions with observations using the predictive coding (PC) principle. The principle has been further extended into the active inference control (AIC) in biomimetic robotics to drive the control, state estimation, and model update. However, continually updating one model leads to catastrophic forgetting in the long term. Therefore, we propose an autonomously expanding self-verifying world model (WM) of sensorimotor dynamics utilized in model-based gait control. The model combines PC with the incremental knowledge representation based on the internal model (IM) principle. The proposed method is experimentally validated in virtual and real scenarios, where the hexapod walking robot has to recognize and adapt to leg paralysis and then recognize the recovery. The method generates novel behaviors in real time, improving the performance and outperforming the examined state-of-the-art methods. Furthermore, the robot’s decisions and gained knowledge are interpretable and promise further functional scalability.

PaperID: 67,

Authors: Jiaping Xiao, Rangya Zhang, Yuhang Zhang, Mir Feroskhan

Affiliations: School of Mechanical and Aerospace Engineering, Nanyang Technological University, Jurong West, Singapore

Title: Vision-Based Learning for Drones: A Survey

Abstract:
Drones, as advanced cyber-physical systems (CPSs), are undergoing a transformative shift with the advent of vision-based learning, a field that is rapidly gaining prominence due to its profound impact on drone autonomy and functionality. Unlike existing task-specific surveys, this work offers a comprehensive overview of vision-based learning for drones, emphasizing its pivotal role in enhancing their operational capabilities across various scenarios. First, the fundamental principles of vision-based learning are elucidated, demonstrating how it significantly improves drones’ visual perception and decision-making processes. Vision-based control methods are then categorized into indirect, semidirect, and end-to-end approaches from the perception-control perspective. Various applications of vision-based drones with learning capabilities are further explored, ranging from single-agent systems to more complex multiagent and heterogeneous system scenarios, while highlighting the challenges and innovations characterizing each domain. Finally, open questions and potential solutions are discussed to guide future research and development in this dynamic and rapidly evolving field. With the growth of large language models (LLMs) and embodied intelligence, vision-based learning for drones provides a promising yet challenging road toward achieving artificial general intelligence (AGI) in the 3-D physical world.

PaperID: 68,

Authors: Runhao Li, Yongming Chen, Zhenyu Weng, Zhiping Lin, Yap-Peng Tan

Affiliations: School of Electrical and Electronic Engineering, Nanyang Technological University, Jurong West, Singapore; Shien-Ming Wu School of Intelligent Engineering, South China University of Technology, Guangzhou, China

Title: Class-Specific Prompt Learning for Vision-Language Models

Abstract:
The use of learning prompts to adapt pretrained vision–language models (VLMs) for downstream tasks has gained significant attention due to its potential to reduce training costs compared to model fine-tuning through few-shot learning. Most existing methods rely on a universal prompt for all classes, as it generally delivers consistent performance across various datasets. However, a universal prompt cannot capture class-specific discriminative information. To overcome this limitation, we propose class-specific prompt learning (CPL). CPL represents the context of a prompt using two components: a base vector shared among all classes and a class-specific vector designed for individual classes. This method combines the generalization ability of the base context with the adaptability of the class-specific context. Furthermore, we introduce contrastive CPL, which enhances the ability of the prompt to capture discriminative features unique to each class. Also, we adopt the self-consistency loss to regularize the base context, enhancing its generalization ability. As a result, CPL effectively learns tailored prompts for each class. Extensive experiments demonstrate that CPL achieves superior performance over existing methods in both base-class classification and new class generalization.

PaperID: 69,

Authors: Shuo Chen, Chen Gong, Jun Li, Jian Yang

Affiliations: School of Intelligence Science and Technology, Nanjing University, Nanjing, China; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China

Abstract:
Self-supervised contrastive learning (CL) seeks to learn generalizable feature representations via the self-supervision of pairwise similarities, where existing CL approaches usually build definite similarity labels (e.g., positive or negative) for model training. Yet in practice, the same pair of instances may have opposite similarity labels in different scenarios, e.g., two interclass images from CIFAR-100 can be a similar pair in CIFAR-20. Learning with definite similarities can hardly obtain an ideal representation that simultaneously characterizes the similar and dissimilar patterns (e.g., the contexts and details) between each two instances. Therefore, pairwise similarities used for CL should be agnostic, and we argue that simultaneously considering both the similarity and dissimilarity for each data pair could learn more generalizable representations. To this end, we propose similarity-agnostic CL (SACL), which generalizes the instance discrimination strategy of conventional CL to a new multiobjective programming (MOP) form. In SACL, we build multiple projection layers with corresponding regularizers to constrain the distance matrix to have different sparsity in different objectives so that we can obtain alterable pairwise distances to capture both the similarity and dissimilarity between each pair of instances. We show that SACL can be equivalently converted to a single learning objective, easily solved by stochastic optimization with convergence guarantees. Theoretically, we prove a tighter error bound than conventional CL approaches; empirically, our method improves the downstream task performance for image, text, and graph data.

PaperID: 70,

Authors: Yongcan Luo, Jiahao Zheng, Zhengjie Yang, Ning Chen, Dapeng Oliver Wu

Affiliations: Department of Computer Science, City University of Hong Kong, Hong Kong, SAR, China

Title: Pleno-Alignment Framework for Stock Trend Prediction

Abstract:
Predicting stock trends is a highly rewarding but high-risk endeavor due to the complex interplay of market dynamics, irrational behaviors, and diverse sentiments. Previous studies have used time-series analysis on historical prices or sentiment analysis on textual information. However, these methods often fail to capture the dynamic interactions between text and time-series modalities and overlook the different perspectives embedded in textual data. To address these limitations, we propose the pleno-alignment framework (PAFrame) that enhances multimodal stock information through intermodal and intramodal alignment to capture market dynamics. Our framework first integrates textual and time-series data in a shared representation space to learn modal-invariant information. To tackle divergent sentiments in textual data, we employ a contrastive learning approach to extract abstract semantic meanings from objective and subjective perspectives, thereby improving the robustness of language representations. Finally, we use a hybrid approach that explicitly combines cross-attention mechanisms to create a unified representation and utilizes prompts to implicitly guide language models with numerical financial indicators for final prediction. Our comprehensive experiments on five real-world datasets show that PAFrame outperforms existing methods in predicting stock trends.

PaperID: 71,

Authors: Xiaoye Chen, Wensheng Gan, Zefeng Chen, Jian Zhu, Ruichu Cai, Philip S. Yu

Affiliations: School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China; College of Cyber Security, Jinan University, Guangzhou, China; Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA

Title: Toward Targeted Mining of RFM Patterns

Abstract:
In today’s era of information overload, leveraging data mining techniques to understand and analyze customer behavior has become essential for businesses. Among these techniques, the recency, frequency, and monetary value analysis model serves as a powerful tool for customer segmentation, enabling companies to identify high-value customers. However, traditional recency, frequency, and monetary (RFM) models do not focus on user-specific targets, often struggling to meet the increasing demands for personalization and efficiency. To address this challenge, this article introduces the concept of target RFM patterns, which must satisfy the three dimensions of recency, frequency, and utility while aligning with user interests. Based on this concept, we formulate the problem of mining target RFM patterns. More importantly, we define a mining order, called TaRFM order, and propose an efficient algorithm called TaRFM. This new algorithm is optimized through three pruning strategies based on the TaRFM order, which not only eliminates a significant number of invalid operations, thereby reducing pattern generation, but also accurately extracts all TaRFM patterns without requiring postprocessing techniques. Finally, extensive experiments conducted on multiple datasets demonstrate the accuracy and efficiency of the TaRFM algorithm.

PaperID: 72,

Authors: Yuning Cui, Mingyu Liu, Wenqi Ren, Alois Knoll

Affiliations: School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China; School of Computation, Information and Technology, Technical University of Munich, Munich, Germany

Title: Modumer: Modulating Transformer for Image Restoration

Abstract:
Image restoration aims to recover clean images from degraded counterparts. While Transformer-based approaches have achieved significant advancements in this field, they are limited by high complexity and their inability to capture omni-range dependencies, hindering their overall performance. In this work, we develop Modumer for effective and efficient image restoration by revisiting the Transformer block and modulation design, which processes input through a convolutional block and projection layers and fuses features via elementwise multiplication. Specifically, within each unit of Modumer, we integrate the cascaded modulation design with the downsampled Transformer block to build the attention layers, enabling omni-kernel modulation and mapping inputs into high-dimensional feature spaces. Moreover, we introduce a bioinspired parameter-sharing mechanism to attention layers, which not only enhances efficiency but also improves performance. In addition, a dual-domain feed-forward network (DFFN) strengthens the representational power of the model. Extensive experimental evaluations demonstrate that the proposed Modumer achieves state-of-the-art performance across ten datasets in five single-degradation image restoration tasks, including image motion deblurring, deraining, dehazing, desnowing, and low-light enhancement. Moreover, the model exhibits strong generalization capabilities in all-in-one image restoration tasks. Additionally, it demonstrates competitive performance in composite-degradation image restoration.

PaperID: 73,

Authors: Chongsheng Zhang, George Almpanidis, Gaojuan Fan, Binquan Deng, Yanbo Zhang, Ji Liu, Aouaidjia Kamel, Paolo Soda, João Gama

Affiliations: Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China; Baidu Inc., Beijing, China; Department of Engineering, University Campus Bio-Medico di Roma, Rome, Italy; Laboratory of Artificial Intelligence and Decision Support, University of Porto, Porto, Portugal

Title: A Systematic Review on Long-Tailed Learning

Abstract:
Long-tailed data are a special type of multiclass imbalanced data with a very large amount of minority/tail classes that have a very significant combined influence. Long-tailed learning (LTL) aims to build high-performance models on datasets with long-tailed distributions that can identify all the classes with high accuracy, in particular the minority/tail classes. It is a cutting-edge research direction that has attracted a remarkable amount of research effort in the past few years. In this article, we present a comprehensive survey of the latest advances in long-tailed visual learning. We first propose a new taxonomy for LTL, which consists of eight different dimensions, including data balancing, neural architecture, feature enrichment, logits adjustment, loss function, bells and whistles, network optimization, and posthoc processing techniques. Based on our proposed taxonomy, we present a systematic review of LTL methods, discussing their commonalities and alignable differences. We also analyze the differences between imbalance learning and LTL. Finally, we discuss prospects and future directions in this field.

PaperID: 74,

Authors: Renhao Huang, Hao Xue, Maurice Pagnucco, Flora D. Salim, Yang Song

Affiliations: School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia

Title: Vision-Based Multi-Future Trajectory Prediction: A Survey

Abstract:
Vision-based trajectory prediction is an important task that supports safe and intelligent behaviors in autonomous systems. Many advanced approaches have been proposed over the years with improved spatial and temporal feature extraction. However, human behavior is naturally diverse and uncertain. Given the past trajectory and surrounding environment information, an agent can have multiple plausible trajectories in the future. To tackle this problem, an essential task named multi-future trajectory prediction (MTP) has recently been studied. This task aims to generate a diverse, acceptable, and explainable distribution of future predictions for each agent. In this article, we present the first survey for MTP with our unique taxonomies and a comprehensive analysis of frameworks, datasets, and evaluation metrics. We also compare models on existing MTP datasets and conduct experiments on the ForkingPath dataset. Finally, we discuss multiple future directions that can help researchers develop novel MTP systems and other diverse learning tasks similar to MTP.

PaperID: 75,

Authors: Yabo Wang, Bo Qi, Xin Wang, Tongliang Liu, Daoyi Dong

Affiliations: State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China; Department of Automation, Tsinghua University, Beijing, China; Sydney AI Centre, The University of Sydney, Sydney, NSW, Australia; Australian Artificial Intelligence Institute, Faculty of Engineering and IT, The University of Technology Sydney, Sydney, NSW, Australia

Title: Power Characterization of Noisy Quantum Kernels

Abstract:
Quantum kernel methods have been widely recognized as one of the promising quantum machine learning (QML) algorithms that have the potential to achieve quantum advantages. However, their capabilities may be severely degraded by inevitable noises in the current noisy intermediate-scale quantum (NISQ) era. In this article, we theoretically characterize the power of noisy quantum kernels and demonstrate that under depolarizing noise, quantum kernel methods may only have very poor prediction capability, even when the generalization error is small. Specifically, we quantitatively describe the decreasing of the prediction capability of noisy quantum kernels in terms of the rate of quantum noise, the size of training samples, the number of qubits, and the number of layers affected by quantum noises. Our results clearly demonstrate that for a given number of training samples, once the number of layers affected by noise exceeds some threshold, the prediction capability of noisy kernels is very poor. Thus, we provide a crucial warning to employ noisy quantum kernel methods for quantum computation and the theoretical results can also serve as guidelines when developing practical quantum kernel algorithms for achieving quantum advantages.

PaperID: 76,

Authors: Liang Gao, Li Li, Yingwen Chen, Shaojing Fu, Dongsheng Wang, Siwei Wang, Cheng-Zhong Xu, Ming Xu

Affiliations: Defense Innovation Institute, Academy of Military Sciences, Beijing, China; IOTSC, University of Macau, Macau, China; College of Computer, National University of Defense Technology, Changsha, Hunan, China; Intelligent Game and Decision Lab, Academy of Military Sciences, Beijing, China

Title: Noise-Robust Federated Learning via Interclient Co-Distillation

Abstract:
Federated learning (FL) is a new learning paradigm that enables multiple clients to collaboratively train a high-performance model while preserving user privacy. However, the effectiveness of FL heavily relies on the availability of accurately labeled data, which can be challenging to obtain in real-world scenarios. To address this issue and robustly train shared models using distributed noisy labeled data, we propose FedDQ, a noise-robust FL framework that utilizes co-distillation and quality-aware aggregation techniques. FedDQ incorporates two key features: a noise-adaptive training strategy and an efficient label-correcting mechanism. The noise-adaptive training strategy relies on the estimation of labels’ noise levels to dynamically adjust clients’ training engagement, which mitigates the impact of wrong labels while efficiently exploring features from clean data. In addition, FedDQ designs a two-head network and employs it for co-distillation. The co-distillation strategy facilitates knowledge transfer among clients to share the representational capabilities. Besides, FedDQ enhances label correction to rectify improper labels through co-filtering and label correction. The experimental results demonstrate the effectiveness of FedDQ in improving model performance and handling noisy data challenges in FL settings. On the CIFAR-100 dataset with noisy labels, FedDQ exhibits a notable improvement of up to 32.4% compared to the baseline method.

PaperID: 77,

Authors: Arik Reuter, Anton Thielmann, Christoph Weisser, Benjamin Säfken, Thomas Kneib

Affiliations: Institute of Mathematics, Clausthal University of Technology, Clausthal-Zellerfeld, Germany; BASF, Ludwigshafen, Germany; Chair of Statistics, Georg-August-Universität Göttingen, Göttingen, Germany

Title: Probabilistic Topic Modeling With Transformer Representations

Abstract:
The field of topic modelling was mostly dominated by Bayesian graphical models during the last decade. With the rise of transformers in natural language processing, however, several successful models that rely on straightforward clustering approaches in transformer-based embedding spaces have emerged and consolidated the notion of topics as clusters of embedding vectors. We propose the transformer-representation neural topic model (TNTM), which combines the benefits of topic representations in transformer-based embedding spaces and probabilistic modeling. Therefore, this approach unifies the powerful and versatile notion of topics based on transformer embeddings with fully probabilistic modeling, as in models such as latent Dirichlet allocation (LDA). We utilize the variational autoencoder (VAE) framework for improved inference speed and modeling flexibility. Experimental results show that our proposed model achieves results on par with various state-of-the-art approaches in terms of embedding coherence while maintaining almost perfect topic diversity. The corresponding source code is available at: https://github.com/ArikReuter/TNTM.

PaperID: 78,

Authors: Yanru Sun, Zongxia Xie, Haoyu Xing, Hualong Yu, Qinghua Hu

Affiliations: College of Intelligence and Computing, Tianjin Key Laboratory of Machine Learning, Tianjin University, Tianjin, China

Title: PPGF: Probability Pattern-Guided Time Series Forecasting

Abstract:
Time series forecasting (TSF) is an essential branch of machine learning with various applications. Most methods for TSF focus on constructing different networks to extract better information and improve performance. However, practical application data contain different internal mechanisms, resulting in a mixture of multiple patterns. That is, the model’s ability to fit different patterns is different and generates different errors. In order to solve this problem, we propose an end-to-end framework, namely probability pattern-guided time series forecasting (PPGF). PPGF reformulates the TSF problem as a forecasting task guided by probabilistic pattern classification. First, we propose the grouping strategy to approach forecasting problems as classification and alleviate the impact of data imbalance on classification. Second, we predict the corresponding class interval to guarantee the consistency of classification and forecasting. In addition, true class probability (TCP) is introduced to pay more attention to the difficult samples to improve the classification accuracy. Detailedly, PPGF classifies the different patterns to determine which one the target value may belong to and estimates it accurately in the corresponding interval. To demonstrate the effectiveness of the proposed framework, we conduct extensive experiments on real-world datasets, and PPGF achieves significant performance improvements over several baseline methods. Furthermore, the effectiveness of TCP and the necessity of consistency between classification and forecasting are proved in the experiments. All data and codes are available online: https://github.com/syrGitHub/PPGF.

PaperID: 79,

Authors: Ting Hu, Xiaotong Liu, Kai Ji, Yunwen Lei

Affiliations: School of Management, Xi’an Jiaotong University, Xi’an, China; Huawei Technologies Company Ltd., Shenzhen, China; Department of Mathematics, The University of Hong Kong, Hong Kong, China

Title: Convergence of Adaptive Stochastic Mirror Descent

Abstract:
In this article, we present a family of adaptive stochastic optimization methods, which are associated with mirror maps that are widely used to capture the geometry properties of optimization problems during iteration processes. The well-known adaptive moment estimation (Adam)-type algorithm falls into the family when the mirror maps take the form of temporal adaptation. In the context of convex objective functions, we show that with proper step sizes and hyperparameters, the average regret can achieve the convergence rate \mathcal O(T^-(1/2)) after T iterations under some standard assumptions. We further improve it to O(T^-1\log T) when the objective functions are strongly convex. In the context of smooth objective functions (not necessarily convex), based on properties of the strongly convex differentiable mirror map, our algorithms achieve convergence rates of order \mathcal O(T^-(1/2)) up to a logarithmic term, requiring large or increasing hyperparameters that are coincident with practical usage of Adam-type algorithms. Thus, our work gives explanations for the selection of the hyperparameters in Adam-type algorithms’ implementation.

PaperID: 80,

Authors: Guanxiong He, Zheng Wang, Liaoyuan Tang, Weizhong Yu, Feiping Nie, Xuelong Li

Affiliations: School of Artificial Intelligence, Optics and Electronics (iOPEN) and the Key Laboratory of Intelligent Interaction and Applications, Ministry of Industry and Information Technology, Northwestern Polytechnical University, Xian, China

Title: Reweighted-Boosting: A Gradient-Based Boosting Optimization Framework

Abstract:
Boosting is a well-established ensemble learning approach that aims to enhance overall performance by combining multiple weak learners with a linear combination structure. It operates on the principle of using new learners to compensate for the shortcomings of previous learners and is known for its ability to reduce computational resource requirements while mitigating the risks of overfitting. However, from the perspective of convex optimization, it becomes apparent that classical boosting methods often converge to local optima rather than global optima when minimizing the target loss due to its greedy strategy. In this article, we address the issue and propose a novel optimization framework for the boosting paradigm. Our framework focuses on refining the ensemble model by further minimizing loss function through the reallocation of base learner weights, which results in a more robust and powerful learner. We have conducted experiments on various real-world and synthetic datasets, and our findings confirm that our Reweighted-Boosting model consistently outperforms its counterparts. It also exhibits an increased classification margin for the data, making it a valuable enhancement to original boosting algorithms.

PaperID: 81,

Authors: Yao Zhu, Yuefeng Chen, Xiaofeng Mao, Xiu Yan, Yue Wang, Wang Lu, Jindong Wang, Xiangyang Ji

Affiliations: Department of Automation, Tsinghua University, Beijing, China; Alibaba Group, Hangzhou, China; Meituan Group, Beijing, China; Microsoft Research Asia, Beijing, China

Title: Enhancing Few-Shot CLIP With Semantic-Aware Fine-Tuning

Abstract:
Learning generalized representations from limited training samples is crucial for applying deep neural networks in low-resource scenarios. Recently, methods based on contrastive language-image pretraining (CLIP) have exhibited promising performance in few-shot adaptation tasks. To avoid catastrophic forgetting and overfitting caused by few-shot fine-tuning, existing works usually freeze the parameters of CLIP pretrained on large-scale datasets, overlooking the possibility that some parameters might not be suitable for downstream tasks. To this end, we revisit CLIP’s visual encoder with a specific focus on its distinctive attention pooling layer, which performs a spatial weighted-sum of the dense feature maps. Given that dense feature maps contain meaningful semantic information, and different semantics hold varying importance for diverse downstream tasks (such as prioritizing semantics like ears and eyes in pet classification tasks rather than side mirrors), using the same weighted-sum operation for dense features across different few-shot tasks might not be appropriate. Hence, we propose fine-tuning the parameters of the attention pooling layer during the training process to encourage the model to focus on task-specific semantics. In the inference process, we perform residual blending between the features pooled by the fine-tuned and the original attention pooling layers to incorporate both the few-shot knowledge and the pretrained CLIP’s prior knowledge. We term this method as semantic-aware fine-tuning (SAFE). SAFE is effective in enhancing the conventional few-shot CLIP and is compatible with the existing adapter approach (termed SAFE-A). Extensive experiments on 11 benchmarks demonstrate that both SAFE and SAFE-A significantly outperform the second-best method by +1.51% and +2.38% in the one-shot setting and by +0.48% and +1.37% in the four-shot setting, respectively.

PaperID: 82,

Authors: Abdul Quadir, M. Sajid, Mohammad Tanveer

Affiliations: Department of Mathematics, Indian Institute of Technology Indore, Indore, India

Title: Granular Ball Twin Support Vector Machine

Abstract:
Twin support vector machine (TSVM) is an emerging machine learning model with versatile applicability in classification and regression endeavors. Nevertheless, TSVM confronts noteworthy challenges: 1) the imperative demand for matrix inversions presents formidable obstacles to its efficiency and applicability on large-scale datasets; 2) the omission of the structural risk minimization (SRM) principle in its primal formulation heightens the vulnerability to overfitting risks; and 3) the TSVM exhibits a high susceptibility to noise and outliers and also demonstrates instability when subjected to resampling. In view of the aforementioned challenges, we propose the granular ball TSVM (GBTSVM). GBTSVM takes granular balls (GBs), rather than individual data points, as inputs to construct a classifier. These GBs, characterized by their coarser granularity, exhibit robustness to resampling and reduced susceptibility to the impact of noise and outliers. We further propose a novel large-scale GBTSVM (LS-GBTSVM). LS-GBTSVM’s optimization formulation ensures two critical facets: 1) it eliminates the need for matrix inversions, streamlining the LS-GBTSVM’s computational efficiency; and 2) it incorporates the SRM principle through the incorporation of regularization terms, effectively addressing the issue of overfitting. The proposed LS-GBTSVM exemplifies efficiency, scalability for large datasets, and robustness against noise and outliers. We conduct a comprehensive evaluation of the GBTSVM and LS-GBTSVM models on benchmark datasets from UCI and KEEL, both with and without the addition of label noise, and compared with existing baseline models. Furthermore, we extend our assessment to the large-scale NDC datasets to establish the practicality of the proposed models in such contexts. Our experimental findings and rigorous statistical analyses affirm the superior generalization prowess of the proposed GBTSVM and LS-GBTSVM models compared to the baseline models. The source code of the proposed GBTSVM and LS-GBTSVM models are available at https://github.com/mtanveer1/GBTSVM.

PaperID: 83,

Authors: Huiliang Zhang, Ping Nie, Lijun Sun, Benoit Boulet

Affiliations: Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada; School of Electronics Engineering and Computer Science, Peking University, Beijing, China; Department of Civil Engineering, McGill University, Montreal, QC, Canada

Title: Nearest Neighbor Multivariate Time Series Forecasting

Abstract:
Multivariate time series (MTS) forecasting has a wide range of applications in both industry and academia. Recently, spatial-temporal graph neural networks (STGNNs) have gained popularity as MTS forecasting methods. However, current STGNNs can only use the finite length of MTS input data due to the computational complexity. Moreover, they lack the ability to identify similar patterns throughout the entire dataset and struggle with data that exhibit sparsely and discontinuously distributed correlations among variables over an extensive historical period, resulting in only marginal improvements. In this article, we introduce a simple yet effective k-nearest neighbor MTS forecasting (kNN-MTS) framework, which forecasts with a nearest neighbor retrieval mechanism over a large datastore of cached series, using representations from the MTS model for similarity search. This approach requires no additional training and scales to give the MTS model direct access to the whole dataset at test time, resulting in a highly expressive model that consistently improves performance, and has the ability to extract sparse distributed but similar patterns span over multivariables from the entire dataset. Furthermore, a hybrid spatial-temporal encoder (HSTEncoder) is designed for kNN-MTS which can capture both long-term temporal and short-term spatial-temporal dependencies and is shown to provide accurate representation for kNN-MTS for better forecasting. Experimental results on several real-world datasets show a significant improvement in the forecasting performance of kNN-MTS. The quantitative analysis also illustrates the interpretability and efficiency of kNN-MTS, showing better application prospects and opening up a new path for efficiently using the large dataset in MTS models.

PaperID: 84,

Authors: Weiqing Yan, Shuochen Yao, Chang Tang, Wujie Zhou

Affiliations: School of Computer and Control Engineering, Yantai University, Yantai, China; School of Computer Science, China University of Geosciences, Wuhan, China; School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, Zhejiang, China

Title: Multiview Representation Learning via Information-Theoretic Optimization

Abstract:
Multiview data, characterized by rich features, are crucial in many machine learning applications. However, effectively extracting intraview features and integrating interview information present significant challenges in multiview learning (MVL). Traditional deep network-based approaches often involve learning multiple layers to derive latent. In these methods, the features of different classes are typically implicitly embedded rather than systematically organized. This lack of structure makes it challenging to explicitly map classes to independent principal subspaces in the feature space, potentially causing class overlap and confusion. Consequently, the capability of these representations to accurately capture the intrinsic structure of the data remains uncertain. In this article, we introduce an innovative multiview representation learning (MVRL) by maximizing two information-theoretic metrics: intraview coding rate reduction and interview mutual information. Specifically, in the intraview representation learning, we aim to optimize feature representations by maximizing the coding rate difference between the entire dataset and individual classes. This process expands the feature representation space while compressing the representations within each class, resulting in more compact feature representations within each viewpoint. Subsequently, we align and fuse these view-specific features through space transformation and cross-sample fusion to achieve consistent representation across multiple views. Finally, we maximize information transmission to maintain consistency and correlation among data representations across views. By maximizing mutual information between the consensus representations and view-specific representations, our method ensures that the learned representations capture more concise intrinsic features and correlations among different views, thereby enhancing the performance and generalization ability of MVL. Experiments show that the proposed methods have achieved excellent performance.

PaperID: 85,

Authors: Yang Yue, Bingyi Kang, Xiao Ma, Qisen Yang, Gao Huang, Shiji Song, Shuicheng Yan

Affiliations: Department of Automation, Tsinghua University, Beijing, China; Sea AI Laboratory, Fusionopolis Place, Singapore

Title: Decoupled Prioritized Resampling for Offline RL

Abstract:
Offline reinforcement learning (RL) is challenged by the distributional shift problem. To tackle this issue, existing works mainly focus on designing sophisticated policy constraints between the learned policy and the behavior policy. However, these constraints are applied equally to well-performing and inferior actions through uniform sampling, which might negatively affect the learned policy. In this article, we propose offline decoupled prioritized resampling (ODPR), which designs specialized priority functions for the suboptimal policy constraint issue in offline RL and employs unique decoupled resampling for training stability. Through theoretical analysis, we show that the distinctive priority functions induce a provable improved behavior policy by modifying the distribution of the original behavior policy, and when constrained to this improved policy, a policy-constrained offline RL algorithm is likely to yield a better solution. We provide two practical implementations to balance computation and performance: one estimates priorities based on a fit value network [advantage-based ODPR (ODPR-A)] and the other utilizes trajectory returns [return-based ODPR (ODPR-R)] for quick computation. As a highly compatible plug-and-play component, ODPR is evaluated with five prevalent offline RL algorithms: behavior cloning (BC), twin delayed deep deterministic policy gradient + BC (TD3 + BC), OnestepRL, conservative Q-learning (CQL), and implicit Q-learning (IQL). Our experiments confirm that both ODPR-A and ODPR-R significantly improve performance across all baseline methods. Moreover, ODPR-A can be effective in some challenging settings, i.e., without trajectory information. Code and pretrained weights are available at https://github.com/yueyang130/ODPR.

PaperID: 86,

Authors: Marco Buzzelli, Simone Bianco

Affiliations: Department of Informatics Systems and Communication, University of Milano - Bicocca, Milan, Italy

Title: A Convolutional Framework for Color Constancy

Abstract:
We introduce a convolutional framework (CF) for computational color constancy, building upon the established low-level image feature-based framework, which utilized simple image statistics for illuminant estimation. Our framework expands upon this through an end-to-end learnable neural architecture. This adaptation enables the learning and usage of advanced filters that are not restricted to Gaussian kernels operating on individual color channels, thus generalizing the capabilities of the original framework. Additionally, our general framework supports deeper convolutional architectures, thus increasing its computational power. It can also be efficiently applied to estimate multiple spatially varying illuminants within a single scene. Our experimental results on standard datasets demonstrate that the CF outperforms the best methods in the low-level framework, improving the illuminant estimation accuracy by up to 34% for single illuminant estimation and 30% for multiple illuminants estimation. Additionally, our framework exhibits superior performance even when the number of training images is reduced. Finally, we document the inference speedup of our implementation reaching up to 30× , making the CF especially suitable for applications where efficiency is critical. Source code and trained models available at: https://github.com/MarcoBauzz/convolutional-color-constancy

PaperID: 87,

Authors: Lifeng Zhang, Xiangwei Zheng, Xuanchi Chen, Lizhen Cui

Affiliations: School of Information Science and Engineering, Shandong Normal University, Jinan, China; School of Software, Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China

Title: Three-Dimensional View Relationship-Based Context-Aware Emotion Recognition

Abstract:
Context-aware emotion recognition (CAER) leverages comprehensive scene information, including facial expressions, body postures, and contextual background. However, current studies predominantly rely on facial expressions, body postures, and global contextual features; the interaction between the agents (target individuals) and other objects in the scene is usually absent or incomplete. In this article, a three-dimensional view relationship-based CAER (TDRCer) method is proposed, which comprises two branches: the personal emotional branch (PEB) and the contextual emotional branch (CEB). First, PEB is designed for the extraction of facial expression features and body posture features from the agent. A vision transformer (ViT), pretrained by contrastive learning with a novel loss function combining Euclidean distance and cosine similarity, is applied to enhance the robustness of facial expression features. Meanwhile, the human body contour images extracted by semantic segmentation are fed into another ViT to extract body posture features. Second, CEB is constructed for the extraction of global contextual features and interactive relationships among objects in the scene. The images masked by the agents’ bodies are fed into a ViT to extract global contextual features. By leveraging both the gaze angle and depth map, a three-dimensional view graph (3DVG) is constructed to represent the interactive relationships between agents and objects in the scene. Then, a graph convolutional network is employed to extract interactive relationship features from the 3DVG. Finally, the multiplicative fusion strategy is applied to fuse the features of two branches, and the fused features are utilized to classify the emotions. TDRCer achieves an accuracy of 89.90% on the CAER-S dataset and a mean average precision (mAP) of 36.02% on the EMOTIons in context (EMOTIC) dataset. The code can be accessed at https://github.com/mengTender/TDRCer.

PaperID: 88,

Authors: Seung Park, Yong-Goo Shin

Affiliations: Biomedical Engineering, Chungbuk National University Hospital, Chungbuk National University College of Medicine, Cheongju-si, Chungcheongbuk-do, South Korea; Department of Electronics and Information Engineering, Korea University, Sejong-si, South Korea

Title: Rethinking Image Skip Connections in StyleGAN2

Abstract:
Various models based on StyleGAN have gained significant traction in the field of image synthesis, attributed to their robust training stability and superior performances. Within the StyleGAN framework, the adoption of image skip connection is favored over the traditional residual connection. However, this preference is just based on empirical observations; there has not been any in-depth mathematical analysis on it yet. To rectify this situation, this brief aims to elucidate the mathematical meaning of the image skip connection and introduce a groundbreaking methodology, termed the image squeeze connection, which significantly improves the quality of image synthesis. Specifically, we analyze the image skip connection technique to reveal its problem and introduce the proposed method which not only effectively boosts the GAN performance but also reduces the required number of network parameters. Extensive experiments on various datasets demonstrate that the proposed method consistently enhances the performance of state-of-the-art models based on StyleGAN. We believe that our findings represent a vital advancement in the field of image synthesis, suggesting a novel direction for future research and applications.

PaperID: 89,

Authors: Jiatai Wang, Zhiwei Xu, Xin Wang, Tao Li

Affiliations: College of Computer Science, Nankai University, Tianjin, China; Haihe Laboratory of Information Technology Application Innovation (ITAI), Tianjin, China; Department of Electrical and Computer Engineering, Stony Brook University, Stony Brook, NY, USA

Title: Toward Generalized Multistage Clustering: Multiview Self-Distillation

Abstract:
Existing multistage clustering methods independently learn the salient features from multiple views and then perform the clustering task. Particularly, multiview clustering (MVC) has attracted a lot of attention in multiview or multimodal scenarios. MVC aims at exploring common semantics and pseudo-labels from multiple views and clustering in a self-supervised manner. However, limited by noisy data and inadequate feature learning, such a clustering paradigm generates overconfident pseudo-labels that misguide the model to produce inaccurate predictions. Therefore, it is desirable to have a method that can correct this pseudo-label mistraction in multistage clustering to avoid bias accumulation. To alleviate the effect of overconfident pseudo-labels and improve the generalization ability of the model, this article proposes a novel multistage deep MVC framework where multiview self-distillation (DistilMVC) is introduced to distill dark knowledge of label distribution. Specifically, in the feature subspace at different hierarchies, we explore the common semantics of multiple views through contrastive learning and obtain pseudo-labels by maximizing the mutual information between views. Additionally, a teacher network is responsible for distilling pseudo-labels into dark knowledge, supervising the student network and improving its predictive capabilities to enhance its robustness. Extensive experiments on real-world multiview datasets show that our method has better clustering performance than the state-of-the-art (SOTA) methods.

PaperID: 90,

Authors: Cheng Qian, Xiaoxian Lao, Chunguang Li

Affiliations: College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China

Title: Reconstruction-Based Anomaly Localization via Knowledge-Informed Self-Training

Abstract:
Anomaly localization, which involves localizing anomalous regions within images, is a significant industrial task. Reconstruction-based methods are widely adopted for anomaly localization because of their low complexity and high interpretability. Most existing reconstruction-based methods only use normal samples to construct model. If anomalous samples are appropriately utilized in the process of anomaly localization, the localization performance can be improved. However, usually only weakly labeled anomalous samples are available, which limits the improvement. In many cases, we can obtain some knowledge of anomalies summarized by domain experts. Taking advantage of such knowledge can help us better utilize the anomalous samples and thus further improve the localization performance. In this article, we propose a novel reconstruction-based method named knowledge-informed self-training (KIST) which integrates knowledge into a reconstruction model through self-training. Specifically, KIST utilizes weakly labeled anomalous samples in addition to the normal ones and exploits knowledge to yield pixel-level pseudolabels of the anomalous samples. Based on the pseudolabels, a novel loss that promotes the reconstruction of normal pixels while suppressing the reconstruction of anomalous pixels is used. We conduct experiments on different datasets and demonstrate the advantages of KIST over the existing reconstruction-based methods.

PaperID: 91,

Authors: Xiaoxie Zhu, Jinfeng Yi, Lijun Zhang

Affiliations: National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; Paradigm Inc., Beijing, China; National Key Laboratory for Novel Software Technology and the School of Artificial Intelligence, Nanjing University, Nanjing, China

Title: Continual Learning With Unknown Task Boundary

Abstract:
Most existing studies on continual learning (CL) consider the task-based setting, where task boundaries are known to learners during training. However, they may be impractical for real-world problems, where new tasks arrive with unnotified distribution shifts. In this article, we introduce a new boundary-unknown continual learning scenario called continuum incremental learning (CoIL), where the incremental unit may be a concatenation of several tasks or a subset of one task. To identify task boundaries, we design a continual out-of-distribution (OOD) detection method based on softmax probabilities, which can detect OOD samples for the latest learned task. Then, we incorporate it with continual learning approaches to solve the CoIL problem. Furthermore, we investigate the more challenging task-reappear setting and propose a method named continual learning with unknown task boundary (CLUTaB). CLUTaB first adopts in-distribution detection and OOD loss to determine whether a set of data is sampled from any learned distribution. Then, a two-step inference technique is designed to improve the continual learning performance. Experiments show that our methods work well with existing continual learning approaches and achieve good performance on CIFAR-100 and mini-ImageNet datasets.

PaperID: 92,

Authors: Zijian Ying, Qianmu Li, Zhichao Lian, Jun Hou, Tong Lin, Tao Wang

Affiliations: School of Cyber Science and Technology, Nanjing University of Science and Technology, Nanjing, China; Department of Social Science, Nanjing Vocational University of Industry Technology, Nanjing, China; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China

Title: Understanding Convolutional Neural Networks From Excitations

Abstract:
Saliency maps have proven to be a highly efficacious approach for explicating the decisions of convolutional neural networks (CNNs). However, extant methodologies predominantly rely on gradients, which constrain their ability to explicate complex models. Furthermore, such approaches are not fully adept at leveraging negative gradient information to improve interpretive veracity. In this study, we present a novel concept, termed positive and negative excitation (PANE), which enables the direct extraction of PANE for each layer, thus enabling complete layer-by-layer information utilization sans gradients. To organize these excitations into final saliency maps, we introduce a double-chain backpropagation procedure. A comprehensive experimental evaluation, encompassing both binary classification and multiclassification tasks, was conducted to gauge the effectiveness of our proposed method. Encouragingly, the results evince that our approach offers a significant improvement over the state-of-the-art methods in terms of salient pixel removal, minor pixel removal, and inconspicuous adversarial perturbation generation guidance. In addition, we verify the correlation between PANEs.

PaperID: 93,

Authors: Muhammad Anwar Ma'sum, Mahardhika Pratama, Edwin Lughofer, Lin Liu, Habibullah, Ryszard Kowalczyk

Affiliations: STEM, University of South Australia, Mawson Lakes, SA, Australia; Institute for Mathematical Methods in Medicine and Data Based Modeling, Johannes Kepler University, Linz, Austria

Title: Few-Shot Continual Learning via Flat-to-Wide Approaches

Abstract:
The existing approaches on continual learning (CL) call for a lot of samples in their training processes. Such approaches are impractical for many real-world problems having limited samples because of the overfitting problem. This article proposes a few-shot CL approach, termed flat-to-wide approach (FLOWER), where a flat-to-wide learning process finding the flat-wide minima is proposed to address the catastrophic forgetting (CF) problem. The issue of data scarcity is overcome with a data augmentation approach making use of a ball-generator concept to restrict the sampling space into the smallest enclosing ball. Our numerical studies demonstrate the advantage of FLOWER achieving significantly improved performances over prior arts notably in the small base tasks. For further study, source codes of FLOWER, competitor algorithms, and experimental logs are shared publicly in https://github.com/anwarmaxsum/FLOWER.

PaperID: 94,

Authors: Hui Lin, Xiaopeng Hong, Zhiheng Ma, Yaowei Wang, Deyu Meng

Affiliations: School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, China; Faculty of Computing, Harbin Institute of Technology, Harbin, China; Faculty of Computility Microelectronics, Shenzhen University of Advanced Technology, Shenzhen, China; Peng Cheng Laboratory, Shenzhen, China

Title: Multidimensional Measure Matching for Crowd Counting

Abstract:
This article addresses the challenge of scale variations in crowd-counting problems from a multidimensional measure-theoretic perspective. We start by formulating crowd counting as a measure-matching problem, based on the assumption that discrete measures can express the scattered ground truth and the predicted density map. In this context, we introduce the Sinkhorn counting loss and extend it to the semi-balanced form, which alleviates the problems including entropic bias, distance destruction, and amount constraints. We then model the measure matching under the multidimensional space, in order to learn the counting from both location and scale. To achieve this, we extend the traditional 2-D coordinate support to 3-D, incorporating an additional axis to represent scale information, where a pyramid-based structure will be leveraged to learn the scale value for the predicted density. Extensive experiments on four challenging crowd-counting datasets, namely, ShanghaiTech A, UCF-QNRF, JHU++, and NWPU have validated the proposed method. Code is released at https://github.com/LoraLinH/Multidimensional-Measure-Matching-for-Crowd-Counting.

PaperID: 95,

Authors: Jie Jiang, Xingjian He, Weining Wang, Hanqing Lu, Jing Liu

Affiliations: School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China

Title: Hierarchical Contrastive Learning for Semantic Segmentation

Abstract:
Recently, pixel-to-pixel contrastive learning in single-scale feature space has been widely studied in semantic segmentation to learn a unified feature expression for pixels of the same category. However, the unified representation is too extreme, and the receptive field of each single-scale pixel is limited, which is insufficient to reflect the representative features of the category. To address these problems, this article extends the single-scale feature space to that of multiscale and proposes a hierarchical contrastive learning (Hi-CL) method to explore pixel-to-component semantic relationships. First, we generate multiscale candidate samples by applying several pooling windows with different sizes on a feature map, where different windows may represent different parts of the objects in the image. Then, we prune the sample set through threshold-based criteria to select appropriate samples for feature representation learning. Finally, Hi-CL is performed to learn the pixel-to-component consistency with the pruned samples. Our method is easy to be applied on existing semantic segmentation models and obtains consistent improvement. Furthermore, we achieve state-of-the-art results on three popular benchmarks, including Cityscapes, ADE20K, and COCO Stuff datasets.

PaperID: 96,

Authors: Zhiyuan Zhang, Haoxuan Li, Chengjie Ke, Jun Chen, Xin Tian

Affiliations: Electronic Information School, Wuhan University, Wuhan, China; School of Automation, China University of Geosciences, Wuhan, China

Title: Deep Variational Network for Blind Pansharpening

Abstract:
Deep-learning-based methods play an important role in pansharpening that uses panchromatic images to enhance the spatial resolution of multispectral images while maintaining spectral features. However, most existing methods mainly consider only one fixed degradation in the training process. Therefore, their performance may drop significantly when the degradation of testing data is unknown (blind) and different from the training data, which is common in real-world applications. To address this issue, we proposed a deep variational network for blind pansharpening, named VBPN, which integrates degradation estimation and image fusion into a whole Bayesian framework. First, by taking the noise and blurring parameters of the multispectral image with the noise parameters of the panchromatic image as hidden variables, we parameterize the approximate posterior distribution for the fusion problem using neural networks. Since all parameters in this posterior distribution are explicitly modeled, the degradation parameters of the multispectral image and the panchromatic image are easily estimated. Furthermore, we designed VPBN composed of degradation estimation and image fusion subnetworks, which can optimize the fusion results guided by the variational inference according to the testing data. As a result, the blind pansharpening performance can be improved. In general, VPBN has good interpretability and generalization ability by combining the advantages of model-based and deep-learning-based approaches. Experiments on simulated and real datasets prove that VPBN can achieve state-of-the-art fusion results.

PaperID: 97,

Authors: Jingwei Xin, Nannan Wang, Xinrui Jiang, Jie Li, Xiaoyu Wang, Xinbo Gao

Affiliations: State Key Laboratory of Integrated Services Networks, School of Telecommunications Engineering, Xidian University, Xi’an, Shaanxi, China; Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China; School of Electronic Engineering, Xidian University, Xi’an, Shaanxi, China; School of Computer Science and Technology, University of Science and Technology of China, Hefei, China; Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China

Title: Rectified Binary Network for Single-Image Super-Resolution

Abstract:
Binary neural network (BNN) is an effective approach to reduce the memory usage and the computational complexity of full-precision convolutional neural networks (CNNs), which has been widely used in the field of deep learning. However, there are different properties between BNNs and real-valued models, making it difficult to draw on the experience of CNN composition to develop BNN. In this article, we study the application of binary network to the single-image super-resolution (SISR) task in which the network is trained for restoring original high-resolution (HR) images. Generally, the distribution of features in the network for SISR is more complex than those in recognition models for preserving the abundant image information, e.g., texture, color, and details. To enhance the representation ability of BNN, we explore a novel activation-rectified inference (ARI) module that achieves a more complete representation of features by combining observations from different quantitative perspectives. The activations are divided into several parts with different quantification intervals and are inferred independently. This allows the binary activations to retain more image detail and yield finer inference. In addition, we further propose an adaptive approximation estimator (AAE) for gradually learning the accurate gradient estimation interval in each layer to alleviate the optimization difficulty. Experiments conducted on several benchmarks show that our approach is able to learn a binary SISR model with superior performance over the state-of-the-art methods. The code will be released at https://github.com/jwxintt/Rectified-BSR.

PaperID: 98,

Authors: Rui Zhu, Yalong Bai, Ting Yao, Jingen Liu, Zhenglong Sun, Tao Mei, Chang Wen Chen

Affiliations: School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China; Du Xiaoman, Beijing, China; HiDream.ai, Beijing, China; Amazon, New York, NY, USA; Department of Computing, The Hong Kong Polytechnic University, Hong Kong, Hung Hom, China

Title: Teaching Masked Autoencoder With Strong Augmentations

Abstract:
Masked autoencoder (MAE) has been regarded as a capable self-supervised learner for various downstream tasks. Nevertheless, the model still lacks high-level discriminability, which results in poor linear probing performance. In view of the fact that strong augmentation plays an essential role in contrastive learning, can we capitalize on strong augmentation in MAE? The difficulty originates from the pixel uncertainty caused by strong augmentation that may affect the reconstruction, and thus, directly introducing strong augmentation into MAE often hurts the performance. In this article, we delve into the potential of strong augmented views to enhance MAE while maintaining MAE’s advantages. To this end, we propose a simple yet effective masked Siamese autoencoder (MSA) model, which consists of a student branch and a teacher branch. The student branch derives MAE’s advanced architecture, and the teacher branch treats the unmasked strong view as an exemplary teacher to impose high-level discrimination onto the student branch. We demonstrate that our MSA can improve the model’s spatial perception capability and, therefore, globally favors interimage discrimination. Empirical evidence shows that the model pretrained by MSA provides superior performances across different downstream tasks. Notably, linear probing performance on frozen features extracted from MSA leads to 6.1% gains over MAE on ImageNet-1k. Fine-tuning (FT) the network on VQAv2 task finally achieves 67.4% accuracy, outperforming 1.6% of the supervised method DeiT and 1.2% of MAE. Codes and models are available at https://github.com/KimSoybean/MSA.

PaperID: 99,

Authors: Hantao Zhou, Rui Yang, Yachao Zhang, Haoran Duan, Yawen Huang, Runze Hu, Xiu Li, Yefeng Zheng

Affiliations: Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China; Department of Computer Science, Durham University, Durham, U.K.; Tencent Jarvis Laboratory, Shenzhen, China; School of Information and Electronics, Beijing Institute of Technology, Beijing, China

Title: UniHead: Unifying Multi-Perception for Detection Heads

Abstract:
The detection head constitutes a pivotal component within object detectors, tasked with executing both classification and localization functions. Regrettably, the commonly used parallel head often lacks omni perceptual capabilities, such as deformation perception (DP), global perception (GP), and cross-task perception (CTP). Despite numerous methods attempting to enhance these abilities from a single aspect, achieving a comprehensive and unified solution remains a significant challenge. In response to this challenge, we develop an innovative detection head, termed UniHead, to unify three perceptual abilities simultaneously. More precisely, our approach: 1) introduces DP, enabling the model to adaptively sample object features; 2) proposes a dual-axial aggregation transformer (DAT) to adeptly model long-range dependencies, thereby achieving GP; and 3) devises a cross-task interaction transformer (CIT) that facilitates interaction between the classification and localization branches, thus aligning the two tasks. As a plug-and-play method, the proposed UniHead can be conveniently integrated with existing detectors. Extensive experiments on the COCO dataset demonstrate that our UniHead can bring significant improvements to many detectors. For instance, the UniHead can obtain +2.7 AP gains in RetinaNet, +2.9 AP gains in FreeAnchor, and +2.1 AP gains in GFL. The code is available at https://github.com/zht8506/UniHead.

PaperID: 100,

Authors: Yuzhong Chen, Zhenxiang Xiao, Yi Pan, Lin Zhao, Haixing Dai, Zihao Wu, Changhe Li, Tuo Zhang, Changying Li, Dajiang Zhu, Tianming Liu, Xi Jiang

Affiliations: Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China; Department of Computer Science, University of Georgia, Athens, GA, USA; School of Automation, Northwestern Polytechnical University, Xi’an, China; Department of Agricultural and Biological Engineering, University of Florida, Gainesville, FL, USA; Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX, USA

Title: Mask-Guided Vision Transformer for Few-Shot Learning

Abstract:
Learning with little data is challenging but often inevitable in various application scenarios where the labeled data are limited and costly. Recently, few-shot learning (FSL) gained increasing attention because of its generalizability of prior knowledge to new tasks that contain only a few samples. However, for data-intensive models such as vision transformer (ViT), current fine-tuning-based FSL approaches are inefficient in knowledge generalization and, thus, degenerate the downstream task performances. In this article, we propose a novel mask-guided ViT (MG-ViT) to achieve an effective and efficient FSL on the ViT model. The key idea is to apply a mask on image patches to screen out the task-irrelevant ones and to guide the ViT focusing on task-relevant and discriminative patches during FSL. Particularly, MG-ViT only introduces an additional mask operation and a residual connection, enabling the inheritance of parameters from pretrained ViT without any other cost. To optimally select representative few-shot samples, we also include an active learning-based sample selection method to further improve the generalizability of MG-ViT-based FSL. We evaluate the proposed MG-ViT on classification, object detection, and segmentation tasks using gradient-weighted class activation mapping (Grad-CAM) to generate masks. The experimental results show that the MG-ViT model significantly improves the performance and efficiency compared with general fine-tuning-based ViT and ResNet models, providing novel insights and a concrete approach toward generalizing data-intensive and large-scale deep learning models for FSL.

PaperID: 101,

Authors: Yan Zeng, Ruichu Cai, Fuchun Sun, Libo Huang, Zhifeng Hao

Affiliations: School of Mathematics and Statistics, Beijing Technology and Business University, Beijing, China; School of Computer Science, Guangdong University of Technology, Guangzhou, China; Department of Computer Science and Technology, Tsinghua University, Beijing, China; Chinese Academy of Sciences, Institute of Computing Technology, Beijing, China; College of Science, Shantou University, Shantou, China

Title: A Survey on Causal Reinforcement Learning

Abstract:
While reinforcement learning (RL) achieves tremendous success in sequential decision-making problems of many domains, it still faces key challenges of data inefficiency and the lack of interpretability. Interestingly, many researchers have leveraged insights from the causality literature recently, bringing forth flourishing works to unify the merits of causality and address well the challenges from RL. As such, it is of great necessity and significance to collate these causal RL (CRL) works, offer a review of CRL methods, and investigate the potential functionality from causality toward RL. In particular, we divide the existing CRL approaches into two categories according to whether their causality-based information is given in advance or not. We further analyze each category in terms of the formalization of different models, ranging from the Markov decision process (MDP), partially observed MDP (POMDP), multiarmed bandits (MABs), imitation learning (IL), and dynamic treatment regime (DTR). Each of them represents a distinct type of causal graphical illustration. Moreover, we summarize the evaluation matrices and open sources, while we discuss emerging applications, along with promising prospects for the future development of CRL.

PaperID: 102,

Authors: Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Yiyi Zhou, Xiaopeng Hong, Yongjian Wu, Rongrong Ji

Affiliations: Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen, China; School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China; Youtu Laboratory, Tencent, Shanghai, China

Title: Image Captioning via Dynamic Path Customization

Abstract:
This article explores a novel dynamic network for vision and language (V&L) tasks, where the inferring structure is customized on the fly for different inputs. Most previous state-of-the-art (SOTA) approaches are static and handcrafted networks, which not only heavily rely on expert knowledge but also ignore the semantic diversity of input samples, therefore resulting in suboptimal performance. To address these issues, we propose a novel Dynamic Transformer Network (DTNet) for image captioning, which dynamically assigns customized paths to different samples, leading to discriminative yet accurate captions. Specifically, to build a rich routing space and improve routing efficiency, we introduce five types of basic cells and group them into two separate routing spaces according to their operating domains, i.e., spatial and channel. Then, we design a Spatial-Channel Joint Router (SCJR), which endows the model with the capability of path customization based on both spatial and channel information of the input sample. To validate the effectiveness of our proposed DTNet, we conduct extensive experiments on the MS-COCO dataset and achieve new SOTA performance on both the Karpathy split and the online test server. The source code is publicly available at https://github.com/xmu-xiaoma666/DTNet.

PaperID: 103,

Authors: Claire M. Postlethwaite, Peter Ashwin, Matthew D. Egbert

Affiliations: Department of Mathematics, University of Auckland, Auckland, New Zealand; Department of Mathematics and Statistics, University of Exeter, Exeter, U.K.; Department of Computer Science, University of Auckland, Auckland, New Zealand

Title: A Continuous Time Dynamical Turing Machine

Abstract:
Continuous time recurrent neural networks (CTRNNs) are systems of coupled ordinary differential equations (ODEs) inspired by the structure of neural networks in the brain. CTRNNs are known to be universal dynamical approximators: given a large enough system, the parameters of a CTRNN can be tuned to produce output that is arbitrarily close to that of any other dynamical system. However, in practice, both designing systems of CTRNN to have a certain output, and the reverse—understanding the dynamics of a given system of CTRNN—can be nontrivial. In this article, we describe a method for embedding any specified Turing machine in its entirety into a CTRNN. As such, we describe in detail a continuous time dynamical system that performs arbitrary discrete-state computations. We suggest that in acting as both a continuous time dynamical system and as a computer, the study of such systems can help refine and advance the debate concerning the Computational Hypothesis that cognition is a form of computation and the Dynamical Hypothesis that cognitive systems are dynamical systems.

PaperID: 104,

Authors: Xusheng Zhao, Qiong Dai, Xu Bai, Jia Wu, Hao Peng, Huailiang Peng, Zhengtao Yu, Philip S. Yu

Affiliations: Institute of Information Engineering Chinese Academy of Sciences, Beijing, China; Department of Computing, Macquarie University, Sydney, NSW, Australia; School of Cyber Science and Technology, Beihang University, Beijing, China; Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China; Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA

Title: Reinforced GNNs for Multiple Instance Learning

Abstract:
Multiple instance learning (MIL) trains models from bags of instances, where each bag contains multiple instances, and only bag-level labels are available for supervision. The application of graph neural networks (GNNs) in capturing intrabag topology effectively improves MIL. Existing GNNs usually require filtering low-confidence edges among instances and adapting graph neural architectures to new bag structures. However, such asynchronous adjustments to structure and architecture are tedious and ignore their correlations. To tackle these issues, we propose a reinforced GNN framework for MIL (RGMIL), pioneering the exploitation of multiagent deep reinforcement learning (MADRL) in MIL tasks. MADRL enables the flexible definition or extension of factors that influence bag graphs or GNNs and provides synchronous control over them. Moreover, MADRL explores structure-to-architecture correlations while automating adjustments. Experimental results on multiple MIL datasets demonstrate that RGMIL achieves the best performance with excellent explainability. The code and data are available at https://github.com/RingBDStack/RGMIL.

PaperID: 105,

Authors: Jifeng Guo, Zhulin Liu, C. L. Philip Chen

Affiliations: School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

Title: An Incremental-Self-Training-Guided Semi-Supervised Broad Learning System

Abstract:
The broad learning system (BLS) has recently been applied in numerous fields. However, it is mainly a supervised learning system and thus not suitable for specific practical applications with a mixture of labeled and unlabeled data. Despite a manifold regularization-based semi-supervised BLS, its performance still requires improvement, because its assumption is not always applicable. Therefore, this article proposes an incremental-self-training-guided semi-supervised BLS (ISTSS-BLS). Distinctive to traditional self-training, where all unlabeled data are labeled simultaneously, incremental self-training (IST) obtains unlabeled data incrementally from an established sorted list based on the distance between the data and their cluster center. During iterative learning, a small portion of labeled data is first used to train BLS. The system recursively self-updates its structure and meta-parameters using: 1) the double-restricted mechanism and 2) the dynamic neuron-incremental mechanism. The double-restricted mechanism is beneficial to preventing the introduction of incorrect pseudo-labeled samples, and the dynamic neuron-incremental mechanism guides the self-updating of the network structure effectively based on the training accuracy of the labeled data. These strategies guarantee a parsimonious model during the update. Besides, a novel metric, the accuracy-time ratio (A/T), is proposed to evaluate the model’s performance comprehensively regarding time and accuracy. In experimental verifications, ISTSS-BLS performs outstandingly on 11 datasets. Specifically, the IST is compared with the traditional one on three scales data, saving up to 52.02% learning time. In addition, ISTSS-BLS is compared with different state-of-the-art alternatives, and all results indicate that it possesses significant advantages in performance.

PaperID: 106,

Authors: Keke Tang, Yuexin Ma, Dingruibo Miao, Peng Song, Zhaoquan Gu, Zhihong Tian, Wenping Wang

Affiliations: Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China; School of Information Science and Technology, ShanghaiTech University, Shanghai, China; State Key Laboratory of Remote Sensing Information Engineering for Surveying and Mapping, Wuhan University, Wuhan, China; Information Systems Technology and Design Pillar, Singapore University of Technology and Design, Tampines, Singapore; Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China; Department of Visualization, Texas A&M University, College Station, TX, USA

Title: Decision Fusion Networks for Image Classification

Abstract:
Convolutional neural networks, in which each layer receives features from the previous layer(s) and then aggregates/abstracts higher level features from them, are widely adopted for image classification. To avoid information loss during feature aggregation/abstraction and fully utilize lower layer features, we propose a novel decision fusion module (DFM) for making an intermediate decision based on the features in the current layer and then fuse its results with the original features before passing them to the next layers. This decision is devised to determine an auxiliary category corresponding to the category at a higher hierarchical level, which can, thus, serve as category-coherent guidance for later layers. Therefore, by stacking a collection of DFMs into a classification network, the generated decision fusion network is explicitly formulated to progressively aggregate/abstract more discriminative features guided by these decisions and then refine the decisions based on the newly generated features in a layer-by-layer manner. Comprehensive results on four benchmarks validate that the proposed DFM can bring significant improvements for various common classification networks at a minimal additional computational cost and are superior to the state-of-the-art decision fusion-based methods. In addition, we demonstrate the generalization ability of the DFM to object detection and semantic segmentation.

PaperID: 107,

Authors: Bingrong Xu, Zhigang Zeng, Cheng Lian, Zhengming Ding

Affiliations: School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China; School of Automation, Wuhan University of Technology, Wuhan, China; Department of Computer Science, Tulane University, New Orleans, LA, USA

Title: Generative Mixup Networks for Zero-Shot Learning

Abstract:
Zero-shot learning casts light on lacking unseen class data by transferring knowledge from seen classes via a joint semantic space. However, the distributions of samples from seen and unseen classes are usually imbalanced. Many zero-shot learning methods fail to obtain satisfactory results in the generalized zero-shot learning task, where seen and unseen classes are all used for the test. Also, irregular structures of some classes may result in inappropriate mapping from visual features space to semantic attribute space. A novel generative mixup networks with semantic graph alignment is proposed in this article to mitigate such problems. To be specific, our model first attempts to synthesize samples conditioned with class-level semantic information as the prototype to recover the class-based feature distribution from the given semantic description. Second, the proposed model explores a mixup mechanism to augment training samples and improve the generalization ability of the model. Third, triplet gradient matching loss is developed to guarantee the class invariance to be more continuous in the latent space, and it can help the discriminator distinguish the real and fake samples. Finally, a similarity graph is constructed from semantic attributes to capture the intrinsic correlations and guides the feature generation process. Extensive experiments conducted on several zero-shot learning benchmarks from different tasks prove that the proposed model can achieve superior performance over the state-of-the-art generalized zero-shot learning.

PaperID: 108,

Authors: Shijie Li, Yun Liu, Juergen Gall

Affiliations: Department of Information Systems and Artificial Intelligence, University of Bonn, Bonn, Germany; Eidgenössische Technische Hochschule (ETH) Zurich, Zürich, Switzerland

Title: Rethinking 3-D LiDAR Point Cloud Segmentation

Abstract:
Many point-based semantic segmentation methods have been designed for indoor scenarios, but they struggle if they are applied to point clouds that are captured by a light detection and ranging (LiDAR) sensor in an outdoor environment. In order to make these methods more efficient and robust such that they can handle LiDAR data, we introduce the general concept of reformulating 3-D point-based operations such that they can operate in the projection space. While we show by means of three point-based methods that the reformulated versions are between 300 and 400 times faster and achieve higher accuracy, we furthermore demonstrate that the concept of reformulating 3-D point-based operations allows to design new architectures that unify the benefits of point-based and image-based methods. As an example, we introduce a network that integrates reformulated 3-D point-based operations into a 2-D encoder-decoder architecture that fuses the information from different 2-D scales. We evaluate the approach on four challenging datasets for semantic LiDAR point cloud segmentation and show that leveraging reformulated 3-D point-based operations with 2-D image-based operations achieves very good results for all four datasets.

PaperID: 109,

Authors: Zhuang Shao, Jungong Han, Demetris Marnerides, Kurt Debattista

Affiliations: Warwick Manufacturing Group, University of Warwick, Coventry, U.K; Department of Computer Science, Aberystwyth University, Aberystwyth, U.K

Title: Region-Object Relation-Aware Dense Captioning via Transformer

Abstract:
Dense captioning provides detailed captions of complex visual scenes. While a number of successes have been achieved in recent years, there are still two broad limitations: 1) most existing methods adopt an encoder–decoder framework, where the contextual information is sequentially encoded using long short-term memory (LSTM). However, the forget gate mechanism of LSTM makes it vulnerable when dealing with a long sequence and 2) the vast majority of prior arts consider regions of interests (RoIs) equally important, thus failing to focus on more informative regions. The consequence is that the generated captions cannot highlight important contents of the image, which does not seem natural. To overcome these limitations, in this article, we propose a novel end-to-end transformer-based dense image captioning architecture, termed the transformer-based dense captioner (TDC). TDC learns the mapping between images and their dense captions via a transformer, prioritizing more informative regions. To this end, we present a novel unit, named region-object correlation score unit (ROCSU), to measure the importance of each region, where the relationships between detected objects and the region, alongside the confidence scores of detected objects within the region, are taken into account. Extensive experimental results and ablation studies on the standard dense-captioning datasets demonstrate the superiority of the proposed method to the state-of-the-art methods.

PaperID: 110,

Authors: Yu Xue, Xiaolong Han, Ferrante Neri, Jiafeng Qin, Danilo Pelusi

Affiliations: School of Software, Nanjing University of Information Science and Technology, Nanjing, China; Department of Computer Science, Nature Inspired Computing and Engineering Research Group, University of Surrey, Guildford, U.K.; Faculty of Communication Science, University of Teramo, Teramo, Italy

Title: A Gradient-Guided Evolutionary Neural Architecture Search

Abstract:
Neural architecture search (NAS) is a popular method that can automatically design deep neural network structures. However, designing a neural network using NAS is computationally expensive. This article proposes a gradient-guided evolutionary NAS (GENAS) to design convolutional neural networks (CNNs) for image classification. GENAS is a hybrid algorithm that combines evolutionary global and local search operators to evolve a population of subnets sampled from a supernet. Each candidate architecture is encoded as a table describing which operations are associated with the edges between nodes signifying feature maps. Besides, evolutionary optimization uses novel crossover and mutation operators to manipulate the subnets using the proposed tabular encoding. Every n generations, the candidate architectures undergo a local search inspired by differentiable NAS. GENAS is designed to overcome the limitations of both evolutionary and gradient descent NAS. This algorithmic structure enables the performance assessment of the candidate architecture without retraining, thus limiting the NAS calculation time. Furthermore, subnet individuals are decoupled during evaluation to prevent strong coupling of operations in the supernet. The experimental results indicate that the searched structures achieve test errors of 2.45%, 16.86%, and 23.9% on CIFAR-10/100/ImageNet datasets and it costs only 0.26 GPU days on a graphic card. GENAS can effectively expedite the training and evaluation processes and obtain high-performance network structures.

PaperID: 111,

Authors: Archer Moore, Heejung Shim, Jingge Zhu, Mingming Gong

Affiliations: School of Mathematics and Statistics, Faculty of Science, The University of Melbourne, Melbourne, VIC, Australia; Department of Electrical and Electronic Engineering, Faculty of Engineering and Information Technology, The University of Melbourne, Melbourne, VIC, Australia

Title: Semi-Supervised Learning Under General Causal Models

Abstract:
Semi-supervised learning (SSL) aims to train a machine learning (ML) model using both labeled and unlabeled data. While the unlabeled data have been used in various ways to improve the prediction accuracy, the reason why unlabeled data could help is not fully understood. One interesting and promising direction is to understand SSL from a causal perspective. In light of the independent causal mechanisms (ICM) principle, the unlabeled data can be helpful when the label causes the features but not vice versa. However, the causal relations between the features and labels can be complex in real world applications. In this article, we propose an SSL framework that works with general causal models in which the variables have flexible causal relations. More specifically, we explore the causal graph structures and design corresponding causal generative models which can be learned with the help of unlabeled data. The learned causal generative model can generate synthetic labeled data for training a more accurate predictive model. We verify the effectiveness of our proposed method by empirical studies on both simulated and real data.

PaperID: 112,

Authors: Xi Xiao, Hailong Ma, Guojun Gan, Qing Li, Bin Zhang, Shutao Xia

Affiliations: Shenzhen International Graduate School, Tsinghua University, Shenzhen, China; Department of Mathematics, University of Connecticut, Storrs, CT, USA; Future Network Research Institute, Southern University of Science and Technology, Shenzhen, China; Cyberspace Security Research Center, Peng Cheng Laboratory, Shenzhen, China

Title: Robust k-Means-Type Clustering for Noisy Data

Abstract:
Data clustering is a fundamental machine learning task that seeks to categorize a dataset into homogeneous groups. However, real data usually contain noise, which poses significant challenges to clustering algorithms. In this article, motivated by how the k-means algorithm is derived from a Gaussian mixture model (GMM), we propose a robust k-means-type algorithm, named k-means-type clustering based on t-distribution (KMTD), by assuming that the data points are drawn from a special multivariate t-mixture model (TMM). Compared to the Gaussian distribution, the t-distribution has a fatter tail. The proposed algorithm is more robust to noise. Like the k-means algorithm, the proposed algorithm is simpler than those based on a full TMM. Both synthetic and actual data are used to illustrate the proposed algorithm’s performance and efficiency. The experimental results demonstrated that the proposed algorithm operates more quickly than other sophisticated algorithms and, in most cases, achieves higher accuracy than the other algorithms.

PaperID: 113,

Authors: Andrea Ceni, Claudio Gallicchio

Affiliations: Department of Computer Science, University of Pisa, Pisa, Italy

Title: Edge of Stability Echo State Network

Abstract:
Echo state networks (ESNs) are time series processing models working under the echo state property (ESP) principle. The ESP is a notion of stability that imposes an asymptotic fading of the memory of the input. On the other hand, the resulting inherent architectural bias of ESNs may lead to an excessive loss of information, which in turn harms the performance in certain tasks with long short-term memory requirements. To bring together the fading memory property and the ability to retain as much memory as possible, in this article, we introduce a new ESN architecture called the Edge of Stability ESN (ES2N). The introduced ES2N model is based on defining the reservoir layer as a convex combination of a nonlinear reservoir (as in the standard ESN), and a linear reservoir that implements an orthogonal transformation. In virtue of a thorough mathematical analysis, we prove that the whole eigenspectrum of the Jacobian of the ES2N map can be contained in an annular neighborhood of a complex circle of controllable radius. This property is exploited to tune the ES2N’s dynamics close to the edge-of-chaos regime by design. Remarkably, our experimental analysis shows that ES2N model can reach the theoretical maximum short-term memory capacity (MC). At the same time, in comparison to conventional reservoir approaches, ES2N is shown to offer an excellent trade-off between memory and nonlinearity, as well as a significant improvement of performance in autoregressive nonlinear modeling and real-world time series modeling.

PaperID: 114,

Authors: Aleksandr Cariow, Galina Cariowa

Affiliations: Faculty of Computer Science and Information Technology, West Pomeranian University of Technology Szczecin, Szczecin, Poland

Title: Reduced-Complexity Algorithms for Tessarine Neural Networks

Abstract:
The brief presents the results of synthesizing efficient algorithms for implementing the basic data-processing macro operations used in tessarine-valued neural networks. These macro operations primarily include the macro operation of multiplication of two tessarines: the macro operation of calculating the inner product of two tessarine-valued vectors and the macro operation of multiple multiplications of one tessarine by the set of different tessarines. When synthesizing the discussed algorithms, we use the fact that tessarine multiplications can be interpreted as matrix–vector products. In each of these cases, the matrices have a specific block structure, which allows them to be efficiently factorized. This factorization provides a reduction in the multiplicative complexity of computing the product of two tessarines. In what follows, we use the optimized tessarine multiplication algorithm to synthesize reduced-complexity algorithms for the inner product of two tessarine-valued vectors as well as to compute multiple tessarine multiplication. Also, to further decrease the computational complexity of the inner product of two tessarine-valued vectors and multiple tessarine multiplication, we take into account that the selective sets of operations in the computations of all partial matrix–vector multiplications participating in these macro operations are the same. Accounting for this fact provides an additional reduction in computation complexity.

PaperID: 115,

Authors: Ainur Zhaikhan, Ali H. Sayed

Affiliations: Adaptive Systems Laboratory, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland

Title: Graph Exploration for Effective Multiagent Q-Learning

Abstract:
This article proposes an exploration technique for multiagent reinforcement learning (MARL) with graph-based communication among agents. We assume that the individual rewards received by the agents are independent of the actions by the other agents, while their policies are coupled. In the proposed framework, neighboring agents collaborate to estimate the uncertainty about the state–action space in order to execute more efficient explorative behavior. Different from existing works, the proposed algorithm does not require counting mechanisms and can be applied to continuous-state environments without requiring complex conversion techniques. Moreover, the proposed scheme allows agents to communicate in a fully decentralized manner with minimal information exchange. And for continuous-state scenarios, each agent needs to exchange only a single parameter vector. The performance of the algorithm is verified with theoretical results for discrete-state scenarios and with experiments for the continuous ones.

PaperID: 116,

Authors: Xinhang Wan, Jiyuan Liu, Xinbiao Gan, Xinwang Liu, Siwei Wang, Yi Wen, Tianjiao Wan, En Zhu

Affiliations: School of Computer, National University of Defense Technology, Changsha, China; College of Systems Engineering, National University of Defense Technology, Changsha, Hunan, China; Intelligent Game and Decision Laboratory, Beijing, China

Title: One-Step Multi-View Clustering With Diverse Representation

Abstract:
Multi-View clustering has attracted broad attention due to its capacity to utilize consistent and complementary information among views. Although tremendous progress has been made recently, most existing methods undergo high complexity, preventing them from being applied to large-scale tasks. Multi-View clustering via matrix factorization is a representative to address this issue. However, most of them map the data matrices into a fixed dimension, limiting the model’s expressiveness. Moreover, a range of methods suffers from a two-step process, i.e., multimodal learning and the subsequent k-means, inevitably causing a suboptimal clustering result. In light of this, we propose a one-step multi-view clustering with diverse representation (OMVCDR) method, which incorporates multi-view learning and k-means into a unified framework. Specifically, we first project original data matrices into various latent spaces to attain comprehensive information and auto-weight them in a self-supervised manner. Then, we directly use the information matrices under diverse dimensions to obtain consensus discrete clustering labels. The unified work of representation learning and clustering boosts the quality of the final results. Furthermore, we develop an efficient optimization algorithm with proven convergence to solve the resultant problem. Comprehensive experiments on various datasets demonstrate the promising clustering performance of our proposed method. The code is publicly available at https://github.com/wanxinhang/OMVCDR.

PaperID: 117,

Authors: Maria Nareklishvili, Marius Geitle

Affiliations: Department of Economics, University of Oslo, Oslo, Norway; Department of Computer Science and Communication, Ostfold University College, Halden, Norway

Title: Deep Ensemble Transformers for Dimensionality Reduction

Abstract:
We propose deep ensemble transformers (DETs), a fast, scalable approach for dimensionality reduction problems. This method leverages the power of deep neural networks and employs cascade ensemble techniques as its fundamental feature extraction tool. To handle high-dimensional data, our approach employs a flexible number of intermediate layers sequentially. These layers progressively transform the input data into decision tree predictions. To further enhance prediction performance, the output from the final intermediate layer is fed through a feed-forward neural network architecture for final prediction. We derive an upper bound of the disparity between the generalization error and the empirical error and demonstrate that it converges to zero. This highlights the generalizability of our method to parameter estimation and feature selection problems. In our experimental evaluations, DETs outperform existing models in terms of prediction accuracy, representation learning ability, and computational time. Specifically, the method achieves over 95% accuracy in gene expression data and can be trained on average 50% faster than traditional artificial neural networks (ANNs).

PaperID: 118,

Authors: An Yang, Ying Liu, Chunguang Li, Qinyuan Ren

Affiliations: College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China; College of Control Science and Engineering, Zhejiang University, Hangzhou, China

Title: Deeply Supervised Block-Wise Neural Architecture Search

Abstract:
Neural architecture search (NAS) has shown great promise in automatically designing neural network models. Recently, block-wise NAS has been proposed to alleviate deep coupling problem between architectures and weights existed in the well-known weight-sharing NAS, by training the huge weight-sharing supernet block-wisely. However, the existing block-wise NAS methods, which resort to either supervised distillation or self-supervised contrastive learning scheme to enable block-wise optimization, take massive computational cost. To be specific, the former introduces an external high-capacity teacher model, while the latter involves supernet-scale momentum model and requires a long training schedule. Considering this, in this work, we propose a resource-friendly deeply supervised block-wise NAS (DBNAS) method. In the proposed DBNAS, we construct a lightweight deeply-supervised module after each block to enable a simple supervised learning scheme and leverage ground-truth labels to indirectly supervise optimization of each block progressively. Besides, the deeply-supervised module is specifically designed as structural and functional condensation of the supernet, which establishes global awareness for progressive block-wise optimization and helps search for promising architectures. Experimental results show that the DBNAS method only takes less than 1 GPU day to search out promising architectures on the ImageNet dataset with less GPU memory footprint than the other block-wise NAS works. The best-performing model among the searched DBNAS family achieves 75.6% Top-1 accuracy on ImageNet, which is competitive with the state-of-the-art NAS models. Moreover, our DBNAS family models also achieve good transfer performance on CIFAR-10/100, as well as two downstream tasks: object detection and semantic segmentation.

PaperID: 119,

Authors: Pourya Shamsolmoali, Masoumeh Zareapoor, Swagatam Das, Eric Granger, Salvador García

Affiliations: School of Communication and Electronic Engineering, East China Normal University, Shanghai, China; Department of Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China; Electronics and Communication Sciences Unit, Indian Statistical Institute, Kolkata, India; Department of Systems Engineering, École de technologie supérieure, Université du Québec, Montreal, QC, Canada; Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain

Title: Hybrid Gromov-Wasserstein Embedding for Capsule Learning

Abstract:
Capsule networks (CapsNets) aim to parse images into a hierarchy of objects, parts, and their relationships using a two-step process involving part–whole transformation and hierarchical component routing. However, this hierarchical relationship modeling is computationally expensive, which has limited the wider use of CapsNet despite its potential advantages. The current state of CapsNet models primarily focuses on comparing their performance with capsule baselines, falling short of achieving the same level of proficiency as deep convolutional neural network (CNN) variants in intricate tasks. To address this limitation, we present an efficient approach for learning capsules that surpasses canonical baseline models and even demonstrates superior performance compared with high-performing convolution models. Our contribution can be outlined in two aspects: first, we introduce a group of subcapsules onto which an input vector is projected. Subsequently, we present the hybrid Gromov–Wasserstein (HGW) framework, which initially quantifies the dissimilarity between the input and the components modeled by the subcapsules, followed by determining their alignment degree through optimal transport (OT). This innovative mechanism capitalizes on new insights into defining alignment between the input and subcapsules, based on the similarity of their respective component distributions. This approach enhances CapsNets’ capacity to learn from intricate, high-dimensional data while retaining their interpretability and hierarchical structure. Our proposed model offers two distinct advantages: 1) its lightweight nature facilitates the application of capsules to more intricate vision tasks, including object detection; and 2) it outperforms baseline approaches in these demanding tasks. Our empirical findings illustrate that HGW capsules (HGWCapsules) exhibit enhanced robustness against affine transformations, scale effectively to larger datasets, and surpass CNN and CapsNet models across various vision tasks.

PaperID: 120,

Authors: Paul Jungmann, Julia Poray, Akash Kumar

Affiliations: GlobalFoundries Dresden Module One LLC & Company KG, Dresden, Germany; Chair for Processor Design, Center for Advancing Electronics Dresden (CfAED), Dresden, Germany

Title: Analytical Uncertainty Propagation in Neural Networks

Abstract:
The usage of machine-learning techniques, such as neural networks, is common in a large variety of domains. Estimating the certainty of a predicted value is important when precise information is gained. Nevertheless, the forward propagation of uncertainty in machine-learning models is hardly understood. In general, providing error bars for measurements (measurement uncertainty) is crucial when high precision is needed for decision-making. The objective of this work is the development of an analytical method for aleatoric uncertainty forward propagation in neural networks, based on analytical uncertainty propagation well known from physics and engineering. With that, the method gives provable correct results. A benefit is that the method does not require a different training procedure, but only needs the weights and biases of the neural network and is computationally inexpensive. The analytical method is applied to real-world examples from the semiconductor industry (regression and image classification). Its usefulness is demonstrated by the provided examples, which show how meaningful error bars are when machine learning may be used for decision-making.

PaperID: 121,

Authors: Boyu Dong, Dong Chen, Yu Wu, Siliang Tang, Yueting Zhuang

Affiliations: College of Computer Science and Technology, Zhejiang University, Hangzhou, China; School of Computer Science, Princeton University, Princeton, NJ, USA

Title: FADngs: Federated Learning for Anomaly Detection

Abstract:
With the increasing demand for data privacy, federated learning (FL) has gained popularity for various applications. Most existing FL works focus on the classification task, overlooking those scenarios where anomaly detection may also require privacy-preserving. Traditional anomaly detection algorithms cannot be directly applied to the FL setting due to false and missing detection issues. Moreover, with common aggregation methods used in FL (e.g., averaging model parameters), the global model cannot keep the capacities of local models in discriminating anomalies deviating from local distributions, which further degrades the performance. For the aforementioned challenges, we propose Federated Anomaly Detection with Noisy Global Density Estimation, and Self-supervised Ensemble Distillation (FADngs). Specifically, FADngs aligns the knowledge of data distributions from each client by sharing processed density functions. Besides, FADngs trains local models in an improved contrastive learning way that learns more discriminative representations specific for anomaly detection based on the shared density functions. Furthermore, FADngs aggregates capacities by ensemble distillation, which distills the knowledge learned from different distributions to the global model. Our experiments demonstrate that the proposed method significantly outperforms state-of-the-art federated anomaly detection methods. We also empirically show that the shared density function is privacy-preserving. The code for the proposed method is provided for research purposes https://github.com/kanade00/Federated_Anomaly_detection.

PaperID: 122,

Authors: Yi Liu, De Cheng, Dingwen Zhang, Shoukun Xu, Jungong Han

Affiliations: School of Computer Science and Artificial Intelligence, the Aliyun School of Big Data, the School of Software, and CNPC-CZU Innovation Alliance, Changzhou University, Changzhou, Jiangsu, China; School of Telecommunications Engineering, Xidian University, Shaanxi, Xi’an, China; School of Automation, Northwestern Polytechnical University, Xi’an, Shaanxi, China; Department of Computer Science, University of Sheffield, Sheffield, U.K

Title: Capsule Networks With Residual Pose Routing

Abstract:
Capsule networks (CapsNets) have been known difficult to develop a deeper architecture, which is desirable for high performance in the deep learning era, due to the complex capsule routing algorithms. In this article, we present a simple yet effective capsule routing algorithm, which is presented by a residual pose routing. Specifically, the higher-layer capsule pose is achieved by an identity mapping on the adjacently lower-layer capsule pose. Such simple residual pose routing has two advantages: 1) reducing the routing computation complexity and 2) avoiding gradient vanishing due to its residual learning framework. On top of that, we explicitly reformulate the capsule layers by building a residual pose block. Stacking multiple such blocks results in a deep residual CapsNets (ResCaps) with a ResNet-like architecture. Results on MNIST, AffNIST, SmallNORB, and CIFAR-10/100 show the effectiveness of ResCaps for image classification. Furthermore, we successfully extend our residual pose routing to large-scale real-world applications, including 3-D object reconstruction and classification, and 2-D saliency dense prediction. The source code has been released on https://github.com/liuyi1989/ResCaps.

PaperID: 123,

Authors: Vasco Lopes, Luís A. Alexandre

Affiliations: NOVA LINCS, Universidade da Beira Interior, Covilhã, Portugal

Title: Toward Less Constrained Macro-Neural Architecture Search

Abstract:
Networks found with neural architecture search (NAS) achieve the state-of-the-art performance in a variety of tasks, out-performing human-designed networks. However, most NAS methods heavily rely on human-defined assumptions that constrain the search: architecture’s outer skeletons, number of layers, parameter heuristics, and search spaces. In addition, common search spaces consist of repeatable modules (cells) instead of fully exploring the architecture’s search space by designing entire architectures (macro-search). Imposing such constraints requires deep human expertise and restricts the search to predefined settings. In this article, we propose less constrained macro-neural architecture search (LCMNAS), a method that pushes NAS to less constrained search spaces by performing macro-search without relying on predefined heuristics or bounded search spaces. LCMNAS introduces three components for the NAS pipeline: 1) a method that leverages information about well-known architectures to autonomously generate complex search spaces based on weighted directed graphs (WDGs) with hidden properties; 2) an evolutionary search strategy that generates complete architectures from scratch; and 3) a mixed-performance estimation approach that combines information about architectures at the initialization stage and lower fidelity estimates to infer their trainability and capacity to model complex functions. We present experiments in 14 different datasets showing that LCMNAS is capable of generating both cell and macro-based architectures with minimal GPU computation and state-of-the-art results. Moreover, we conduct extensive studies on the importance of different NAS components in both cell and macro-based settings. The code for reproducibility is publicly available at https://github.com/VascoLopes/LCMNAS.

PaperID: 124,

Authors: Alaa Tharwat, Wolfram Schenck

Affiliations: Center for Applied Data Science (CfADS), Hochschule Bielefeld–University of Applied Sciences and Arts, Bielefeld, Germany

Title: Active Learning for Handling Missing Data

Abstract:
Recently, the massive growth of IoT devices and Internet data, which are widely used in many applications, including industry and healthcare, has dramatically increased the amount of free unlabeled data collected. However, this unlabeled data is useless if we want to learn supervised machine learning models. The expensive and time-consuming cost of labeling makes the problem even more challenging. Here, the active learning (AL) technique provides a solution by labeling small but highly informative and representative data, which guarantees a high degree of generalizability over space and improves classification performance with data we have never seen before. The task is more difficult when the active learner has no predefined knowledge, such as initial training data, and when the obtained data is incomplete (i.e., contains missing values). In previous studies, the missing data should first be imputed. Then, the active learner selects from the available unlabeled data, regardless of whether the points were originally observed or imputed. However, selecting inaccurate imputed data points would negatively affect the active learner and prevent it from selecting informative and/or representative points, thus reducing the overall classification performance of the prediction models. This motivated us to introduce a novel query selection strategy that accounts for imputation uncertainty when querying new points. For this purpose, we first introduce a novel multiple imputation method that considers feature importance in selecting the most promising feature groups for missing value estimation. This multiple imputation method provides the ability to quantify the imputation uncertainty of each imputed data point. Furthermore, in each of the two phases of the proposed active learner (exploration and exploitation), imputation uncertainty is taken into account to reduce the probability of selecting points with high imputation uncertainty. We tested the effectiveness of the proposed active learner on different binary and multiclass datasets with different missing rates.

PaperID: 125,

Authors: Zehua Sun, Yonghui Xu, Yong Liu, Wei He, Lanju Kong, Fangzhao Wu, Yali Jiang, Lizhen Cui

Affiliations: Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR) and the Software School, Shandong University, Jinan, China; Nanyang Technological University, Jurong West, Singapore; Microsoft Research Asia, Beijing, China

Title: A Survey on Federated Recommendation Systems

Abstract:
Federated learning has recently been applied to recommendation systems to protect user privacy. In federated learning settings, recommendation systems can train recommendation models by collecting the intermediate parameters instead of the real user data, which greatly enhances user privacy. In addition, federated recommendation systems (FedRSs) can cooperate with other data platforms to improve recommendation performance while meeting the regulation and privacy constraints. However, FedRSs face many new challenges such as privacy, security, heterogeneity, and communication costs. While significant research has been conducted in these areas, gaps in the surveying literature still exist. In this article, we: 1) summarize some common privacy mechanisms used in FedRSs and discuss the advantages and limitations of each mechanism; 2) review several novel attacks and defenses against security; 3) summarize some approaches to address heterogeneity and communication costs problems; 4) introduce some realistic applications and public benchmark datasets for FedRSs; and 5) present some prospective research directions in the future. This article can guide researchers and practitioners understand the research progress in these areas.

PaperID: 126,

Authors: Delong Chen, Zhao Wu, Fan Liu, Zaiquan Yang, Shaoqiu Zheng, Ying Tan, Erjin Zhou

Affiliations: College of Computer and Information, Hohai University, Nanjing, China; MEGVII Research, Beijing, China; Nanjing Research Institute of Electronic Engineering, Nanjing, China; Department of Machine Intelligence, School of EECS, Peking University, Beijing, China

Title: ProtoCLIP: Prototypical Contrastive Language Image Pretraining

Abstract:
Contrastive language image pretraining (CLIP) has received widespread attention since its learned representations can be transferred well to various downstream tasks. During the training process of the CLIP model, the InfoNCE objective aligns positive image–text pairs and separates negative ones. We show an underlying representation grouping effect during this process: the InfoNCE objective indirectly groups semantically similar representations together via randomly emerged within-modal anchors. Based on this understanding, in this article, prototypical contrastive language image pretraining (ProtoCLIP) is introduced to enhance such grouping by boosting its efficiency and increasing its robustness against the modality gap. Specifically, ProtoCLIP sets up prototype-level discrimination between image and text spaces, which efficiently transfers higher level structural knowledge. Furthermore, prototypical back translation (PBT) is proposed to decouple representation grouping from representation alignment, resulting in effective learning of meaningful representations under a large modality gap. The PBT also enables us to introduce additional external teachers with richer prior language knowledge. ProtoCLIP is trained with an online episodic training strategy, which means it can be scaled up to unlimited amounts of data. We trained our ProtoCLIP on conceptual captions (CCs) and achieved an +5.81% ImageNet linear probing improvement and an +2.01% ImageNet zero-shot classification improvement. On the larger YFCC-15M dataset, ProtoCLIP matches the performance of CLIP with 33% of training time.

PaperID: 127,

Authors: Kaihang Jiang, Wai Keung Wong, Xiaozhao Fang, Jiaxing Li, Jianyang Qin, Shengli Xie

Affiliations: School of Fashion and Textiles, The Hong Kong Polytechnic University, Hong Kong, China; School of Automation, Guangdong University of Technology, Guangzhou, China; Institute of Artificial Intelligence, Guangzhou University, Guangzhou, China; School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China; School of Automation and the Guangdong Hong Kong Macao Joint Laboratory for Smart Discrete Manufacturing, Guangdong University of Technology, Guangzhou, China

Title: Random Online Hashing for Cross-Modal Retrieval

Abstract:
In the past decades, supervised cross-modal hashing methods have attracted considerable attentions due to their high searching efficiency on large-scale multimedia databases. Many of these methods leverage semantic correlations among heterogeneous modalities by constructing a similarity matrix or building a common semantic space with the collective matrix factorization method. However, the similarity matrix may sacrifice the scalability and cannot preserve more semantic information into hash codes in the existing methods. Meanwhile, the matrix factorization methods cannot embed the main modality-specific information into hash codes. To address these issues, we propose a novel supervised cross-modal hashing method called random online hashing (ROH) in this article. ROH proposes a linear bridging strategy to simplify the pair-wise similarities factorization problem into a linear optimization one. Specifically, a bridging matrix is introduced to establish a bidirectional linear relation between hash codes and labels, which preserves more semantic similarities into hash codes and significantly reduces the semantic distances between hash codes of samples with similar labels. Additionally, a novel maximum eigenvalue direction (MED) embedding method is proposed to identify the direction of maximum eigenvalue for the original features and preserve critical information into modality-specific hash codes. Eventually, to handle real-time data dynamically, an online structure is adopted to solve the problem of dealing with new arrival data chunks without considering pairwise constraints. Extensive experimental results on three benchmark datasets demonstrate that the proposed ROH outperforms several state-of-the-art cross-modal hashing methods.

PaperID: 128,

Authors: Chengzhi Cao, Xueyang Fu, Yurui Zhu, Zhijing Sun, Zheng-Jun Zha

Affiliations: School of Information Science and Technology, University of Science and Technology of China, Hefei, China

Title: Event-Driven Video Restoration With Spiking-Convolutional Architecture

Abstract:
With high temporal resolution, high dynamic range, and low latency, event cameras have made great progress in numerous low-level vision tasks. To help restore low-quality (LQ) video sequences, most existing event-based methods usually employ convolutional neural networks (CNNs) to extract sparse event features without considering the spatial sparse distribution or the temporal relation in neighboring events. It brings about insufficient use of spatial and temporal information from events. To address this problem, we propose a new spiking-convolutional network (SC-Net) architecture to facilitate event-driven video restoration. Specifically, to properly extract the rich temporal information contained in the event data, we utilize a spiking neural network (SNN) to suit the sparse characteristics of events and capture temporal correlation in neighboring regions; to make full use of spatial consistency between events and frames, we adopt CNNs to transform sparse events as an extra brightness prior to being aware of detailed textures in video sequences. In this way, both the temporal correlation in neighboring events and the mutual spatial information between the two types of features are fully explored and exploited to accurately restore detailed textures and sharp edges. The effectiveness of the proposed network is validated in three representative video restoration tasks: deblurring, super-resolution, and deraining. Extensive experiments on synthetic and real-world benchmarks have illuminated that our method performs better than existing competing methods.

PaperID: 129,

Authors: Yunyun Wang, Weiwen Zheng, Qinghao Li, Songcan Chen

Affiliations: Jiangsu Key Laboratory of Big Data Security and Intelligent Processing, Computer Science and Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China; College of Computer Science and Technology, MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Title: Dual-Correction-Adaptation Network for Noisy Knowledge Transfer

Abstract:
Unsupervised domain adaptation (UDA) promotes target learning via a single -directional transfer from label-rich source domain to unlabeled target, while its reverse adaption from target to source has not been jointly considered yet. In real teaching practice, a teacher helps students learn and also gets promotion from students, and such a virtuous cycle inspires us to explore dual -directional transfer between domains. In fact, target pseudo-labels predicted by source commonly involve noise due to model bias; moreover, source domain usually contains innate noise, which inevitably aggravates target noise, leading to noise amplification. Transfer from target to source exploits target knowledge to rectify the adaptation, consequently enables better source transfer, and exploits a virtuous transfer circle. To this end, we propose a dual-correction–adaptation network (DualCAN), in which adaptation and correction cycle between domains, such that learning in both domains can be boosted gradually. To the best of our knowledge, this is the first naive attempt of dual-directional adaptation. Empirical results validate DualCAN with remarkable performance gains, particularly for extreme noisy tasks (e.g., approximately +10% on D \rightarrow A of Office-31 with 40% label corruption).

PaperID: 130,

Authors: Shiye Lei, Fengxiang He, Yancheng Yuan, Dacheng Tao

Affiliations: Sydney AI Centre and School of Computer Science, Faculty of Engineering, The University of Sydney, Darlington, NSW, Australia; Artificial Intelligence and its Applications Institute, School of Informatics, University of Edinburgh, Edinburgh, U.K; Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR

Title: Understanding Deep Learning via Decision Boundary

Abstract:
This article discovers that the neural network (NN) with lower decision boundary (DB) variability has better generalizability. Two new notions, algorithm DB variability and (\epsilon, \eta) -data DB variability, are proposed to measure the DB variability from the algorithm and data perspectives. Extensive experiments show significant negative correlations between the DB variability and the generalizability. From the theoretical view, two lower bounds based on algorithm DB variability are proposed and do not explicitly depend on the sample size. We also prove an upper bound of order \mathcal O\left ((1/\sqrt m)+\epsilon +\eta \log (1/\eta )\right) based on data DB variability. The bound is convenient to estimate without the requirement of labels and does not explicitly depend on the network size which is usually prohibitively large in deep learning.

PaperID: 131,

Authors: Rui Lin, Jason Chun Lok Li, Jiajun Zhou, Binxiao Huang, Jie Ran, Ngai Wong

Affiliations: Distributed and Parallel Software Laboratory, Huawei, Hong Kong; Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam, Hong Kong

Title: Lite It Fly: An All-Deformable-Butterfly Network

Abstract:
Most deep neural networks (DNNs) consist fundamentally of convolutional and/or fully connected layers, wherein the linear transform can be cast as the product between a filter matrix and a data matrix obtained by arranging feature tensors into columns. Recently proposed deformable butterfly (DeBut) decomposes the filter matrix into generalized, butterfly-like factors, thus achieving network compression orthogonal to the traditional ways of pruning or low-rank decomposition. This work reveals an intimate link between DeBut and a systematic hierarchy of depthwise and pointwise convolutions, which explains the empirically good performance of DeBut layers. By developing an automated DeBut chain generator, we show for the first time the viability of homogenizing a DNN into all DeBut layers, thus achieving extreme sparsity and compression. Various examples and hardware benchmarks verify the advantages of All-DeBut networks. In particular, we show it is possible to compress a PointNet to <5% parameters with <5% accuracy drop, a record not achievable by other compression schemes.

PaperID: 132,

Authors: Honghui Wang, Yifan Pu, Shiji Song, Gao Huang

Affiliations: Department of Automation, Tsinghua University, Beijing, China

Title: Advancing Generalization in PINNs Through Latent-Space Representations

Abstract:
Physics-informed neural networks (PINNs) have made significant strides in modeling dynamical systems governed by partial differential equations (PDEs). However, their generalization capabilities across varying scenarios remain limited. To overcome this limitation, we propose physics-informed dynamics representation learner (PiDo), a novel physics-informed neural PDE solver designed to generalize effectively across diverse PDE configurations, including varying initial conditions, PDE coefficients, and training-time horizons. PiDo exploits the shared underlying structure of dynamical systems with different properties by projecting PDE solutions into a latent space using auto-decoding. It then learns the dynamics of these latent representations, conditioned on the PDE coefficients. Despite its promise, integrating latent dynamics models within a physics-informed framework poses challenges due to the optimization difficulties associated with physics-informed losses. To address these challenges, we introduce a novel approach that diagnoses and mitigates these issues within the latent space. This strategy employs straightforward yet effective regularization techniques, enhancing both the temporal extrapolation performance and the training stability of PiDo. We validate PiDo on a range of benchmarks, including 1-D combined equations and 2-D Navier–Stokes equations. In addition, we demonstrate the transferability of its learned representations to downstream applications such as long-term integration and inverse problems.

PaperID: 133,

Authors: Guo-Sen Xie, Junyi Li, Ting Guo, Xiangbo Shu, Fang Zhao, Zheng Zhang, Ling Shao

Affiliations: School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China; Data Science and Technology, North University of China, Taiyuan, China; School of Intelligence Science and Technology, Nanjing University, Suzhou, China; Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen, China; UCAS-Terminus AI Laboratory, University of Chinese Academy of Sciences, Beijing, China

Title: Attribute Prompt Alignment Network for Zero-Shot Learning

Abstract:
In the vanilla zero-shot learning (ZSL) paradigm, category attributes is the key for knowledge generalizable transfer from seen to unseen classes. By contrast, the current contrastive language-image pretraining (CLIP) model relies on the category names to achieve a more general ZSL-like prediction. When vanilla ZSL meets general CLIP, however, most existing methods on both sides struggle to benefit from each other. In this brief, we resort to attribute prompt tuning (APT) for improving the knowledge transferability from the pretrained CLIP model to the downstream ZSL framework for pursuing desirable feature representations. Our approach, termed as attribute prompt alignment network (APAN), leverages APT for cross-network feature alignment (CFA). In this way, we can investigate the effects of CLIP to vanilla ZSL task in the era of large model by the two branch APAN architecture. Specifically, APT takes as an input the templates of class attribute descriptions to produce attribute prompts, which are further used to both guide the localizations of visual regions across two frozen feature extraction networks, through a visual-semantic interaction attention. This enables APAN to progressively refine and align these cross-network features, thus resulting in generalizable feature representations that can capture fine-grained attribute information. For CFA, we simply introduce prediction alignment loss that constrains the predictions from these two cross-network visual features. Experimental results on three benchmark datasets well demonstrate that APAN outperforms the state-of-the-art methods by absorbing generalizable knowledge from CLIP models.

PaperID: 134,

Authors: Xun Wang, Jingmian Wang, Zhuzhong Qian, Bolei Zhang

Affiliations: State Key Laboratory for Novel Software Technology and School of Computer Science, Nanjing University, Nanjing, China; School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China

Title: Online Adaptable Offline RL With Guidance Model

Abstract:
Reinforcement learning (RL) has emerged as a promising approach across various applications, yet its reliance on repeated trial-and-error learning to develop effective policies from scratch poses significant challenges for deployment in scenarios where interaction is costly or constrained. In this work, we investigate the offline-to-online RL paradigm, wherein policies are initially pretrained using offline historical datasets and subsequently fine-tuned with a limited amount of online interaction. Previous research has suggested that efficient offline pretraining is crucial for achieving optimal final performance. However, it is challenging to incorporate appropriate conservatism to prevent the overestimation of out-of-distribution (OOD) data while maintaining adaptability for online fine-tuning. To address these issues, we propose an effective offline RL algorithm that integrates a guidance model to introduce suitable conservatism and ensure seamless adaptability to online fine-tuning. Our rigorous theoretical analysis and extensive experimental evaluations demonstrate better performance of our novel algorithm, underscoring the critical role played by the guidance model in enhancing its efficacy.

PaperID: 135,

Authors: Pengfei Jiao, Yuanqi Liu, Yinghui Wang, Huijun Tang, Zhidong Zhao, Shirui Pan

Affiliations: School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China; National Key Laboratory of Information Systems Engineering, Beijing Institute of Control and Electronic Technology, Beijing, China; School of Information and Communication Technology, Griffith University, Southport, QLD, Australia

Title: Interactive Graph Learning for Multilevel Network Alignment

Abstract:
The task of network alignment aims to identify corresponding nodes across multiple networks, with applications in various fields such as social network analysis and bioinformatics. Traditional methods typically focus on the topological structure of networks at a specific level, but they may overlook important properties exhibited by many networks, such as scale-free properties and specific power-law structures often found in social networks. Consequently, these methods fail to effectively capture and utilize such information, leading to misalignment. In this article, we propose a network alignment framework that incorporates both topological and attribute information from multiple levels in the network, including homogeneity, power-law, and higher order structures. We introduce a Euclidean hyperbolic interactive graph learning method specifically designed for modeling power-law structures in networks, aiming to improve the accuracy of network alignment. To evaluate the effectiveness of our proposed method, we conduct experiments on several real-world datasets. The results demonstrate that our approach achieves higher accuracy compared to other advanced baselines.

PaperID: 136,

Authors: Jiabin Liu, Bo Wang, Yuping Zhang, Huadong Wang, Biao Li, Xin Shen, Gang Kou

Affiliations: School of Information and Electronics, Beijing Institute of Technology, Beijing, China; School of Information Technology and Management, University of International Business and Economics, Beijing, China; Beijing ModelBest Intelligent Technology Company Ltd., Beijing, China; School of Business Administration, Faculty of Business Administration, Southwestern University of Finance and Economics, Chengdu, China; Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China; Xiangjiang Laboratory, School of Digital Media Engineering and Humanities, Hunan University of Technology and Business, Changsha, China

Title: Progressive Training for Learning From Label Proportions

Abstract:
Learning from label proportions (LLPs), which aims to learn an instance-level classifier using proportion-based grouped training data, has garnered increasing attention in the field of machine learning. Existing deep learning-based LLP methods employ end-to-end pipelines to derive proportional loss functions via the Kullback–Leibler (KL) divergence between bag-level prior and posterior class distributions. However, the optimal solutions of these methods often struggle to conform to the given proportions, inevitably leading to degradation in the final classification performance. In this article, we address this issue by proposing a novel progressive training method for LLP, termed PT-LLP, which strives to meet the proportion constraints from the bag level to the instance level. Specifically, we first train a model by using the existing KL-divergence-based LLP methods that are consistent with bag-level proportion information. Then, we impose additional constraints on strict proportion consistency to the classifier to further move toward a more ideal solution by reformulating it as a constrained optimization problem, which can be efficiently solved using optimal transport (OT) algorithms. In particular, the knowledge distillation is employed as a transition stage to transfer the bag-level information to the instance level using a teacher–student framework. Finally, our framework is model-agnostic and demonstrates significant performance improvements through extensive experiments on different datasets when incorporated into other deep LLP methods as the first training stage.

PaperID: 137,

Authors: Xiangmin Han, Rundong Xue, Jingxi Feng, Yifan Feng, Shaoyi Du, Jun Shi, Yue Gao

Affiliations: School of Software, BNRist, THUIBCS, BLBCI, Tsinghua University, Beijing, China; Institute of Artificial Intelligence and Robotics, College of Artificial Intelligence, Xi’an Jiaotong University, Xi’an, China; School of Communication and Information Engineering, Shanghai University, Shanghai, China

Title: Hypergraph Foundation Model for Brain Disease Diagnosis

Abstract:
The goal of the hypergraph foundation model (HGFM) is to learn an encoder based on the hypergraph computational paradigm through self-supervised pretraining on high-order correlation structures, enabling the encoder to rapidly adapt to various downstream tasks in scenarios, where no labeled data or only a small amount of labeled data are available. The initial exploratory work has been applied to brain disease diagnosis tasks. However, existing methods primarily rely on graph-based approaches to learn low-order correlation patterns between brain regions in brain networks, neglecting the modeling and learning of complex correlations between different brain diseases and patients. This article proposes an HGFM for brain disease diagnosis, which conducts multidimensional pretraining tasks to explore latent cross-dimensional high-order correlation patterns on various brain disease datasets. HGFM is a high-order correlation-driven foundation model for brain disease diagnosis and effectively improves prediction performance. Specifically, HGFM first performs brain functional network link prediction tasks on individual brain networks and group interaction network link prediction tasks on group brain networks, constructing an HGFM for brain disease diagnosis. In downstream tasks, it achieves predictions for different brain disease diagnosis tasks through few-shot learning fine-tuning methods. The proposed method is evaluated on functional magnetic resonance imaging (fMRI) data from 4409 patients across four brain diseases. Results show that it outperforms existing state-of-the-art methods in all brain disease diagnosis tasks, demonstrating its potential value in clinical applications.

PaperID: 138,

Authors: Mengzhou Gao, Xinxun Zhang, Pengfei Jiao, Tianpeng Li, Zhidong Zhao

Affiliations: School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China; College of Intelligence and Computing, Tianjin University, Tianjin, China

Title: DVGMAE: Self-Supervised Dynamic Variational Graph Masked Autoencoder

Abstract:
Although contrastive self-supervised learning (SSL) on dynamic graphs has made significant success, the issue of heavy reliance on data augmentation and training tricks has been a persistent pain point. Generative SSL, especially masked autoencoders (MAEs) have recently produced promising results and can avoid these issues. However, the research on MAE in dynamic graphs remains largely unexplored due to the following challenges: 1) how to design an effective masking strategy for dynamic graphs? and 2) how to design a decoder to retain temporal dependency when graphs are perturbed? In this article, we propose DVGMAE, a novel dynamic variational graph masked autoencoder model to solve these challenges. DVGMAE simultaneously captures the evolving behaviors and topological features via an innovative masking strategy and an elaborate decoder. Specifically, we first implement a temporal-aware masking strategy on the edges of each snapshot based on the updated probabilities derived from historical mask information. This strategy mitigates potential masking bias in dynamic graphs. We then design a globally enhanced decoder to recover the temporal and spatial information of each snapshot. Extensive experiments demonstrate that DVGMAE outperforms the existing state-of-the-art on various tasks across different datasets.

PaperID: 139,

Authors: Feng Xiao, Youqing Wang, S. Joe Qin, Jicong Fan

Affiliations: Chinese University of Hong Kong, Shenzhen, China; Beijing University of Chemical Technology, Beijing, China; Lingnan University, Hong Kong, SAR, China

Title: Semi-Supervised Anomaly Detection Using Restricted Distribution Transformation

Abstract:
Anomaly detection (AD) is typically regarded as an unsupervised learning task, where the training data either do not contain any anomalous samples or contain only a few unlabeled anomalous samples. In fact, in many real scenarios such as fault diagnosis and disease detection, a small number of anomalous samples labeled by domain experts are often available during the training phase, which makes semi-supervised AD (SAD) more appealing, though the related study is quite limited. Existing semi-supervised AD methods directly add optimization terms of anomalous samples to the optimization objective of unsupervised AD (UAD), where the effects of the limited labeled anomalous data on the optimization process become trivial and they cannot fully contribute to the detection task. To cover the shortage, in this work, we propose a novel semi-supervised AD method to fully use the limited labeled anomalous data and further to boost detection performance. The proposed method learns a nonlinear transformation to project normal data into a compact target distribution and simultaneously to project exposed anomalous samples into another target distribution, where the two target distributions do not overlap each other. The goal is difficult to achieve because of the scarcity of anomalous samples. To address this problem, we propose to generate a large number of intermediate samples interpolating between normal and anomalous data and project them into a third target distribution lying between the aforementioned two target distributions. Empirical results on multiple benchmarks with varying domains demonstrate the superiority of our method over existing supervised and semi-supervised AD methods.

PaperID: 140,

Authors: Shiben Liu, Huijie Fan, Qiang Wang, Xi'ai Chen, Zhi Han, Yandong Tang

Affiliations: State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China

Title: Diverse Representations Embedding for Lifelong Person Re-Identification

Abstract:
Lifelong person re-identification (LReID) aims to continuously learn from sequential data streams, enabling cross-camera matching of individuals over time. A critical challenge in LReID lies in balancing the preservation of previously acquired knowledge with the incremental acquisition of new information, due to task-level gaps and limited representation capacity. Conventional methods relying on CNN backbones struggle to fully capture the diverse perspectives of each instance, leading to suboptimal model performance. To tackle these limitations, we propose a diverse representation embedding (DRE) framework that balances preserving old knowledge with adapting to new information. Specifically, our DRE incorporates a robust Transformer-based backbone that utilizes maximum embedding (ME) and multiple class tokens to generate overlapping representations for each instance. To further enhance the model’s representation capacity, we design an adaptive constraint module (ACM), which performs integration and discrimination operations on overlapping representations to yield diverse yet diverse representations. Furthermore, we propose two strategies: knowledge update (KU) and knowledge preservation (KP), implemented within the adjustment and learner models, respectively. The KU strategy enhances the learner model’s ability to adapt to new information by leveraging prior knowledge from the adjustment model. The KP strategy ensures the retention of historical knowledge while maintaining the model’s adaptability. Extensive experiments validate that our DRE surpasses state-of-the-art approaches across large-scale, occluded, and holistic datasets, demonstrating significant performance gains. Our code is available at https://github.com/LiuShiBen/DRE

PaperID: 141,

Authors: Meng Xu, Xinhong Chen, Guanyi Zhao, Zihao Wen, Weiwei Fu, Jianping Wang

Affiliations: Department of Computer Science, City University of Hong Kong, Hong Kong, SAR, China

Title: Neighboring State-Aware Policy for Deep Reinforcement Learning

Abstract:
Deep reinforcement learning (DRL) methods, which train a policy to obtain the sequence of actions required to complete a task, have achieved remarkable success across diverse applications. It is a long-standing open issue in the DRL community to make the trained policy gradually approach the theoretically globally optimal policy, and existing research has also explored several challenges, such as exploration-exploitation, to improve the quality of the obtained policy. However, most DRL methods rely solely on the current state for decision-making, leading to short-sightedness and suboptimal learning. To overcome this, we propose a neighboring state-aware policy that enhances existing DRL methods by incorporating a neighboring state sequence in the decision-making process. Specifically, our approach saves multiple past and future states and concatenates them as the neighboring state sequence, along with the current state, and inputs them to the actor to generate an action during the training process. This global perspective, provided by neighboring states, is similar to human decision-making and helps the agent better understand state evolution, leading to improved policy learning. We present two specific implementations of our approach and demonstrate through extensive experiments that it effectively enhances ten representative DRL methods across nine tasks, based on three metrics, including return.

PaperID: 142,

Authors: Tengfei Yan, Jiankai Tu, Chunguang Li, Fan Zhang

Affiliations: College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China; College of Computer Science and Technology, Zhejiang University, Hangzhou, China

Title: Topology-Preserved Information Bottleneck for Multiview Anomaly Detection

Abstract:
Anomaly detection (AD) techniques are widely used in various fields. Existing techniques primarily focus on learning a normal region from single-view data, which may be not suitable for multiview data that provides more comprehensive information from multiple perspectives. Therefore, AD techniques designed for multiview data are necessary. Straightforwardly, one can concatenate the features learned from multiple single-view data into a joint representation to conduct AD. However, this may overlook the inevitable overlaps between views, potentially masking view-specific information due to the repetitive calculations of these overlaps. Among the various possible methods, one way to address this is to compress redundant information while maintaining comprehensive information across views. Following this way, in this article, we leverage the principle of information bottleneck (IB) to extract concise and comprehensive representations for multiview data. But it is problematic to directly use these representations for AD, since the multiview fusion process may disturb the intrinsic structure of the original data. That is, samples distributed at the edges/center of the original normal data distribution are mapped closer to the center/edges. This might cause abnormal samples (close to the normal data at the edges) to be incorrectly mapped into the normal region during inference. In the AD scenario, the absence of abnormal training samples makes it unfeasible to preserve this structure using supervised information. In this article, we design a topology-preserved regularization that unsupervisedly constrains the latent representations to preserve the original data’s intrinsic structure, to improve the AD performance. Overall, we propose a topology-preserved multiview information bottleneck (TMVIB) feature extraction method to extract concise, comprehensive, and topology-preserved latent representations from multiview data. Interestingly, we find that the TMVIB feature extraction method itself can be viewed as a regularized anomaly detector, allowing it to output anomaly scores directly. Experiments on synthetic and real-world multiview datasets demonstrate the effectiveness of the proposed TMVIB.

PaperID: 143,

Authors: Longkun Guo, Chaoqi Jia, Kewen Liao, Zhigang Lu, Minhui Xue

Affiliations: School of Mathematics and Statistics, Fuzhou University, Fuzhou, China; School of Accounting, Information Systems and Supply Chain, RMIT University, Melbourne, VIC, Australia; School of Information Technology, Deakin University, Burwood, VIC, Australia; School of Data, Computer and Mathematical Science, Western Sydney University, Kingswood, NSW, Australia; CSIRO’s Data, Sydney, NSW, Australia

Title: Near-Optimal Algorithms for Instance-Level Constrained k-Center Clustering

Abstract:
Many practical applications impose a new challenge of utilizing instance-level background knowledge (e.g., subsets of similar or dissimilar data points) within their input data to improve clustering results. In this work, we build on the widely adopted k-center clustering, modeling its input instance-level background knowledge as must-link (ML) and cannot-link (CL) constraint sets, and formulate the constrained k-center problem. Given the long-standing challenge of developing efficient algorithms for constrained clustering problems, we first derive an efficient approximation algorithm for constrained k-center at the best possible approximation ratio of 2 with linear programming (LP)-rounding technology. Recognizing the limitations of LP-rounding algorithms including high runtime complexity and challenges in parallelization, we subsequently develop a greedy algorithm that does not rely on the LP and can be efficiently parallelized. This algorithm also achieves the same approximation ratio 2 but with lower runtime complexity. Lastly, we empirically evaluate our approximation algorithm against baselines on various real datasets, validating our theoretical findings and demonstrating significant advantages of our algorithm in terms of clustering cost, quality, and runtime complexity.

PaperID: 144,

Authors: Like Xin, Wanqi Yang, Lei Wang, Ming Yang

Affiliations: School of Mathematical Sciences, Nanjing Normal University, Nanjing, China; School of Computer and Electronic Information, Nanjing Normal University, Nanjing, China; School of Computing and Information Technology, University of Wollongong, Wollongong, NSW, Australia

Title: Multilevel Reliable Guidance for Unpaired Multiview Clustering

Abstract:
In this article, we address the challenging problem of unpaired multiview clustering (UMC), which aims to achieve effective joint clustering using unpaired samples observed across multiple views. Traditional incomplete multiview clustering (IMC) methods typically rely on paired samples to capture complementary information between views. However, such strategies become impractical in the UMC due to the absence of paired samples. Although some researchers have attempted to address this issue by preserving consistent cluster structures across views, effectively mining such consistency remains challenging when the cluster structures with low confidence. Therefore, we propose a novel method, multilevel reliable guidance for UMC (MRG-UMC), which integrates multilevel clustering and reliable view guidance to learn consistent and confident cluster structures from three perspectives. Specifically, inner view multilevel clustering exploits high-confidence sample pairs across different levels to reduce the impact of boundary samples, resulting in more confident cluster structures. Synthesized-view alignment leverages a synthesized view to mitigate cross-view discrepancies and promote consistency. Cross-view guidance employs a reliable view guidance strategy to enhance the clustering confidence of poorly clustered views. These three modules are jointly optimized across multiple levels to achieve consistent and confident cluster structures. Furthermore, theoretical analyses verify the effectiveness of MRG-UMC in enhancing clustering confidence. Extensive experimental results show that MRG-UMC outperforms state-of-the-art UMC methods, achieving an average NMI improvement of 12.95% on multiview datasets. The source code is available at https://anonymous.4open.science/r/MRG-UMC-5E20

PaperID: 145,

Authors: Hao Nan Sheng, Zhi-Yong Wang, Hing Cheung So

Affiliations: Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China

Title: Robust Rank-One Matrix Completion via Explicit Regularizer

Abstract:
In robust matrix completion (MC), the Welsch function, also referred to as the maximum correntropy criterion with Gaussian kernel, has been widely employed. However, it suffers from the drawback of down-weighing normal data. This work is the first to uncover the explicit regularizer (ER) for the Welsch function based on the multiplicative form of half-quadratic (HQ) minimization. Leveraging this discovery, we develop a new function called t-Welsch, also with ER, which provides unity weight to normal data and exhibits stronger robustness against large-magnitude outliers compared to Huber’s weight. We apply the t-Welsch to rank-one matching pursuit, enabling accurate and robust low-rank matrix recovery without the need of rank information and singular value decomposition (SVD). The resultant MC algorithm is realized via block coordinate descent (BCD), whose analyses of convergence and computational complexity are produced. Experiments are conducted using synthetic random data, as well as real-world images with salt-and-pepper noise and multiple-input multiple-output (MIMO) radar signals in the presence of Gaussian mixture disturbances. In all three scenarios, the proposed algorithm outperforms the state-of-the-art robust MC methods in terms of recovery accuracy. The code is available at https://github.com/ShuDun23/t-Welsch-and-RAR1MC.

PaperID: 146,

Authors: Jialong Wang, Zheng Wang, Zhiguo Gong

Affiliations: Department of Computer and Information Science, State Key Laboratory of Internet of Things for Smart City, University of Macau, Macau, China; School of Computer Science, Shanghai Jiao Tong University, Shanghai, China

Title: Multihop Reconstruction for Generalized Zero-Shot Node Classification

Abstract:
Graphs in the real world keep evolving with the integration of new nodes, and it is often infeasible to manually label all the new nodes promptly. In this case, graph learning algorithms can come in handy and perform classification on these newly emerging nodes. Typically, if unseen classes exist (i.e., no training samples from these classes), one can perform zero-shot learning (ZSL) or generalized ZSL (GZSL). During testing, ZSL aims to classify samples within unseen classes, whereas GZSL aims to classify samples within both seen and unseen classes, which is even more challenging. In our previous work, we proposed a decomposed graph prototype network (DGPN) to decompose the graph convolution operation for handling the zero-shot node classification (ZNC) problem. However, DGPN is not well-suited for the generalized ZNC (GZNC) problem. To this end, in this article, we propose a novel graph generative model, multihop reconstruction graph autoencoder (MHR-GAE). Unlike DGPN, MHR-GAE utilizes a multihop encoder with class semantic descriptions (CSDs) (as condition signals) to reconstruct the information and generate nodes of unseen classes. Thus, it can handle both the ZNC and GZNC problems and obtain competitive performance. We evaluate our model on real-world datasets, and the experimental results demonstrate that MHR-GAE outperforms other baseline methods.

PaperID: 147,

Authors: Xue Qiao, Shuang Gu, Jiayuan Cheng, Chen Peng, Zhiwei Xiong, Hong Shen, Gan Jiang

Affiliations: Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, China; Aerospace Information Research Institute, Suzhou, China

Title: Fine-Grained Entity Recognition via Large Language Models

Abstract:
Fine-grained entity recognition (FGER) attracts increasing attention in information extraction and many other natural language understanding applications. However, it is a quite challenging problem for a specific domain due to the lack of specific-domain labeled data. To address this challenge, recent advancements in language modeling such as generative pretrained transformer (GPT) offer promising alternatives. Since large language models (LLMs) can be used for various tasks, such as text generation, summarization, and information extraction without labeled data, we incorporated them into the FGER field. Nonetheless, when too many verbose labels are fed to LLMs simultaneously, LLMs occasionally generate content that diverges from user input, contradicts previously generated context, or misaligns with established world knowledge, also called the “hallucination” phenomenon. In this article, we propose a new method called FGER-GPT to address these issues. Our approach leverages multiple inference chains and incorporates a hierarchical strategy for recognizing fine-grained entities, resulting in a significant performance boost. Importantly, neither coarse-grained nor fine-grained entity annotations are used in our proposed approach, which avoids the heavy labor consumption of labeling. Extensive experiments conducted on widely used datasets have demonstrated that the proposed FGER-GPT achieves competitive performance compared to state-of-the-art approaches in low-resource scenarios, highlighting its feasibility for real-world applications.

PaperID: 148,

Authors: Shenglun Chen, Xinzhu Ma, Hong Zhang, Haojie Li, Baoli Sun, Zhihui Wang

Affiliations: School of Software, Dalian University of Technology, Dalian, China; Shanghai Laboratory, Shanghai, China; College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, China; DUT-RU International School of Information Science and Engineering, Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian University of Technology, Dalian, China

Title: Real-Time Depth Completion With Multimodal Feature Alignment

Abstract:
As a key problem in computer vision, depth completion aims to recover dense depth maps from sparse ones [generally derived from light detection and ranging (LiDAR)]. Most methods introduce synchronous RGB images and leverage multimodal fusion to integrate multimodal features from these modalities to describe the complete scene. However, their different natural characteristics lead to inconsistency in features, potentially impacting the effectiveness of multimodal feature fusion. To address this issue, we propose a feature alignment network (FANet) that introduces an alignment scheme to enhance the consistency between multimodal features. This scheme aligns the modality-invariant semantic context, which is invariant to changes in modality and represents the correlation between a pixel and its surroundings. Specifically, we first design an asymmetric context extraction (ACE) module to extract modality-invariant semantic contexts from multimodal features within limited GPU memory, and then pull them closer to improve consistency. Crucially, our alignment scheme is only applied during the training phase, and no additional computation cost is incurred in the inference phase. Moreover, we introduce a simple yet effective refinement module to refine estimated results via residual learning based on intermediate depth maps and sparse depth maps. Extensive experiments on KITTI and VOID datasets demonstrate that our method achieves competitive performance against typical real-time methods. In addition, we embed the proposed alignment scheme and refinement module into other methods to demonstrate their effectiveness.

PaperID: 149,

Authors: Jun Fu, Xianrui Ji, Dexiong Chen, Guosheng Hu, Shuang Li, Xiating Feng

Affiliations: State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang, China; School of Future Technologies, Northeastern University, Shenyang, China; Max Planck Institute of Biochemistry, Martinsried, Germany; Oosto, Belfast, U.K.; School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China; State Key Laboratory of Intelligent Deep Metal Mining and Equipment, Northeastern University, Shenyang, China

Title: AdvMixUp: Adversarial MixUp Regularization for Deep Learning

Abstract:
Deep neural networks (DNNs) have shown significant progress in many application fields. However, overfitting remains a significant challenge in their development. While existing data-augmentation techniques such as MixUp have been successful in preventing overfitting, they often fail to generate hard mixed samples near the decision boundary, impeding model optimization. In this article, we present adversarial MixUp (AdvMixUp), a novel sample-dependent method for regularizing DNNs. AdvMixUp addresses this issue by incorporating adversarial training (AT) to create sample-dependent and feature-level interpolation masks, generating more challenging mixed samples. These virtual samples enable DNNs to learn more robust features, ultimately reducing overfitting. Empirical evaluations on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet demonstrate that AdvMixUp outperforms existing MixUp variants.

PaperID: 150,

Authors: Jakub Kwiatkowski, Krzysztof Krawiec

Affiliations: Institute of Computing Science, Poznań University of Technology, Poznań, Poland

Title: Staged Self-Supervised Learning for Raven Progressive Matrices

Abstract:
This study presents and investigates abstract compositional transformers (ACTs), a class of deep learning (DL) architectures based on the transformer blueprint, designed to handle abstract reasoning tasks that require completing spatial visual patterns. We combine ACTs with choice-making modules and apply them to Raven progressive matrices (RPMs), logical puzzles that require selecting the correct image from the available answers. We devise a number of ACT variants, train them in several modes and with additional augmentations, subject them to ablations, demonstrate their data scalability, and analyze their behavior and latent representations that emerged in the process. Using self-supervision allows us to successfully train ACTs on relatively small training sets, mitigate several biases identified in RPMs in past studies, and achieve SotA results on the two most popular RPM benchmarks.

PaperID: 151,

Authors: Yuheng Wang, Zhenping Lan, Yanguo Sun, Nan Wang, Jiansong Li, Xincheng Yang

Affiliations: Department of Electronic Information, Dalian Polytechnic University, Dalian, China; Information Center, The Second Hospital of Dalian Medical University, Dalian, China

Title: Few-Shot Learning Based on Multimodal Information Processing

Abstract:
Few-shot learning aims to develop models with strong generalization capabilities using a small number of training samples. However, most learning methods rely solely on the visual features of a few samples to represent entire categories, leading to poor category representativeness. In contrast, humans can utilize multimodal information to learn category features, thereby making them more representative. Hence, this article emulates the human multimodal learning mechanism by integrating visual features with textual information, thereby facilitating the model’s acquisition of more representative and robust category features. Specifically, this article introduces a novel multimodal fusion mechanism—the visual-semantic fusion selection mechanism (VSFSM)—which comprises a fusion selection module (FS-Module) and a category enhancement module (CE-Module). These two modules collaboratively enhance the model’s classification performance. The FS-Module aligns and fuses semantic information with visual features across both channel and spatial dimensions, performing feature selection and reconstruction. This process not only generates representative category features but also mitigates the impact of noise. The CE-Module guides the model to emphasize category-specific features in the query images, ultimately yielding representative visual-semantic category features while reducing the interference of noise in the query images. Additionally, to better facilitate few-shot learning, this article introduces a novel objective loss function for optimized training. Extensive comparative and ablation experiments conducted on multiple datasets further validate the effectiveness of the proposed method.

PaperID: 152,

Authors: Weicai Li, Tiejun Lv, Wei Ni, Jingbo Zhao, Ekram Hossain, H. Vincent Poor

Affiliations: School of Information and Communication Engineering, Beijing University of Posts and Telecommunications (BUPT), Beijing, China; School of Electrical and Data Engineering, University of Technology Sydney, Ultimo, NSW, Australia; Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, MB, Canada; Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ, USA

Title: Route-and-Aggregate Decentralized Federated Learning Under Communication Errors

Abstract:
Decentralized federated learning (D-FL) allows clients to aggregate learning models locally, offering flexibility and scalability. Existing D-FL methods use gossip protocols, which are inefficient when not all nodes in the network are D-FL clients. This article puts forth a new D-FL strategy, termed route-and-aggregate (R&A) D-FL, where participating clients exchange models with their peers through established routes (as opposed to flooding) and adaptively normalize their aggregation coefficients to compensate for communication errors. The impact of routing and imperfect links on the convergence of R&A D-FL is analyzed, revealing that convergence is minimized when routes with the minimum end-to-end (E2E) packet error rates (PERs) are employed to deliver models. Our analysis is experimentally validated through three image classification tasks and two next-word prediction tasks, utilizing widely recognized datasets and models. R&A D-FL outperforms the flooding-based D-FL method in terms of training accuracy by 35% in our tested ten-client network, and shows strong synergy between D-FL and networking. In another test with ten D-FL clients, the training accuracy of R&A D-FL with communication errors approaches that of the ideal centralized federated learning (C-FL) without communication errors, as the number of routing nodes (i.e., nodes that do not participate in the training of D-FL) rises to 28.

PaperID: 153,

Authors: Yongri Piao, Zhi Wang, Tingwei Liu, Jihao Yin, Miao Zhang, Huchuan Lu

Affiliations: School of Information and Communication Engineering, Dalian University of Technology, Dalian, China; International School of Information and Software Engineering, Dalian University of Technology, Dalian, China

Title: Learning Discriminative Representation for Co-Salient Object Detection

Abstract:
Co-salient object detection (CoSOD) is the task of identifying and emphasizing the common salient objects in a collection of images. The current co-salient object detection frameworks often extract features and model interimage relations separately. Although these methods achieve promising performance in many scenes, separating the feature extraction and relation modeling falls short of obtaining discriminative features for co-salient objects, resulting in subperformance, especially in some complex and cluttered real-world scenes. In this article, we introduce a novel CoSOD framework to unify feature extraction and interimage relation modeling. We design an early token interaction module (ETIM) that bridges information flow between branches to simultaneously realize feature extraction and interimage information interaction. To further enhance our network’s capability to distinguish co-salient objects from other irrelevant foreground objects, we introduce a pixel-to-group contrastive (PGC) learning method. This approach aids in eliminating the need for additional interaction modules while preserving features’ discriminative power for co-salient objects. Our proposed CoSOD framework only includes a backbone embedded with ETIM, a decoder without interaction modules and a project head only used during the training phase. Extensive experiments on three challenging benchmarks, that is, CoCA, CoSOD3k, and Cosal2015, demonstrate that our proposed method can outperform current leading-edge models and achieve the new state-of-the-art. The source code is available at https://github.com/zhiwang98/LDRNet

PaperID: 154,

Authors: Zhiqiang Pang, Hong Wang, Qi Xie, Deyu Meng, Zongben Xu

Affiliations: School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, Shaanxi, China; School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, China

Title: TRG-Net: An Interpretable and Controllable Rain Generator

Abstract:
Exploring and modeling the rain generation mechanism is critical for augmenting paired data to ease the training of rainy image processing models. Most of the conventional methods handle this task in an artificial physical rendering manner, through elaborately designing fundamental elements constituting rains. These kinds of methods, however, are over-dependent on human subjectivity, which limits their adaptability to real rains. In contrast, recent deep learning (DL) methods have achieved great success by training a neural network-based generator from pre-collected rainy image data. However, current methods usually design the generator in a “closed box” manner, increasing the learning difficulty and data requirements. To address these issues, this study proposes a novel DL-based rain generator, which fully takes the physical generation mechanism underlying rains into consideration and well encodes the learning of the fundamental rain factors (i.e., shape, orientation, length, width, and sparsity) explicitly into the deep network. Its significance lies in that the generator not only elaborately designs essential elements of the rain to simulate expected rains, like conventional artificial strategies, but also finely adapts to complicated and diverse practical rainy images, like DL methods. By rationally adopting the filter parameterization technique, the proposed rain generator is finely controllable with respect to rain factors and able to learn the distribution of these factors purely from data without the need for rain factor labels. Our unpaired generation experiments demonstrate that the rain generated by the proposed rain generator is not only of higher quality but also more effective for deraining and downstream tasks compared to current state-of-the-art rain generation methods. Besides, the paired data augmentation experiments, including both in-distribution and out-of-distribution (OOD), further validate the diversity of samples generated by our model for in-distribution deraining and OOD generalization tasks.

PaperID: 155,

Authors: Guozhi Liu, Weiwei Lin, Tiansheng Huang, Fang Shi, Wentai Wu, Li Shen

Affiliations: School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China; College of Mathematics and Informatics, South China Agricultural University, Guangzhou, Guangdong, China; Department of Computer Science, College of Information Science and Technology, Jinan University, Guangzhou, China; School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China

Title: AdaptiveFL: Communication-Adaptive Federated Learning Under Dynamic Bandwidth

Abstract:
Federated learning (FL) is a distributed machine learning paradigm that enables heterogeneous devices to train a model collaboratively. Recognizing communication as a bottleneck in FL, existing communication-efficient solutions, e.g., HeteroFL and LotteryFL, etc., utilize gradient sparsification to reduce communication costs. However, existing solutions fail to address the dynamic bandwidth issue in which the bandwidth of each client is constantly changing throughout the training process. In this article, we propose AdaptiveFL, a communication-adaptive FL framework, considering the dynamic constraints of bandwidth. The design of AdaptiveFL follows two key steps: 1) in each round, each device selects a best-fit sub-model for communication per currently available bandwidth; and 2) to guarantee the performance of each sub-model sent under dynamic bandwidth constraints, AdaptiveFL employs a local training method that enables each device to train a “tailorable” local model, which can be tailored to any sparsity with competitive accuracy. We compare AdaptiveFL with several communication-efficient SOTA methods and demonstrate that AdaptiveFL outperforms other baselines by a large margin.

PaperID: 156,

Authors: Chen Li, Zeyi Liu, Xiao He, Pengyu Han

Affiliations: Department of Automation, Tsinghua University, Beijing, China; Shanghai BYD Company Ltd., Shanghai, China

Title: Factorization-Based Broad Learning System With Time-Dependent Structure

Abstract:
In response to the increasing complexity of tasks in artificial intelligence, broad learning systems (BLSs) have emerged as essential tools, especially given the limitations of deep neural networks, such as their extensive training and computational demands. This study addresses the computational inefficiencies and numerical instabilities inherent in traditional BLS when handling complex tasks in dynamic environments. To mitigate these challenges, we propose an enhanced version of BLS incorporating QR factorization (QRF), referred to as QRBLS, which is known for improving numerical stability. This framework replaces the traditional method of computing output weights, which typically relies on the Moore-Penrose pseudoinverse. The primary contribution of this article is the integration of QRF into the BLS architecture, thereby improving stability when processing large-scale datasets. QRBLS also features a dynamic updating mechanism that adjusts model parameters efficiently with new data, enabling continuous learning without the need for full-model re-evaluation. In addition, a time-dependent structure (TDS) enhances the model’s responsiveness to temporal data changes, increasing its utility in dynamic environments. Validation through numerical experiments demonstrated that QRBLS outperformed traditional BLS, exhibiting superior stability and adaptability in handling data anomalies and rapid updates. The integration of QRF and TDS significantly improves the adaptability and computational efficiency of BLS, providing a robust solution for large scale and dynamic AI applications. QRBLS effectively addresses challenges related to numerical instability and continuous learning, offering practical improvements in real-world settings.

PaperID: 157,

Authors: Xiang Hao, Chenxiang Ma, Qu Yang, Jibin Wu, Kay Chen Tan

Affiliations: Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong, SAR, China; Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore; Department of Data Science and Artificial Intelligence, the Department of Computing, and the Research Center of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong, SAR, China; Department of Data Science and Artificial Intelligence and the Research Center of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong, SAR, China

Title: Toward Ultralow-Power Neuromorphic Speech Enhancement With Spiking-FullSubNet

Abstract:
Speech enhancement (SE) is critical for improving speech intelligibility and quality in various audio devices. In recent years, deep learning-based methods have significantly improved SE performance, but they often come with a high computational cost, which is prohibitive for a large number of edge devices, such as headsets and hearing aids. This work proposes an ultralow-power SE system based on the brain-inspired spiking neural network (SNN) called Spiking-FullSubNet. Spiking-FullSubNet follows a full-band and subband fusioned approach to effectively capture both global and local spectral information. To enhance the efficiency of computationally expensive subband modeling, we introduce a frequency partitioning method inspired by the sensitivity profile of the human peripheral auditory system. Furthermore, we introduce a novel spiking neuron model that can dynamically control the input information integration and forgetting, enhancing the multiscale temporal processing capability of SNN, which is critical for speech denoising. Experiments conducted on the recent Intel Neuromorphic Deep Noise Suppression (N-DNS) Challenge dataset show that the Spiking-FullSubNet surpasses state-of-the-art (SOTA) methods by large margins in terms of both speech quality and energy efficiency metrics. Notably, our system won the championship of the Intel N-DNS Challenge (algorithmic track), opening up a myriad of opportunities for ultralow-power SE at the edge. Our source code and model checkpoints are publicly available at github.com/haoxiangsnr/spiking-fullsubnet

PaperID: 158,

Authors: Xinyu Chen, Jian Wang, Jie Yang, Chao Zhang, Dacheng Tao

Affiliations: School of Mathematical Sciences, Dalian University of Technology, Dalian, China; College of Science, China University of Petroleum (East China), Qingdao, China; College of Computing and Data Science, Nanyang Technological University, Jurong West, Singapore

Title: (δ, ε)-K Segmentation for Characterizing Well-Clusterable Sets

Abstract:
Kleinberg (2002) introduced three axioms to formalize the behavior of clustering algorithms and presented that no clustering algorithm satisfies them. However, this result usually is inconsistent with the practical experience of clustering algorithms. In this article, we reformulate these axioms to fill the gap and verify the existence of clustering algorithms satisfying the modified axioms when the point set is well-clusterable. In particular, the concept of (\delta ,\varepsilon) –K segmentation is proposed to characterize the set that has the potential to be clustered well. Then, we verify the existence and the uniqueness of (\delta ,\varepsilon) –K segmentation of a set, respectively. Next, we demonstrate that the (\delta ,\varepsilon) –K segmentation is compatible with K–means, Min-Cut, DBSCAN, and several clustering internal evaluation (CIE) indexes. In addition, the ratio \delta / \varepsilon can be treated as the measure of not only characterizing well-clusterable sets but also of evaluating the performance of clustering results, respectively.

PaperID: 159,

Authors: Jinjing Shi, Ren-Xin Zhao, Wenxuan Wang, Shichao Zhang, Xuelong Li

Affiliations: School of Electronic Information, Central South University, Changsha, China; School of Computer Science and Engineering, Central South University, Changsha, China; Guangxi Key Laboratory of Multi-Source Information Mining and Security, Guangxi Normal University, Guilin, China; Institute of Artificial Intelligence (TeleAI), China Telecom Corporation Ltd., Beijing, P. R. China

Title: QSAN: A Near-Term Achievable Quantum Self-Attention Network

Abstract:
Self-attention mechanism (SAM) is good at capturing the intrinsic connection between features to dramatically boost the performance of machine learning models. Nevertheless, the capability of SAM is not equipped with many current quantum machine learning (QML) models, thus confining their expansion on massive high-dimensional quantum data. To address the above problems, a quantum SAM (QSAM) consisting of a quantum logic similarity (QLS)-based quantum bit self-attention score matrix (QBSASM) is introduced to augment the data representation of SAM exponentially. According to QSAM, the framework and quantum circuits of a one-step achievable quantum self-attention network (QSAN) are designed to consider measurement times compression fully. Moreover, a prototype of quantum coordinates is presented during the design process to describe the mathematical relationship between the output bits and the control bits to facilitate the programming. Ultimately, MNIST binary classification experiments on the PennyLane platform and comparisons with cutting-edge QML models demonstrate QSAN converges about 1.7× and 2.3× faster than hardware-efficient ansatz and quantum approximate optimization algorithm (QAOA) ansatz, respectively, with similar parameter configurations and 100% prediction accuracy, which indicates that it has a better learning capability. In the CIFAR-10 classification experiments, QSAN achieves high prediction accuracy at a small scale relative to classical machine learning models. Predictably, QSAN elevates the efficiency of QML models and lays the foundation for future quantum computers to perform machine learning on massive amounts of data while promoting the advancement of quantum computer vision and other fields.

PaperID: 160,

Authors: Haonan Xin, Danyang Wu, Jitao Lu, Rong Wang, Feiping Nie, Xuelong Li

Affiliations: School of Artificial Intelligence, Optics and Electronics (iOPEN) and the Key Laboratory of Intelligent Interaction and Applications (Ministry of Industry and Information Technology), Northwestern Polytechnical University, Xi’an, Shaanxi, China; College of Information Engineering, Northwest A&F University, Xianyang, China; School of Computer Science, the School of Artificial Intelligence, Optics and Electronics (iOPEN), and the Key Laboratory of Intelligent Interaction and Applications (Ministry of Industry and Information Technology), Northwestern Polytechnical University, Xi’an, Shaanxi, China; Institute of Artificial Intelligence, China Telecom Corporation Ltd., Beijing, China

Title: Multiview Clustering via Block Diagonal Graph Filtering

Abstract:
Graph-based multiview clustering methods have gained significant attention in recent years. In particular, incorporating graph filtering into these methods allows for the exploration and utilization of both feature and topological information, resulting in a commendable improvement in clustering accuracy. However, these methods still exhibit several limitations: 1) the graph filters are predetermined, which disconnects the link with subsequent clustering tasks and 2) the separability of the filtered features is poor, which may not be suitable for the clustering. To mitigate these aforementioned issues, we propose Multiview Clustering via Block Diagonal Graph Filtering (MvC-BDGF), which can learn cluster-friendly graph filters. Specifically, the block diagonal graph filter with localized characteristics, which could make the filtered features very discriminating, is innovatively designed. The MvC-BDGF model seamlessly integrates the learning of graph filters with the acquisition of consensus graphs, forming a unified framework. This integration allows the model to obtain optimal filters and simultaneously acquire corresponding clustering labels. To solve the optimization problem in the MvC-BDGF model, an iterative solver based on the coordinate descent method is devised. Finally, a large number of experiments on benchmark datasets fully demonstrate the effectiveness and superiority of the proposed model. The code is available at https://github.com/haonanxin/MvC-BDGF_code.

PaperID: 161,

Authors: Xijia Tang, Chao Xu, Hong Tao, Xiaoyu Ma, Chenping Hou

Affiliations: College of Science, National University of Defense Technology, Changsha, China

Title: Confidence-Based PU Learning With Instance-Dependent Label Noise

Abstract:
Positive and unlabeled (PU) learning, which trains binary classifiers using only PU data, has gained vast attentions in recent years. Traditional PU learning often assumes that all the positive samples are labeled accurately. Nevertheless, due to the reasons such as sample ambiguity and insufficient algorithms, label noise is almost unavoidable in this scenario. Current PU algorithms neglect the label noise issue in the positive set, which is often biased toward certain instances rather than being uniformly distributed in practical applications. We define this important but understudied problem as PU learning with instance-dependent label noise (PUIDN). To eliminate the adverse impact of IDN, we leverage confidence scores for each instance in the positive set, which establish the connection between samples and labels without any assumption on noise distribution. Then, we propose an unbiased estimator for classification risk considering both label and confidence information, which can be computed immediately from PUIDN data along with their confidence scores. Moreover, our classification framework integrates an optimization strategy of alternating iteration based on the correlation between different confidence information, thereby alleviating the additional requirement for training data. Theoretically, we derive a generalization error bound for our proposed method. Experimentally, the effectiveness of our approach is demonstrated through various types of numerical results.

PaperID: 162,

Authors: Lanlan Feng, Yipeng Liu, Ziming Liu, Ce Zhu

Affiliations: School of Information and Communication Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China

Title: Online Nonconvex Robust Tensor Principal Component Analysis

Abstract:
Robust tensor principal component analysis (RTPCA) based on tensor singular value decomposition (t-SVD) separates the low-rank component and the sparse component from the multiway data. For streaming data, online RTPCA (ORTPCA) processes tensor data sequentially, where the low-rank component is updated based on the latest estimation and the newly arrived sample. It enhances both computation and storage efficiency. However, in most of the existing ORTPCA methods, the relaxation from tensor multirank to the convex tensor nuclear norm (TNN) may have a certain modeling error, which leads to unavoidable tracking accuracy loss. In this article, a tensor Schatten-p norm ( 0\lt p\lt 1 ) is applied to provide a tighter approximation of the tensor rank. A Lemma is deduced to divide the Schatten-p norm into terms to be updated in an online way. Based on it, the corresponding online nonconvex RTPCA (ONRTPCA) method is proposed for efficient tensor subspace tracking. Moreover, we incorporate the dynamic forgetting window into ONRTPCA to adaptively track varying subspaces. In addition, this article also provides convergence analysis and complexity analysis. Experimental results on synthetic data and real-world video data show that our proposed method achieves superior subspace tracking accuracy in comparison with a series of state-of-the-art methods while maintaining a high convergence speed and low memory requirement.

PaperID: 163,

Authors: Eunho Koo, Tongseok Lim

Affiliations: Department of Big Data Convergence, Chonnam National University, Gwangju, Republic of Korea; Mitch Daniels School of Business, Purdue University, West Lafayette, IN, USA

Title: Node Classification in Networks via Simplicial Interactions

Abstract:
In the node classification task, it is natural to presume that densely connected nodes tend to exhibit similar attributes. Given this, it is crucial to first define what constitutes a dense connection and to develop a reliable mathematical tool for assessing node cohesiveness. In this article, we propose a probability-based objective function for semi-supervised node classification that takes advantage of higher order networks’ capabilities. The proposed function reflects the philosophy aligned with the intuition behind classifying within higher order networks, as it is designed to reduce the likelihood of nodes interconnected through higher order networks bearing different labels. In addition, we propose the stochastic block tensor model (SBTM) as a graph generation model designed specifically to address a significant limitation of the traditional stochastic block model (SBM), which does not adequately represent the distribution of higher order structures in real networks. We evaluate the objective function using networks generated by the SBTM, which include both balanced and imbalanced scenarios. Furthermore, we present an approach that integrates the objective function with graph neural network (GNN)-based semi-supervised node classification methodologies, aiming for additional performance gains. Our results demonstrate that in challenging classification scenarios—characterized by a low probability of homo-connections, a high probability of hetero-connections, and limited prior node information—models based on the higher order network outperform pairwise interaction-based models. Furthermore, the experimental results suggest that integrating our proposed objective function with existing GNN-based node classification approaches enhances the classification performance by efficiently learning higher order structures distributed in the network.

PaperID: 164,

Authors: Lanjihong Ma, Yao-Xiang Ding, Peng Zhao, Zhi-Hua Zhou

Affiliations: National Key Laboratory for Novel Software Technology and the School of Artificial Intelligence, Nanjing University, Nanjing, China; State Key Laboratory of Computer Aided Design and Computer Graphics, College of Computer Science and Technology, Zhejiang University, Hangzhou, China

Title: Learning Objective Adaptation by Correlation-Based Model Reuse

Abstract:
In open-environment machine learning (open ML), the learning objectives can vary according to specific real-world requirements. Models tailored for initial objectives may not be appropriate for the varied objectives. Retraining models from scratch for every single objective can be computationally intensive. Therefore, it is desirable to reuse models trained on the original objectives to help learn under the varied objectives. To this end, it is essential to characterize the objective correlations to better reuse the models. Previous works only consider the relative importance between pairs of previous and varied objectives, also known as previous-varied objectives correlations, ignoring correlations among the original objectives themselves. In this article, we demonstrate the importance of cross-original objective correlations. We propose a novel approach that employs the optimal transport technique to model correlations across all previous and varied objectives and then facilitates model reuse by utilizing learned transportation discrepancies to incorporate model reusabilities. Our empirical results show that our approach significantly outperforms existing benchmarks and well captures the underlying objective structure, validating the importance of accurate objective correlation modeling for learning with varied objectives.

PaperID: 165,

Authors: Xuyang Zhong, Chen Liu

Affiliations: Department of Computer Science, City University of Hong Kong, Hong Kong, SAR, China

Title: Toward Mitigating Architecture Overfitting on Distilled Datasets

Abstract:
Dataset distillation (DD) methods have demonstrated remarkable performance for neural networks trained with very limited training data. However, a significant challenge arises in the form of architecture overfitting: the distilled training dataset synthesized by a specific network architecture (i.e., training network) generates poor performance when trained by other network architectures (i.e., test networks), especially when the test networks have a larger capacity than the training network. This article introduces a series of approaches to mitigate this issue. Among them, DropPath renders the large model to be an implicit ensemble of its subnetworks, and knowledge distillation (KD) ensures each subnetwork acts similar to the small but well-performing teacher network. These methods, characterized by their smoothing effects, significantly mitigate architecture overfitting. We conduct extensive experiments to demonstrate the effectiveness and generality of our methods. Particularly, across various scenarios involving different tasks and different sizes of distilled data, our approaches significantly mitigate architecture overfitting. Furthermore, our approaches achieve comparable or even superior performance when the test network is larger than the training network. Codes are available at https://github.com/CityU-MLO/mitigate_architecture_overfitting.

PaperID: 166,

Authors: José D. Huerta-Morales, Chenglong You, Omar S. Magaña-Loaiza, Shi-Hai Dong, Roberto de J. León-Montiel, Mario Alan Quiroz-Juárez

Affiliations: Instituto de Ciencias Nucleares, Universidad Nacional Autónoma de México, Ciudad de México, Mexico; Department of Physics and Astronomy, Quantum Photonics Laboratory, Louisiana State University, Baton Rouge, LA, USA; Research Center for Quantum Physics, Huzhou University, Huzhou, China; Centro de Física Aplicada y Tecnología Avanzada, Universidad Nacional Autónoma de México, Querétaro, Mexico

Title: Smart Machine Vision for Universal Spatial-Mode Reconstruction

Abstract:
Structured light beams, in particular, those carrying orbital angular momentum (OAM), have gained a lot of attention due to their potential for enlarging the transmission capabilities of communication systems. However, the use of OAM-carrying light in communications faces two major problems, namely distortions introduced during propagation in disordered media, such as the atmosphere or optical fibers, and the large divergence that high-order OAM modes experience. While the use of nonorthogonal modes may offer a way to circumvent the divergence of high-order OAM fields, artificial intelligence (AI) algorithms have shown promise for solving the mode-distortion issue. Unfortunately, current AI-based algorithms make use of large-amount data-handling protocols that generally lead to large processing time and high power consumption. Here, we show that a low-power, low-cost image sensor can act as an artificial neural network that simultaneously detects and reconstructs distorted OAM-carrying beams. We demonstrate the capabilities of our device by reconstructing (with a 95% efficiency) individual Vortex, Laguerre-Gaussian (LG), and Bessel modes, as well as hybrid (nonorthogonal) coherent superpositions of such modes. Our work provides a potentially useful basis for the development of low-power-consumption, light-based communication devices.

PaperID: 167,

Authors: Yuxun Qu, Yongqiang Tang, Chenyang Zhang, Xiangrui Cai, Xiaojie Yuan, Wensheng Zhang

Affiliations: College of Computer Science, Nankai University, Tianjin, China; State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; College of Computer Science, VCIP, TMCC, TBI Center, Nankai University, Tianjin, China

Title: Dual-Space Contrastive Learning for Open-World Semi-Supervised Classification

Abstract:
Despite recent progress in semi-supervised learning (SSL), its scalability remains limited in realistic scenarios where unseen classes may appear in the unlabeled data. To address this challenge, open-world SSL (OWSSL) is proposed in recent years and attracts much attention. One core difficulty in OWSSL is to enhance the representative ability for unlabeled samples, especially for those in novel classes. More recently, several works introduce contrastive learning into OWSSL and achieve impressive performance. However, they mainly focus on conducting contrastive learning solely in either feature or prediction spaces, while ignoring the thorough exploration of information potentials in dual spaces. In this study, we propose a novel method to handle OWSSL tasks via dual-space contrastive learning (DSCL). DSCL contains two modules: intraspace contrastive learning and interspace contrastive learning. In the intraspace module, we bridge the two spaces with a learnable classifier and impose contrastive learning in the dual spaces, such that the category discriminative information could be effectively utilized to improve the representative ability. In interspace module, to further enhance the utilization of complementary information from dual spaces, we introduce neighborhood information from feature space to enhance predictive learning and meanwhile utilize the cluster structure from the prediction spaces to improve intraclass compactness of the features. Compared with state-of-the-art competitors, the proposed DSCL achieves superior performance on the popular benchmarks, i.e., CIFAR100, Imagenet100, CIFAR10, CUB-200, and Scar.

PaperID: 168,

Authors: Jun Wang, Chuang Sun, Asoke K. Nandi, Ruqiang Yan, Xuefeng Chen

Affiliations: Department of Mechanical Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China

Title: Brain-Inspired Meta-Learning for Few-Shot Bearing Fault Diagnosis

Abstract:
Deep learning has attracted much attention in bearing fault diagnosis because of its high precision and end-to-end modules. However, in real industrial scenarios, some complex mechanical structures and working environments hinder data collection and fault reproduction, which makes bearing fault diagnosis with few samples a practical but challenging issue. As a data-driven approach, the standard deep learning method cannot extract features from a few samples due to overfitting. Neuroscience research has shown that the learning mechanism of the biological brain is more adaptable to learning tasks with few samples. Motivated by this, we propose a brain-inspired meta-learning (BIML) strategy for diagnosing few-shot bearing faults. Specifically, we design a brain-like learning algorithm for spiking neural networks (SNNs) based on the biological nervous system’s learning mechanism and introduce a meta-learning strategy to apply it to the fault diagnosis task of bearing with few samples. Experimental results show that BIML is better than existing few-shot bearing fault diagnosis methods. Subsequently, we conduct a theoretical analysis of the effectiveness of BIML strategies and verify our analysis through experiments.

PaperID: 169,

Authors: Gang Wang, Yufei Chen

Affiliations: School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China; School of Computer Science and Technology, Tongji University, Shanghai, China

Title: Two-View Correspondence Learning With Local Consensus Transformer

Abstract:
Correspondence learning is a crucial component in multiview geometry and computer vision. The presence of heavy outliers (mismatches) consistently renders the matching problem to be highly challenging. In this article, we revisit the benefits of local consensus (LC) in traditional feature matching and introduce the concept of LC to design a trainable neural network capable of capturing the underlying correspondences. This network is named the LC transformer (LCT) and is specifically tailored for wide-baseline stereo applications. Our network architecture comprises three distinct operations. To establish the neighbor topology, we employ a dynamic graph-based embedding layer as the initial step. Subsequently, these local topologies serve as guidance for the multihead self-attention layer, enabling it to extract a more extensive contextual understanding through channel attention (CA). Following this, order-aware graph pooling is applied to extract the global context information from the embedded LC. Through the experimental analysis, the ablation study reveals that PointNet-like learning models can, indeed, benefit from the incorporation of LC. The proposed model achieves state-of-the-art performance in both challenging scenes, namely, the YFCC100M outdoor and SUN3D indoor environments, even in the presence of more than 90% outliers.

PaperID: 170,

Authors: Xiao Cui, Yulei Qin, Yuting Gao, Enwei Zhang, Zihan Xu, Tong Wu, Ke Li, Xing Sun, Wengang Zhou, Houqiang Li

Affiliations: Department of Electrical Engineering and Information Science, University of Science and Technology of China, Hefei, China; Tencent YouTu Laboratory, Shanghai, China

Title: SinKD: Sinkhorn Distance Minimization for Knowledge Distillation

Abstract:
Knowledge distillation (KD) has been widely adopted to compress large language models (LLMs). Existing KD methods investigate various divergence measures including the Kullback-Leibler (KL), reverse KL (RKL), and Jensen-Shannon (JS) divergences. However, due to limitations inherent in their assumptions and definitions, these measures fail to deliver effective supervision when a distribution overlap exists between the teacher and the student. In this article, we show that the aforementioned KL, RKL, and JS divergences, respectively, suffer from issues of mode-averaging, mode-collapsing, and mode-underestimation, which deteriorates logits-based KD for diverse natural language processing (NLP) tasks. We propose the Sinkhorn KD (SinKD) that exploits the Sinkhorn distance to ensure a nuanced and precise assessment of the disparity between distributions of teacher and student models. Besides, thanks to the properties of the Sinkhorn metric, we get rid of sample-wise KD that restricts the perception of divergences inside each teacher-student sample pair. Instead, we propose a batch-wise reformulation to capture the geometric intricacies of distributions across samples in the high-dimensional space. A comprehensive evaluation of GLUE and SuperGLUE, in terms of comparability, validity, and generalizability, highlights our superiority over state-of-the-art (SOTA) methods on all kinds of LLMs with encoder-only, encoder-decoder, and decoder-only architectures. Codes and models are available at https://github.com/2018cx/SinKD.

PaperID: 171,

Authors: Yanbei Liu, Lu Yu, Shichuan Zhao, Xiao Wang, Lei Geng, Zhitao Xiao, Shuai Ma, Yanwei Pang

Affiliations: School of Life Sciences, Tiangong University, Tianjin, China; School of Electronics and Information Engineering, Tiangong University, Tianjin, China; School of Software, Beihang University, Beijing, China; School of Computer Science and Engineering, Beihang University, Beijing, China; School of Electrical and Information Engineering, Tianjin University, Tianjin, China

Title: Graph Neural Networks With Adaptive Confidence Discrimination

Abstract:
Graph neural networks (GNNs) have demonstrated remarkable success for semisupervised node classification. However, these GNNs are still limited to the conventionally semisupervised framework and cannot fully leverage the potential value of large numbers of unlabeled samples. The pseudolabeling method in semisupervised learning (SSL) is widely recognized because it can clearly leverage unlabeled samples. Nevertheless, the existing pseudolabeling methods usually utilize a fixed threshold for all classes and only use a portion of unlabeled samples (ones with high prediction confidence), which leads to class imbalance and low data utilization. To solve these problems, we propose GNNs with adaptive confidence discrimination (ACDGNN) to fully utilize unlabeled samples for facilitating semisupervised node classification. Specifically, an adaptive confidence discrimination module is designed to divide all unlabeled nodes into two subsets by comparing their confidence scores with the adaptive confidence threshold at each training epoch. Then, different constraint strategies for two subset nodes are employed. Unlabeled nodes with high confidence are used to iteratively expand the label set, while ones with low confidence learn discriminative features by applying contrastive learning. Validated by extensive experiments, the proposed ACDGNN delivers significant accuracy gains over the previous SOTAs: an average improvement of 2.0% on all datasets and 5.7% on the Flickr dataset in particular.

PaperID: 172,

Authors: Dayang Wang, Feng-Lei Fan, Bojian Hou, Hao Zhang, Zhen Jia, Boce Zhang, Rongjie Lai, Hengyong Yu, Fei Wang

Affiliations: Department of Electrical and Computer Engineering, University of Massachusetts, Lowell, MA, USA; Weill Cornell Medicine, Cornell University, New York City, NY, USA; Department of Food Science and Human Nutrition, University of Florida, Gainesville, FL, USA; Department of Mathematics, Rensselaer Polytechnic Institute, Troy, NY, USA

Title: Manifoldron: Direct Space Partition via Manifold Discovery

Abstract:
A neural network (NN) with the widely-used ReLU activation has been shown to partition the sample space into many convex polytopes for prediction. However, the parametric way a NN and other machine learning models use to partition the space has imperfections, e.g., the compromised interpretability for complex models, the inflexibility in decision boundary construction due to the generic character of the model, and the risk of being trapped into shortcut solutions. In contrast, although the nonparameterized models can adorably avoid or downplay these issues, they are usually insufficiently powerful either due to over-simplification or the failure to accommodate the manifold structures of data. In this context, we first propose a new type of machine learning models referred to as Manifoldron that directly derives decision boundaries from data and partitions the space via manifold structure discovery. Then, we systematically analyze the key characteristics of the Manifoldron such as manifold characterization capability and its link to NNs. The experimental results on four synthetic examples, 20 public benchmark datasets, and one real-world application demonstrate that the proposed Manifoldron performs competitively compared to the mainstream machine learning models. We have shared our code in https://github.com/wdayang/Manifoldron for free download and evaluation.

PaperID: 173,

Authors: Zhenbo Huang, Jing Zhao, Shiliang Sun

Affiliations: School of Computer Science and Technology, East China Normal University, Shanghai, China; Department of Automation, Shanghai Jiao Tong University, Shanghai, China

Title: De-Pessimism Offline Reinforcement Learning via Value Compensation

Abstract:
Offline reinforcement learning (RL) has been widely used in practice due to its efficient data utilization, but it still faces the challenge of training vulnerability caused by policy deviation. Existing offline RL methods that add policy constraints or perform conservative Q-value estimation are pessimistic, making the learned policy suboptimal. In this article, we address the pessimism problem by focusing on accurate Q-value estimation. We propose the de-pessimism (DEP) operator to estimate Q values using the optimal Bellman operator or the compensation operator according to whether the actions are in the behavior support set. The compensation operator qualitatively determines the positive or negative nature of out-of-distribution (OOD) actions based on their performance compared with the behavior actions. It leverages differences in state values to compensate for the Q value of positive OOD actions, thereby alleviating pessimism. We theoretically demonstrate the convergence of DEP and its effectiveness in policy improvement. To further advance the practical application, we integrate DEP into the soft actor-critic (SAC) algorithm, yielding the value-compensated de-pessimism offline RL (DoRL-VC). Experimentally, DoRL-VC achieves state-of-the-art (SOTA) performance across mujoco locomotion, Maze 2-D, and challenging Adroit tasks, illustrating the efficacy of DEP in mitigating pessimism.

PaperID: 174,

Authors: Gui-Bin Bian, Xiang-Rong Tang, Ruichen Ma, Qiang Ye, Zhen Li, Ming-Yang Zhang, Han Ren, Yu-Peng Zhai

Affiliations: School of Automation, Beijing Information Science and Technology University, Beijing, China; Institute of Automation, Chinese Academy of Sciences, Beijing, China

Title: Few-Human-Interaction Reinforcement Learning for Autonomous Transbronchial Intervention

Abstract:
The transbronchial interventional surgery presents challenges with winding and convoluted pathways, prone to compression and friction. Current autonomous planning struggles to reach deeper bronchial positions, and hard to consider multiple conflicting goals simultaneously. This article introduces an innovative planning scheme with preference weights to achieve smooth, frictionless, and collision-free autonomous transbronchial intervention with continuum robot (CR). A few-human-interaction twin-delayed deep deterministic policy gradient (FHITD3) generated from surgeon preference guidance is proposed, which determines the optimal strategy for the motion of CR. Preference knowledge is generated through interaction between human and few diversity samples. An abstract actuator space description is proposed for the posture and position representation of CR during movement within bronchus. A contact motion analysis strategy is proposed to calculate real-time attitude of CR in contact with bronchus. In addition, an oscillation suppression approach to address CR’s unsmooth distal end trajectory is proposed. Simulated experiments show that the CR autonomously completes intervention tasks with a smooth and stable trajectory, reducing distal end oscillation by over 45%. It achieves a target endpoint within the fourth level bronchus (approximately 5 mm diameter) with over 90% probability.

PaperID: 175,

Authors: Yaohang Wu, Renwei Dian, Shutao Li

Affiliations: College of Electrical and Information Engineering and the Key Laboratory of Visual Perception and Artificial Intelligence of Hunan Province, Hunan University, Changsha, China; School of Robotics, Hunan University, Changsha, China

Title: Multistage Spatial-Spectral Fusion Network for Spectral Super-Resolution

Abstract:
Spectral super-resolution (SSR) aims to restore a hyperspectral image (HSI) from a single RGB image, in which deep learning has shown impressive performance. However, the majority of the existing deep-learning-based SSR methods inadequately address the modeling of spatial-spectral features in HSI. That is to say, they only sufficiently capture either the spatial correlations or the spectral self-similarity, which results in a loss of discriminative spatial-spectral features and hence limits the fidelity of the reconstructed HSI. To solve this issue, we propose a novel SSR network dubbed multistage spatial-spectral fusion network (MSFN). From the perspective of network design, we build a multistage Unet-like architecture that differentially captures the multiscale features of HSI both spatialwisely and spectralwisely. It consists of two types of the self-attention mechanism, which enables the proposed network to achieve global modeling of HSI comprehensively. From the perspective of feature alignment, we innovatively design the spatial fusion module (SpatialFM) and spectral fusion module (SpectralFM), aiming to preserve the comprehensively captured spatial correlations and spectral self-similarity. In this manner, the multiscale features can be better fused and the accuracy of reconstructed HSI can be significantly enhanced. Quantitative and qualitative experiments on the two largest SSR datasets (i.e., NTIRE2022 and NTIRE2020) demonstrate that our MSFN outperforms the state-of-the-art SSR methods. The code implementation will be uploaded at https://github.com/Matsuri247/MSFN-for-Spectral-Super-Resolution.

PaperID: 176,

Authors: Zhiwen Cao, Xijiong Xie

Affiliations: School of Information Science and Engineering, Ningbo University, Ningbo, China

Title: Partition-Level Tensor Learning-Based Multiview Unsupervised Feature Selection

Abstract:
Multiview unsupervised feature selection is an emerging direction in the machine learning community because of its ability to identify informative patterns and reduce the dimensionality of multiview data. Although numerous methods have been proposed and shown to be effective, they have some limitations: 1) most existing algorithms fail to improve the model performance along the view dimension; 2) they rarely incorporate more discriminative partition information; and 3) the negative effects of marginal samples are not considered. To solve these problems, we propose a novel method termed as partition-level tensor learning-based multiview unsupervised feature selection (PTFS). The proposed method optimizes a low-rank constrained tensor assembled by the inner product of base partition matrices. By doing so, PTFS simultaneously leverages the high-order view correlation and indirectly integrates discriminative partition information. Besides, a statistic-based adaptive self-paced strategy is introduced to ensure that confident samples are prioritized for training the model. Moreover, an effective alternating optimization method is designed to solve the resulting optimization problem. Extensive experiments on ten datasets demonstrate the effectiveness and efficiency of the proposed method compared to the state-of-the-art methods. The code is available at https://github.com/HdTgon/2023-TNNLS-PTFS.

PaperID: 177,

Authors: Hao Li, Wei Wang, Mengzhu Wang, Huibin Tan, Long Lan, Zhigang Luo, Xinwang Liu, Kenli Li

Affiliations: College of Computer, National University of Defense Technology, Changsha, China; School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China; College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China

Title: STFormer: Spatial-Temporal-Aware Transformer for Video Instance Segmentation

Abstract:
Video instance segmentation (VIS) is a challenging task, requiring handling object classification, segmentation, and tracking in videos. Existing Transformer-based VIS approaches have shown remarkable success, combining encoded features and instance queries as decoder inputs. However, their decoder inputs are low-resolution due to computational cost, resulting in a loss of fine-grained information, sensitivity to background interference, and poor handling of small objects. Moreover, the queries are randomly initialized without location information, hindering convergence efficiency and accurate object instance localization. To address these issues, we propose a novel VIS approach, STFormer, with a spatial-temporal feature aggregation (STFA) module and spatial-temporal-aware Transformer (STT). Specifically, STFA obtains robust high-resolution masked features efficiently for the decoder, while STT’s location-guided instance query (LGIQ) improves initial instance queries. STFormer preserves more fine-grained information, improves convergence efficiency, and localizes object instance features accurately. Extensive experiments on YouTube-VIS 2019, YouTube-VIS 2021, and OVIS datasets show that STFormer outperforms mainstream VIS methods.

PaperID: 178,

Authors: Xiang Chen, Min Liu, Rongguang Wang, Renjiu Hu, Dongdong Liu, Gaolei Li, Yaonan Wang, Hang Zhang

Affiliations: College of Electrical and Information Engineering, Hunan University, Changsha, China; Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, USA; Sibley School of Mechanical and Aerospace Engineering, Cornell University, Ithaca, NY, USA; Department of Mechanical and Aerospace Engineering, New York University, New York, NY, USA; Institute of Cyber Science and Technology, Shanghai Jiao Tong University, Shanghai, China; Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY, USA

Title: Spatially Covariant Image Registration With Text Prompts

Abstract:
Medical images are often characterized by their structured anatomical representations and spatially inhomogeneous contrasts. Leveraging anatomical priors in neural networks can greatly enhance their utility in resource-constrained clinical settings. Prior research has harnessed such information for image segmentation, yet progress in deformable image registration has been modest. Our work introduces textSCF, a novel method that integrates spatially covariant filters and textual anatomical prompts encoded by visual-language models, to fill this gap. This approach optimizes an implicit function that correlates text embeddings of anatomical regions to filter weights. textSCF not only boosts computational efficiency but can also retain or improve registration accuracy. By capturing the contextual interplay between anatomical regions, it offers impressive interregional transferability and the ability to preserve structural discontinuities during registration. textSCF’s performance has been rigorously tested on intersubject brain magnetic resonance imaging (MRI) and abdominal computerized tomography (CT) registration tasks, outperforming existing state-of-the-art models in the MICCAI Learn2Reg 2021 challenge and leading the leaderboard. In abdominal registrations, textSCF’s larger model variant improved the Dice score by 11.3% over the second-best model, while its smaller variant maintained similar accuracy but with an 89.13% reduction in network parameters and a 98.34% decrease in computational operations.

PaperID: 179,

Authors: Xinye Wang, Lei Duan, Lili Guan, Jiaxuan Xu, Chengxin He

Affiliations: School of Computer Science, Sichuan University, Chengdu, China; National Engineering Research Center for Biomaterials, Sichuan University, Chengdu, China

Title: Learning Decision Boundaries for Multidimensional Anomaly Detection

Abstract:
In this article, we attempt to study a problem of identifying outliers from multiple dimensions, terming it as multidimensional anomaly detection. This task assumes that each sample corresponds to heterogeneous discriminative spaces, where each space characterizes distinct semantic information along one dimension. Consequently, the abnormalities exhibited by a sample will vary under these semantically different dimensions. In contrast to traditional anomaly detection, multidimensional anomaly detection offers a more holistic assessment of a sample’s abnormalities. However, the heterogeneity of discriminative spaces leads to incomparability of the outputs from different dimensions which is the major difficulty in designing multidimensional anomaly detection methods. This article introduces a novel model, maximum margin multidimensional anomaly detection (ALOE), specifically tailored for multidimensional anomaly detection. ALOE constructs a convex optimization problem with nonlinear constraints. The primary objective is to simultaneously learn multiple decision boundaries, utilizing the maximum margin principle and covariance regularization, while distinguishing between outliers and normal samples under multiple dimensions by capturing the correlation among multiple dimensions. To obtain the optimal decision boundary under each dimension, we devise an alternating optimization method for this convex optimization problem. To validate the effectiveness of ALOE, we conduct extensive experiments on 12 real-world datasets, comparing its performance against 34 anomaly detection methods. The experimental results demonstrate the superior performance of ALOE.

PaperID: 180,

Authors: Wenbiao Yan, Jihua Zhu, Yiyang Zhou, Jinqian Chen, Haozhe Cheng, Kun Yue, Qinghai Zheng

Affiliations: School of Software, Xi’an Jiaotong University, Xi’an, China; Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming, China; College of Computer and Data Science, Fuzhou University, Fuzhou, China

Title: Neighbor-Based Completion for Addressing Incomplete Multiview Clustering

Abstract:
Driven by the complementarity and consistency inherent in multiview data, multiview clustering (MVC) has garnered widespread attention in various domains. Real-world data often encounters the issue of missing information, leading to a surge of interest in the domain of incomplete MVC (IMVC). Despite existing approaches having made significant progress in addressing IMVC, two significant challenges persist: 1) many alignment-based methodologies tend to overlook the topological relationships among instances and 2) the view representations based on completion lack reconstructive properties, casting doubt on their alignment with the actual view representations. In response, we present a novel approach termed neighbor-based completion for addressing IMVC (NBIMVC), which capitalizes on the topological information among instances and the consistent information across views. Specifically, our method uses autoencoders to learn feature representations for each view and leverages nearest-neighbor relationships between unique and complete instances to complete missing features in missing views. Subsequently, we enforce hard negative alignment constraints on complete paired instances in the feature space. Finally, we ensure the consistency of views in the semantic space by employing cluster information and a shared clustering network, which facilitates the final multiview categories output and effectively resolves the IMVC problem. Extensive experimental evaluations validate the efficacy of our proposed method, showcasing comparable or superior performance to existing approaches.

PaperID: 181,

Authors: Kaili Xiang, Ruotong Ming, Siyu Chen, Frank L. Lewis

Affiliations: International Joint Laboratory on Safety and Control of Autonomous Unmanned Systems of Ministry of Education, and the School of Automation, Chongqing University, Chongqing, China; GRASP Laboratory, University of Pennsylvania, Philadelphia, PA, USA; UTA Research Institute, University of Texas at Arlington, Fort Worth, TX, USA

Title: Neuroadaptive Control With Enhanced Stability and Reliability

Abstract:
The performance of neural network (NN)-driven control systems hinges on the reliability and functionality of the NN unit in the controller. Maintaining the compact set condition for NN training signals (inputs) during operation is crucial for preserving the NN’s universal learning and approximation capabilities, yet this requirement is often overlooked in existing studies. This article introduces a constraint transformation-based design method that ensures excitation signals always originate from a fixed region, regardless of initial conditions. By meeting the compactness condition required by the universal approximation theorem, this approach safeguards the functionality of the NN-driven control unit. Additionally, a decaying damping rate is employed to enable the tracking error to asymptotically converge to zero, rather than being ultimately uniformly bounded (UUB). To further ensure robust operation even if the NN underperforms due to an insufficient number of neurons or violation of the compact set condition, a new control strategy is developed based on the worst case behavior of NNs. This “fail-secure” mechanism significantly enhances the reliability of the NN-based control scheme. The effectiveness and benefits of the proposed method are confirmed through numerical simulations, demonstrating its potential to substantially improve the robustness and performance of NN-driven control systems.

PaperID: 182,

Authors: Jiawen Zhu, Xin Chen, Haiwen Diao, Shuai Li, Jun-Yan He, Chenyang Li, Bin Luo, Dong Wang, Huchuan Lu

Affiliations: School of Information and Communication Engineering, Dalian University of Technology, Dalian, China; Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong; Tongyi Lab, Alibaba Group, Shenzhen, China

Title: Exploring Dynamic Transformer for Efficient Object Tracking

Abstract:
The speed-precision tradeoff is a critical problem in visual object tracking, as it typically requires low latency and is deployed on resource-constrained platforms. Existing solutions for efficient tracking primarily focus on lightweight backbones or modules, which, however, come at a sacrifice in precision. In this article, inspired by dynamic network routing, we propose DyTrack, a dynamic transformer framework for efficient tracking. Real-world tracking scenarios exhibit varying levels of complexity. We argue that a simple network is sufficient for easy video frames, while more computational resources should be assigned to difficult ones. DyTrack automatically learns to configure proper reasoning routes for different inputs, thereby improving the utilization of the available computational budget and achieving higher performance at the same running speed. We formulate instance-specific tracking as a sequential decision problem and incorporate terminating branches to intermediate layers of the model. Furthermore, we propose a feature recycling mechanism to maximize computational efficiency by reusing the outputs of predecessors. Additionally, a target-aware self-distillation strategy is designed to enhance the discriminating capabilities of early-stage predictions by mimicking the representation patterns of the deep model. Extensive experiments demonstrate that DyTrack achieves promising speed-precision tradeoffs with only a single model. For instance, DyTrack obtains 64.9% area under the curve (AUC) on LaSOT with a speed of 256 fps.

PaperID: 183,

Authors: Mingxiang Liao, Fang Wan, Zonghao Guo, Qixiang Ye

Affiliations: School of Electronic, Electrical, and Communication Engineering, University of Chinese Academy of Sciences, Beijing, China; School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China

Title: Hierarchical AttentionShift for Pointly Supervised Instance Segmentation

Abstract:
Pointly supervised instance segmentation (PSIS) remains a challenging task when appearance variances across object parts cause semantic inconsistency. In this article, we propose a hierarchical AttentionShift approach, to solve the semantic inconsistency issue through exploiting the hierarchical nature of semantics and the flexibility of key-point representation. The estimation of hierarchical attention is defined upon key-point sets. The representative key points are iteratively estimated spatially and in the feature space to capture the fine-grained semantics and cover the full object extent. Hierarchical AttentionShift is performed at instance, part, and fine-grained levels, optimizing object semantics while promoting the conventional self-attention activation to hierarchical activation with local refinement. Experiments on PASCAL VOC 2012 Aug and MS-COCO 2017 benchmarks show that hierarchical AttentionShift improves the state-of-the-art (SOTA) method by 10.4% and 7.0% upon mean average precision (mAP)50, respectively. When applying hierarchical AttentionShift to the segment anything model (SAM), 9.4% AP improvement on the COCO test-dev is achieved. Hierarchical AttentionShift provides a fresh insight to regularize the self-attention mechanism for fine-grained vision tasks. The code is available at github.com/MingXiangL/AttentionShift.

PaperID: 184,

Authors: Joseph Kelley, Martin T. Hagan

Affiliations: School of Electrical and Computer Engineering, Oklahoma State University, Stillwater, OK, USA

Title: Adaptive Multistep Prediction With Sequence-to-Sequence (Seq2Seq) Models

Abstract:
This brief demonstrates for the first time that the sequence-to-sequence (Seq2Seq) model is an adaptive multistep predictor. The Seq2Seq model is fixed-weight adaptive, which means that the model can adapt to time-varying behaviors without having to update its weights and biases. Instead, the learning algorithm is embedded into the recurrent neural network (RNN) decoder. This brief examines the Seq2Seq model’s ability to adapt to time-varying behaviors using both simulated and experimental data, and it also identifies a mechanism within the model that enables the adaptation.

PaperID: 185,

Authors: Dan Li, Haibao Wang, Shihui Ying

Affiliations: School of Mathematics and Information Sciences, Yantai University, Yantai, China; Department of Intelligent Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Japan; Shanghai Institute of Applied Mathematics and Mechanics and the School of Mechanics and Engineering Science, Shanghai University, Shanghai, China

Title: Multiview Representation Learning With One-to-Many Dynamic Relationships

Abstract:
Integrating information from multiple views to obtain potential representations with stronger expressive ability has received significant attention in practical applications. Most existing algorithms usually focus on learning either the consistent or complementary representation of views and, subsequently, integrate one-to-one corresponding sample representations between views. Although these approaches yield effective results, they do not fully exploit the information available from multiple views, limiting the potential for further performance improvement. In this article, we propose an unsupervised multiview representation learning method based on sample relationships, which enables the one-to-many fusion of intraview and interview information. Due to the heterogeneity of views, we need mainly face the two following challenges: 1) the discrepancy in the dimensions of data across different views and 2) the characterization and utilization of sample relationships across these views. To address these two issues, we adopt two modules: the dimension consistency relationship enhancement module and the multiview graph learning module. Thereinto, the relationship enhancement module addresses the discrepancy in data dimensions across different views and dynamically selects data dimensions for each sample that bolsters intraview relationships. The multiview graph learning module devises a novel multiview adjacency matrix to capture both intraview and interview sample relationships. To achieve one-to-many fusion and obtain multiview representations, we employ the graph autoencoder structure. Furthermore, we extend the proposed architecture to the supervised case. We conduct extensive experiments on various real-world multiview datasets, focusing on clustering and multilabel classification tasks, to evaluate the effectiveness of our method. The results demonstrate that our approach significantly improves performance compared to existing methods, highlighting the potential of leveraging sample relationships for multiview representation learning. Our code is released at https://github.com/lilidan-orm/one-to-many-multiview on GitHub.

PaperID: 186,

Authors: Xingyu Zhao, Yuexuan An, Lei Qi, Xin Geng

Affiliations: College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China; School of Computer Science and Engineering, Southeast University, Nanjing, China

Title: Scalable Label Distribution Learning for Multi-Label Classification

Abstract:
Multi-label classification (MLC) refers to the problem of tagging a given instance with a set of relevant labels. Most existing MLC methods are based on the assumption that the correlation of two labels in each label pair is symmetric, which is violated in many real-world scenarios. Moreover, most existing methods design learning processes associated with the number of labels, which makes their computational complexity a bottleneck when scaling up to large-scale output space. To tackle these issues, we propose a novel method named scalable label distribution learning (SLDL) for MLC, which can describe different labels as distributions in a latent space, where the label correlation is asymmetric and the dimension is independent of the number of labels. Specifically, SLDL first converts labels into continuous distributions within a low-dimensional latent space and leverages the asymmetric metric to establish the correlation between different labels. Then, it learns the mapping from the feature space to the latent space, resulting in the computational complexity is no longer related to the number of labels. Finally, SLDL leverages a nearest neighbor-based strategy to decode the latent representations and obtain the final predictions. Extensive experiments illustrate that SLDL achieves very competitive classification performances with little computational consumption.

PaperID: 187,

Authors: Xue Zhang, Xiaohan Zhang, Jiangtao Wang, Jiacheng Ying, Zehua Sheng, Heng Yu, Chunguang Li, Hui-Liang Shen

Affiliations: College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China; School of Computer Science, University of Nottingham Ningbo China, Ningbo, China

Title: TFDet: Target-Aware Fusion for RGB-T Pedestrian Detection

Abstract:
Pedestrian detection plays a critical role in computer vision as it contributes to ensuring traffic safety. Existing methods that rely solely on RGB images suffer from performance degradation under low-light conditions due to the lack of useful information. To address this issue, recent multispectral detection approaches have combined thermal images to provide complementary information and have obtained enhanced performances. Nevertheless, few approaches focus on the negative effects of false positives (FPs) caused by noisy fused feature maps. Different from them, we comprehensively analyze the impacts of FPs on detection performance and find that enhancing feature contrast can significantly reduce these FPs. In this article, we propose a novel target-aware fusion strategy for multispectral pedestrian detection, named TFDet. The target-aware fusion strategy employs a fusion-refinement paradigm. In the fusion phase, we reveal the parallel- and cross-channel similarities in RGB and thermal features and learn an adaptive receptive field to collect useful information from both features. In the refinement phase, we use a segmentation branch to discriminate the pedestrian features from the background features. We propose a correlation-maximum loss function to enhance the contrast between the pedestrian features and background features. As a result, our fusion strategy highlights pedestrian-related features and suppresses unrelated ones, generating more discriminative fused features. TFDet achieves state-of-the-art performance on two multispectral pedestrian benchmarks, KAIST and LLVIP, with absolute gains of 0.65% and 4.1% over the previous best approaches, respectively. TFDet can easily extend to multiclass object detection scenarios. It outperforms the previous best approaches on two multispectral object detection benchmarks, FLIR and M3FD, with absolute gains of 2.2% and 1.9%, respectively. Importantly, TFDet has comparable inference efficiency to the previous approaches and has remarkably good detection performance even under low-light conditions, which is a significant advancement for ensuring road safety. The code will be made publicly available at https://github.com/XueZ-phd/TFDet.git.

PaperID: 188,

Authors: Botao Dong, Longyang Huang, Ning Pang, Hongtian Chen, Weidong Zhang

Affiliations: Department of Automation, Shanghai Jiao Tong University, Shanghai, China

Title: Historical Decision-Making Regularized Maximum Entropy Reinforcement Learning

Abstract:
The challenge of the exploration-exploitation dilemma persists in off-policy reinforcement learning (RL) algorithms, impeding the improvement of policy performance and sample efficiency. To tackle this challenge, a novel historical decision-making regularized maximum entropy (HDMRME) RL algorithm is developed to strike the balance between exploration and exploitation. Built upon the maximum entropy RL framework, the historical decision-making regularization method is proposed to enhance the exploitation capability of RL policies. The theoretical analysis involves proving the convergence of HDMRME, investigating the tradeoff between exploration and exploitation of HDMRME, examining the disparity between the Q-function learned through HDMRME and the classic one, and analyzing the suboptimality of the trained policy. The performance of HDMRME is evaluated across various continuous-action control tasks from Mujoco and OpenAI Gym platforms. Comparative experiments demonstrate that HDMRME exhibits superior sample efficiency and achieves more competitive performance compared with other state-of-the-art RL algorithms.

PaperID: 189,

Authors: Chaoqin Huang, Haoyan Guan, Aofan Jiang, Ya Zhang, Michael W. Spratling, Xinchao Wang, Yanfeng Wang

Affiliations: Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China; Department of Informatics, King’s College London, London, U.K.; Department of Electrical and Computer Engineering, National University of Singapore, Queenstown, Singapore

Title: Few-Shot Anomaly Detection via Category-Agnostic Registration Learning

Abstract:
Most existing anomaly detection (AD) methods require a dedicated model for each category. Such a paradigm, despite its promising results, is computationally expensive and inefficient, thereby failing to meet the requirements for real-world applications. Inspired by how humans detect anomalies, by comparing a query image to known normal ones, this article proposes a novel few-shot AD (FSAD) framework. Using a training set of normal images from various categories, registration, aiming to align normal images of the same categories, is leveraged as the proxy task for self-supervised category-agnostic representation learning. At test time, an image and its corresponding support set, consisting of a few normal images from the same category, are supplied, and anomalies are identified by comparing the registered features of the test image to its corresponding support image features. Such a setup enables the model to generalize to novel test categories. It is, to our best knowledge, the first FSAD method that requires no model fine-tuning for novel categories: enabling a single model to be applied to all categories. Extensive experiments demonstrate the effectiveness of the proposed method. Particularly, it improves the current state-of-the-art (SOTA) for FSAD by 11.3% and 8.3% on the MVTec and MPDD benchmarks, respectively. The source code is available at https://github.com/Haoyan-Guan/CAReg.

PaperID: 190,

Authors: Guang Yang, Zheng Xu, Jing Huo, Shangdong Yang, Tianyu Ding, Xingguo Chen, Yang Gao

Affiliations: State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; Jiangsu Key Laboratory of Big Data Security and Intelligent Processing, Nanjing University of Posts and Telecommunications, Nanjing, China; Applied Sciences Group, Microsoft, Redmond, WA, USA

Title: State Abstraction via Deep Supervised Hash Learning

Abstract:
State abstraction is a widely used technique in reinforcement learning (RL) that compresses the state space to accelerate learning algorithms. However, designing an effective abstraction function in large-scale or high-dimensional state space problems remains a significant challenge. In this brief, we present a novel state abstraction method based on deep supervised hash learning (DSH) and provide a theoretical analysis of its near-optimal property. Furthermore, by leveraging the DSH-based representation as the optimization objective, we propose a direct and concise optimization method based on the target value. In addition, we construct an auxiliary learning task for state abstraction that can be combined with various RL algorithms. In particular, we apply the DSH-based state abstraction to both deep Q-learning (DQN) and soft actor-critic (SAC). Extensive experiments are conducted on Atari and several classic control benchmarks to evaluate the effectiveness of the DSH-based state abstraction method, showing that our method surpasses existing state abstraction algorithms in performance.

PaperID: 191,

Authors: Songze Li, Tonghua Su, Xu-Yao Zhang, Zhongjie Wang

Affiliations: Faculty of Computing, Harbin Institute of Technology, Harbin, China; State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China

Title: Continual Learning With Knowledge Distillation: A Survey

Abstract:
The foremost challenge in continual learning is to mitigate catastrophic forgetting, allowing a model to retain knowledge of previous tasks while learning new tasks. Knowledge distillation (KD), a form of regularization, has gained significant attention for its ability to maintain a model’s performance on previous tasks by mimicking the outputs of earlier models during the learning of new tasks, thus reducing forgetting. This article offers a comprehensive survey of continual learning methods employing KD within the realm of image classification. We provide a detailed analysis of how KD is utilized in continual learning methods, categorizing its application into three distinct paradigms. Besides, we classify these methods based on the type of knowledge source used and thoroughly examine how KD consolidates memory in continual learning from the perspective of loss functions. In addition, we have conducted extensive experiments on CIFAR-100, TinyImageNet, and ImageNet-100 across ten KD-integrated continual learning methods to analyze the role of KD in continual learning, and we have further discussed its effectiveness in other continual learning tasks. Our extensive experimental evidence demonstrates that KD plays a crucial role in mitigating forgetting in continual learning and substantiates that, when used with data replay, classification bias adversely affects the effectiveness of KD, whereas employing a separated softmax loss can significantly enhance its efficacy.

PaperID: 192,

Authors: Mingcheng Dai, Daniel W. C. Ho, Baoyong Zhang, Deming Yuan, Shengyuan Xu

Affiliations: School of Automation, Nanjing University of Science and Technology, Nanjing, Jiangsu, China; Department of Mathematics, City University of Hong Kong, Kowloon Tong, Hong Kong

Title: Distributed Online Convex Optimization With Statistical Privacy

Abstract:
We focus on the problem of distributed online constrained convex optimization with statistical privacy in multiagent systems. The participating agents aim to collaboratively minimize the cumulative system-wide cost while a passive adversary corrupts some of them. The passive adversary collects information from corrupted agents and attempts to estimate the private information of the uncorrupted ones. In this scenario, we adopt a correlated perturbation mechanism with globally balanced property to cover the local information of agents to enable privacy preservation. This work is the first attempt to integrate such a mechanism into the distributed online (sub)gradient descent algorithm, and then a new algorithm called privacy-preserving distributed online convex optimization (PP-DOCO) is designed. It is proved that the designed algorithm provides a statistical privacy guarantee for uncorrupted agents and achieves an expected regret in \mathcal O(\sqrt K) for convex cost functions, where K denotes the time horizon. Furthermore, an improved expected regret in \mathcal O(\log (K)) is derived for strongly convex cost functions. The obtained results are equivalent to the best regret scalings achieved by state-of-the-art algorithms. The privacy bound is established to describe the level of statistical privacy using the notion of Kullback–Leibler divergence (KLD). In addition, we observe that a tradeoff exists between our algorithm’s expected regret and statistical privacy. Finally, the effectiveness of our algorithm is validated by simulation results.

PaperID: 193,

Authors: Chaohua Shi, Mingrui Zhu, Nannan Wang, Xinbo Gao

Affiliations: State Key Laboratory of Integrated Services Networks (ISN), School of Telecommunications Engineering, Xidian University, Xi’an, Shaanxi, China; Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China

Title: PStyle-3D: Example-Based 3-D-Aware Portrait Style Domain Adaptation

Abstract:
The creation of high-quality artistic portraits is a critical and desirable task in the field of computer vision. While recent3-D generative models have achieved impressive results in generating images with view consistency and intricate 3-D shapes, their application for generating artistic portraits is often more challenging than 2-D generative models due to the potentially destructive impact of 3-D structures on human faces. This article introduces a novel approach that leverages a meticulously designed domain feature extraction module to extract the specific feature information from both the source natural face domain and the target artistic portrait domain. These extracted features are seamlessly integrated into a 3-D representation, generating multiview consistent 3-D artistic portraits. To fuse the features of the source and target domains better, we propose a new module for domain adaptation. This module adds a path to the style path established by StyleGAN to introduce the artistic portrait domain information and regulate the target domain’s feature information in \mathcal S space. Our domain adaptation module is implemented in each StyleBlock of the 3-D representation generator to integrate the target domain information with the original facial information. Experimental results demonstrate that our approach generates high-quality 3-D artistic portraits that outperform existing approaches in preserving 3-D geometric information and multiview consistency.

PaperID: 194,

Authors: Ugochukwu Ejike Akpudo, Yongsheng Gao, Jun Zhou, Andrew Lewis

Affiliations: School of Engineering and Built Environment, Griffith University, Nathan, QLD, Australia; School of Information and Communication Technology, Griffith University, Nathan, QLD, Australia; Institute for Integrated and Intelligent Systems, Griffith University, Nathan, QLD, Australia

Title: TraNCE: Transformative Nonlinear Concept Explainer for CNNs

Abstract:
Convolutional neural networks (CNNs) have succeeded remarkably in various computer vision tasks. However, they are not intrinsically explainable. While feature-level understanding of CNNs reveals where the models looked, concept-based explainability methods provide insights into what the models saw. However, their assumption of linear reconstructability of image activations fails to capture the intricate relationships within these activations. Their fidelity-only approach to evaluating global explanations also presents a new concern. For the first time, we address these limitations with the novel transformative nonlinear concept explainer (TraNCE) for CNNs. Unlike linear reconstruction assumptions made by existing methods, TraNCE captures the intricate relationships within the activations. This study presents three original contributions to the CNN explainability literature: 1) an automatic concept discovery mechanism based on variational autoencoders (VAEs). This transformative concept discovery process enhances the identification of meaningful concepts from image activations; 2) a visualization module that leverages the Bessel function to create a smooth transition between prototypical image pixels, revealing not only what the CNN saw but also what the CNN avoided, thereby mitigating the challenges of concept duplication as documented in previous works; and 3) a new metric, the faith score, integrates both coherence and fidelity for comprehensive evaluation of explainer faithfulness and consistency. Based on the investigations on publicly available datasets, we prove that a valid decomposition of a high-dimensional image activation should follow a nonlinear reconstruction, contributing to the explainer’s efficiency. We also demonstrate quantitatively that, besides accuracy, consistency is crucial for the meaningfulness of concepts and human trust. The code is available at https://github.com/daslimo/TrANCE.

PaperID: 195,

Authors: Bowen Duan, Shiming Chen, Yufei Guo, Guo-Sen Xie, Weiping Ding, Yisong Wang

Affiliations: School of Computer Science and Technology and the Institute for Artificial Intelligence, Guizhou University, Guiyang, Guizhou, China; Department of Computer Vision, Mohamed bin Zayed University of AI, Abu Dhabi, United Arab Emirates; Intelligent Science and Technology Academy, CASIC, Beijing, China; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China; School of Artificial Intelligence and Computer Science, Nantong University, Nantong, China

Title: Visual-Semantic Graph Matching Net for Zero-Shot Learning

Abstract:
Zero-shot learning (ZSL) aims to leverage additional semantic information to recognize unseen classes. To transfer knowledge from seen to unseen classes, most ZSL methods often learn a shared embedding space by simply aligning visual embeddings with semantic prototypes. However, methods trained under this paradigm often struggle to learn robust embedding space because they align the two modalities in an isolated manner among classes, which ignore the crucial class relationship during the alignment process. To address the aforementioned challenges, this article proposes a visual-semantic graph matching net (VSGMN), which leverages semantic relationships among classes to aid in visual-semantic embedding. VSGMN uses a graph build net (GBN) and a graph matching net (GMN) to achieve two-stage visual-semantic alignment. Specifically, GBN first uses an embedding-based approach to build visual and semantic graphs in the semantic space and align the embedding with its prototype for first-stage alignment. In addition, to supplement unseen class relationships in these graphs, GBN also builds the unseen class nodes based on semantic relationships. In the second stage, GMN continuously integrates neighbor and cross-graph information into the constructed graph nodes and aligns the node relationships between the two graphs under the class relationship constraint. Extensive experiments on three benchmark datasets demonstrate that VSGMN achieves superior performance in both conventional and generalized ZSL (GZSL) scenarios. The implementation of our VSGMN and experimental results are available at github: https://github.com/dbwfd/VSGMN.

PaperID: 196,

Authors: Zhihan Ning, Zhixing Jiang, David Zhang

Affiliations: School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, Guangdong, China; Institute of Oceanographic Instrumentation, Qilu University of Technology (Shandong Academy of Sciences), Qingdao, China; School of Data Science, The Chinese University of Hong Kong, Shenzhen, Guangdong, China

Title: Exploiting Meta-Learned Confidences for Imbalanced Multilabel Learning

Abstract:
Multilabel learning deals with datasets where each sample is associated with multiple labels. It is commonly assumed that label correlations should be well exploited to build an effective multilabel classifier. Moreover, the class imbalance problem occurs in many multilabel datasets and should be tackled to reduce the classification bias. While many multilabel learning methods have been proposed, research on imbalanced multilabel learning (IMLL) is relatively deficient. To address these issues, we exploit the value of meta-learned confidences, i.e., the prediction confidences iteratively updated over the out-of-bag samples, for IMLL. First, such meta-confidences can be fused to the original feature space to learn high-order label correlations. Second, meta-confidences can be used to calibrate the prediction results to alleviate class imbalance. Motivated by these, we propose an ensemble learning method named meta-confidence ensemble (MCE) for IMLL. Specifically, MCE iteratively makes bootstrap replicates of the multilabel training set, leverages the out-of-bag samples to generate meta-confidences, and fuses them to the original feature space to learn label correlations. A sparse projection method is presented to avoid overfitting and improve the ensemble diversity. Finally, the prediction result of an unseen sample is determined by the calibrated plurality vote of MCE’s base classifiers. Extensive experiments demonstrated the effectiveness and superiority of MCE for IMLL. Codes have been made publicly available at https://github.com/ CUHKSZ-NING/MCEClassifier.

PaperID: 197,

Authors: Jiamin Liu, Heng Lian

Affiliations: School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, China; City University of Hong Kong Shenzhen Research Institute, Shenzhen, China

Title: Kernel-Based Decentralized Policy Evaluation for Reinforcement Learning

Abstract:
We investigate the decentralized nonparametric policy evaluation problem within reinforcement learning (RL), focusing on scenarios where multiple agents collaborate to learn the state-value function using sampled state transitions and privately observed rewards. Our approach centers on a regression-based multistage iteration technique employing infinite-dimensional gradient descent (GD) within a reproducing kernel Hilbert space (RKHS). To make computation and communication more feasible, we employ Nyström approximation to project this space into a finite-dimensional one. We establish statistical error bounds to describe the convergence of value function estimation, marking the first instance of such analysis within a fully decentralized nonparametric framework. We compare the regression-based method to the kernel temporal difference (TD) method in some numerical studies.

PaperID: 198,

Authors: Chengyu Wang, Shan Zhao, Tianwei Yan, Shezheng Song, Wentao Ma, Kuien Liu, Meng Wang

Affiliations: School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China; College of Computer, National University of Defense Technology, Changsha, China; School of Information and Artificial Intelligence, College of Computer Science, Anhui Agricultural University, Hefei, China; Academy of Cyber, Beijing, China

Title: Hierarchical Label-Enhanced Contrastive Learning for Chinese NER

Abstract:
Recently, character–word lattice structures have achieved promising results for Chinese named entity recognition (NER), reducing word segmentation errors and increasing word boundary information for character sequences. However, constructing the lattice structure is complex and time-consuming, thus these lattice-based models usually suffer from low inference speed. Moreover, the quality of the lexicon affects the accuracy of the NER model. Since noise words can potentially confuse NER, limited coverage of the lexicon can cause lattice-based models to degenerate into partial character-based models. In this article, we propose a hierarchical label-enhanced contrastive learning (HLCL) method for Chinese NER. Instead of relying on the lattice structure, HLCL offers an alternative solution to robustly integrate entity boundary and type information with the help of both labels semantic and contrastive learning. HLCL is empowered by two techniques: 1) sentence-level contrastive learning (SCL) to model global mutual information between two different modalities (e.g., labels and sentences) and 2) token-level contrastive learning (TCL) to close the gap between representations of different characters (e.g., label-enhanced characters and original characters), resulting in local mutual information. With the well-designed contrastive learning scheme and the concise model during inference, HLCL can fully leverage the transferable label semantic and has a superb speed of inference. Experiments on four Chinese NER datasets show that HLCL obtains excellent efficiency as well as performance compared with existing lattice-based approaches.

PaperID: 199,

Authors: Yongjeong Oh, Jaeho Lee, Christopher G. Brinton, Yo-Seb Jeon

Affiliations: Department of Electrical Engineering, Pohang University of Science and Technology (POSTECH), Pohang, Gyeongbuk, Republic of Korea; Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA

Title: Communication-Efficient Split Learning via Adaptive Feature-Wise Compression

Abstract:
This article proposes a novel communication-efficient split learning (SL) framework, named SplitFC, which reduces the communication overhead required for transmitting intermediate features and gradient vectors during the SL training process. The key idea of SplitFC is to leverage different dispersion degrees exhibited in the columns of the matrices. SplitFC incorporates two compression strategies: 1) adaptive feature-wise dropout and 2) adaptive feature-wise quantization. In the first strategy, the intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped. In the second strategy, the non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels of this strategy are derived in a closed-form expression. Simulation results on the MNIST, CIFAR-100, and CelebA datasets demonstrate that SplitFC outperforms state-of-the-art SL frameworks by significantly reducing communication overheads while maintaining high accuracy.

PaperID: 200,

Authors: Sajjad Kachuee, Mohammad Sharifkhani

Affiliations: Department of Electrical Engineering, Sharif University of Technology, Tehran, Iran

Title: Latency Adjustable Transformer Encoder for Language Understanding

Abstract:
Adjusting the latency, power, and accuracy of natural language understanding models is a desirable objective of an efficient architecture. This article proposes an efficient Transformer architecture that adjusts the inference computational cost adaptively with a desired inference latency speedup. In the fine-tuning phase, the proposed method detects less important hidden sequence elements (word-vectors) and eliminates them in each encoder layer using a proposed attention context contribution (ACC) metric. After the fine-tuning phase, with the novel offline-tuning property, the inference latency of the model can be adjusted in a wide range of inference speedup selections without any further training. Extensive experiments reveal that most word-vectors in higher Transformer layers contribute less to subsequent layers, allowing their removal to improve inference latency. Experimental results on various language understanding, text generation, and instruction tuning tasks and benchmarks demonstrate the approach’s effectiveness across diverse datasets, with minimal impact on the input’s global context. The technique improves the time-to-first-token (TTFT) of Llama3 by up to 2.9× , with a minor performance drop. The suggested approach posits that in large language models (LLMs), although the complete network is necessary for training, it can be truncated during the fine-tuning phase.

PaperID: 201,

Authors: Xiaohui Wei, Haibo Liu, Puhong Duan, Shutao Li

Affiliations: College of Information Science and Engineering, Hunan Normal University, Changsha, China; College of Robotics, Hunan University, Changsha, China; College of Electrical and Information Engineering, Changsha, China

Title: Dual-Structural Bipartite Graph Learning for Multiview Clustering

Abstract:
Bipartite graph (BiG) has been proven to be efficient in handling massive multiview data for clustering. However, how to regulate the structural information of view-specific anchors and view-shared BiG is still open and needs to be further studied. Hence, a novel dual-structural BiG learning (DsBiGL) method is proposed in the article. It transforms BiG learning into a joint optimization problem of IntrA-view and InteR-view subspace learning (IASL and IRSL) with the structural constraints, such as k-nearest neighbor (KNN) and low-rank. On one hand, IASL uses the KNN and view-specific low-rank constraints to enhance the discriminativeness of view-specific anchors. On the other hand, IRSL uses an adaptive weighting strategy to obtain view-shared BiG directly from multiview samples, where the KNN and view-shared low-rank constraints are adopted to encode local connectivity and cluster information between samples. Note that IASL and IRSL are integrated into a unified optimization model, which ensures the interactive enhancement of view-specific anchor representation and view-shared BiG learning. Finally, an algorithm based on iterative optimization is designed to solve the proposed DsBiGL model. Experimental results on various multiview datasets have demonstrated the superiority of DsBiGL in terms of clustering results when compared with other comparative methods.

PaperID: 202,

Authors: Yao Lyu, Xiangteng Zhang, Shengbo Eben Li, Jingliang Duan, Letian Tao, Qing Xu, Lei He, Keqiang Li

Affiliations: School of Vehicle and Mobility, Tsinghua University, Beijing, China; School of Vehicle and Mobility and the College of Artificial Intelligence, Tsinghua University, Beijing, China; School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, China; State Key Laboratory of Intelligent Green Vehicle and Mobility, Tsinghua University, Beijing, China

Title: Conformal Symplectic Optimization for Stable Reinforcement Learning

Abstract:
Training deep reinforcement learning (RL) agents necessitates overcoming the highly unstable nonconvex stochastic optimization inherent in the trial-and-error mechanism. To tackle this challenge, we propose a physics-inspired optimization algorithm called relativistic adaptive gradient descent (RAD), which enhances long-term training stability. By conceptualizing neural network (NN) training as the evolution of a conformal Hamiltonian system, we present a universal framework for transferring long-term stability from conformal symplectic integrators to iterative NN updating rules, where the choice of kinetic energy governs the dynamical properties of resulting optimization algorithms. By utilizing relativistic kinetic energy, RAD incorporates principles from special relativity and limits parameter updates below a finite speed, effectively mitigating abnormal gradient influences. In addition, RAD models NN optimization as the evolution of a multiparticle system where each trainable parameter acts as an independent particle with an individual adaptive learning rate. We prove RAD’s sublinear convergence under general nonconvex settings, where smaller gradient variance and larger batch sizes contribute to tighter convergence. Notably, RAD degrades to the well-known adaptive moment estimation (ADAM) algorithm when its speed coefficient is chosen as one and symplectic factor as a small positive value. Experimental results show RAD outperforming nine baseline optimizers with five RL algorithms across twelve environments, including standard benchmarks and challenging scenarios. Notably, RAD achieves up to a 155.1% performance improvement over ADAM in Atari games, showcasing its efficacy in stabilizing and accelerating RL training.

PaperID: 203,

Authors: Jianpeng Chen, Yawen Ling, Jie Xu, Yazhou Ren, Shudong Huang, Xiaorong Pu, Zhifeng Hao, Philip S. Yu, Lifang He

Affiliations: School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China; College of Computer Science, Sichuan University, Chengdu, China; Department of Mathematics, College of Science, Shantou University, Shantou, China; Department of Computer Science, University of Illinois Chicago, Chicago, IL, USA; Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA

Title: Variational Graph Generator for Multiview Graph Clustering

Abstract:
Multiview graph clustering (MGC) methods are increasingly being studied due to the explosion of multiview data with graph structural information. The critical point of MGC is to better utilize view-specific and view-common information in features and graphs of multiple views. However, existing works have an inherent limitation that they are unable to concurrently utilize the consensus graph information across multiple graphs and the view-specific feature information. To address this issue, we propose a variational graph generator for MGC (VGMGC). Specifically, a novel variational graph generator is proposed to extract common information among multiple graphs. This generator infers a reliable variational consensus graph based on a priori assumption over multiple graphs. Then, a simple yet effective graph encoder in conjunction with the multiview clustering objective is presented to learn the desired graph embeddings for clustering, which embeds the inferred view-common graph and view-specific graphs together with features. Finally, theoretical results illustrate the rationality of the VGMGC by analyzing the uncertainty of the inferred consensus graph with the information bottleneck (IB) principle. Extensive experiments demonstrate the superior performance of our VGMGC over state-of-the-art methods (SOTAs). The source code is publicly available at: https://github.com/cjpcool/VGMGC.

PaperID: 204,

Authors: Xiaobo Shen, Yinfan Chen, Weiwei Liu, Yuhui Zheng, Quan-Sen Sun, Shirui Pan

Affiliations: School of Computer and Engineering, Nanjing University of Science and Technology, Nanjing, China; School of Computer Science, Wuhan University, Wuhan, China; College of Computer, Qinghai Normal University, Xining, China; School of Information and Communication Technology, Griffith University, Southport, QLD, Australia

Title: Graph Convolutional Multi-Label Hashing for Cross-Modal Retrieval

Abstract:
Cross-modal hashing encodes different modalities of multimodal data into low-dimensional Hamming space for fast cross-modal retrieval. In multi-label cross-modal retrieval, multimodal data are often annotated with multiple labels, and some labels, e.g., “ocean” and “cloud,” often co-occur. However, existing cross-modal hashing methods overlook label dependency that is crucial for improving performance. To fulfill this gap, this article proposes graph convolutional multi-label hashing (GCMLH) for effective multi-label cross-modal retrieval. Specifically, GCMLH first generates word embedding of each label and develops label encoder to learn highly correlated label embedding via graph convolutional network (GCN). In addition, GCMLH develops feature encoder for each modality, and feature fusion module to generate highly semantic feature via GCN. GCMLH uses teacher-student learning scheme to transfer knowledge from the teacher modules, i.e., label encoder and feature fusion module, to the student module, i.e., feature encoder, such that learned hash code can well exploit multi-label dependency and multimodal semantic structure. Extensive empirical results on several benchmarks demonstrate the superiority of the proposed method over existing state-of-the-arts.

PaperID: 205,

Authors: Chang Liu, Bohao Li, Mengnan Shi, Xiaozhong Chen, Qixiang Ye, Xiangyang Ji

Affiliations: Department of Automation, Tsinghua University, Beijing, China; School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing, China

Title: Explicit Margin Equilibrium for Few-Shot Object Detection

Abstract:
Under low data regimes, few-shot object detection (FSOD) transfers related knowledge from base classes with sufficient annotations to novel classes with limited samples in a two-step paradigm, including base training and balanced fine-tuning. In base training, the learned embedding space needs to be dispersed with large class margins to facilitate novel class accommodation and avoid feature aliasing while in balanced fine-tuning properly concentrating with small margins to represent novel classes precisely. Although obsession with the discrimination and representation dilemma has stimulated substantial progress, explorations for the equilibrium of class margins within the embedding space are still in full swing. In this study, we propose a class margin optimization scheme, termed explicit margin equilibrium (EME), by explicitly leveraging the quantified relationship between base and novel classes. EME first maximizes base-class margins to reserve adequate space to prepare for novel class adaptation. During fine-tuning, it quantifies the interclass semantic relationships by calculating the equilibrium coefficients based on the assumption that novel instances can be represented by linear combinations of base-class prototypes. EME finally reweights margin loss using equilibrium coefficients to adapt base knowledge for novel instance learning with the help of instance disturbance (ID) augmentation. As a plug-and-play module, EME can also be applied to few-shot classification. Consistent performance gains upon various baseline methods and benchmarks validate the generality and efficacy of EME. The code is available at github.com/Bohao-Lee/EME.

PaperID: 206,

Authors: Yu-Xuan Zhang, Zhengchun Zhou, Xingxing He, Avik Ranjan Adhikary, Bapi Dutta

Affiliations: School of Information Science and Technology, Southwest Jiaotong University, Chengdu, China; School of Mathematics, Southwest Jiaotong University, Chengdu, China; Department of Computer Science, University of Jaén, Jaén, Spain

Title: Data-Driven Knowledge Fusion for Deep Multi-Instance Learning

Abstract:
Multi-instance learning (MIL) is a widely applied technique in practical applications that involve complex data structures. MIL can be broadly categorized into two types: traditional methods and those based on deep learning. These approaches have yielded significant results, especially regarding their problem-solving strategies and experiment validation, providing valuable insights for researchers in the MIL field. However, considerable knowledge is often trapped within the algorithm, leading to subsequent MIL algorithms that rely solely on the model’s data fitting to predict unlabeled samples. This results in a significant loss of knowledge and impedes the development of more powerful models. In this article, we propose a novel data-driven knowledge fusion for deep MIL (DKMIL) algorithm. DKMIL adopts a completely different idea from existing deep MIL methods by analyzing the decision-making of key samples in the dataset (referred to as the data-driven) and using the knowledge fusion module designed to extract valuable information from these samples to assist the model’s learning. In other words, this module serves as a new interface between data and the model, providing strong scalability and enabling prior knowledge from existing algorithms to enhance the model’s learning ability. Furthermore, to adapt the downstream modules of the model to more knowledge-enriched features extracted from the data-driven knowledge fusion (DDKF) module, we propose a two-level attention (TLA) module that gradually learns shallow- and deep-level features of the samples to achieve more effective classification. We will prove the scalability of the knowledge fusion module and verify the efficiency of the proposed architecture by conducting experiments on 62 datasets across five categories.

PaperID: 207,

Authors: Jiayi Tang, Long Zhao, Xinwang Liu

Affiliations: Department of Cardiovascular Surgery, and the Bio-Med Informatics Research Center & Clinical Research Center, Second Affiliated Hospital, Army Medical University, Chongqing, China; Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; School of Computer, National University of Defense Technology, Changsha, China

Title: Incomplete Multiview Clustering Based on Consensus Information

Abstract:
In contrast to traditional single-view clustering methods, multiview clustering (MVC) approaches aim to extract, analyze, and integrate structural information from diverse perspectives, providing a more comprehensive understanding of internal data structures. However, with an increasing number of views, maintaining the integrity of view information becomes challenging, giving rise to incomplete MVC (IMVC) methods. While existing IMVC methods have shown notable performance on many incomplete multiview (IMV) databases, they still grapple with two key shortcomings: 1) they treat the information of each view as a whole, disregarding the differences among samples within each view; and 2) they rely on eigenvalue and eigenvector operations on the view matrix, limiting their scalability for large-scale samples and views. To overcome these limitations, we propose a novel multiview clustering with consistent information (IMVC-CI) of sample points. Our method explores the multiview information set of sample points to extract consensus structural information and subsequently restores unknown information in each view. Importantly, our approach operates independently on individual sample points, eliminating the need for eigenvalue and eigenvector operations on the view information matrix and facilitating parallel computation. This significantly enhances algorithmic efficiency and mitigates challenges associated with dimensionality. Experimental results on various public datasets demonstrate that our algorithm outperforms state-of-the-art IMVC methods in terms of clustering performance and computational efficiency. The code for our article has been uploaded to https://github.com/PhdJiayiTang/IMVC-CI.git.

PaperID: 208,

Authors: Yunqi Huang, Chang Liu, Bohao Li, Hai-Jun Huang, Ronghui Zhang, Wei Ke, Xiaojun Jing

Affiliations: School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, China; Department of Automation, Tsinghua University, Beijing, China; School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing, China; Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China

Title: Frequency-Aware Divide-and-Conquer for Efficient Real Noise Removal

Abstract:
Deep-learning-based approaches have achieved remarkable progress for complex real scenario denoising, yet their accuracy-efficiency tradeoff is still understudied, particularly critical for mobile devices. As real noise is unevenly distributed relative to underlay signals in different frequency bands, we introduce a frequency-aware divide-and-conquer strategy to develop a frequency-aware denoising network (FADN). FADN is materialized by stacking frequency-aware denoising blocks (FADBs), in which a denoised image is progressively predicted by a series of frequency-aware noise dividing and conquering operations. For noise dividing, FADBs decompose the noisy and clean image pairs into low- and high-frequency representations via a wavelet transform (WT) followed by an invertible network and recover the final denoised image by integrating the denoised information from different frequency bands. For noise conquering, the separated low-frequency representation of the noisy image is kept as clean as possible by the supervision of the clean counterpart, while the high-frequency representation combining the estimated residual from the successive FADB is purified under the corresponding accompanied supervision for residual compensation. Since our FADN progressively and pertinently denoises from frequency bands, the accuracy-efficiency tradeoff can be controlled as a requirement by the number of FADBs. Experimental results on the SIDD, DND, and NAM datasets show that our FADN outperforms the state-of-the-art methods by improving the peak signal-to-noise ratio (PSNR) and decreasing the model parameters. The code is released at https://github.com/NekoDaiSiki/FADN.

PaperID: 209,

Authors: Yu Hu, Fei Qi, Yiu-Ming Cheung, Hongmin Cai

Affiliations: Department of Computer Science and Engineering, South China University of Technology, Guangzhou, China; Department of Computer Science, Hong Kong Baptist University, Kowloon Tsai, Hong Kong

Title: Discriminating Tensor Spectral Clustering for High-Dimension-Low-Sample-Size Data

Abstract:
Tensor spectral clustering (TSC) is a recently proposed approach to robustly group data into underlying clusters. Unlike the traditional spectral clustering (SC), which merely uses pairwise similarities of data in an affinity matrix, TSC aims at exploring their multiwise similarities in an affinity tensor to achieve better performance. However, the performance of TSC highly relies on the design of multiwise similarities, and it remains unclear especially for high-dimension-low-sample-size (HDLSS) data. To this end, this article has proposed a discriminating TSC (DTSC) for HDLSS data. Specifically, DTSC uses the proposed discriminating affinity tensor that encodes the pair-to-pair similarities, which are particularly constructed by the anchor-based distance. HDLSS asymptotic analysis shows that the proposed affinity tensor can explicitly differentiate samples from different clusters when the feature dimension is large. This theoretical property allows DTSC to improve the clustering performance on HDLSS data. Experimental results on synthetic and benchmark datasets demonstrate the effectiveness and robustness of the proposed method in comparison to several baseline methods.

PaperID: 210,

Authors: Yueyang Pi, Yilin Wu, Yang Huang, Yongquan Shi, Shiping Wang

Affiliations: College of Computer and Data Science, Fuzhou University, Fuzhou, China

Title: Inhomogeneous Diffusion-Induced Network for Multiview Semi-Supervised Classification

Abstract:
The challenges posed by heterogeneous data in practical applications have made multiview semi-supervised classification a focus of attention for researchers. While several graph-based approaches have been suggested for this task, they tend to use homogeneous feature propagation, leading to even diffusion of node information to their neighbors. However, this diffusion strategy results in nodes acquiring information of equal proportion from dissimilar samples. In this article, we propose a solution to address these issues by introducing a graph diffusion-induced network for multiview semi-supervised classification. By formulating a discretized partial differential equation on a manifold, we derive a nonlinear and inhomogeneous diffusion equation to govern information propagation on the graph. Then, we investigate the impact of various nonlinear activation functions on random switching edge directions and their suppressive effects on information diffusion between different nodes. In addition, the cross-view consistency under the semi-supervised scenarios is defined and guaranteed for better information fusion. The comprehensive experimental results demonstrate the superiority of the proposed method compared with state-of-the-art approaches. The effectiveness of the proposed approach in handling diverse and heterogeneous data showcases its potential for advancing multiview semi-supervised classification techniques.

PaperID: 211,

Authors: Liping Deng, Wen-Sheng Chen, Binbin Pan, Mingqing Xiao

Affiliations: Department of Mathematics, University of California at Riverside, Riverside, CA, USA; School of Mathematical Sciences, Shenzhen University, Shenzhen, China; School of Mathematical and Statistical Sciences, Southern Illinois University Carbondale, Carbondale, IL, USA

Title: Hyperparameter Recommendation Integrated With Convolutional Neural Network

Abstract:
Hyperparameter recommendation via meta-learning has shown great promise in various studies. The main challenge for meta-learning is how to develop an effective meta-learner (learning algorithm) that can capture the intrinsic relationship between dataset characteristics and the empirical performance of hyperparameters. Existing meta-learners are mostly based on traditional machine-learning models that only learn data representations with a single layer, which are incapable of learning complex features from the data and often cannot capture those properties deeply embedded in data. To address this issue, in this article, we propose hyperparameter recommendation approaches by integrating the learning model with convolutional neural networks (CNNs). Specifically, we first formulate the recommendation task as a regression problem, where dataset characteristics are treated as predictors and the historical performance of hyperparameters as responses. We establish a CNN-based learning model with feature selection capability to serve as the regressor. We then develop a convolutional denoising autoencoder (ConvDAE) that can leverage the spatial structure of the entire hyperparameter performance space and evaluate the performance of hyperparameters via denoising when the performance of partial hyperparameters is available under the multidimensional framework. To make our approach being flexible in applications, we establish a comprehensive two-branch CNN model that can utilize both dataset characteristics and partial evaluations to make effective recommendations. We conduct extensive experiments on 400 real classification problems and the well-known SVM. Our proposed approaches outperform existing meta-learning baselines as well as various search algorithms, demonstrating the high effectiveness in hyperparameter recommendations via deep learning.

PaperID: 212,

Authors: Bin-Bin Jia, Jun-Ying Liu, Min-Ling Zhang

Affiliations: College of Electrical and Information Engineering, Lanzhou University of Technology, Lanzhou, China

Title: Instance-Specific Loss-Weighted Decoding for Decomposition-Based Multiclass Classification

Abstract:
Multiclass classification problems are often addressed by decomposing them into a set of binary classification tasks. A critical step in this approach is the effective aggregation of predictions from each decomposed binary classifier to yield the final multiclass prediction, a process known as decoding. Existing studies have ignored the varying generalization ability of each binary classifier across different samples during decoding, potentially leading to suboptimal performance. In this article, we propose an instance-specific loss-weighted (ILW) decoding strategy that gauges the generalization ability of each binary classifier for one specific sample based on its neighboring samples. This estimated generalization ability is then used to adjust the importance of the binary classifier in determining the sample’s final prediction. Experimental results validate the effectiveness of the ILW decoding strategy. Furthermore, we demonstrate that softmax regression can be reinterpreted as a one-versus-rest (OvR) decomposition-based multiclass classification algorithm, enabling the application of our decoding strategy to enhance its performance. Comparative studies clearly demonstrate the superiority of the improved softmax regression over its traditional counterpart.

PaperID: 213,

Authors: Like Xin, Wanqi Yang, Lei Wang, Ming Yang

Title: Unpaired Multiview Clustering via Reliable View Guidance

Abstract:
This article focuses on unpaired multiview clustering (UMC), a challenging problem, where paired observed samples are unavailable across multiple views. The goal is to perform effective joint clustering using the unpaired observed samples in all views. In incomplete multiview clustering (IMC), existing methods typically rely on sample pairing between views to capture their complementary. However, this is not applicable in the case of UMC. Hence, we aim to extract the consistent cluster structure across views. In UMC, two challenging issues arise: the uncertain cluster structure due to the lack of labels and the uncertain pairing relationship due to the absence of paired samples. We assume that the view with a good cluster structure is the reliable view, which acts as a supervisor to guide the clustering of the other views. With the guidance of reliable views, a more certain cluster structure of these views is obtained while achieving alignment between the reliable views and the other views. Then, we propose reliable view guided UMC with one reliable view (RG-UMC) and reliable view guided UMC with multiple reliable views (RGs-UMC). Specifically, we design alignment modules with one reliable view and multiple reliable views, respectively, to adaptively guide the optimization process. Also, we utilize the compactness module to enhance the relationship of samples within the same cluster. Meanwhile, an orthogonal constraint is applied to the latent representation to obtain discriminate features. Extensive experiments show that both RG-UMC and RGs-UMC outperform the best state-of-the-art method by an average of 24.14% and 29.42% in normalized mutual information (NMI), respectively.

PaperID: 214,

Authors: Zhicheng Cai, Xiaohan Ding, Qiu Shen, Xun Cao

Affiliations: School of Electronic Science and Engineering, Nanjing University, Nanjing, China; Tencent AI Laboratory, Shenzhen, China

Title: RefConv: Reparameterized Refocusing Convolution for Powerful ConvNets

Abstract:
We propose reparameterized refocusing convolution (RefConv) as a replacement for regular convolutional layers, which is a plug-and-play module to improve the performance without any inference costs. Specifically, given a pretrained model, RefConv applies a trainable Refocusing Transformation to the basis kernels inherited from the pretrained model to establish connections among the parameters. For example, a depthwise RefConv can relate the parameters of a specific channel of convolution kernel to the parameters of the other kernel, i.e., make them refocus on the other parts of the model they have never attended to, rather than focus on the input features only. From another perspective, RefConv augments the priors of existing model structures by utilizing the representations encoded in the pretrained parameters as the priors and refocusing on them to learn novel representations, thus further enhancing the representational capacity of the pretrained model. The experimental results validated that RefConv can improve multiple convolutional neural network (CNN)-based models by a clear margin on image classification (up to 1.47% higher top-1 accuracy on ImageNet), object detection, semantic segmentation, and adversarial attacks without introducing any extra inference costs or altering the original model structure. Further studies demonstrated that RefConv can strengthen the spatial skeletons of kernels, reduce the redundancy of channels, and smooth the loss landscape, which explains its effectiveness.

PaperID: 215,

Authors: Liang Zhao, Qiongjie Xie, Zhengtao Li, Songtao Wu, Yi Yang

Affiliations: School of Software Technology, Dalian University of Technology, Dalian, China; School of Reliability and Systems Engineering, Beihang University, Beijing, China

Title: Dynamic Graph Guided Progressive Partial View-Aligned Clustering

Abstract:
In recent years, there has been a growing focus on multiview data, driven by its rich complementary and consistent information, which has the potential to significantly enhance the performance of downstream tasks. Although many multiview clustering (MVC) methods have achieved promising results by integrating the information of multiple views to learn the consistent representation or consistent graph, these methods typically require complete and entirely accurate correspondences between multiview data, which is challenging to fulfill in practice leading to the problem of partially view-aligned clustering (PVC). To tackle it, we propose a novel method, called dynamic graph guided progressive partial view-aligned clustering (DGPPVC) in this article. To the best of our knowledge, this could be the first work to employ graph convolutional network (GCN) to address the problem of PVC, which explores GCN with dynamic adjacency matrix to reduce unreliable alignments and locate the feature representation with consistent graph structure. In particular, DGPPVC develops an end-to-end framework that encompasses graph construction, feature representation learning, and alignment relationships learning, in which the three parts mutually influence and benefit each other. Moreover, DGPPVC adopts a novel alignment learning strategy that progresses from simplicity to complexity, enabling the step-by-step acquisition of unknown correspondences between different modalities. By giving priority to simple instance pairs, a variant of Jaccard similarities is designed to identify more reliable and complex alignments progressively. During the gradual learning process of alignment relationships, the graph structure matrix is continually and dynamically optimized, thus acquiring a greater variety of graph information between different views. Experiments on several real-world datasets show our promising performance compared with the state-of-the-art methods in partially view-aligned clustering.

PaperID: 216,

Authors: Jiao Li, Liangxiao Jiang, Wenjun Zhang

Affiliations: School of Computer Science, China University of Geosciences, Wuhan, China

Title: Label Consistency-Based Ground Truth Inference for Crowdsourcing

Abstract:
In crowdsourcing scenarios, we can obtain each instance’s multiple noisy labels from different crowd workers and then infer its unknown ground truth via a ground truth inference method. However, to the best of our knowledge, the existing ground truth inference methods always attempt to aggregate multiple noisy labels into a single consensus label as the ground truth. In this article, we aim to explore a new strategy, i.e., label selection, which directly selects the label of the highest quality worker as the ground truth. To this end, we propose a label consistency-based ground truth inference (LCGTI) method. In LCGTI, we argue that high-quality workers should have a low bias with other workers in labeling the same instances and a low variance with themselves in labeling similar instances. To estimate the bias, we calculate the label consistency of different workers on the same instances. To estimate the variance, we calculate the label consistency of the same worker on similar instances. Finally, we combine these two components to calculate the labeling quality of each worker on the inferred instance and perform label selection instead of label aggregation to achieve inference. The experimental results on 34 simulated and two real-world datasets show that LCGTI significantly outperforms all the other state-of-the-art label aggregation-based ground truth inference methods.

PaperID: 217,

Authors: Yanyan Shao, Dongyan Guo, Ying Cui, Zhenhua Wang, Liyan Zhang, Jianhua Zhang

Affiliations: College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China; College of Information Engineering, Northwest A&F University, Xianyang, China; College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China; School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China

Title: Graph Attention Network for Context-Aware Visual Tracking

Abstract:
Siamese-network-based trackers convert the general object tracking as a similarity matching task between a template and a search region. Using convolutional feature cross correlation (Xcorr) for similarity matching, a large number of Siamese trackers are proposed and achieved great success. However, due to the predefined size of the target feature, these trackers suffer from either retaining much background information or losing important foreground information. Moreover, the global matching between the target and search region also largely neglects the part-level structural information and the contextual information of the target. To tackle the aforementioned obstacles, in this article, we propose a simple context-aware Siamese graph attention network, which establishes part-to-part correspondence between the Siamese branches with a complete bipartite graph. The object information from the template is propagated to the search region via a graph attention mechanism. With such a design, a target-aware template input is enabled to replace the prefixed template region, which can adaptively fit the size and aspect ratio variations in different objects. Based on it, we further construct a context-aware feature matching mechanism to embed both the target and the contextual information in the search region. Experiments on challenging benchmarks including GOT-10k, TrackingNet, LaSOT, VOT2020, and OTB-100 demonstrate that the proposed SiamGAT outperforms many state-of-the-art trackers and achieves leading performance. Code is available at: https://git.io/SiamGAT.

PaperID: 218,

Authors: Xiaolong Fan, Maoguo Gong, Yue Wu, Mingyang Zhang, Hao Li, Xiangming Jiang

Affiliations: School of Electronic Engineering, Key Laboratory of Collaborative Intelligence Systems, Ministry of Education, Xidian University, Xi’an, China; School of Computer Science and Technology, Key Laboratory of Collaborative Intelligence Systems, Ministry of Education, Xidian University, Xi’an, China

Title: CCGIB: A Cross-Channel Graph Information Bottleneck Principle

Abstract:
The empirical studies of most existing graph neural networks (GNNs) broadly take the original node feature and adjacency relationship as single-channel input, ignoring the rich information of multiple graph channels. To circumvent this issue, the multichannel graph analysis framework has been developed to fuse graph information across channels. How to model and integrate shared (i.e., consistency) and channel-specific (i.e., complementarity) information is a key issue in multichannel graph analysis. In this article, we propose a cross-channel graph information bottleneck (CCGIB) principle to maximize the agreement for common representations and the disagreement for channel-specific representations. Under this principle, we formulate the consistency and complementarity information bottleneck (IB) objectives. To enable optimization, a viable approach involves deriving variational lower bound and variational upper bound (VarUB) of mutual information terms, subsequently focusing on optimizing these variational bounds to find the approximate solutions. However, obtaining the lower bounds of cross-channel mutual information objectives proves challenging through direct utilization of variational approximation, primarily due to the independence of the distributions. To address this challenge, we leverage the inherent property of joint distributions and subsequently derive variational bounds to effectively optimize these information objectives. Extensive experiments on graph benchmark datasets demonstrate the superior effectiveness of the proposed method.

PaperID: 219,

Authors: Fusheng Hao, Liu Liu, Fuxiang Wu, Qieshi Zhang, Jun Cheng

Affiliations: Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; School of Artificial Intelligence and the State Key Laboratory of Software Development Environment, Beihang University, Beijing, China

Title: Class-Irrelevant Feature Removal for Few-Shot Image Classification

Abstract:
Most existing few-shot image classification methods employ global pooling to aggregate class-relevant local features in a data-drive manner. Due to the difficulty and inaccuracy in locating class-relevant regions in complex scenarios, as well as the large semantic diversity of local features, the class-irrelevant information could reduce the robustness of the representations obtained by performing global pooling. Meanwhile, the scarcity of labeled images exacerbates the difficulties of data-hungry deep models in identifying class-relevant regions. These issues severely limit deep models’ few-shot learning ability. In this work, we propose to remove the class-irrelevant information by making local features class relevant, thus bypassing the big challenge of identifying which local features are class irrelevant. The resulting class-irrelevant feature removal (CIFR) method consists of three phases. First, we employ the masked image modeling strategy to build an understanding of images’ internal structures that generalizes well. Second, we design a semantic-complementary feature propagation module to make local features class relevant. Third, we introduce a weighted dense-connected similarity measure, based on which a loss function is raised to fine-tune the entire pipeline, with the aim of further enhancing the semantic consistency of the class-relevant local features. Visualization results show that CIFR achieves the removal of class-irrelevant information by making local features related to classes. Comparison results on four benchmark datasets indicate that CIFR yields very promising performance.

PaperID: 220,

Authors: Guangzhi Ma, Jie Lu, Zhen Fang, Feng Liu, Guangquan Zhang

Affiliations: Australian Artificial Intelligence Institute, Faulty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia

Title: Multiview Classification Through Learning From Interval-Valued Data

Abstract:
The classification problem concerning crisp-valued data has been well resolved. However, interval-valued data, where all of the observations’ features are described by intervals, are also a common data type in real-world scenarios. For example, the data extracted by many measuring devices are not exact numbers but intervals. In this article, we focus on a highly challenging problem called learning from interval-valued data (LIND), where we aim to learn a classifier with high performance on interval-valued observations. First, we obtain the estimation error bound of the LIND problem based on the Rademacher complexity. Then, we give the theoretical analysis to show the strengths of multiview learning on classification problems, which inspires us to construct a new algorithm called multiview interval information extraction (Mv-IIE) approach for improving classification accuracy on interval-valued data. The experiment comparisons with several baselines on both synthetic and real-world datasets illustrate the superiority of the proposed framework in handling interval-valued data. Moreover, we describe an application of Mv-IIE that we can prevent data privacy leakage by transforming crisp-valued (raw) data into interval-valued data.

PaperID: 221,

Authors: Taiheng Liu, Guang-Zhong Cao, Zhaoshui He, Shengli Xie, Xiuqin Deng

Affiliations: Guangdong Key Laboratory of Electromagnetic Control and Intelligent Robots, College of Mechatronics and Control Engineering, the College of Physics and Optoelectronic Engineering, and the National Key Laboratory of Green and Long-Life Engineering in Extreme Environment (Shenzhen), Shenzhen University, Shenzhen, China; Guangdong Key Laboratory of Electromagnetic Control and Intelligent Robots, College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen, China; School of Automation, Guangdong-HongKong-Macao Joint Laboratory for Smart Discrete Manufacturing, and the Key Laboratory for IoT Intelligent Information Processing and System Integration of Ministry of Education, Guangdong University of Technology, Guangzhou, China; School of Applied Mathematics, Guangdong University of Technology, Guangzhou, China

Title: Multimodal Fusion Network for 3-D Lane Detection

Abstract:
3-D lane detection is a challenging task due to the diversity of lanes, occlusion, dazzle light, and so on. Traditional methods usually use highly specialized handcrafted features and carefully designed postprocessing to detect them. However, these methods are based on strong assumptions and single modal so that they are easily scalable and have poor performance. In this article, a multimodal fusion network (MFNet) is proposed through using multihead nonlocal attention and feature pyramid for 3-D lane detection. It includes three parts: multihead deformable transformation (MDT) module, multidirectional attention feature pyramid fusion (MA-FPF) module, and top-view lane prediction (TLP) ones. First, MDT is presented to learn and mine multimodal features from RGB images, depth maps, and point cloud data (PCD) for achieving optimal lane feature extraction. Then, MA-FPF is designed to fuse multiscale features for presenting the vanish of lane features as the network deepens. Finally, TLP is developed to estimate 3-D lanes and predict their position. Experimental results on the 3-D lane synthetic and ONCE-3DLanes datasets demonstrate that the performance of the proposed MFNet outperforms the state-of-the-art methods in both qualitative and quantitative analyses and visual comparisons.

PaperID: 222,

Authors: Yingsong Cheng, Xinya Wang, Yong Ma, Xiaoguang Mei, Minghui Wu, Jiayi Ma

Affiliations: Electronic Information School, Wuhan University, Wuhan, China

Title: General Hyperspectral Image Super-Resolution via Meta-Transfer Learning

Abstract:
Recent advances in deep learning-based methods have led to significant progress in the hyperspectral super-resolution (SR). However, the scarcity and the high dimension of data have hindered further development since deep models require sufficient data to learn stable patterns. Moreover, the huge domain differences between hyperspectral image (HSI) datasets pose a significant challenge in generalizability. To address these problems, we present a general hyperspectral SR framework via meta-transfer learning (MTL). We randomly sample various spectral ranges for SR tasks during MTL, allowing the model to accumulate diverse task experiences. Additionally, we implement a task schedule to gradually expand the number of bands, bridging the significant domain differences between datasets. By leveraging multiple datasets, we are able to achieve better performance and greater generalizability, making it applicable under various circumstances. Meanwhile, as a general framework, our scheme can be applied to existing methods to obtain performance improvements. In addition, we design an advanced network architecture based on the multifusion features to further improve the performance. Experiments demonstrate that our method not only achieves superior performance in both qualitative and quantitative terms but also can adapt robustly to a new and difficult sample, where few epochs can yield quite considerable results.

PaperID: 223,

Authors: Bin Gu, Hilal AlQuabeh, William de Vazelhes, Zhouyuan Huo, Heng Huang

Affiliations: School of Artificial Intelligence, Jilin University, Changchun, China; Department of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE; Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA; Department of Computer Science, University of Maryland, College Park, MD, USA

Title: Stagewise Training With Exponentially Growing Training Sets

Abstract:
In the world of big data, training large-scale machine learning problems has gained considerable attention. Numerous innovative optimization strategies have been presented in recent years to accelerate the large-scale training process. However, the possibility of further accelerating the training process of various optimization algorithms remains an unresolved subject. To begin addressing this difficult problem, we exploit the researched findings that when training data are independent and identically distributed, the learning problem on a smaller dataset is not significantly different from the original one. Upon that, we propose a stagewise training technique that grows the size of the training set exponentially while solving nonsmooth subproblem. We demonstrate that our stagewise training via exponentially growing the size of the training sets (STEGSs) are compatible with a large number of proximal gradient descent and gradient hard thresholding (GHT) techniques. Interestingly, we demonstrate that STEGS can greatly reduce overall complexity while maintaining statistical accuracy or even surpassing the intrinsic error introduced by GHT approaches. In addition, we analyze the effect of the training data growth rate on the overall complexity. The practical results of applying l_2,1 - and l_0 -norms to a variety of large-scale real-world datasets not only corroborate our theories but also demonstrate the benefits of our STEGS framework.

PaperID: 224,

Authors: Lei Yuan, Lihe Li, Ziqian Zhang, Fuxiang Zhang, Cong Guan, Yang Yu

Affiliations: National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China

Title: Multiagent Continual Coordination via Progressive Task Contextualization

Abstract:
Cooperative multiagent reinforcement learning (MARL) has attracted significant attention and has the potential for many real-world applications. Previous arts mainly focus on facilitating the coordination ability from different aspects (e.g., nonstationarity and credit assignment) in single-task or multitask scenarios, ignoring the stream of tasks that appear in a continual manner. This ignorance makes the continual coordination an unexplored territory, neither in problem formulation nor efficient algorithms designed. Toward tackling the mentioned issue, this article proposes an approach, multiagent continual coordination via progressive task contextualization (MACPro). The key point lies in obtaining a factorized policy, using shared feature extraction layers but separated independent task heads, each specializing in a specific class of tasks. The task heads can be progressively expanded based on the learned task contextualization. Moreover, to cater to the popular centralized training with decentralized execution (CTDE) paradigm in MARL, each agent learns to predict and adopt the most relevant policy head based on local information in a decentralized manner. We show in multiple multiagent benchmarks that existing continual learning methods fail, while MACPro is able to achieve close-to-optimal performance. More results also disclose the effectiveness of MACPro from multiple aspects, such as high generalization ability.

PaperID: 225,

Authors: Xiangkun He, Jianye Hao, Xu Chen, Jun Wang, Xuewu Ji, Chen Lv

Affiliations: School of Mechanical and Aerospace Engineering, Nanyang Technological University, Jurong West, Singapore; College of Intelligence and Computing, Tianjin University, Tianjin, China; Jailing School of Artificial Intelligence, Renmin University of China, Beijing, China; Department of Computer Science, University College London, London, U.K.; State Key Laboratory of Automotive Safety and Energy, Tsinghua University, Beijing, China

Title: Robust Multiobjective Reinforcement Learning Considering Environmental Uncertainties

Abstract:
Numerous real-world decision or control problems involve multiple conflicting objectives whose relative importance (preference) is required to be weighed in different scenarios. While Pareto optimality is desired, environmental uncertainties (e.g., environmental changes or observational noises) may mislead the agent into performing suboptimal policies. In this article, we present a novel multiobjective optimization paradigm, robust multiobjective reinforcement learning (RMORL) considering environmental uncertainties, to train a single model that can approximate robust Pareto-optimal policies across the entire preference space. To enhance policy robustness against environmental changes, an environmental disturbance is modeled as an adversarial agent across the entire preference space via incorporating a zero-sum game into a multiobjective Markov decision process (MOMDP). Additionally, we devise an adversarial defense technique against observational perturbations, which ensures that policy variations, perturbed by adversarial attacks on state observations, remain within bounds under any specified preferences. The proposed technique is assessed in five multiobjective environments with continuous action spaces, showcasing its effectiveness through comparisons with competitive baselines, which encompass classical and state-of-the-art schemes.

PaperID: 226,

Authors: Jia-Qi Lin, Man-Sheng Chen, Xi-Ran Zhu, Chang-Dong Wang, Haizhang Zhang

Affiliations: School of Mathematics (Zhuhai), Sun Yat-sen University, Zhuhai, China; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China

Title: Dual Information Enhanced Multiview Attributed Graph Clustering

Abstract:
Multiview attributed graph clustering is an important approach to partition multiview data based on the attribute characteristics and adjacent matrices from different views. Some attempts have been made in using graph neural network (GNN), which have achieved promising clustering performance. Despite this, few of them pay attention to the inherent specific information embedded in multiple views. Meanwhile, they are incapable of recovering the latent high-level representation from the low-level ones, greatly limiting the downstream clustering performance. To fill these gaps, a novel dual information enhanced multiview attributed graph clustering (DIAGC) method is proposed in this article. Specifically, the proposed method introduces the specific information reconstruction (SIR) module to disentangle the explorations of the consensus and specific information from multiple views, which enables graph convolutional network (GCN) to capture the more essential low-level representations. Besides, the contrastive learning (CL) module maximizes the agreement between the latent high-level representation and low-level ones and enables the high-level representation to satisfy the desired clustering structure with the help of the self-supervised clustering (SC) module. Extensive experiments on several real-world benchmarks demonstrate the effectiveness of the proposed DIAGC method compared with the state-of-the-art baselines.

PaperID: 227,

Authors: Hongmin Liu, Yuefeng Cai, Bin Fan, Jinglin Xu

Affiliations: School of Intelligence Science and Technology, Institute of Artificial Intelligence, University of Science and Technology Beijing, Beijing, China

Title: ScalableTrack: Scalable One-Stream Tracking via Alternating Learning

Abstract:
Transformer-based one-stream trackers are widely used to extract features and interact information for visual object tracking. However, the current one-stream tracker has fixed computational dimensions between different stages, which limits the network’s ability to learn context clues and global representations, resulting in a decrease in the ability to distinguish between targets and backgrounds. To address this issue, a new scalable one-stream tracking framework, ScalableTrack, is proposed. It unifies feature extraction and information integration by intrastage mutual guidance, leveraging the scalability of target-oriented features to enhance object sensitivity and obtain discriminative global representations. In addition, we bridge interstage contextual cues by introducing an alternating learning strategy and solve the arrangement problem of the two modules. The alternating learning strategy uses alternate stacks of feature extraction and information interaction to focus on tracked objects and prevent catastrophic forgetting of target information between different stages. Experiments on eight challenging benchmarks (TrackingNet, GOT-10k, VOT2020, UAV123, LaSOT, LaSOText, OTB100, and TC128) show that ScalableTrack outperforms state-of-the-art (SOTA) methods with better generalization and global representation ability.

PaperID: 228,

Authors: Mingcai Chen, Yu Zhao, Bing He, Zongbo Han, Junzhou Huang, Bingzhe Wu, Jianhua Yao

Affiliations: State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; Tencent AI Lab, Shenzhen, China; Computer Science and Engineering Department, University of Texas at Arlington, Arlington, TX, USA

Title: Learning With Noisy Labels Over Imbalanced Subpopulations

Abstract:
Learning with noisy labels (LNL) has attracted significant attention from the research community. Many recent LNL methods rely on the assumption that clean samples tend to have a “small loss.” However, this assumption often fails to generalize to some real-world cases with imbalanced subpopulations, that is, training subpopulations that vary in sample size or recognition difficulty. Therefore, recent LNL methods face the risk of misclassifying those “informative” samples (e.g., hard samples or samples in the tail subpopulations) into noisy samples, leading to poor generalization performance. To address this issue, we propose a novel LNL method to deal with noisy labels and imbalanced subpopulations simultaneously. It first leverages sample correlation to estimate samples’ clean probabilities for label correction and then utilizes corrected labels for distributionally robust optimization (DRO) to further improve the robustness. Specifically, in contrast to previous works using classification loss as the selection criterion, we introduce a feature-based metric that takes the sample correlation into account for estimating samples’ clean probabilities. Then, we refurbish the noisy labels using the estimated clean probabilities and the pseudo-labels from the model’s predictions. With refurbished labels, we use DRO to train the model to be robust to subpopulation imbalance. Extensive experiments on a wide range of benchmarks demonstrate that our technique can consistently improve state-of-the-art (SOTA) robust learning paradigms against noisy labels, especially when encountering imbalanced subpopulations. We provide our code in https://github.com/chenmc1996/LNL-IS.

PaperID: 229,

Authors: Qi Liu, Yanjie Li, Xiongtao Shi, Ke Lin, Yuecheng Liu, Yunjiang Lou

Affiliations: Guangdong Key Laboratory of Intelligent Morphing Mechanisms and Adaptive Robotics and the School of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen, China; Huawei Noah’s Ark Lab, Shenzhen, China

Title: Distributional Policy Gradient With Distributional Value Function

Abstract:
In this article, we propose a distributional policy-gradient method based on distributional reinforcement learning (RL) and policy gradient. Conventional RL algorithms typically estimate the expectation of return, given a state–action pair. Furthermore, distributional RL algorithms consider the return as a random variable and estimate the return distribution that can characterize the probability of different returns resulted by environmental uncertainties. Thus, the return distribution provides more valuable information than its expectation, leading to superior policies in general. Although distributional RL has been investigated widely in value-based RL methods, very few policy-gradient methods take advantage of distributional RL. To bridge this research gap, we propose a distributional policy-gradient method by introducing a distributional value function to the policy gradient (DVDPG). We estimate the distribution of policy gradient instead of the expectation estimated in conventional policy-gradient RL methods. Furthermore, we propose two policy-gradient value sampling mechanisms to do policy improvement. First, we propose a distribution-probability-sampling method that samples the policy-gradient value according to the quantile probability of return distribution. Second, a uniform sample mechanism is proposed. With our sample mechanisms, the proposed distributional policy-gradient method enhances the stochasticity of the policy gradient, improving the exploration efficiency and benefiting to avoid falling into local optimal solutions. In sparse-reward tasks, the distribution-probability-sampling method outperforms the uniform sample mechanism. In dense-reward tasks, the two sample mechanisms perform similarly. Moreover, we show that the conventional policy-gradient method is a special case of the proposed method. Experimental results on various sparse-reward and dense-reward OpenAI-gym tasks illustrate the efficiency of the proposed method, outperforming baselines in almost environments.

PaperID: 230,

Authors: Yuexuan An, Hui Xue, Xingyu Zhao, Ning Xu, Pengfei Fang, Xin Geng

Affiliations: School of Computer Science and Engineering, Southeast University, Nanjing, China

Title: Leveraging Bilateral Correlations for Multi-Label Few-Shot Learning

Abstract:
Multi-label few-shot learning (ML-FSL) refers to the task of tagging previously unseen images with a set of relevant labels, giving a small number of training examples. Modeling the correlations between instances and labels, formulated in the existing methods, allows us to extract more available knowledge from limited examples. However, they simply explore the instance and label correlations with a uniform importance assumption without considering the discrepancy of importance in different instances or labels, making the utilization of instance and label correlations a bottleneck for ML-FSL. To tackle the issue, we propose a unified framework named bilateral correlation reconstruction (BCR) to enable the network to effectively mine underlying instance and label correlations with varying importance information from both instance-to-label and label-to-instance perspectives. Specifically, from the instance-to-label perspective, we refine prototypes per category by reweighting each image with its specific instance-importance degree extracted from the similarity between the instance and the corresponding category. From the label-to-instance perspective, we smooth labels for each image by recovering latent label-importance with considering the integrated topology of all samples in a task. Experimental results on multiple benchmarks validate that BCR could outperform existing ML-FSL methods by large margins.

PaperID: 231,

Authors: Meijun Fu, Xiaomin Wang, Jun Wang, Zhang Yi

Affiliations: School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China; School of Information Science and Technology, Southwest Jiaotong University, Chengdu, China; School of Mathematics and Computer Science, Yunnan Minzu University, Kunming, China; Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu, China

Title: Prototype Bayesian Meta-Learning for Few-Shot Image Classification

Abstract:
Meta-learning aims to leverage prior knowledge from related tasks to enable a base learner to quickly adapt to new tasks with limited labeled samples. However, traditional meta-learning methods have limitations as they provide an optimal initialization for all new tasks, disregarding the inherent uncertainty induced by few-shot tasks and impeding task-specific self-adaptation initialization. In response to this challenge, this article proposes a novel probabilistic meta-learning approach called prototype Bayesian meta-learning (PBML). PBML focuses on meta-learning variational posteriors within a Bayesian framework, guided by prototype-conditioned prior information. Specifically, to capture model uncertainty, PBML treats both meta- and task-specific parameters as random variables and integrates their posterior estimates into hierarchical Bayesian modeling through variational inference (VI). During model inference, PBML employs Laplacian estimation to approximate the integral term over the likelihood loss, deriving a rigorous upper-bound for generalization errors. To enhance the model’s expressiveness and enable task-specific adaptive initialization, PBML proposes a data-driven approach to model the task-specific variational posteriors. This is achieved by designing a generative model structure that incorporates prototype-conditioned task-dependent priors into the random generation of task-specific variational posteriors. Additionally, by performing latent embedding optimization, PBML decouples the gradient-based meta-learning from the high-dimensional variational parameter space. Experimental results on benchmark datasets for few-shot image classification illustrate that PBML attains state-of-the-art or competitive performance when compared to other related works. Versatility studies demonstrate the adaptability and applicability of PBML in addressing diverse and challenging few-shot tasks. Furthermore, ablation studies validate the performance gains attributed to the inference and model components.

PaperID: 232,

Authors: Arash Vahabpour, Tianyi Wang, Qiujing Lu, Omead Pooladzandi, Vwani Roychowdhury

Affiliations: Department of Electrical and Computer Engineering, University of California, Los Angeles, Los Angeles, CA, USA

Title: Diverse Imitation Learning via Self-Organizing Generative Models

Abstract:
Imitation learning (IL) is a well-known problem in the field of Markov decision process (MDP), where one is given multiple demonstration trajectories generated by expert(s), and the goal is to replicate the hidden expert-policies so that when the MDP is run independently, it generates trajectories close to the demonstrated ones. IL is one of the most useful tools used in building versatile robots that can learn from examples. This task becomes particularly challenging when the expert exhibits a mixture of behavior modes. Prior work has introduced latent variables to model variations of the expert policy. However, our experiments show that the existing works do not exhibit appropriate imitation of individual modes. To tackle this problem, we first draw inspiration from the well-known classical technique of self-organizing maps (SOMs) and introduce an encoder-free generative model—referred to as the self-organizing generative (SOG) model—for learning multimodal data distributions from samples. We then apply SOG for behavior cloning (BC)—a framework that learns deterministic policies without considering the environment—to accurately distinguish and imitate different modes. Then, we integrate it with generative adversarial IL (GAIL)—a framework that learns policies while considering the environment—to make the learning robust toward compounding errors at unseen states. We show that our method significantly outperforms the state of the art across multiple experiments within the MuJoCo simulator, including locomotion and robotic manipulation tasks.

PaperID: 233,

Authors: Changhao Chen, Stefano Rosa, Chris Xiaoxuan Lu, Bing Wang, Niki Trigoni, Andrew Markham

Affiliations: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China; Istituto Italiano di Tecnologia (IIT), Genoa, Italy; School of Informatics, The University of Edinburgh, Edinburgh, U.K; Department of Computer Science, University of Oxford, Oxford, U.K

Title: Learning Selective Sensor Fusion for State Estimation

Abstract:
Autonomous vehicles and mobile robotic systems are typically equipped with multiple sensors to provide redundancy. By integrating the observations from different sensors, these mobile agents are able to perceive the environment and estimate system states, e.g., locations and orientations. Although deep learning (DL) approaches for multimodal odometry estimation and localization have gained traction, they rarely focus on the issue of robust sensor fusion—a necessary consideration to deal with noisy or incomplete sensor observations in the real world. Moreover, current deep odometry models suffer from a lack of interpretability. To this extent, we propose SelectFusion, an end-to-end selective sensor fusion module that can be applied to useful pairs of sensor modalities, such as monocular images and inertial measurements, depth images, and light detection and ranging (LIDAR) point clouds. Our model is a uniform framework that is not restricted to specific modality or task. During prediction, the network is able to assess the reliability of the latent features from different sensor modalities and to estimate trajectory at both scale and global pose. In particular, we propose two fusion modules—a deterministic soft fusion and a stochastic hard fusion—and offer a comprehensive study of the new strategies compared with trivial direct fusion. We extensively evaluate all fusion strategies both on public datasets and on progressively degraded datasets that present synthetic occlusions, noisy and missing data, and time misalignment between sensors, and we investigate the effectiveness of the different fusion strategies in attending the most reliable features, which in itself provides insights into the operation of the various models.

PaperID: 234,

Authors: Andong Lu, Cun Qian, Chenglong Li, Jin Tang, Liang Wang

Affiliations: Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and the Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei, China; Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and the Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Artificial Intelligence, Anhui University, Hefei, China; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China

Title: Duality-Gated Mutual Condition Network for RGBT Tracking

Abstract:
Low-quality modalities contain not only a lot of noisy information but also some discriminative features in RGB-Thermal (RGBT) tracking. However, the potentials of low-quality modalities are not well explored in existing RGBT tracking algorithms. In this work, we propose a novel duality-gated mutual condition network to fully exploit the discriminative information of all modalities while suppressing the effects of data noise. In specific, we design a mutual condition module, which takes the discriminative information of a modality as the condition to guide feature learning of target appearance in another modality. Such a module can effectively enhance target representations of all modalities even in the presence of low-quality modalities. To improve the quality of conditions and further reduce data noise, we propose a duality-gated mechanism and integrate it into the mutual condition module. To deal with the tracking failure caused by sudden camera motion, which often occurs in RGBT tracking, we design a resampling strategy based on optical flow. It does not increase much computational cost since we perform optical flow calculation only when the model prediction is unreliable and then execute resampling when the sudden camera motion is detected. Extensive experiments on four RGBT tracking benchmark datasets show that our method performs favorably against the state-of-the-art tracking algorithms.

PaperID: 235,

Authors: Ying Zhao, Shuang Li, Rui Zhang, Chi Harold Liu, Weipeng Cao, Xizhao Wang, Song Tian

Affiliations: School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China; College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China; Big Data Institute, Shenzhen University, Shenzhen, China; TravelSky Technology Limited, Beijing, China

Title: Semantic Correlation Transfer for Heterogeneous Domain Adaptation

Abstract:
Heterogeneous domain adaptation (HDA) is expected to achieve effective knowledge transfer from a label-rich source domain to a heterogeneous target domain with scarce labeled data. Most prior HDA methods strive to align the cross-domain feature distributions by learning domain invariant representations without considering the intrinsic semantic correlations among categories, which inevitably results in the suboptimal adaptation performance across domains. Therefore, to address this issue, we propose a novel semantic correlation transfer (SCT) method for HDA, which not only matches the marginal and conditional distributions between domains to mitigate the large domain discrepancy, but also transfers the category correlation knowledge underlying the source domain to target by maximizing the pairwise class similarity across source and target. Technically, the domainwise and classwise centroids (prototypes) are first computed and aligned according to the feature embeddings. Then, based on the derived classwise prototypes, we leverage the cosine similarity of each two classes in both domains to transfer the supervised source semantic correlation knowledge among different categories to target effectively. As a result, the feature transferability and category discriminability can be simultaneously improved during the adaptation process. Comprehensive experiments and ablation studies on standard HDA tasks, such as text-to-image, image-to-image, and text-to-text, have demonstrated the superiority of our proposed SCT against several state-of-the-art HDA methods.

PaperID: 236,

Authors: Hongxiang Jiang, Xiaoyan Luo, Jihao Yin, Huazhu Fu, Fuxiang Wang

Affiliations: School of Aerospace, Beihang University, Beijing, China; Institute of High Performance Computing, Agency for Science, Technology and Research, Fusionopolis, Singapore; School of Electronics and Information Engineering, Beihang University, Beijing, China

Title: Orthogonal Subspace Representation for Generative Adversarial Networks

Abstract:
Disentanglement learning aims to separate explanatory factors of variation so that different attributes of the data can be well characterized and isolated, which promotes efficient inference for downstream tasks. Mainstream disentanglement approaches based on generative adversarial networks (GANs) learn interpretable data representation. However, most typical GAN-based works lack the discussion of the latent subspace, causing insufficient consideration of the variation of independent factors. Although some recent research analyzes the latent space on pretrained GANs for image editing, they do not emphasize learning representation directly from the subspace perspective. Appropriate subspace properties could facilitate corresponding feature representation learning to satisfy the independent variation requirements of the obtained explanatory factors, which is crucial for better disentanglement. In this work, we propose a unified framework for ensuring disentanglement, which fully investigates latent subspace learning (SL) in GAN. The novel GAN-based architecture explores orthogonal subspace representation (OSR) on vanilla GAN, named OSRGAN. To guide a subspace with strong correlation, less redundancy, and robust distinguishability, our OSR includes three stages, self-latent-aware, orthogonal subspace-aware, and structure representation-aware, respectively. First, the self-latent-aware stage promotes the latent subspace strongly correlated with the data space to discover interpretable factors, but with poor independence of variation. Second, the following orthogonal subspace-aware stage adaptively learns some 1-D linear subspace spanned by a set of orthogonal bases in the latent space. There is less redundancy between them, expressing the corresponding independence. Third, the structure representation-aware stage aligns the projection on the orthogonal subspace and the latent variables. Accordingly, feature representation in each linear subspace can be distinguishable, enhancing the independent expression of interpretable factors. In addition, we design an alternating optimization step, achieving a tradeoff training of OSRGAN on different properties. Despite it strictly constrains orthogonality, the loss weight coefficient of distinguishability induced by orthogonality could be adjusted and balanced with correlation constraint. To elucidate, this tradeoff training prevents our OSRGAN from overemphasizing any property and damaging the expressiveness of the feature representation. It takes into account both interpretable factors and their independent variation characteristics. Meanwhile, alternating optimization could keep the cost and efficiency of forward inference unchanged and will not burden the computational complexity. In theory, we clarify the significance of OSR, which brings better independence of factors, along with interpretability as correlation could converge to a high range faster. Moreover, through the convergence behavior analysis, including the objective functions under different constraints and the evaluation curve with iterations, our model demonstrates enhanced stability and definitely converges toward a higher peak for disentanglement. To depict the performance in downstream tasks, we compared the state-of-the-art GAN-based and even VAE-based approaches on different datasets. Our OSRGAN achieves higher disentanglement scores on FactorVAE, SAP, MIG, and VP metrics. All the experimental results illustrate that our novel GAN-based framework has considerable advantages on disentanglement.

PaperID: 237,

Authors: Ruoyu Zhao, Mingrui Zhu, Nannan Wang, Xinbo Gao

Affiliations: State Key Laboratory of Integrated Services Networks, School of Telecommunications Engineering, Xidian University, Xi’an, Shaanxi, China; Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China

Title: Few-Shot Face Stylization via GAN Prior Distillation

Abstract:
Face stylization has made notable progress in recent years. However, when training on limited data, the performance of existing approaches significantly declines. Although some studies have attempted to tackle this problem, they either failed to achieve the few-shot setting (less than 10) or can only get suboptimal results. In this article, we propose GAN Prior Distillation (GPD) to enable effective few-shot face stylization. GPD contains two models: a teacher network with GAN Prior and a student network that fulfills end-to-end translation. Specifically, we adapt the teacher network trained on large-scale data in the source domain to the target domain using a handful of samples, where it can learn the target domain’s knowledge. Then, we can achieve few-shot augmentation by generating source domain and target domain images simultaneously with the same latent codes. We propose an anchor-based knowledge distillation module that can fully use the difference between the training and the augmented data to distill the knowledge of the teacher network into the student network. The trained student network achieves excellent generalization performance with the absorption of additional knowledge. Qualitative and quantitative experiments demonstrate that our method achieves superior results than state-of-the-art approaches in a few-shot setting.

PaperID: 238,

Authors: Jingcai Guo, Han Wang, Yuanyuan Xu, Wenchao Xu, Yufeng Zhan, Yuxia Sun, Song Guo

Affiliations: Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong; School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China; School of Automation, Beijing Institute of Technology, Beijing, China; Department of Computer Science, Jinan University, Guangzhou, China; Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Sai Kung, New Territories, Hong Kong

Title: Multimodal Dual-Embedding Networks for Malware Open-Set Recognition

Abstract:
Malware open-set recognition (MOSR) is an emerging research domain that aims at jointly classifying malware samples from known families and detecting the ones from novel unknown families, respectively. Existing works mostly rely on a well-trained classifier considering the predicted probabilities of each known family with a threshold-based detection to achieve the MOSR. However, our observation reveals that the feature distributions of malware samples are extremely similar to each other even between known and unknown families. Thus, the obtained classifier may produce overly high probabilities of testing unknown samples toward known families and degrade the model performance. In this article, we propose the multi\modal dual-embedding networks, dubbed MDENet, to take advantage of comprehensive malware features from different modalities to enhance the diversity of malware feature space, which is more representative and discriminative for down-stream recognition. Concretely, we first generate a malware image for each observed sample based on their numeric features using our proposed numeric encoder with a re- designed multiscale CNN structure, which can better explore their statistical and spatial correlations. Besides, we propose to organize tokenized malware features into a sentence for each sample considering its behaviors and dynamics, and utilize language models as the textual encoder to transform it into a representable and computable textual vector. Such parallel multimodal encoders can fuse the above two components to enhance the feature diversity. Last, to further guarantee the open-set recognition (OSR), we dually embed the fused multimodal representation into one primary space and an associated sub-space, i.e., discriminative and exclusive spaces, with contrastive sampling and \rho -bounded enclosing sphere regularizations, which resort to classification and detection, respectively. Moreover, we also enrich our previously proposed large-scaled malware dataset MAL-100 with multimodal characteristics and contribute an improved version dubbed MAL-100+. Experimental results on the widely used malware dataset Mailing and the proposed MAL-100+ demonstrate the effectiveness of our method.

PaperID: 239,

Authors: Shuo Yu, Huafei Huang, Yanming Shen, Pengfei Wang, Qiang Zhang, Ke Sun, Honglong Chen

Affiliations: School of Computer Science and Technology and the Key Laboratory of Social Computing and Cognitive Intelligence of Ministry of Education, Dalian University of Technology, Dalian, China; School of Software, Dalian University of Technology, Dalian, China; College of Control Science and Engineering, China University of Petroleum (East China), Qingdao, China

Title: Formulating and Representing Multiagent Systems With Hypergraphs

Abstract:
Graph-learning methods, especially graph neural networks (GNNs), have shown remarkable effectiveness in handling non-Euclidean data and have achieved great success in various scenarios. Existing GNNs are primarily based on message-passing schemes, that is, aggregating information from neighboring nodes. However, the diversity and complexity of complex systems from real-world circumstances are not sufficiently taken into account. In these cases, the individual should be treated as an agent, with the ability to perceive their surroundings and interact with other individuals, rather than just be viewed as nodes in existing graph approaches. Additionally, the pairwise interactions used in existing methods also lack the expressiveness for the higher-order complex relations among multiple agents, thus limiting the performance in various tasks. In this work, we propose a Multiagent Hypergraph Force-learning method dubbed MHGForce. First, we formalize the multiagent system (MAS) and illustrate its connection to graph learning. Then, we propose a generalized multiagent hypergraph-learning framework. In this framework, we integrate message-passing and force-based interactions to devise a pluggable method. The method empowers graph approaches to excel in downstream tasks while effectively maintaining structural information in the representations. Experimental results on the Cora, Citeseer, Cora-CA, Zoo, and NTU2012 datasets in node classification demonstrate the effectiveness and generality of our proposed method. We also discuss the characteristics of the MHGForce and explore its role through parametric analysis and visualization. Finally, we give a discussion, conclude our work, and propose future directions.

PaperID: 240,

Authors: Riccardo Rosati, Luca Romeo, Víctor Manuel Vargas, Pedro Antonio Gutiérrez, Emanuele Frontoni, César Hervás-Martínez

Affiliations: Department of Information Engineering, Università Politecnica delle Marche, Ancona, Italy; Department of Economics and Law, Università degli Studi di Macerata, Macerata, Italy; Department of Computer Science and Numerical Analysis, University of Córdoba, Córdoba, Spain; Department of Political Science, Communication and International Relations, Università degli Studi di Macerata, Macerata, Italy

Title: Learning Ordinal-Hierarchical Constraints for Deep Learning Classifiers

Abstract:
Real-world classification problems may disclose different hierarchical levels where the categories are displayed in an ordinal structure. However, no specific deep learning (DL) models simultaneously learn hierarchical and ordinal constraints while improving generalization performance. To fill this gap, we propose the introduction of two novel ordinal–hierarchical DL methodologies, namely, the hierarchical cumulative link model (HCLM) and hierarchical–ordinal binary decomposition (HOBD), which are able to model the ordinal structure within different hierarchical levels of the labels. In particular, we decompose the hierarchical–ordinal problem into local and global graph paths that may encode an ordinal constraint for each hierarchical level. Thus, we frame this problem as simultaneously minimizing global and local losses. Furthermore, the ordinal constraints are set by two approaches [ordinal binary decomposition (OBD) and cumulative link model (CLM)] within each global and local function. The effectiveness of the proposed approach is measured on four real-use case datasets concerning industrial, biomedical, computer vision, and financial domains. The extracted results demonstrate a statistically significant improvement to state-of-the-art nominal, ordinal, and hierarchical approaches.

PaperID: 241,

Authors: Pin Zhang, Wenhan Dong, Ming Cai, Shengde Jia, Zipeng Wang

Affiliations: College of Aeronautics Engineering, Air Force Engineering University, Xi’an, China; College of Mechatronic Engineering and Automation, National University of Defense Technology, Changsha, China; Faculty of Information Technology, Beijing Laboratory of Smart Environmental Protection, Beijing Key Laboratory of Computational Intelligence and Intelligent System, and Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China

Title: MEOL: A Maximum-Entropy Framework for Options Learning

Abstract:
Options, the temporally extended courses of actions that can be taken at varying time scale, have provided a concrete, key framework for learning levels of temporal abstraction in hierarchical tasks. While methods of learning options end-to-end is well researched, how to explore good options and actions simultaneously is still challenging. We address this issue by maximizing reward augmented with entropies of both option and action selection policy in options learning. To this end, we reveal our novel optimization objective by reformulating options learning from perspective of probabilistic inference and propose a soft options iteration method to guarantee convergence to the optimum. In implementation, we propose an off-policy algorithm called the maximum-entropy options critic (MEOC) and evaluate it on series of continuous control benchmarks. Comparative results demonstrate that our method outperforms baselines in efficiency and final result on most benchmarks, and the performance exhibits superiority and robustness especially on complex tasks. Ablated studies further explain that entropy maximization on hierarchical exploration promotes learning performance through efficient options specialization and multimodality in action level.

PaperID: 242,

Authors: Xiaoyan Zhao, Min Yang, Qiang Qu, Ruifeng Xu

Affiliations: Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, SAR, China; Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; Harbin Institute of Technology (Shenzhen), Shenzhen, China

Title: Few-Shot Relation Extraction With Automatically Generated Prompts

Abstract:
Relation extraction (RE) tends to struggle when the supervised training data is few and difficult to be collected. In this article, we elicit relational and factual knowledge from large pretrained language models (PLMs) for few-shot RE (FSRE) with prompting techniques. Concretely, we automatically generate a diverse set of natural language templates and modulate PLM’s behavior through these prompts for FSRE. To mitigate the template bias which leads to unstableness of few-shot learning, we propose a simple yet effective template regularization network (TRN) to prevent deep networks from over-fitting uncertain templates and thus stabilize the FSRE models. TRN alleviates the template bias with three mechanisms: 1) an attention mechanism over mini-batch to weight each template; 2) a ranking regularization mechanism to regularize the attention weights and constrain the importance of uncertain templates; and 3) a template calibration module with two calibrating techniques to modify the uncertain templates in the lowest-ranked group. Experimental results on two benchmark datasets (i.e., FewRel and NYT) show that our model has robust superiority over strong competitors. For reproducibility, we will release our code and data upon the publication of this article.

PaperID: 243,

Authors: Mengzhu Wang, Junze Liu, Ge Luo, Shanshan Wang, Wei Wang, Long Lan, Ye Wang, Feiping Nie

Affiliations: College of Artificial Intelligence, Hebei University of Technology, Tianjin, China; School of Physics, Changchun University of Science and Technology, Changchun, China; College of Computer, Fudan University, Shanghai, China; Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei, China; School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China; College of Computer, National University of Defense Technology, Changsha, China; School of Artificial Intelligence, Northwestern Polytechnical University, Xi’an, China

Title: Smooth-Guided Implicit Data Augmentation for Domain Generalization

Abstract:
The training process of a domain generalization (DG) model involves utilizing one or more interrelated source domains to attain optimal performance on an unseen target domain. Existing DG methods often use auxiliary networks or require high computational costs to improve the model’s generalization ability by incorporating a diverse set of source domains. In contrast, this work proposes a method called Smooth-Guided Implicit Data Augmentation (SGIDA) that operates in the feature space to capture the diversity of source domains. To amplify the model’s generalization capacity, a distance metric learning (DML) loss function is incorporated. Additionally, rather than depending on deep features, the suggested approach employs logits produced from cross entropy (CE) losses with infinite augmentations. A theoretical analysis shows that logits are effective in estimating distances defined on original features, and the proposed approach is thoroughly analyzed to provide a better understanding of why logits are beneficial for DG. Moreover, to increase the diversity of the source domain, a sampling-based method called smooth is introduced to obtain semantic directions from interclass relations. The effectiveness of the proposed approach is demonstrated through extensive experiments on widely used DG, object detection, and remote sensing datasets, where it achieves significant improvements over existing state-of-the-art methods across various backbone networks.

PaperID: 244,

Authors: Fuli Wang, Karelia Pena-Pena, Wei Qian, Gonzalo R. Arce

Affiliations: Institute for Financial Services Analytics, University of Delaware, Newark, DE, USA; Department of Electrical and Computer Engineering, University of Delaware, Newark, DE, USA; Department of Applied Economics and Statistics, University of Delaware, Newark, DE, USA

Title: T-HyperGNNs: Hypergraph Neural Networks via Tensor Representations

Abstract:
Hypergraph neural networks (HyperGNNs) are a family of deep neural networks designed to perform inference on hypergraphs. HyperGNNs follow either a spectral or a spatial approach, in which a convolution or message-passing operation is conducted based on a hypergraph algebraic descriptor. While many HyperGNNs have been proposed and achieved state-of-the-art performance on broad applications, there have been limited attempts at exploring high-dimensional hypergraph descriptors (tensors) and joint node interactions carried by hyperedges. In this article, we depart from hypergraph matrix representations and present a new tensor-HyperGNN (T-HyperGNN) framework with cross-node interactions (CNIs). The T-HyperGNN framework consists of T-spectral convolution, T-spatial convolution, and T-message-passing HyperGNNs (T-MPHN). The T-spectral convolution HyperGNN is defined under the t-product algebra that closely connects to the spectral space. To improve computational efficiency for large hypergraphs, we localize the T-spectral convolution approach to formulate the T-spatial convolution and further devise a novel tensor-message-passing algorithm for practical implementation by studying a compressed adjacency tensor representation. Compared to the state-of-the-art approaches, our T-HyperGNNs preserve intrinsic high-order network structures without any hypergraph reduction and model the joint effects of nodes through a CNI layer. These advantages of our T-HyperGNNs are demonstrated in a wide range of real-world hypergraph datasets. The implementation code is available at https://github.com/wangfuli/T-HyperGNNs.git.

PaperID: 245,

Authors: Renwei Dian, Yuanye Liu, Shutao Li

Affiliations: School of Robotics, Hunan University, Changsha, China; School of Robotics and the Key Laboratory of Visual Perception and Artificial Intelligence of Hunan Province, Hunan University, Changsha, China; College of Electrical and Information Engineering and the Key Laboratory of Visual Perception and Artificial Intelligence of Hunan Province, Hunan University, Changsha, China

Title: Spectral Super-Resolution via Deep Low-Rank Tensor Representation

Abstract:
Spectral super-resolution has attracted the attention of more researchers for obtaining hyperspectral images (HSIs) in a simpler and cheaper way. Although many convolutional neural network (CNN)-based approaches have yielded impressive results, most of them ignore the low-rank prior of HSIs resulting in huge computational and storage costs. In addition, the ability of CNN-based methods to capture the correlation of global information is limited by the receptive field. To surmount the problem, we design a novel low-rank tensor reconstruction network (LTRN) for spectral super-resolution. Specifically, we treat the features of HSIs as 3-D tensors with low-rank properties due to their spectral similarity and spatial sparsity. Then, we combine canonical-polyadic (CP) decomposition with neural networks to design an adaptive low-rank prior learning (ALPL) module that enables feature learning in a 1-D space. In this module, there are two core modules: the adaptive vector learning (AVL) module and the multidimensionwise multihead self-attention (MMSA) module. The AVL module is designed to compress an HSI into a 1-D space by using a vector to represent its information. The MMSA module is introduced to improve the ability to capture the long-range dependencies in the row, column, and spectral dimensions, respectively. Finally, our LTRN, mainly cascaded by several ALPL modules and feedforward networks (FFNs), achieves high-quality spectral super-resolution with fewer parameters. To test the effect of our method, we conduct experiments on two datasets: the CAVE dataset and the Harvard dataset. Experimental results show that our LTRN not only is as effective as state-of-the-art methods but also has fewer parameters. The code is available at https://github.com/renweidian/LTRN.

PaperID: 246,

Authors: Xiaowei Zhao, Dongming Wu, Feiping Nie, Weizhong Yu, Chen Zhao, Xuelong Li

Affiliations: State Key Laboratory of Electromechanical Integrated Manufacturing of High-Performance Electronic Equipments, the Center for Complex Systems, and the School of Mechano-Electronic Engineering, Xidian University, Xi’an, Shaanxi, China; School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi’an, Shaanxi, China; School of Artificial Intelligence, Optics and Electronics (iOPEN) and the Key Laboratory of Intelligent Interaction and Applications, Ministry of Industry and Information Technology, Northwestern Polytechnical University, Xi’an, Shaanxi, China; School of Computer Science and the Embedded Systems Integration Engineering Research Center, Ministry of Education, Northwestern Polytechnical University, Xi’an, Shaanxi, China

Title: Nonlinear Locality-Preserving Projections With Dynamic Graph Learning

Abstract:
The affinity graph is regarded as a mathematical representation of the local manifold structure. The performance of locality-preserving projections (LPPs) and its variants is tied to the quality of the affinity graph. However, there are two drawbacks in current approaches. First, the pre-designed graph is inconsistent with the actual distribution of data. Second, the linear projection way would cause damage to the nonlinear manifold structure. In this article, we propose a nonlinear dimensionality reduction model, named deep locality-preserving projections (DLPPs), to solve these problems simultaneously. The model consists of two loss functions, each employing deep autoencoders (AEs) to extract discriminative features. In the first loss function, the affinity relationships among samples in the intermediate layer are determined adaptively according to the distances between samples. Since the features of samples are obtained by nonlinear mapping, the manifold structure can be kept in the low-dimensional space. Additionally, the learned affinity graph is able to avoid the influence of noisy and redundant features. In the second loss function, the affinity relationships among samples in the last layer (also called the reconstruction layer) are learned. This strategy enables denoised samples to have a good manifold structure. By integrating these two functions, our proposed model minimizes the mismatch of the manifold structure between samples in the denoising space and the low-dimensional space, while reducing sensitivity to the initial weights of the graph. Extensive experiments on toy and benchmark datasets have been conducted to verify the effectiveness of our proposed model.

PaperID: 247,

Authors: Enrico Picco, Alessandro Lupo, Serge Massar

Affiliations: Laboratoire d’Information Quantique, CP , Université Libre de Bruxelles (ULB), Brussels, Belgium

Title: Deep Photonic Reservoir Computer for Speech Recognition

Abstract:
Speech recognition is a critical task in the field of artificial intelligence (AI) and has witnessed remarkable advancements thanks to large and complex neural networks, whose training process typically requires massive amounts of labeled data and computationally intensive operations. An alternative paradigm, reservoir computing (RC), is energy efficient and is well adapted to implementation in physical substrates, but exhibits limitations in performance when compared with more resource-intensive machine learning algorithms. In this work, we address this challenge by investigating different architectures of interconnected reservoirs, all falling under the umbrella of deep RC (DRC). We propose a photonic-based deep reservoir computer and evaluate its effectiveness on different speech recognition tasks. We show specific design choices that aim to simplify the practical implementation of a reservoir computer while simultaneously achieving high-speed processing of high-dimensional audio signals. Overall, with the present work, we hope to help the advancement of low-power and high-performance neuromorphic hardware.

PaperID: 248,

Authors: Zhenyu Zhou, Lei Luo, Tianrui Liu, Qing Liao, Xinwang Liu, En Zhu

Affiliations: School of Computer Science, National University of Defense Technology, Changsha, China; School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China

Title: Category Alignment Mechanism for Few-Shot Image Classification

Abstract:
While humans can excel at image classification tasks by comparing a few images, existing metric-based few-shot classification methods are still not well adapted to novel tasks. Performance declines rapidly when encountering new patterns, as feature embeddings cannot effectively encode discriminative properties. Moreover, existing matching methods inadequately utilize support set samples, focusing only on comparing query samples to category prototypes without exploiting contrastive relationships across categories for discriminative features. In this work, we propose a method where query samples select their most category-representative features for matching, making feature embeddings adaptable and category-related. We introduce a category alignment mechanism (CAM) to align query image features with different categories. CAM ensures features chosen for matching are distinct and strongly correlated to intra- and inter-contrastive relationships within categories, making extracted features highly related to their respective categories. CAM is parameter-free, requires no extra training to adapt to new tasks, and adjusts features for matching when task categories change. We also implement a cross-validation-based feature selection technique for support samples, generating more discriminative category prototypes. We implement two versions of inductive and transductive inference and conduct extensive experiments on six datasets to demonstrate the effectiveness of our algorithm. The results indicate that our method consistently yields performance improvements on benchmark tasks and surpasses the current state-of-the-art methods.

PaperID: 249,

Authors: Yunjie Tian, Lingxi Xie, Jiemin Fang, Jianbin Jiao, Qixiang Ye, Qi Tian

Affiliations: School of Electronic, Electrical, and Communication Engineering, University of Chinese Academy of Sciences, Beijing, China; Huawei Inc., Shenzhen, China; Institute of Artificial Intelligence and the School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China

Title: Exploring Complicated Search Spaces With Interleaving-Free Sampling

Abstract:
Conventional neural architecture search (NAS) algorithms typically work on search spaces with short-distance node connections. We argue that such designs, though safe and stable, are obstacles to exploring more effective network architectures. In this brief, we explore the search algorithm upon a complicated search space with long-distance connections and show that existing weight-sharing search algorithms fail due to the existence of interleaved connections (ICs). Based on the observation, we present a simple-yet-effective algorithm, termed interleaving-free neural architecture search (IF-NAS). We further design a periodic sampling strategy to construct subnetworks during the search procedure, avoiding the ICs to emerge in any of them. In the proposed search space, IF-NAS outperforms both random sampling and previous weight-sharing search algorithms by significant margins. It can also be well-generalized to the microcell-based spaces. This study emphasizes the importance of macrostructure and we look forward to further efforts in this direction. The code is available at github.com/sunsmarterjie/IFNAS.

PaperID: 250,

Authors: Huan Tian, Bo Liu, Tianqing Zhu, Wanlei Zhou, Philip S. Yu

Affiliations: Centre for Cyber Security and Privacy and the School of Computer Science, University of Technology Sydney, Ultimo, NSW, Australia; Faculty of Data Science, City University of Macau, Taipa, Macau; Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA

Title: MultiFair: Model Fairness With Multiple Sensitive Attributes

Abstract:
While existing fairness interventions show promise in mitigating biased predictions, most studies concentrate on single-attribute protections. Although a few methods consider multiple attributes, they either require additional constraints or prediction heads, incurring high computational overhead or jeopardizing the stability of the training process. More critically, they consider per-attribute protection approaches, raising concerns about fairness gerrymandering where certain attribute combinations remain unfair. This work aims to construct a neutral domain containing fused information across all subgroups and attributes. It delivers fair predictions as the fused input contains neutralized information for all considered attributes. Specifically, we adopt mixup operations to generate samples with fused information. However, our experiments reveal that directly adopting the operations leads to degraded prediction results. The excessive mixup operations result in unrecognizable training data. To this end, we design three distinct mixup schemes that balance information fusion across attributes while retaining distinct visual features critical for training valid models. Extensive experiments with multiple datasets and up to eight sensitive attributes demonstrate that the proposed MultiFair method can deliver fairness protections for multiple attributes while maintaining valid prediction results.

PaperID: 251,

Authors: Peipei Yuan, Xinge You, Hong Chen, Yingjie Wang, Qinmu Peng, Bin Zou

Affiliations: School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China; College of Science and the Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, China; College of Control Science and Engineering, China University of Petroleum (East China), Qingdao, Shandong, China; Faculty of Mathematics and Statistics, Hubei Key Laboratory of Applied Mathematics, Hubei University, Wuhan, China

Title: Sparse Additive Machine With the Correntropy-Induced Loss

Abstract:
Sparse additive machines (SAMs) have shown competitive performance on variable selection and classification in high-dimensional data due to their representation flexibility and interpretability. However, the existing methods often employ the unbounded or nonsmooth functions as the surrogates of 0–1 classification loss, which may encounter the degraded performance for data with outliers. To alleviate this problem, we propose a robust classification method, named SAM with the correntropy-induced loss (CSAM), by integrating the correntropy-induced loss (C-loss), the data-dependent hypothesis space, and the weighted \ell _q,1 -norm regularizer ( q\geq 1 ) into additive machines. In theory, the generalization error bound is estimated via a novel error decomposition and the concentration estimation techniques, which shows that the convergence rate \mathcal O(n^-1/4) can be achieved under proper parameter conditions. In addition, the theoretical guarantee on variable selection consistency is analyzed. Experimental evaluations on both synthetic and real-world datasets consistently validate the effectiveness and robustness of the proposed approach.

PaperID: 252,

Authors: Jianxin Lin, Yongqiang Tang, Junping Wang, Wensheng Zhang

Affiliations: State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China

Title: Constrained Maximum Cross-Domain Likelihood for Domain Generalization

Abstract:
As a recent noticeable topic, domain generalization aims to learn a generalizable model on multiple source domains, which is expected to perform well on unseen test domains. Great efforts have been made to learn domain-invariant features by aligning distributions across domains. However, existing works are often designed based on some relaxed conditions which are generally hard to satisfy and fail to realize the desired joint distribution alignment. In this article, we propose a novel domain generalization method, which originates from an intuitive idea that a domain-invariant classifier can be learned by minimizing the Kullback–Leibler (KL)-divergence between posterior distributions from different domains. To enhance the generalizability of the learned classifier, we formalize the optimization objective as an expectation computed on the ground-truth marginal distribution. Nevertheless, it also presents two obvious deficiencies, one of which is the side-effect of entropy increase in KL-divergence and the other is the unavailability of ground-truth marginal distributions. For the former, we introduce a term named maximum in-domain likelihood to maintain the discrimination of the learned domain-invariant representation space. For the latter, we approximate the ground-truth marginal distribution with source domains under a reasonable convex hull assumption. Finally, a constrained maximum cross-domain likelihood (CMCL) optimization problem is deduced, by solving which the joint distributions are naturally aligned. An alternating optimization strategy is carefully designed to approximately solve this optimization problem. Extensive experiments on four standard benchmark datasets, i.e., Digits-DG, PACS, Office-Home, and miniDomainNet, highlight the superior performance of our method.

PaperID: 253,

Authors: Erdal Akin

Affiliations: Computer Engineering Department, Bitlis Eren University, Bitlis, Türkiye

Title: Deep Reinforcement Learning-Based Multirestricted Dynamic-Request Transportation Framework

Abstract:
Unmanned aerial vehicles (UAVs) are used in many areas where their usage is increasing constantly. Their popularity, therefore, maintains its importance in the technology world. Parallel to the development of technology, human standards, and surroundings should also improve equally. This study is developed based on the possibility of timely delivery of urgent medical requests in emergency situations. Using UAVs for delivering urgent medical requests will be very effective due to their flexible maneuverability and low costs. However, off-the-shelf UAVs suffer from limited payload capacity and battery constraints. In addition, urgent requests may be requested at an uncertain time, and delivering in a short time may be crucial. To address this issue, we proposed a novel framework that considers the limitations of the UAVs and dynamically requested packages. These previously unknown packages have source–destination pairs and delivery time intervals. Furthermore, we utilize deep reinforcement learning (DRL) algorithms, deep Q-network (DQN), proximal policy optimization (PPO), and advantage actor–critic (A2C) to overcome this unknown environment and requests. The comprehensive experimental results demonstrate that the PPO algorithm has a faster and more stable training performance than the other DRL algorithms in two different environmental setups. Also, we implemented an extension version of a Brute-force (BF) algorithm, assuming that all requests and environments are known in advance. The PPO algorithm performs very close to the success rate of the BF algorithm.

PaperID: 254,

Authors: Jie Lin, Ting-Zhu Huang, Xi-Le Zhao, Teng-Yu Ji, Qibin Zhao

Affiliations: School of Mathematics, Southwest Jiaotong University, Chengdu, Sichuan, China; School of Mathematical Sciences/Research Center for Image and Vision Computing, University of Electronic Science and Technology of China, Chengdu, Sichuan, China; School of Mathematical and Statistics, Northwestern Polytechnical University, Xi’an, Shaanxi, China; RIKEN Center for Advanced Intelligence Project, Tokyo, Japan

Title: Tensor Robust Kernel PCA for Multidimensional Data

Abstract:
Recently, the tensor nuclear norm (TNN)-based tensor robust principle component analysis (TRPCA) has achieved impressive performance in multidimensional data processing. The underlying assumption in TNN is the low-rankness of frontal slices of the tensor in the transformed domain (e.g., Fourier domain). However, the low-rankness assumption is usually violative for real-world multidimensional data (e.g., video and image) due to their intrinsically nonlinear structure. How to effectively and efficiently exploit the intrinsic structure of multidimensional data remains a challenge. In this article, we first suggest a kernelized TNN (KTNN) by leveraging the nonlinear kernel mapping in the transform domain, which faithfully captures the intrinsic structure (i.e., implicit low-rankness) of multidimensional data and is computed at a lower cost by introducing kernel trick. Armed with KTNN, we propose a tensor robust kernel PCA (TRKPCA) model for handling multidimensional data, which decomposes the observed tensor into an implicit low-rank component and a sparse component. To tackle the nonlinear and nonconvex model, we develop an efficient alternating direction method of multipliers (ADMM)-based algorithm. Extensive experiments on real-world applications collectively verify that TRKPCA achieves superiority over the state-of-the-art RPCA methods.

PaperID: 255,

Authors: Jiahao Qi, Zhiqiang Gong, Xingyue Liu, Chen Chen, Ping Zhong

Affiliations: National Key Laboratory of Science and Technology on Automatic Target Recognition, National University of Defense Technology, Changsha, China; National Innovation Institute of Defense Technology, Chinese Academy of Military Science, Beijing, China

Title: Masked Spatial-Spectral Autoencoders Are Excellent Hyperspectral Defenders

Abstract:
Deep learning (DL) methodology contributes a lot to the development of hyperspectral image (HSI) analysis community. However, it also makes HSI analysis systems vulnerable to adversarial attacks. To this end, we propose a masked spatial–spectral autoencoder (MSSA) in this article under self-supervised learning theory, for enhancing the robustness of HSI analysis systems. First, a masked sequence attention learning (MSAL) module is conducted to promote the inherent robustness of HSI analysis systems along spectral channel. Then, we develop a graph convolutional network (GCN) with learnable graph structure to establish global pixel-wise combinations. In this way, the attack effect would be dispersed by all the related pixels among each combination, and a better defense performance is achievable in spatial aspect. Finally, to improve the defense transferability and address the problem of limited labeled samples, MSSA employs spectra reconstruction as a pretext task and fits the datasets in a self-supervised manner. Comprehensive experiments over three benchmarks verify the effectiveness of MSSA in comparison with the state-of-the-art hyperspectral classification methods and representative adversarial defense strategies.

PaperID: 256,

Authors: Baoyao Yang, Pong C. Yuen, Yiqun Zhang, An Zeng

Affiliations: School of Computers, Guangdong University of Technology, Guangzhou, China; Department of Computer Science, Hong Kong Baptist University, Kowloon Tsai, Hong Kong

Title: Allosteric Feature Collaboration for Model-Heterogeneous Federated Learning

Abstract:
Although federated learning (FL) has achieved outstanding results in privacy-preserved distributed learning, the setting of model homogeneity among clients restricts its wide application in practice. This article investigates a more general case, namely, model-heterogeneous FL (M-hete FL), where client models are independently designed and can be structurally heterogeneous. M-hete FL faces new challenges in collaborative learning because the parameters of heterogeneous models could not be directly aggregated. In this article, we propose a novel allosteric feature collaboration (AlFeCo) method, which interchanges knowledge across clients and collaboratively updates heterogeneous models on the server. Specifically, an allosteric feature generator is developed to reveal task-relevant information from multiple client models. The revealed information is stored in the client-shared and client-specific codes. We exchange client-specific codes across clients to facilitate knowledge interchange and generate allosteric features that are dimensionally variable for model updates. To promote information communication between different clients, a dual-path (model–model and model–prediction) communication mechanism is designed to supervise the collaborative model updates using the allosteric features. Client models are fully communicated through the knowledge interchange between models and between models and predictions. We further provide theoretical evidence and convergence analysis to support the effectiveness of AlFeCo in M-hete FL. The experimental results show that the proposed AlFeCo method not only performs well on classical FL benchmarks but also is effective in model-heterogeneous federated antispoofing. Our codes are publicly available at https://github.com/ybaoyao/AlFeCo.

PaperID: 257,

Authors: Yupei Zhang, Yifei Wang, Yuxin Li, Yunan Xu, Shuangshuang Wei, Shuhui Liu, Xuequn Shang

Affiliations: School of Computer Science, Northwestern Polytechnical University, Xi’an, China

Title: Federated Discriminative Representation Learning for Image Classification

Abstract:
Acquiring big-size datasets to raise the performance of deep models has become one of the most critical problems in representation learning (RL) techniques, which is the core potential of the emerging paradigm of federated learning (FL). However, most current FL models concentrate on seeking an identical model for isolated clients and thus fail to make full use of the data specificity between clients. To enhance the classification performance of each client, this study introduces the FDRL, a federated discriminative RL model, by partitioning the data features of each client into a global subspace and a local subspace. More specifically, FDRL learns the global representation for federated communication between those isolated clients, which is to capture common features from all protected datasets via model sharing, and local representations for personalization in each client, which is to preserve specific features of clients via model differentiating. Toward this goal, FDRL in each client trains a shared submodel for federated communication and, meanwhile, a not-shared submodel for locality preservation, in which the two models partition client-feature space by maximizing their differences, followed by a linear model fed with combined features for image classification. The proposed model is implemented with neural networks and optimized in an iterative manner between the server of computing the global model and the clients of learning the local classifiers. Thanks to the powerful capability of local feature preservation, FDRL leads to more discriminative data representations than the compared FL models. Experimental results on public datasets demonstrate that our FDRL benefits from the subspace partition and achieves better performance on federated image classification than the state-of-the-art FL models.

PaperID: 258,

Authors: Yilin Shao, Long Sun, Licheng Jiao, Xu Liu, Fang Liu, Lingling Li, Shuyuan Yang

Affiliations: Key Laboratory of Intelligent Perception and Image Understanding, Ministry of Education of China, International Research Center of Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi’an, China

Title: CoT: Contourlet Transformer for Hierarchical Semantic Segmentation

Abstract:
The Transformer–convolutional neural network (CNN) hybrid learning approach is gaining traction for balancing deep and shallow image features for hierarchical semantic segmentation. However, they are still confronted with a contradiction between comprehensive semantic understanding and meticulous detail extraction. To solve this problem, this article proposes a novel Transformer–CNN hybrid hierarchical network, dubbed contourlet transformer (CoT). In the CoT framework, the semantic representation process of the Transformer is unavoidably peppered with sparsely distributed points that, while not desired, demand finer detail. Therefore, we design a deep detail representation (DDR) structure to investigate their fine-grained features. First, through contourlet transform (CT), we distill the high-frequency directional components from the raw image, yielding localized features that accommodate the inductive bias of CNN. Second, a CNN deep sparse learning (DSL) module takes them as input to represent the underlying detailed features. This memory- and energy-efficient learning method can keep the same sparse pattern between input and output. Finally, the decoder hierarchically fuses the detailed features with the semantic features via an image reconstruction-like fashion. Experiments demonstrate that CoT achieves competitive performance on three benchmark datasets: PASCAL Context [57.21% mean intersection over union (mIoU)], ADE20K (54.16% mIoU), and Cityscapes (84.23% mIoU). Furthermore, we conducted robustness studies to validate its resistance against various sorts of corruption. Our code is available at: https://github.com/yilinshao/CoT-Contourlet-Transformer.

PaperID: 259,

Authors: Lihao Liu, Angelica I. Avilés-Rivero, Carola-Bibiane Schönlieb

Affiliations: Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, U.K.

Title: Contrastive Registration for Unsupervised Medical Image Segmentation

Abstract:
Medical image segmentation is an important task in medical imaging, as it serves as the first step for clinical diagnosis and treatment planning. While major success has been reported using deep learning supervised techniques, they assume a large and well-representative labeled set. This is a strong assumption in the medical domain where annotations are expensive, time-consuming, and inherent to human bias. To address this problem, unsupervised segmentation techniques have been proposed in the literature. Yet, none of the existing unsupervised segmentation techniques reach accuracies that come even near to the state-of-the-art of supervised segmentation methods. In this work, we present a novel optimization model framed in a new convolutional neural network (CNN)-based contrastive registration architecture for unsupervised medical image segmentation called CLMorph. The core idea of our approach is to exploit image-level registration and feature-level contrastive learning, to perform registration-based segmentation. First, we propose an architecture to capture the image-to-image transformation mapping via registration for unsupervised medical image segmentation. Second, we embed a contrastive learning mechanism in the registration architecture to enhance the discriminative capacity of the network at the feature level. We show that our proposed CLMorph technique mitigates the major drawbacks of existing unsupervised techniques. We demonstrate, through numerical and visual experiments, that our technique substantially outperforms the current state-of-the-art unsupervised segmentation methods on two major medical image datasets.

PaperID: 260,

Authors: Francisco Munguia-Galeano, Ah-Hwee Tan, Ze Ji

Affiliations: School of Engineering, Cardiff University, Cardiff, U.K.; School of Computing and Information Systems, Singapore Management University, Bras Basah, Singapore

Title: Deep Reinforcement Learning With Explicit Context Representation

Abstract:
Though reinforcement learning (RL) has shown an outstanding capability for solving complex computational problems, most RL algorithms lack an explicit method that would allow learning from contextual information. On the other hand, humans often use context to identify patterns and relations among elements in the environment, along with how to avoid making wrong actions. However, what may seem like an obviously wrong decision from a human perspective could take hundreds of steps for an RL agent to learn to avoid. This article proposes a framework for discrete environments called Iota explicit context representation (IECR). The framework involves representing each state using contextual key frames (CKFs), which can then be used to extract a function that represents the affordances of the state; in addition, two loss functions are introduced with respect to the affordances of the state. The novelty of the IECR framework lies in its capacity to extract contextual information from the environment and learn from the CKFs’ representation. We validate the framework by developing four new algorithms that learn using context: Iota deep Q-network (IDQN), Iota double deep Q-network (IDDQN), Iota dueling deep Q-network (IDuDQN), and Iota dueling double deep Q-network (IDDDQN). Furthermore, we evaluate the framework and the new algorithms in five discrete environments. We show that all the algorithms, which use contextual information, converge in around 40000 training steps of the neural networks, significantly outperforming their state-of-the-art equivalents.

PaperID: 261,

Authors: Qiyuan Zhang, Shu Leng, Xiaoteng Ma, Qihan Liu, Xueqian Wang, Bin Liang, Yu Liu, Jun Yang

Affiliations: School of Mechatronics Engineering, Harbin Institute of Technology, Harbin, China; Department of Automation, Tsinghua University, Beijing, China

Title: CVaR-Constrained Policy Optimization for Safe Reinforcement Learning

Abstract:
Current constrained reinforcement learning (RL) methods guarantee constraint satisfaction only in expectation, which is inadequate for safety-critical decision problems. Since a constraint satisfied in expectation remains a high probability of exceeding the cost threshold, solving constrained RL problems with high probabilities of satisfaction is critical for RL safety. In this work, we consider the safety criterion as a constraint on the conditional value-at-risk (CVaR) of cumulative costs, and propose the CVaR-constrained policy optimization algorithm (CVaR-CPO) to maximize the expected return while ensuring agents pay attention to the upper tail of constraint costs. According to the bound on the CVaR-related performance between two policies, we first reformulate the CVaR-constrained problem in augmented state space using the state extension procedure and the trust-region method. CVaR-CPO then derives the optimal update policy by applying the Lagrangian method to the constrained optimization problem. In addition, CVaR-CPO utilizes the distribution of constraint costs to provide an efficient quantile-based estimation of the CVaR-related value function. We conduct experiments on constrained control tasks to show that the proposed method can produce behaviors that satisfy safety constraints, and achieve comparable performance to most safe RL (SRL) methods.

PaperID: 262,

Authors: Susanna Yu Gordleeva, Yuliya Tsybina, Mikhail Krivonosov, Ivan Yu. Tyukin, Victor B. Kazantsev, Alexey Zaikin, Alexander N. Gorban

Affiliations: Department of Neurotechnologies, Lobachevsky State University of Nizhny Novgorod, Nizhny Novgorod, Russia; Department of Mathematics, King’s College London, London, U.K

Title: Situation-Based Neuromorphic Memory in Spiking Neuron-Astrocyte Network

Abstract:
Mammalian brains operate in very special surroundings: to survive they have to react quickly and effectively to the pool of stimuli patterns previously recognized as danger. Many learning tasks often encountered by living organisms involve a specific set-up centered around a relatively small set of patterns presented in a particular environment. For example, at a party, people recognize friends immediately, without deep analysis, just by seeing a fragment of their clothes. This set-up with reduced “ontology” is referred to as a “situation.” Situations are usually local in space and time. In this work, we propose that neuron-astrocyte networks provide a network topology that is effectively adapted to accommodate situation-based memory. In order to illustrate this, we numerically simulate and analyze a well-established model of a neuron-astrocyte network, which is subjected to stimuli conforming to the situation-driven environment. Three pools of stimuli patterns are considered: external patterns, patterns from the situation associative pool regularly presented to the network and learned by the network, and patterns already learned and remembered by astrocytes. Patterns from the external world are added to and removed from the associative pool. Then, we show that astrocytes are structurally necessary for an effective function in such a learning and testing set-up. To demonstrate this we present a novel neuromorphic computational model for short-term memory implemented by a two-net spiking neural-astrocytic network. Our results show that such a system tested on synthesized data with selective astrocyte-induced modulation of neuronal activity provides an enhancement of retrieval quality in comparison to standard spiking neural networks trained via Hebbian plasticity only. We argue that the proposed set-up may offer a new way to analyze, model, and understand neuromorphic artificial intelligence systems.

PaperID: 263,

Authors: Pengpeng Zeng, Haonan Zhang, Lianli Gao, Xiangpeng Li, Jin Qian, Heng Tao Shen

Affiliations: Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Chengdu, China; Future Media Center and the School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China; School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China

Title: Visual Commonsense-Aware Representation Network for Video Captioning

Abstract:
Generating consecutive descriptions for videos, that is, video captioning, requires taking full advantage of visual representation along with the generation process. Existing video captioning methods focus on an exploration of spatial–temporal representations and their relationships to produce inferences. However, such methods only exploit the superficial association contained in a video itself without considering the intrinsic visual commonsense knowledge that exists in a video dataset, which may hinder their capabilities of knowledge cognitive to reason accurate descriptions. To address this problem, we propose a simple, yet effective method, called visual commonsense-aware representation network (VCRN), for video captioning. Specifically, we construct a Video Dictionary, a plug-and-play component, obtained by clustering all video features from the total dataset into multiple clustered centers without additional annotation. Each center implicitly represents a visual commonsense concept in a video domain, which is utilized in our proposed visual concept selection (VCS) component to obtain a video-related concept feature. Next, a concept-integrated generation (CIG) component is proposed to enhance caption generation. Extensive experiments on three public video captioning benchmarks: MSVD, MSR-VTT, and VATEX, demonstrate that our method achieves state-of-the-art performance, indicating the effectiveness of our method. In addition, our method is integrated into the existing method of video question answering (VideoQA) and improves this performance, which further demonstrates the generalization capability of our method. The source code has been released at https://github.com/zchoi/VCRN.

PaperID: 264,

Authors: Lingbo Liu, Mengmeng Liu, Guanbin Li, Ziyi Wu, Junfan Lin, Liang Lin

Affiliations: School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China

Title: Road Network-Guided Fine-Grained Urban Traffic Flow Inference

Abstract:
Accurate inference of fine-grained traffic flow from coarse-grained one is an emerging yet crucial problem, which can help greatly reduce the number of the required traffic monitoring sensors for cost savings. In this work, we note that traffic flow has a high correlation with road network, which was either completely ignored or simply treated as an external factor in previous works. To facilitate this problem, we propose a novel road-aware traffic flow magnifier (RATFM) that explicitly exploits the prior knowledge of road networks to fully learn the road-aware spatial distribution of fine-grained traffic flow. Specifically, a multidirectional 1-D convolutional layer is first introduced to extract the semantic feature of the road network. Subsequently, we incorporate the road network feature and coarse-grained flow feature to regularize the short-range spatial distribution modeling of road-relative traffic flow. Furthermore, we take the road network feature as a query to capture the long-range spatial distribution of traffic flow with a transformer architecture. Benefiting from the road-aware inference mechanism, our method can generate high-quality fine-grained traffic flow maps. Extensive experiments on three real-world datasets show that the proposed RATFM outperforms state-of-the-art models under various scenarios. Our code and datasets are released at https://github.com/luimoli/RATFM.

PaperID: 265,

Authors: Feng-Lei Fan, Mengzhou Li, Fei Wang, Rongjie Lai, Ge Wang

Affiliations: Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY, USA; AI-Based X-ray Imaging System (AXIS) Laboratory, Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY, USA; Department of Mathematics, Rensselaer Polytechnic Institute, Troy, NY, USA

Title: On Expressivity and Trainability of Quadratic Networks

Abstract:
Inspired by the diversity of biological neurons, quadratic artificial neurons can play an important role in deep learning models. The type of quadratic neurons of our interest replaces the inner-product operation in the conventional neuron with a quadratic function. Despite promising results so far achieved by networks of quadratic neurons, there are important issues not well addressed. Theoretically, the superior expressivity of a quadratic network over either a conventional network or a conventional network via quadratic activation is not fully elucidated, which makes the use of quadratic networks not well grounded. In practice, although a quadratic network can be trained via generic backpropagation, it can be subject to a higher risk of collapse than the conventional counterpart. To address these issues, we first apply the spline theory and a measure from algebraic geometry to give two theorems that demonstrate better model expressivity of a quadratic network than the conventional counterpart with or without quadratic activation. Then, we propose an effective training strategy referred to as referenced linear initialization (ReLinear) to stabilize the training process of a quadratic network, thereby unleashing the full potential in its associated machine learning tasks. Comprehensive experiments on popular datasets are performed to support our findings and confirm the performance of quadratic deep learning. We have shared our code in https://github.com/FengleiFan/ReLinear.

PaperID: 266,

Authors: Zheng Zhang, Qingrui Zhang, Bo Zhu, Xiaohan Wang, Tianjiang Hu

Affiliations: School of Aeronautics and Astronautics, Sun Yat-sen University, Shenzhen, China

Title: EASpace: Enhanced Action Space for Policy Transfer

Abstract:
Formulating expert policies as macro actions promises to alleviate the long-horizon issue via structured exploration and efficient credit assignment. However, traditional option-based multipolicy transfer methods suffer from inefficient exploration of macro action’s length and insufficient exploitation of useful long-duration macro actions. In this article, a novel algorithm named enhanced action space (EASpace) is proposed, which formulates macro actions in an alternative form to accelerate the learning process using multiple available suboptimal expert policies. Specifically, EASpace formulates each expert policy into multiple macro actions with different execution times. All the macro actions are then integrated into the primitive action space directly. An intrinsic reward, which is proportional to the execution time of macro actions, is introduced to encourage the exploitation of useful macro actions. The corresponding learning rule that is similar to intraoption Q-learning is employed to improve the data efficiency. Theoretical analysis is presented to show the convergence of the proposed learning rule. The efficiency of EASpace is illustrated by a grid-based game and a multiagent pursuit problem. The proposed algorithm is also implemented in physical systems to validate its effectiveness.

PaperID: 267,

Authors: Shiye Lei, Fengxiang He, Haowen Chen, Dacheng Tao

Affiliations: Sydney AI Centre and the School of Computer Science, Faculty of Engineering, The University of Sydney, Darlington, NSW, Australia; Artificial Intelligence and its Applications Institute, School of Informatics, University of Edinburgh, Edinburgh, U.K; Department of Mathematics, ETH Zürich, Zürich, Switzerland

Title: Attentive Learning Facilitates Generalization of Neural Networks

Abstract:
This article studies the generalization of neural networks (NNs) by examining how a network changes when trained on a training sample with or without out-of-distribution (OoD) examples. If the network’s predictions are less influenced by fitting OoD examples, then the network learns attentively from the clean training set. A new notion, dataset-distraction stability, is proposed to measure the influence. Extensive CIFAR-10/100 experiments on the different VGG, ResNet, WideResNet, ViT architectures, and optimizers show a negative correlation between the dataset-distraction stability and generalizability. With the distraction stability, we decompose the learning process on the training set \mathcal S into multiple learning processes on the subsets of \mathcal S drawn from simpler distributions, i.e., distributions of smaller intrinsic dimensions (IDs), and furthermore, a tighter generalization bound is derived. Through attentive learning, miraculous generalization in deep learning can be explained and novel algorithms can also be designed.

PaperID: 268,

Authors: Jie Lian, Lizhi Wang, He Sun, Hua Huang

Affiliations: School of Computer Science, Beijing Institute of Technology, Beijing, China; Key Laboratory of Computational Optical Imaging Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, Beijing Normal University, Beijing, China

Title: GT-HAD: Gated Transformer for Hyperspectral Anomaly Detection

Abstract:
Hyperspectral anomaly detection (HAD) aims to distinguish between the background and anomalies in a scene, which has been widely adopted in various applications. Deep neural network (DNN)-based methods have emerged as the predominant solution, wherein the standard paradigm is to discern the background and anomalies based on the error of self-supervised hyperspectral image (HSI) reconstruction. However, current DNN-based methods cannot guarantee correspondence between the background, anomalies, and reconstruction error, which limits the performance of HAD. In this article, we propose a novel gated transformer network for HAD (GT-HAD). Our key observation is that the spatial–spectral similarity in HSI can effectively distinguish between the background and anomalies, which aligns with the fundamental definition of HAD. Consequently, we develop GT-HAD to exploit the spatial–spectral similarity during HSI reconstruction. GT-HAD consists of two distinct branches that model the features of the background and anomalies, respectively, with content similarity as constraints. Furthermore, we introduce an adaptive gating unit to regulate the activation states of these two branches based on a content-matching method (CMM). Extensive experimental results demonstrate the superior performance of GT-HAD. The original code is publicly available at https://github.com/jeline0110/ GT-HAD, along with a comprehensive benchmark of state-of-the-art HAD methods.

PaperID: 269,

Authors: Yulu Fu, Yuting Li, Qiong Huang, Jinrong Cui, Jie Wen

Affiliations: College of Mathematics and Informatics, South China Agricultural University, Guangzhou, China; Shenzhen Key Laboratory of Visual Object Detection and Recognition, Harbin Institute of Technology, Shenzhen, Shenzhen, China

Title: Anchor Graph Network for Incomplete Multiview Clustering

Abstract:
Incomplete multiview clustering (IMVC) has received extensive attention in recent years. However, existing works still have several shortcomings: 1) some works ignore the correlation of sample pairs in the global structural distribution; 2) many methods are computational expensive, thus cannot be applicable to the large-scale incomplete data clustering tasks; and 3) some methods ignore the refinement of the bipartite graph structure. To address the above issues, we propose a novel anchor graph network for IMVC, which includes a generative model and a similarity metric network. Concretely, the method uses a generative model to construct bipartite graphs, which can mine latent global structure distributions of sample pairs. Later, we use graph convolution network (GCN) with the constructed bipartite graphs to learn the structural embeddings. Notably, the introduction of bipartite graphs can greatly reduce the computational complexity and thus enable our model to handle large-scale data. Unlike previous works based on bipartite graph, our method employs bipartite graphs to guide the learning process in GCNs. In addition, an innovative adaptive learning strategy that can construct robust bipartite graphs is incorporated into our method. Extensive experiments demonstrate that our method achieves the comparable or superior performance compared with the state-of-the-art methods.

PaperID: 270,

Authors: Lei Zhai, Yitong Li, Zhixi Feng, Shuyuan Yang, Hao Tan

Affiliations: School of Artificial Intelligence, Xidian University, Xi’an, China

Title: Learning Cross-Domain Features With Dual-Path Signal Transformer

Abstract:
The past decade has witnessed the rapid development of deep neural networks (DNNs) for automatic modulation classification (AMC). However, most of the available works learn signal features from only a single domain via DNNs, which is not reliable enough to work in uncertain and complex electromagnetic environments. In this brief, a new cross-domain signal transformer (CDSiT) is proposed for AMC, to explore the latent association between different domains of signals. By constructing a signal fusion bottleneck (SFB), CDSiT can implicitly fuse and classify signal features with complementary structures in different domains. Extensive experiments are performed on RadioML2016.10A and RadioML2018.01A, and the results show that CDSiT outperforms its counterparts, particularly for some modulation modes that are difficult to classify before. Through ablation experiences, we also verify the effectiveness of each module in CDSiT.

PaperID: 271,

Authors: Yu Jiang, Cong Hua, Yifan Feng, Yue Gao

Affiliations: College of Computer Science and Technology, Jilin University, Changchun, China; Key Laboratory for Information System Security, Ministry of Education, Beijing National Research Center for Information Science and Technology, School of Software, Tsinghua University, Beijing, China

Title: Hierarchical Set-to-Set Representation for 3-D Cross-Modal Retrieval

Abstract:
Three-dimensional in-domain retrieval has recently achieved significant success, but 3-D cross-modal retrieval still faces problems and challenges. Existing methods only rely on a simple global feature (GF), which overlooks the local information of complex 3-D objects and the connections between similar local features across complex multimodal instances. To tackle this issue, we propose a hierarchical set-to-set representation (HSR) and a corresponding hierarchical similarity that incorporates global-to-global and local-to-local similarity metrics. Specifically, we employ feature extractors for each modality to learn both GFs and local feature sets. We then project these features into their respective common space and use bilinear pooling to generate compact-set features that maintain the invariant for set-to-set similarity measurement. To facilitate effective hierarchical similarity measurement, we design an operation to combine the GF and the compact-set feature to generate the hierarchical representation for 3-D cross-modal retrieval, which preserves hierarchical similarity measurement. To optimize the framework, we adopt the joint loss functions, including cross-modal center loss (CMCL), mean square loss, and cross-entropy loss, to reduce the cross-modal discrepancy for each instance and minimize the distances between the instances in the same category. Experimental results demonstrate that our method outperforms the state-of-the-art methods on the 3-D cross-modal retrieval task on both ModelNet10 and ModelNet40 datasets.

PaperID: 272,

Authors: Ruiwen Yuan, Yongqiang Tang, Yajing Wu, Wensheng Zhang

Affiliations: School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China; State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China

Title: Clustering Enhanced Multiplex Graph Contrastive Representation Learning

Abstract:
Multiplex graph representation learning has attracted considerable attention due to its powerful capacity to depict multiple relation types between nodes. Previous methods generally learn representations of each relation-based subgraph and then aggregate them into final representations. Despite the enormous success, they commonly encounter two challenges: 1) the latent community structure is overlooked and 2) consistent and complementary information across relation types remains largely unexplored. To address these issues, we propose a clustering-enhanced multiplex graph contrastive representation learning model (CEMR). In CEMR, by formulating each relation type as a view, we propose a multiview graph clustering framework to discover the potential community structure, which promotes representations to incorporate global semantic correlations. Moreover, under the proposed multiview clustering framework, we develop cross-view contrastive learning and cross-view cosupervision modules to explore consistent and complementary information in different views, respectively. Specifically, the cross-view contrastive learning module equipped with a novel negative pairs selecting mechanism enables the view-specific representations to extract common knowledge across views. The cross-view cosupervision module exploits the high-confidence complementary information in one view to guide low-confidence clustering in other views by contrastive learning. Comprehensive experiments on four datasets confirm the superiority of our CEMR when compared to the state-of-the-art rivals.

PaperID: 273,

Authors: Wenqian Dong, Yihan Yang, Jiahui Qu, Yunsong Li, Yufei Yang, Xiuping Jia

Affiliations: State Key Laboratory of Integrated Service Network, Xidian University, Xi’an, China; School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China; School of Engineering and Information Technology, University of New South Wales, Canberra, ACT, Australia

Title: Feature Pyramid Fusion Network for Hyperspectral Pansharpening

Abstract:
Hyperspectral (HS) pansharpening aims at fusing an observed HS image with a panchromatic (PAN) image, to produce an image with the high spectral resolution of the former and the high spatial resolution of the latter. Most of the existing convolutional neural networks (CNNs)-based pansharpening methods reconstruct the desired high-resolution image from the encoded low-resolution (LR) representation. However, the encoded LR representation captures semantic information of the image and is inadequate in reconstructing fine details. How to effectively extract high-resolution and LR representations for high-resolution image reconstruction is the main objective of this article. In this article, we propose a feature pyramid fusion network (FPFNet) for pansharpening, which permits the network to extract multiresolution representations from PAN and HS images in two branches. The PAN branch starts from the high-resolution stream that maintains the spatial resolution of the PAN image and gradually adds LR streams in parallel. The structure of the HS branch remains highly consistent with that of the PAN branch, but starts with the LR stream and gradually adds high-resolution streams. The representations with corresponding resolutions of PAN and HS branches are fused and gradually upsampled in a coarse to fine manner to reconstruct the high-resolution HS image. Experimental results on three datasets demonstrate the significant superiority of the proposed FPFNet over the state-of-the-art methods in terms of both qualitative and quantitative comparisons.

PaperID: 274,

Authors: Like Xin, Wanqi Yang, Lei Wang, Ming Yang

Title: Selective Contrastive Learning for Unpaired Multi-View Clustering

Abstract:
In this article, we investigate a novel but insufficiently studied issue, unpaired multi-view clustering (UMC), where no paired observed samples exist in multi-view data, and the goal is to leverage the unpaired observed samples in all views for effective joint clustering. Existing methods in incomplete multi-view clustering usually utilize the sample pairing relationship between views to connect the views for joint clustering, but unfortunately, it is invalid for the UMC case. Therefore, we strive to mine a consistent cluster structure between views and propose an effective method, namely selective contrastive learning for UMC (scl-UMC), which needs to solve the following two challenging issues: 1) uncertain clustering structure under no supervision information and 2) uncertain pairing relationship between the clusters of views. Specifically, for the first one, we design an inner-view (IV) selective contrastive learning module to enhance the clustering structures and alleviate the uncertainty, which selects confident samples near the cluster centroids to perform contrastive learning in each view. For the second one, we design a cross-view (CV) selective contrastive learning module to first iteratively match the clusters between views and then tighten the matched clusters. Also, we utilize mutual information to further enhance the correlation of the matched clusters between views. Extensive experiments show the efficiency of our methods for UMC, compared with the state-of-the-art methods.

PaperID: 275,

Authors: Bohao Li, Chang Liu, Mengnan Shi, Xiaozhong Chen, Xiangyang Ji, Qixiang Ye

Affiliations: School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing, China; Department of Automation, Tsinghua University, Beijing, China

Title: Proposal Distribution Calibration for Few-Shot Object Detection

Abstract:
Adapting object detectors learned with sufficient supervision to novel classes under low data regimes is charming yet challenging. In few-shot object detection (FSOD), the two-step training paradigm is widely adopted to mitigate the severe sample imbalance, i.e., holistic pre-training on base classes, then partial fine-tuning in a balanced setting with all classes. Since unlabeled instances are suppressed as backgrounds in the base training phase, the learned region proposal network (RPN) is prone to produce biased proposals for novel instances, resulting in dramatic performance degradation. Unfortunately, the extreme data scarcity aggravates the proposal distribution bias, hindering the region of interest (RoI) head from evolving toward novel classes. In this brief, we introduce a simple yet effective proposal distribution calibration (PDC) approach to neatly enhance the localization and classification abilities of the RoI head by recycling its localization ability endowed in base training and enriching high-quality positive samples for semantic fine-tuning. Specifically, we sample proposals based on the base proposal statistics to calibrate the distribution bias and impose additional localization and classification losses upon the sampled proposals for fast expanding the base detector to novel classes. Experiments on the commonly used Pascal VOC and MS COCO datasets with explicit state-of-the-art performances justify the efficacy of our PDC for FSOD. Code is available at github.com/Bohao-Lee/PDC.

PaperID: 276,

Authors: Yu Duan, Feiping Nie, Huimin Chen, Zhanxuan Hu, Rong Wang, Xuelong Li

Affiliations: School of Telecommunications Engineering, Xidian University, Xi’an, China; School of Computer Science, the School of Artificial Intelligence, Optics and Electronics (iOPEN), and the Key Laboratory of Intelligent Interaction and Applications, Ministry of Industry and Information Technology, Northwestern Polytechnical University, Xi’an, Shaanxi, China; School of Information Science and Technology, Yunnan Normal University, Kunming, China; Institute of Artificial Intelligence (TeleAI), China Telecom, Beijing, China

Title: Hyperbolic Hierarchical Representation Learning for Generalized Category Discovery

Abstract:
This study addresses the problem of generalized category discovery (GCD), an advanced and challenging semi-supervised learning scenario that deals with unlabeled data from both known and novel categories. Although recent research has effectively engaged with this issue, these studies typically map features into Euclidean space, which fails to maintain the latent semantic hierarchy of the training samples effectively. This limitation restricts the exploration of more detailed and rich information and degrades the performance in discovering new categories. The emerging field of hyperbolic representation learning suggests that hyperbolic geometry could be advantageous for extracting semantic information to tackle this problem. Motivated by this, we proposed hyperbolic hierarchical representation learning for GCD (HypGCD). Specifically, HypGCD enhances representations in hyperbolic space, building upon the Euclidean space representation from two perspectives: instance-class level and instance-instance level. At the instance-class level, HypGCD endeavors to construct well-defined clusters, with each sample forming a robust hierarchical cluster structure. Concurrently, at the instance-instance level, HypGCD anticipates that a subset of samples will display a tree-like structure in local space, which aligns more closely with real-world scenarios. Finally, HypGCD optimizes the Euclidean and hyperbolic space collectively to obtain refined features. Additionally, we show that HypGCD is exceptionally effective, achieving state-of-the-art (SOTA) results on several datasets. The code is available at https://github.com/DuannYu/HypGCD

PaperID: 277,

Authors: Kayol Soares Mayer, Jonathan Aguiar Soares, Ariadne A. Cruz, Dalton Soares Arantes

Affiliations: Department of Communications, School of Electrical and Computer Engineering, Digital Communications Laboratory (ComLab), Universidade Estadual de Campinas (UNICAMP), Campinas, São Paulo, Brazil

Title: Adaptive Learning Rate Methods for Complex-Valued Neural Networks

Abstract:
Artificial neural networks (ANNs) have become a popular tool in digital signal processing (DSP). Among the widespread ANN architectures, complex-valued neural networks (CVNNs) have been extensively studied in image processing and telecommunications. Unlike their real-valued counterparts, CVNNs can handle signals directly in the complex domain. Due to this capability, CVNNs usually exhibit higher accuracy and improved convergence compared to real-valued neural networks (RVNNs). Despite their improved performance in several applications, CVNNs still lag behind RVNNs in terms of learning techniques and heuristics. In this context, we propose adaptive learning rate approaches for CVNNs, extending the well-known adaptive gradient (AdaGrad), root-mean-square propagation (RMSProp), AdaMax, AMSGrad, softplus AMSGrad (SAMSGrad), Nesterov-accelerated adaptive moment estimation (Nadam), and DiffGrad to the complex domain. Computational complexities of the proposed optimizers are analyzed for CVNN architectures. Results are compared in terms of mean-squared-error convergence.

PaperID: 278,

Authors: Yuchen Xiao, Lei Yuan, Lihe Li, Ziqian Zhang, Yi-Chen Li, Yang Yu

Affiliations: National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China

Title: Generalizable Offline Multiobjective Reinforcement Learning via Preference-Conditioned Diffuser

Abstract:
Multiobjective reinforcement learning (MORL) addresses sequential decision-making problems with multiple objectives by learning policies optimized for diverse pReferences. While traditional methods necessitate costly online interaction with the environment, recent approaches leverage static datasets containing precollected trajectories, making offline MORL the preferred choice for real-world applications. However, existing offline MORL techniques suffer from limited expressiveness and poor generalization on out-of-distribution (OOD) preferences. To overcome these limitations, we propose diffusion-based MORL (DiffMORL), a generalizable diffusion-based planning frame work for MORL. Leveraging the strong expressiveness and generation capability of diffusion models, DiffMORL further boosts its generalization through offline data mixup, which mitigates the memorization phenomenon and facilitates feature learning by data augmentation. By training on the augmented data, DiffMORL is able to condition on a given preference, whether in-distribution or OOD, to plan the desired trajectory and extract the corresponding action. Evaluations conducted on the datasets for MORL (D4MORL) benchmark demonstrate that DiffMORL achieves state-of-the-art results across nearly all tasks. Notably, it surpasses the best baseline on 14 out of 18 metrics for OOD generalization, underscoring its remarkable generalization ability in offline MORL scenarios.

PaperID: 279,

Authors: Martin Krutsky, Gustav Sír

Affiliations: Department of Computer Science, Faculty of Electrical Engineering, Czech Technical University, Prague, Czech Republic

Title: Geometric Deep Learning for the Rubik's Cube Group

Abstract:
The Rubik’s cube, a widely recognized combinatorial puzzle with an astronomically vast state space, has been the subject of various research experiments with neural networks used as heuristic estimators to navigate the state-space exploration. However, prior efforts have overlooked the intriguing symmetries inherent to this domain. Drawing on geometric deep learning principles, this article introduces a novel neural architecture that explicitly leverages these symmetries, grounded in a rigorous group-theoretical analysis. The design of the proposed symmetry-invariant model is then validated empirically through an innovative universal procedure for detecting model symmetry invariance. Finally, experimental results demonstrate that the symmetry-aware neural architecture exhibits enhanced generalization and problem-solving efficacy compared with the state of the art.

PaperID: 280,

Authors: Zechao Li, Hao Tang, Zhimao Peng, Guo-Jun Qi, Jinhui Tang

Affiliations: School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China; Laboratory for MAchine Perception and Learning (MAPLE Maple-lab.net), Orlando, FL, USA

Title: Knowledge-Guided Semantic Transfer Network for Few-Shot Image Recognition

Abstract:
Deep learning-based models have been shown to outperform human beings in many computer vision tasks with massive available labeled training data in learning. However, humans have an amazing ability to easily recognize images of novel categories by browsing only a few examples of these categories. In this case, few-shot learning comes into being to make machines learn from extremely limited labeled examples. One possible reason why human beings can well learn novel concepts quickly and efficiently is that they have sufficient visual and semantic prior knowledge. Toward this end, this work proposes a novel knowledge-guided semantic transfer network (KSTNet) for few-shot image recognition from a supplementary perspective by introducing auxiliary prior knowledge. The proposed network jointly incorporates vision inferring, knowledge transferring, and classifier learning into one unified framework for optimal compatibility. A category-guided visual learning module is developed in which a visual classifier is learned based on the feature extractor along with the cosine similarity and contrastive loss optimization. To fully explore prior knowledge of category correlations, a knowledge transfer network is then developed to propagate knowledge information among all categories to learn the semantic-visual mapping, thus inferring a knowledge-based classifier for novel categories from base categories. Finally, we design an adaptive fusion scheme to infer the desired classifiers by effectively integrating the above knowledge and visual information. Extensive experiments are conducted on two widely used Mini-ImageNet and Tiered-ImageNet benchmarks to validate the effectiveness of KSTNet. Compared with the state of the art, the results show that the proposed method achieves favorable performance with minimal bells and whistles, especially in the case of one-shot learning.

PaperID: 281,

Authors: Shifei Ding, Wei Du, Ling Ding, Jian Zhang, Lili Guo, Bo An

Affiliations: School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China; College of Intelligence and Computing, Tianjin University, Tianjin, China; School of Computer Science and Engineering, Nanyang Technological University, Singapore

Title: Multiagent Reinforcement Learning With Graphical Mutual Information Maximization

Abstract:
Communication learning is an important research direction in the multiagent reinforcement learning (MARL) domain. Graph neural networks (GNNs) can aggregate the information of neighbor nodes for representation learning. In recent years, several MARL methods leverage GNN to model information interactions between agents to coordinate actions and complete cooperative tasks. However, simply aggregating the information of neighboring agents through GNNs may not extract enough useful information, and the topological relationship information is ignored. To tackle this difficulty, we investigate how to efficiently extract and utilize the rich information of neighbor agents as much as possible in the graph structure, so as to obtain high-quality expressive feature representation to complete the cooperation task. To this end, we present a novel GNN-based MARL method with graphical mutual information (MI) maximization to maximize the correlation between input feature information of neighbor agents and output high-level hidden feature representations. The proposed method extends the traditional idea of MI optimization from graph domain to multiagent system, in which the MI is measured from two aspects: agent features information and agent topological relationships. The proposed method is agnostic to specific MARL methods and can be flexibly integrated with various value function decomposition methods. Considerable experiments on various benchmarks demonstrate that the performance of our proposed method is superior to the existing MARL methods.

PaperID: 282,

Authors: Yujiao Hu, Yuan Yao, Jinchao Chen, Zhihao Wang, Qingmin Jia, Yan Pan

Affiliations: School of Data Science and Artificial Intelligence, Chang’an University, Xi’an, China; School of Computer Science, Northwestern Polytechnical University, Xi’an, China; Future Network Research Center, Purple Mountain Laboratories, Nanjing, China; Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha, China

Title: Solving Scalable Multiagent Routing Problems With Reinforcement Learning

Abstract:
Multiagent routing problems, arising from practical applications, such as logistics, transportation, and emergency response, face challenges due to the exponential growth of the search space with increasing problem scales. This article proposes RouteMaker to address the often-overlooked multiagent routing problems involving dedicated multiple depots. RouteMaker leverages role-interaction-based graph neural network (RIGNN) to realize effective locations assignments and integrates an advanced planner to plan travel path for each agent. RouteMaker is trained on small-scale problems and can produce comparable or superior approximate optimal solutions compared with the best heuristic baselines. Notably, the learned RouteMaker generalizes seamlessly to large-scale problems and real-world problems without the need for fine-tuning, delivering significantly higher quality solutions in relatively less time. For scenarios involving 40 agents and 1000 locations, RouteMaker achieves over 600× speed improvement and more than 88% cost reduction, compared with the representative classical heuristic solver (ORTools).

PaperID: 283,

Authors: Haodong Zhang, Tao Ren, Yifan Wang, Fanchun Meng, Wei Ju, Ying Tian

Affiliations: Software College, Northeastern University, Shenyang, China; School of Information Technology and Management, University of International Business and Economics, Beijing, China; College of Computer Science, Sichuan University, Chengdu, China; Department of Otorhinolaryngology, First Affiliated Hospital of China Medical University, Shenyang, China

Title: Cluster-Aware Few-Shot Molecular Property Prediction With Factor Disentanglement

Abstract:
Molecular property prediction plays a crucial role in drug discovery, but is always challenged by the limited number of effective labels. Compared with existing methods, we argue that the auxiliary properties of the molecule and the heterogeneous structure of different property prediction tasks have always been ignored. In this article, we propose a novel framework termed Meta-DREAM for few-shot molecular property prediction, which tailors to learning the transferable knowledge within different clusters of tasks. Specifically, we first construct a heterogeneous molecule relation graph (HMRG) with molecule–property and molecule–molecule relations to utilize many-to-many correlations between properties and molecules. The meta-learning episode can, then, be reformulated as a subgraph of HMRG. Next, we propose a disentangled graph encoder to explicitly discriminate the underlying factors of the task. In addition, we introduce a soft clustering module to group each factorized task representation into appropriate clusters and preserve knowledge generalization within a cluster and customization among clusters. In this way, each disentangled factor serves as a cluster-aware parameter gate for the task-specific meta-learner. Extensive experiments on five commonly used molecular datasets show that Meta-DREAM consistently outperforms existing state-of-the-art methods and verifies the effectiveness of each module.

PaperID: 284,

Authors: Xiaoyan Kui, Haonan Yan, Qinsong Li, Min Zhang, Liming Chen, Beiji Zou

Affiliations: School of Computer Science and Engineering, Central South University, Changsha, China; Big Data Institute, Central South University, Changsha, China; Department of Mathematics and Computer Science, École, Lyon, France

Title: ChebMixer: Efficient Graph Representation Learning With MLP Mixer

Abstract:
Graph neural networks (GNNs) have achieved remarkable success in learning graph representations, especially graph Transformers, which have recently shown superior performance on various graph mining tasks. However, the graph Transformer generally treats nodes as tokens, which results in quadratic complexity regarding the number of nodes during self-attention computation. The graph multilayer perceptron (MLP) mixer addresses this challenge using the efficient MLP Mixer technique from computer vision. However, the time-consuming process of extracting graph tokens limits its performance. In this article, we present a novel architecture named ChebMixer, a newly proposed graph MLP Mixer that uses fast Chebyshev polynomials-based spectral filtering to extract a sequence of tokens. First, we produce multiscale representations of graph nodes via fast Chebyshev polynomial-based spectral filtering. Next, we consider each node’s multiscale representations as a sequence of tokens and refine the node representation with an effective MLP Mixer. Finally, we aggregate the multiscale representations of nodes through Chebyshev interpolation. Owing to the powerful representation capabilities and fast computational properties of the MLP Mixer, we can quickly extract more informative node representations to improve the performance of downstream tasks. The experimental results prove our significant improvements in various scenarios, ranging from homogeneous and heterophilic graph node classification to medical image segmentation. Compared with NAGphormer, the average performance improved by 1.45% on homogeneous graphs and 4.15% on heterophilic graphs. And the average performance improved by 1.39% on medical image segmentation tasks compared with VM-UNet. We will release the source code after this article is accepted.

PaperID: 285,

Authors: Yuanchao Su, Lianru Gao, Antonio Plaza, Xu Sun, Mengying Jiang, Guang Yang

Affiliations: College of Geomatics, Xi’an University of Science and Technology, Xi’an, China; Key Laboratory of Computational Optical Imaging Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; Department of Technology of Computers and Communications, Escuela Politécnica, Hyperspectral Computing Laboratory, University of Extremadura, Cáceres, Spain; Department of Computer and Information Science, University of Macau, Macau, China; Bioengineering/Imperial-X, Imperial College London, London, U.K.

Title: SRViT: Self-Supervised Relation-Aware Vision Transformer for Hyperspectral Unmixing

Abstract:
Vision transformer (ViT) has recently been a popular topic in the foundation model field, taking advantage of its strong scalability and outstanding representation capabilities. As a deep model, ViT introduces a new architecture for achieving hyperspectral image (HSI) unmixing. However, traditional ViTs overlook pixel-level spatial continuity by partitioning the input image into nonoverlapping fixed-size patches. This approach disrupts local structural relationships and hinders the model’s ability to capture fine-grained spatial dependencies, resulting in suboptimal feature representation for dense prediction tasks in unmixing. To address these challenges, this article proposes the development of a self-supervised relation-aware ViT (SRViT). SRViT incorporates a self-embedded module comprising encoders, a pixel-level position encoder (PLPE), a self-supervised contrastive mechanism (SCM), and a decoder. The self-embedded module and PLPE preserve local correlations in HSI across different views, facilitating cross-view learning through SCM to ensure generalization. In addition, the decoder incorporates Kronecker-factored approximate curvature (K-FAC) to capture the local geometric structure of spectral information. Ultimately, SRViT learns endmembers and fractional abundance as the unmixing result. The effectiveness and competitiveness of SRViT have been systematically validated through comparative experiments, demonstrating its superior performance. The source code is available at the following link: https://github.com/yuanchaosu/TNNLS-SRViT

PaperID: 286,

Authors: Lin Zhao, Xiao Chen, Eric Z. Chen, Yikang Liu, Terrence Chen, Shanhui Sun

Affiliations: United Imaging Intelligence, Boston, MA, USA

Title: Retrieval-Augmented Few-Shot Medical Image Segmentation With Foundation Models

Abstract:
Medical image segmentation is crucial for clinical decision-making, but the scarcity of annotated data presents significant challenges. Few-shot segmentation (FSS) methods show promise but often require training on the target domain and struggle to generalize across different modalities. Similarly, adapting foundation models such as the segment anything model (SAM) for medical imaging has limitations, including the need for fine-tuning and domain-specific adaptation. To address these issues, we propose a novel method that adapts DINOv2 and SAM 2 for retrieval-augmented few-shot medical image segmentation. Our approach uses DINOv2’s feature as query to retrieve similar samples from limited annotated data, which are then encoded as memories and stored in memory bank. With the memory attention mechanism of SAM 2, the model leverages these memories as conditions to generate accurate segmentation of the target image. We evaluated our framework on three medical image segmentation tasks, demonstrating superior performance and generalizability across various modalities without the need for any retraining or fine-tuning. Overall, this method offers a practical and effective solution for few-shot medical image segmentation and holds significant potential as a valuable annotation tool in clinical applications.

PaperID: 287,

Authors: Wanyi Ning, Jingyu Wang, Qi Qi, Haifeng Sun, Daixuan Cheng, Cong Liu, Lei Zhang, Zirui Zhuang, Jianxin Liao

Affiliations: State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China; China Mobile Research Institute, Beijing, China; China United Network Communications Corporation, Beijing, China

Title: Federated Fine-Tuning on Heterogeneous LoRAs With Error-Compensated Aggregation

Abstract:
Federated learning (FL) has recently been applied to the parameter-efficient fine-tuning (PEFT) of large language models (LLMs). While promising, client resource heterogeneity has imposed the challenge of the “bucket effect” to FL, where model configuration must cater to the client with the fewest resources. To tackle this issue, heterogeneous low-rank adaptation (LoRA) has recently emerged in FL, which enables clients to do local fine-tuning with different LoRA ranks. However, existing works in this area typically adopt zero-padding, stacking, or singular value decomposition (SVD) for LoRA aggregation, which often incur precision loss or significant overhead, limiting their practicality. In this article, we propose ECLoRA, a novel method for federated fine-tuning with heterogeneous LoRA settings across clients. ECLoRA employs randomized SVD (RSVD) to dramatically reduce aggregation overhead while introducing an error compensation (EC) mechanism that incorporates the decomposition error from previous rounds to improve aggregation precision. Extensive experiments on four widely used foundation models across six public tasks demonstrate the effectiveness of ECLoRA. Specifically, ECLoRA is: (1) accurate, significantly improving the final model performance; (2) fast, accelerating convergence with an average speedup of 1.54× to 3.01× ; and (3) practical, reducing aggregation time by approximately 40× compared to classical SVD.

PaperID: 288,

Authors: Ye Wang, Jingjing Wang, Ruijie Zhu, Hang Fu, Jianrui Chen, C. L. Philip Chen

Affiliations: School of Cyber Science and Technology, Beihang University, Beijing, China; School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing, China; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

Title: Facilitating Multiagent Coordination Relying on Graph Information Representation

Abstract:
The popular multiagent reinforcement learning (MARL) methods primarily focus on exploring the capability of value functions to facilitate multiagent coordination. These MARL methods, following the centralized training with decentralized execution (CTDE) paradigm, tend to design ingenious network architectures while overlooking the impact of coordination through expanding local observation information. To tackle this deficiency, we model the multiagent systems (MASs) as a graph and use a graph neural network (GNN) to extract rich information between one agent and the others efficiently. Moreover, we propose a multigraph-neural-network information representation (MGIR) method that uses the power of GNN in local observation information extraction, enabling the acquisition of higher quality information. Specifically, multiple GNNs are used during centralized training to characterize the MAS from different perspectives and extract representations of latent variables. During distributed execution, these latent variables are leveraged to expand local observation information. Extensive comparative experiments substantiate that our proposed MGIR demonstrates superior coordination performance when compared with baseline methods. In addition, it can be flexibly integrated into various value function decomposition methods of MARL.

PaperID: 289,

Authors: Changxin Zhang, Xinglong Zhang, Yixing Lan, Hao Gao, Xin Xu

Affiliations: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China

Title: Game-Theoretic Constrained Policy Optimization for Safe Reinforcement Learning

Abstract:
Safe reinforcement learning (RL) aims to optimize the task performance with safety guarantees. One common modeling scheme to study safe RL problems is the constrained Markov decision process (CMDP). However, current safe RL methods within the CMDP framework face challenges in tradeoffs among various objectives and gradient conflicts of policy updating. To cope with these challenges, this article presents a novel safe RL approach called game-theoretic constrained policy optimization (GCPO). The proposed approach formulates the CMDP problem as a general-sum Markov game with multiple players, where a task player seeks to maximize the reward objective, while constraint players aim to minimize constraint objectives until they are fulfilled. By doing so, GCPO adopts the learning mode with multiple subpolicies, each aligned with a distinct objective, that collectively constitute the overall behavior of the agent. The learning convergence of the GCPO can be ensured with the contraction mapping to the Nash equilibrium. Furthermore, a novel dominant timescale update rule is presented for multiplayer policy learning to guarantee constraint satisfaction. The learning convergence and constraint satisfaction of GCPO are theoretically analyzed. Consequently, GCPO eliminates the necessity of tuning tradeoff parameters and mitigates gradient conflicts during multiobjective policy updating. Experimental results show that GCPO outperforms state-of-the-art safe RL algorithms in a quadrotor trajectory tracking task and various high-dimensional robot locomotion benchmarks. Moreover, GCPO exhibits robustness to diverse scales of task rewards and constraint costs without the need for intricate tradeoffs.

PaperID: 290,

Authors: Jie Chen, Hua Mao, Yuanbiao Gou, Xi Peng

Affiliations: College of Computer Science, Sichuan University, Chengdu, China; Department of Computer and Information Sciences, Northumbria University, Newcastle, U.K.

Title: Hierarchical Sparse Representation Clustering for High-Dimensional Data Streams

Abstract:
Data stream clustering reveals patterns within continuously arriving, potentially unbounded data sequences. Numerous data stream algorithms have been proposed to cluster data streams. The existing data stream clustering algorithms still face significant challenges when addressing high-dimensional data streams. First, it is intractable to measure the similarities among high-dimensional data objects via Euclidean distances when constructing and merging microclusters. Second, these algorithms are highly sensitive to the noise contained in high-dimensional data streams. In this article, we propose a hierarchical sparse representation clustering (HSRC) framework for clustering high-dimensional data streams. HSRC first employs a sparse representation-based technique to learn an affinity matrix for data objects in individual landmark windows with a fixed size, where the number of neighboring data objects is automatically selected. The sparse representation-based technique ensures that highly correlated data samples within clusters are grouped together. Then, HSRC applies a spectral clustering technique to the affinity matrix to generate microclusters. These microclusters are subsequently merged into macroclusters based on their sparse similarity degrees (SSDs). In addition, HSRC introduces sparsity residual values (SRVs) to adaptively select representative data objects from the current landmark window. These representatives serve as dictionary samples for the next landmark window. Finally, HSRC refines each macrocluster through fine-tuning. In particular, HSRC enables the detection of outliers in high-dimensional data streams via the associated SRVs. The experimental results obtained on several benchmark datasets demonstrate the effectiveness and robustness of the proposed HSRC framework.

PaperID: 291,

Authors: Lei Deng, Xiao-Yang Liu, Danny H. K. Tsang

Affiliations: Internet of Things Thrust, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, China; Department of Electrical Engineering, Columbia University, New York, NY, USA

Title: Real-Time Network Latency Estimation With Pretrained Generative Models

Abstract:
Network latency estimation is critical for network performance monitoring and management. However, with the escalating demand for real-time performance monitoring and rapid network adjustments in contemporary networks, existing latency estimation methodologies fall short of meeting the need for instantaneous estimation. In this article, we propose a pretrained generative model-based scheme (PGM) for real-time network latency estimation. PGM operates in two stages. First, we employ a pretrained generative model to relax the low-rank constraint typically associated with latency matrix completion (MC). The pretrained generative model well learns the low-rank characteristics of latency matrices in the pretraining stage and can map a condensed latent representation to the matrix space. Second, instead of directly optimizing the matrix, we turn to optimizing the latent representation. Leveraging the low-rank structure achieved by the pretrained generative model simplifies our optimization process, enabling real-time estimation. We also provide a theoretical recovery guarantee to reveal the error bound of PGM. Experimental results on real-world datasets show that the proposed scheme can achieve accurate latency estimation within 50 ms while maintaining the relative squared error (RSE) of estimation at no more than 0.11 (as evidenced using the PlanetLab dataset).

PaperID: 292,

Authors: Wen Zhang, Jiangpeng Zhao, Lean Yu, Song Wang

Affiliations: College of Economics and Management, Beijing University of Technology, Beijing, China; Business School, Sichuan University, Chengdu, China; Lassonde School of Engineering, York University, Toronto, ON, Canada

Title: GMM Enhanced Anchor-Based Spectral Clustering for Large-Scale Data

Abstract:
Anchor-based methods are proposed to make use of anchors to produce an affinity matrix of objects to improve the scalability of traditional spectral clustering (SC). Nevertheless, the membership heterogeneity of objects inside a cluster, which would bring about low quality of anchors and hurt the clustering accuracy, is commonly neglected by existing anchor-based algorithms. To address this problem, this article proposes a novel approach to adopt the Gaussian mixture model (GMM) to enhance anchor-based SC for large-scale data in a two-stage divide-and-conquer manner. In the first stage, GMM with expectation maximization (EM) algorithm is employed to divide the objects into two categories as prior-consistent objects and prior-uncertain objects in considering the membership heterogeneity of objects. In the second stage, anchor-based SC is conducted on the prior-uncertain objects by sampling the anchors from the Gaussian components derived from the first stage. Then, the produced clusters in the second stage are aligned with those Gaussian components by maximizing the membership of objects with respect to clusters. The computation complexity of the proposed GMM-SC approach is much smaller than that of the anchor-based SC. The experiments on large-scale datasets also validate the superiority of the proposed GMM-SC approach over state-of-the-art techniques.

PaperID: 293,

Authors: Simin Li, Ruixiao Xu, Jingqiao Xiu, Yuwei Zheng, Pu Feng, Yuqing Ma, Bo An, Yaodong Yang, Xianglong Liu

Affiliations: State Key Laboratory of Complex and Critical Software Environment, Beihang University, Beijing, China; State Key Laboratory of Software Development Environment, Beihang University, Beijing, China; School of Computing, National University of Singapore, Singapore; College of Computing and Data Science, Nanyang Technological University, Singapore; Institute of Artificial Intelligence, Peking University, Beijing, China

Title: Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization

Abstract:
In cooperative multi-agent reinforcement learning (MARL), ensuring robustness against cooperative agents making unpredictable or worst-case adversarial actions is crucial for real-world deployment. In multi-agent settings, each agent may be perturbed or unperturbed, leading to an exponential increase in potential threat scenarios as the number of agents grows. Existing robust MARL methods either enumerate, or approximate all possible threat scenarios, leading to intense computation and insufficient robustness. In contrast, humans develop robust behaviors by maintaining a general level of caution rather than preparing for every possible threat. Inspired by human decision making, we frame robust MARL as a control-as-inference problem, and optimize worst-case robustness across all threat scenarios implicitly optimized through off-policy evaluation. Specifically, we introduce mutual information regularization as robust regularization (MIR3), which maximizes a lower bound on robustness during routine training, serving as a kind of caution for MARL without adversarial inputs. Further insights show that MIR3 acts as an information bottleneck, preventing agents from over-reacting to others and aligning policies with robust action priors. In the presence of worst-case adversaries, our MIR3 significantly surpasses baseline methods in robustness and training efficiency, and maintaining cooperative performance in StarCraft II, quadrotor swarm control, and robot swarm control. When deploying the robot swarm control algorithm in the real world, our method also outperforms the best baseline by 14.29% in reward. See code and demo videos at https://github.com/DIG-Beihang/MIR3

PaperID: 294,

Authors: Wentao Li, Lingwei Wei, Witold Pedrycz, Weiping Ding, Chao Zhang, Tao Zhan, Shuyin Xia

Affiliations: College of Artificial Intelligence, Southwest University, Chongqing, China; Department of Measurement and Control Systems, Silesian University of Technology (SUT), Gliwice, Poland; School of Artificial Intelligence and Computer Science, Nantong University, Nantong, China; School of Computer and Information Technology, Shanxi University, Taiyuan, China; School of Mathematics and Statistics, Southwest University, Chongqing, China; Key Laboratory of Cyberspace Big Data Intelligent Security Ministry of Education, Chongqing University of Posts and Telecommunications, Chongqing, China

Title: Granular-Ball Regeneration Clustering With Principle of Justifiable Granularity

Abstract:
Classical clustering algorithms such as k-means face limitations in handling clusters with heterogeneous shapes, densities, and sizes, while exhibiting sensitivity to initial centroid selection. To overcome these challenges, this article proposes a novel clustering framework based on regenerated granular ball (RGGB) with the principle of justifiable granularity. Unlike existing granular-ball (GB) techniques that overemphasize purity criteria at the expense of uncontrolled ball sizes, RGGB dynamically adjusts granularity levels through iterative regeneration, achieving an optimal balance between detailed data representation and computational efficiency. This adaptability enhances stability in capturing data similarities while mitigating sensitivity to initialization. To validate the method, we integrate RGGB with a novel k-nearest neighbor (KNN) classifier using regenerated GBs to evaluate classification performance and demonstrate practical applications. Experiments on diverse public and realistic datasets demonstrate that the RGGB-based KNN algorithm consistently outperforms existing techniques, including traditional KNN and other methods, making a promising advancement in clustering and classification tasks.

PaperID: 295,

Authors: Zhen Peng, Yunfan Wang, Qika Lin, Bin Shi, Chen Chen, Bo Dong, Chao Shen

Affiliations: School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China; Department of Computer Science, University of Virginia, Charlottesville, VA, USA; Saw Swee Hock School of Public Health, National University of Singapore, Science Drive , Singapore; School of Distance Education, Xi’an Jiaotong University, Xi’an, China; School of Cyber Science and Engineering, Xi’an Jiaotong University, Xi’an, China

Title: End-to-End Abnormal Subgraph Detection via Subgraph-Level Contrastive Learning

Abstract:
Abnormal subgraph (AS) detection plays a significant role in ensuring the security of many high-impact domains. Unlike node anomaly detection, identifying subgraph anomalies is extremely challenging due to the exponentially large subgraph space caused by various combinations of nodes and edges. Moreover, in the absence of supervisory signals, how to quantify the abnormality of subgraphs poses another pressing challenge. Traditional methods typically rely on handcrafted subgraph anomaly measures, making it hard to handle potential unknown anomalies with limited prior knowledge. Recent deep learning-based techniques are predominantly designed to discover individual node anomalies, which could be suboptimal for AS detection due to the inconsideration of collaborative behaviors between nodes in the subgraph. In fact, existing studies have put very little effort into this task, and even dedicated performance evaluation metrics are not yet available. To address the above challenges and promote related research, in this article, we propose a end-to-end unsupervised subgraph anomaly detection framework (EndSubG), which jointly models subgraph partition and AS detection as a whole instead of treating them as two separate stages. Specifically, EndSubG uncovers potential AS boundaries that violate the Homophily assumption by modeling the edge existence probability, then achieves anomaly-aware graph embedding and subgraph partition based on the refined topology. By forming a coarsened subgraph network, EndSubG picks out subgraph anomalies by learning the “subgraph-vicinity” matching patterns. Additionally, we design an evaluation metric weighted normalized mutual information centered on AS (AS-WNMI) specifically for subgraph anomaly detection, which is a variant of vanilla NMI and quantifies detection performance from both subgraph partition and anomaly recognition. The experimental results on synthetic and real-world datasets corroborate the superiority of end-to-end unsupervised subgraph anomaly detection framework (EndSubG) in terms of area under the curve (AUC), average precision (AP), and AS-WNMI. We also provide an intuitive analysis of the detected subgraphs through visualization for better understanding.

PaperID: 296,

Authors: Hong Liu, Xiuxiu Qiu, Yiming Shi, Miao Xu, Zelin Zang, Zhen Lei

Affiliations: School of Information and Electrical Engineering, Academy of Edge Intelligence, Hangzhou City University, Hangzhou, China; College of Information Engineering, Zhejiang University of Technology, Hangzhou, China; Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China; Centre for Artificial Intelligence and Robotics (CAIR), HKISI-CAS, Hong Kong, China; School of Engineering, Westlake University, Hangzhou, China

Title: Deep Multimanifold Transformation-Based Multivariate Time Series Fault Detection

Abstract:
Unsupervised fault detection in multivariate time series (MTS) plays a vital role in ensuring the stable operation of complex systems. Traditional methods often assume that normal data follow a single Gaussian distribution and identify anomalies as deviations from this distribution. However, this simplified assumption fails to capture the diversity and structural complexity of real-world time series, which can lead to misjudgments and reduced detection performance in practical applications. To address this issue, we propose a new method that combines a neighborhood-driven data augmentation strategy with a multimanifold representation learning framework. By incorporating information from local neighborhoods, the augmentation module can simulate contextual variations of normal data, enhancing the model’s adaptability to distributional changes. In addition, we design a structure-aware feature learning approach that encourages natural clustering of similar patterns in the feature space while maintaining sufficient distinction between different operational states. Extensive experiments on several public benchmark datasets demonstrate that our method achieves superior performance in terms of both accuracy and robustness, showing strong potential for generalization and real-world deployment.

PaperID: 297,

Authors: Heng Zhou, Zhenxi Zhang, Chengyang Li, Chunna Tian, Yongqiang Xie, Zhongbo Li, Xiao-Jun Wu

Affiliations: School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China; School of Electronic Engineering, Xidian University, Xi’an, China; College of Artificial Intelligence, China University of Petroleum (Beijing), Beijing, China; Institute of Systems Engineering, AMS, Beijing, China

Title: Deformation-Resilient Multigranularity Learning for Unaligned RGB-T Semantic Segmentation

Abstract:
RGB–Thermal semantic segmentation (SS) aims to combine visual light and thermal images to determine the semantic category for each pixel and create an object mask. While existing methods typically rely on well-aligned RGB–T image pairs, real-world RGB–T pairs are often unaligned, and pixel-by-pixel alignment is both challenging and time-consuming. To address this critical issue, we introduce a new unaligned RGB–T SS benchmark and propose the deformation-resilient multigranularity learning (DML) method. DML explores the spatial consistency and modal complementarity of RGB–T and mitigates the interference of warped modalities by aligning multimodal features in a coarse-to-fine multigranularity strategy. Specifically, DML constructs a deformation-aware complementary feature enhancer (DCFE), which consists of deformation-aware feature alignment (DFA) and complementary feature aggregation (CFA) modules. DFA enhances the spatial alignment of RGB–T by estimating the deformation field of warped features. Then, CFA aggregates complementary contexts of modal differences across multiple scales to produce deformation-resilient and robust RGB–T feature representations. Finally, we design the multigranularity mask refinement engine (MMFE), which combines class-agnostic saliency prediction (CSP) and class-aware edge generation (CEG) auxiliary tasks to provide useful boundary and positional cues for SS decoders. The MMFE enhances semantic alignment and interclass separability, yielding object masks with sharp boundaries. Quantitative and qualitative experiments on aligned and unaligned datasets validate the effectiveness of our proposed DML, consistently outperforming existing methods designed for aligned RGB–T data. The new unaligned RGB–T SS benchmark and code are available at https://github.com/VisionVerse/Unaligned-RGBT-Semantic-Segmentation

PaperID: 298,

Authors: Haoen Huang, Zhigang Zeng, Jun Wang

Affiliations: School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China; Department of Computer Science and the Department of Data Science, City University of Hong Kong, Kowloon Tong, Hong Kong

Title: A Trust-Region Projection Neural Network for Nonlinear Programming

Abstract:
The trust-region method and projection neural networks are two branches of optimization approaches with different operational principles and characteristics. In this article, a trust-region projection neural network (TRPNN) is proposed by integrating the trust-region method and projection neural networks. TRPNN is a discrete-time neurodynamic optimization model that inherits the exploration–exploitation capability of the trust-region method and the local search capability of projection neural networks. TRPNN is theoretically proven to be convergent to a Karush–Kuhn–Tuchker (KKT) point of nonlinear programming problems. The efficacy of TRPNNs leveraged in a collaborative neurodynamic framework is numerically demonstrated for global optimization in the presence of nonconvexity in objective functions or constraints.

PaperID: 299,

Authors: Hengrong Ju, Yang Lu, Weiping Ding, Wei Zhang, Xibei Yang

Affiliations: School of Artificial Intelligence and Computer Science, Nantong University, Nantong, China; School of Computer, Jiangsu University of Science and Technology, Zhenjiang, China

Title: Multigranularity Information Fused Contrastive Learning With Multiview Clustering

Abstract:
Contrastive multiview clustering (MVC) has emerged as a mainstream approach in MVC due to its superior representation learning capabilities. Traditional contrastive multiview learning methods extract both low- and high-level information from raw data. However, only high-level information is utilized for clustering. Since both types of information are essential for effective clustering, this limitation hampers performance. Moreover, effectively quantifying the importance of different views remains a critical challenge in contrastive MVC. Additionally, the absence of structural information during clustering further weakens clustering performance. To address these issues, this article proposes a multigranularity (MG) information fused contrastive learning with MVC (MGCMVC). Inspired by the concept of MG, low- and high-level features are reconstructed into fine- and coarse-granularity features. First, an MG adaptive weighting sample-level contrastive learning mechanism is introduced to fuse MG features to enhance clustering performance and mitigate clustering performance degradation caused by variations in view quality. Second, a structure-oriented cluster-level contrastive learning approach is designed to preserve structural information and enforce cross-view clustering consistency. Extensive and comprehensive experiments on ten widely used datasets demonstrate that MGCMVC achieves the state-of-the-art performance. The source code is available at https://github.com/Luyangabc/MGCMVC

PaperID: 300,

Authors: Wuque Cai, Hongze Sun, Qianqian Liao, Jiayi He, Duo Chen, Dezhong Yao, Daqing Guo

Affiliations: Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for NeuroInformation, China–Cuba Belt and Joint Laboratory on Neurotechnology and Brain-Apparatus Communication, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China; School of Artificial Intelligence, Chongqing University of Education, Chongqing, China

Title: Robust Spatiotemporal Prototype Learning for Spiking Neural Networks

Abstract:
Spiking neural networks (SNNs) leverage their spike-driven nature to achieve high energy efficiency, positioning them as a promising alternative to traditional artificial neural networks (ANNs). The spiking decoder, a crucial component for output, significantly affects the performance of SNNs. However, current rate coding schemes for decoding of SNNs often lack robustness and do not have a training framework suitable for robust learning, while alternatives to rate coding generally produce worse overall performance. To address these challenges, we propose spatiotemporal prototype (STP) learning for SNNs, which uses multiple learnable binarized prototypes for distance-based decoding. In addition, we introduce a cotraining framework that jointly optimizes prototypes and model parameters, enabling mutual adaptation of the two components. STP learning clusters feature centers through supervised learning to ensure effective aggregation around the prototypes, while maintaining enough spacing between prototypes to handle noise and interference. This dual capability results in superior stability and robustness. On eight benchmark datasets with diverse challenges, the STP-SNN model achieves performance comparable to or superior to state-of-the-art methods. Notably, STP learning demonstrates exceptional robustness and stability in multitask experiments. Overall, these findings reveal that STP learning is an effective means of improving the performance and robustness of SNNs.

PaperID: 301,

Authors: Hongdou Yao, Pengfei Han, Jun Chen, Zheng Wang, Yansheng Qiu, Xiao Wang, Yimin wang, Xiaoyu Chai, Chenglong Cao, Wei Jin

Affiliations: College of Electronic Engineering, National University of Defense Technology, Hefei, China; School of Cybersecurity and the School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi’an, Shaanxi, China; National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China; School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China; School of Information Engineering, Zhongnan University of Economics and Law, Wuhan, China; Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, China

Title: MonOri: Orientation-Guided PnP for Monocular 3-D Object Detection

Abstract:
Monocular 3-D object detection is a challenging task in the field of autonomous driving and has made great progress. However, current monocular image methods tend to incorporate additional information such as pseudolabels to improve algorithm performance while overlooking the geometric relationship between the object’s keypoints, resulting in low performance for occluded object detection. To address this issue, we find that introducing the orientation information of objects in the 3-D detection pipeline can help improve the detection performance of occluded objects. An orientation-guided perspective-n-point (PnP) for monocular 3-D object detection method named MonOri is presented in this article, which uses object’s orientation to guide keypoints’ optimization. Considering the existence of different deformation objects in the scene, we design the feature aggregation detection module (FADM), which consists of the feature focus fusion module (FFFM) and CondConv detection module (CCDM). First, FFFM can highlight signals from irregularly occluded objects, effectively modeling features of elongated and small-sized objects. This module enhances the model’s ability to recognize elongated and small-sized objects in complex scenes. Then, the CCDM is designed to improve the network’s ability to estimate object keypoints’ location regression under occlusion conditions and minimize the network computational overhead. Finally, considering that the unoccluded portions of occluded objects are closely related to the orientation of the objects, an orientation-guided keypoints’ selection module (OGKSM) is proposed to enhance the accuracy of objected optimization for keypoint positions and spatial location inference of the object. Experimental results indicate that the MonOri method achieves competitive results; it is also demonstrated that the orientation information is introduced in the PnP algorithm to estimate the object’s spatial position that can mitigate the impact of occlusion on object detection, thus improving the recognition rate of occluded objects. Our code is available at https://github.com/DL-YHD/MonOri

PaperID: 302,

Authors: Qing Tian, Jiangsen Yu, Yi Zhao, Wen Li, Zhen Lei

Affiliations: School of Software, Nanjing University of Information Science and Technology, Nanjing, China; School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China; National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, China

Title: Evidential Deep Learning for Open-Set Active Domain Adaptation

Abstract:
Open-set domain adaptation (OSDA) seeks to transfer knowledge from a labeled source domain to an unlabeled target domain containing novel classes. Traditional OSDA methods rarely account for the uncertainty in predictions and typically require additional training overhead. Evidential deep learning (EDL) transforms the model’s predictions from point estimates to distributions over the probability simplex by replacing the standard softmax output of classification neural networks with Dirichlet distributions. Considering the presence of out-of-distribution novel classes in OSDA and the additional overhead of existing methods, we propose EDL for open-set active domain adaptation (EOSADA). Leveraging EDL, we construct an open-set classifier and employ a two-round selection strategy guided by the data uncertainty of target domain samples and semantic similarity scores with known classes. This strategy balances the selection of samples from known and novel classes while identifying informative samples, thereby maximizing the performance of the model in OSDA scenarios without modifying the model structure and utilizing a limited annotation budget. Extensive experiments demonstrate the superiority of our approach.

PaperID: 303,

Authors: Ge Zhang, Zhenyu Yang, Jia Wu, Pengfei Jiao, Jian Yang

Affiliations: School of Computer Science and Technology, Donghua University, Shanghai, China; School of Computing, Macquarie University, Sydney, NSW, Australia; School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China

Title: Enhancing Graph Neural Networks for Out-of-Distribution Graph Detection

Abstract:
Graph neural networks (GNNs) have shown promise in graph classification tasks, but they struggle to identify out-of-distribution (OOD) graphs often encountered in real-world scenarios, posing a significant obstacle for their open-world deployment. Due to the unpredictable nature of the various distributions to which OOD graphs adhere, the challenge of OOD graph detection lies in enabling models to capture distribution differences between in-distribution (ID) and OOD graphs. Current methods often introduce a subset of OOD patterns, such as synthetic OOD graphs, to facilitate learning the discrimination between ID and OOD graphs. However, these OOD patterns may not sufficiently encapsulate the entire range of OOD graphs, leading to inadequate learning of the distribution differences between ID and OOD graphs. In this article, we propose a novel OOD graph detection algorithm, ODGNN. The ODGNN does not expose GNNs to any OOD patterns during model training, thus reducing bias toward specific types of OOD graph samples and enhancing OOD graph detection. The algorithm differentiates graphs by evaluating whether the input graphs conform to established ID graph class-conditioned distributions. Specifically, during model training, the ODGNN integrates a Gaussian encoder into GNNs to characterize ID graph classes using distinct class-conditioned distributions. During inference, OOD graphs are mapped to a representation space distant from ID graphs due to their divergence from any known class-conditioned distribution. Extensive experiments conducted on real-world datasets validate the effectiveness of the ODGNN in enhancing OOD detection performance across various GNN-based graph classification models. The ODGNN also demonstrates superior performance compared to state-of-the-art OOD graph detection competitors.

PaperID: 304,

Authors: Da Ding, Youquan Wang, Haicheng Tao, Jia Wu, Jie Cao

Affiliations: College of Computer Science and Software Engineering, Hohai University, Nanjing, Jiangsu, China; School of Computer and Artificial Intelligence, Nanjing University of Finance and Economics, Nanjing, Jiangsu, China; School of Computing, Macquarie University, Sydney, NSW, Australia; School of Management, Hefei University of Technology, Hefei, Anhui, China

Title: A Dual-Discriminator Generative Adversarial Network for Anomaly Detection

Abstract:
Multivariate time series anomaly detection has shown potential in various fields, such as finance, aerospace, and security. The fuzzy definition of data anomalies, the complexity of data patterns, and the scarcity of abnormal data samples pose significant challenges to anomaly detection. Researchers have extensively employed autoencoders (AEs) and generative adversarial networks (GANs) in studying time series anomaly detection methods. However, relying on reconstruction error, the AE-based anomaly detection algorithm needs more effective regularization methods, rendering it susceptible to the problem of overfitting. Meanwhile, GAN-based anomaly detection algorithms require high-quality training data, significantly impacting their practical deployment. We propose a novel GAN based on a dual-discriminator structure to address these issues. The model first processes the data with the generator to obtain the reconstruction error and then calculates pseudo-labels to divide the data into two categories. One data category is input into the first discriminator, where a minor loss between the data and its reconstructed counterpart is better. The other data category is input into the second discriminator, where a larger loss between the data and its reconstructed counterpart is better. Through this process, the model can effectively constrain the generator, retaining information on normal data during data reconstruction while discarding information on abnormal data. After conducting experiments on multiple benchmark datasets, the proposed GAN based on a dual-discriminator structure achieved good results in anomaly detection, outperforming several advanced methods. Additionally, the model also performed well in practical transformer data.

PaperID: 305,

Authors: Runze Yang, Longbing Cao, Jianxun Li, Jie Yang

Affiliations: Department of Automation, Shanghai Jiao Tong University, Shanghai, China; School of Computing, Macquarie University, Sydney, NSW, Australia

Title: Variational Hierarchical N-BEATS Model for Long-Term Time-Series Forecasting

Abstract:
Long-term time-series forecasting (LTSF) is gaining increasing attention due to its significant challenges and real-world applications. However, existing studies underexplore the role of hierarchical timestamp information in LTSF. We find this information crucial, as neglecting it may lead to the loss of broader perspectives necessary for understanding hierarchical effects, such as weekly and yearly patterns. Therefore, we propose VH-NBEATS, an interpretable variational hierarchical model that extends the N-BEATS architecture to address the challenges outlined above. VH-NBEATS consists of two blocks: the hierarchical timestamp block and the harmonic seasonal block, which are designed to capture hierarchical seasonal and trending effects. To tackle the high variability often observed in time series, VH-NBEATS incorporates a variational autoencoder (VAE), significantly enhancing the standard deterministic approach. The experimental results are evaluated on seven real-world datasets, demonstrating state-of-the-art (SOTA) performance for LTSF. We also prove that the hierarchical timestamp block can enable plug-and-play with any methods, such as PatchTST, Informer, and DLinear, for better performance.

PaperID: 306,

Authors: Qian Zhao, Gang Li, Bin He, Runjie Shen

Affiliations: Shanghai Research Institute for Intelligent Autonomous Systems and the National Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University, Shanghai, China; College of Electronics and Information Engineering, Tongji University, Shanghai, China

Title: Deep Learning for Low-Light Vision: A Comprehensive Survey

Abstract:
Visual recognition in low-light environments is a challenging problem since degraded images are the stacking of multiple degradations (noise, low light and blur, etc.). It has received extensive attention from academia and industry in the era of deep learning. Existing surveys focus on low-light image enhancement (LLIE) methods and normal-light visual recognition methods, while few comprehensive surveys of low-light-related vision tasks. This article provides a comprehensive survey of the latest advancements in low-light vision, including methods, datasets, and evaluation metrics, in two aspects: visual quality-driven and recognition quality-driven. On the visual quality-driven aspect, we survey a large number of very recent LLIE methods. On the recognition quality-driven aspect, we survey low-light object detection techniques in the deep learning era using more intuitive categorization method. Furthermore, a quantitative benchmarking of different methods is conducted on several widely adopted low-light vision-related datasets. Finally, we discuss the challenges that exist in low-light vision and future directions worth exploring. We provide a public website that will continue to track developments in this promising field.

PaperID: 307,

Authors: Jianan Li, Huan Chen, Wangcai Zhao, Rui Chen, Tingfa Xu

Affiliations: Beijing Institute of Technology, Beijing, China; Shenzhen Haixing Intelligent Driving Technology, Shenzhen, China

Title: Mixed-Granularity Implicit Representation for Continuous Hyperspectral Compressive Reconstruction

Abstract:
Hyperspectral images (HSIs) are crucial across numerous fields but are hindered by the long acquisition times associated with traditional spectrometers. The coded aperture snapshot spectral imaging (CASSI) system mitigates this issue through a compression technique that accelerates the acquisition process. However, reconstructing HSIs from compressed data presents challenges due to fixed spatial and spectral resolution constraints. This study introduces a novel method using implicit neural representation (INR) for continuous HSI reconstruction. We propose the mixed-granularity implicit representation (MGIR) framework, which includes a hierarchical spectral–spatial implicit encoder (HSSIE) for efficient multiscale implicit feature extraction. This is complemented by a mixed-granularity local feature aggregator (MGLFA) that adaptively integrates local features across scales, combined with a decoder that merges coordinate information for precise reconstruction. By leveraging INRs, the MGIR framework enables reconstruction at any desired spatial–spectral resolution, significantly enhancing the flexibility and adaptability of the CASSI system. Extensive experimental evaluations confirm that our model produces reconstructed images at arbitrary resolutions and matches the state-of-the-art methods across varying spectral–spatial compression ratios (CRs). The code will be released at https://github.com/chh11/MGIR.

PaperID: 308,

Authors: Xing Wei, Zhaoxin Ji, Fan Yang, Chong Zhao, Bin Wen, Yang Lu

Affiliations: School of Computer and Information, Hefei University of Technology, Hefei, China

Title: Unsupervised Domain Adaptation via Bidirectional Transmission Generator Self-Training

Abstract:
Unsupervised domain adaptation (UDA) aims to transfer knowledge from the labeled source domain to the fully unlabeled target domain, thus improving the classification performance of the target domain. Recently, self-training methods have shown their effectiveness on UDA. It iteratively trains target data using the generated target pseudo-labels. However, the feature space for generating pseudo-labels contains a large amount of source information, which traps the model in the source domain, making it challenging for the generator to learn discriminative features of the target domain. In this article, we propose a self-training domain adaptation (DA) model with bidirectional transmission generators (BDTGs). Specifically, we design a bidirectional transmission structure for generators, using exponential moving average (EMA) as the bridge between two generators. The structure has two advantages: 1) by transmitting weight parameters to each other during the training process, it promotes the shift of the feature space, thereby alleviating the difficulty of the model in adapting target domain features and 2) the transmission disturbs the classification boundary and is able to expose unreliable target samples near the boundary. We design a cosine similarity-based filter to identify such samples, to reduce the influence of noisy pseudo-labels with incorrect semantic information on the model. Extensive experiments conducted on five benchmark UDA datasets show that our approach has superior classification performance.

PaperID: 309,

Authors: Zhewei Zhang, Yujun Cheng, Junyu Shen, Xuejing Li, Shengjin Wang

Affiliations: Department of Electronic Engineering, Tsinghua University, Beijing, China; School of Intelligence Science and Technology (Institute of Artificial Intelligence), University of Science and Technology Beijing, Beijing, China; School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China; School of Safety and Emergency Management Engineering, Taiyuan University of Technology, Taiyuan, China

Title: Probably Approximately Correct Bayes Meta-Learning With Parameterized-Bounded Guarantees

Abstract:
In meta-learning, the learner extracts knowledge from the observed tasks and quickly adapts to unseen future tasks. We provide a novel and rigorous-analyzed probably approximately correct Bayes (PAC-Bayes) meta-learning method with parameterized bounds, which learns a posterior distribution from given priors and the data samples. The proposed method is designed to improve generalization stabilities with tighter bound guarantees. We prove that the proposed PAC-Bayes bound of the meta-learner is tighter than previous work under a given condition in a rigorous theoretical way. An explicit theoretical analysis of the generalization errors is also given based on the proposed meta-learning method. Using the proposed bound in our work, we deduce an optimal objective function of the meta-learner that should be minimized during the meta-training process. We validate our theoretical hypothesis by conducting synthetic and real-world environments for meta-learning. Both rigorous proofs and experimental results reveal that our method yields state-of-the-art performances under a variety of meta-learning tasks in terms of accuracy and uncertainty robustness.

PaperID: 310,

Authors: Jing-Cheng Pang, Tian Xu, Shengyi Jiang, Yu-Ren Liu, Yang Yu

Affiliations: National Key Laboratory for Novel Software Technology and the School of Artificial Intelligence, Nanjing University, Nanjing, China; University of Hong Kong, Hong Kong, China

Title: Reinforcement Learning With Sparse-Executing Action via Sparsity Regularization

Abstract:
Reinforcement learning (RL) has demonstrated impressive performance in decision-making tasks like embodied control, autonomous driving, and financial trading. In many decision-making tasks, the agents often encounter the problem of executing actions under limited budgets. However, classic RL methods typically overlook the challenges posed by such sparse-executing actions. They operate under the assumption that all actions can be taken for an unlimited number of times, both in the formulation of the problem and in the development of effective algorithms. To tackle the issue of limited action execution in RL, this article first formalizes the problem as a sparse action Markov decision process (SA-MDP), in which specific actions in the action space can only be executed for a limited time. Then, we propose a policy optimization algorithm, Action Sparsity REgularization (ASRE), which adaptively handles each action with a distinct preference. ASRE operates through two steps. First, ASRE evaluates action sparsity by constrained action sampling. Following this, ASRE incorporates the sparsity evaluation into policy learning by way of an action distribution regularization. We provide theoretical identification that validates the convergence of ASRE to a regularized optimal value function. Experiments on tasks with known sparse-executing actions, where classical RL algorithms struggle to train policy efficiently, show that ASRE effectively constrains the action sampling and outperforms baselines. Moreover, we present that ASRE can generally improve the performance in Atari games, demonstrating its broad applicability.

PaperID: 311,

Authors: Yabin Zhang, Jiehong Lin, Ruihuang Li, Kui Jia, Lei Zhang

Affiliations: Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong; School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China

Title: Point-DAE: Denoising Autoencoders for Self-Supervised Point Cloud Learning

Abstract:
Masked autoencoder (MAE) has demonstrated its effectiveness in self-supervised point cloud learning. Considering that masking is a kind of corruption, in this work we explore a more general denoising autoencoder for point cloud learning (Point-DAE) by investigating more types of corruptions beyond masking. Specifically, we degrade the point cloud with certain corruptions as input, and learn an encoder-decoder model to reconstruct the original point cloud from its corrupted version. Three corruption families (i.e., density/masking, noise, and affine transformation) and a total of 14 corruption types are investigated with traditional non-Transformer encoders. Besides the popular masking corruption, we identify another effective corruption family, i.e., affine transformation. The affine transformation disturbs all points globally, which is complementary to the masking corruption where some local regions are dropped. We also validate the effectiveness of affine transformation corruption with the Transformer backbones, where we decompose the reconstruction of the complete point cloud into the reconstructions of detailed local patches and rough global shape, alleviating the position leakage problem in the reconstruction. Extensive experiments on tasks of object classification, few-shot learning, robustness testing, part segmentation, and 3-D object detection validate the effectiveness of the proposed method. The codes are available at https://github.com/YBZh/Point-DAE

PaperID: 312,

Authors: Yiming Yang, Weipeng Hu, Qiaolin He, Haifeng Hu

Affiliations: School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, China; School of Electrical and Electronic Engineering (EEE), Nanyang Technological University, Jurong West, Singapore

Title: Dynamic Modality-Camera-Invariant Clustering for Unsupervised Visible-Infrared Person Re-Identification

Abstract:
Unsupervised learning visible–infrared person re-identification (USL-VI-ReID) offers a more flexible and cost-effective alternative compared to supervised methods. This field has gained increasing attention due to its promising potential. Existing methods simply cluster modality-specific samples and employ strong association techniques to achieve instance-to-cluster or cluster-to-cluster cross-modality associations. However, they ignore cross-camera differences, leading to noticeable issues with excessive splitting of identities. Consequently, this undermines the accuracy and reliability of cross-modal associations. To address these issues, we propose a novel dynamic modality–camera-invariant clustering (DMIC) framework for USL-VI-ReID. Specifically, our DMIC naturally integrates modality–camera-invariant expansion (MIE), dynamic neighborhood clustering (DNC), and hybrid modality contrastive learning (HMCL) into a unified framework, which eliminates both the cross-modality and cross-camera discrepancies in clustering. MIE fuses intermodal and intercamera distance coding to bridge the gaps between modalities and cameras at the clustering level. DNC employs two dynamic search strategies to refine the network’s optimization objective, transitioning from improving discriminability to enhancing cross-modal and cross-camera generalizability. Moreover, HMCL is designed to optimize instance- and cluster-level distributions. Memories for intramodality and intermodality training are updated using randomly selected samples, facilitating real-time exploration of modality-invariant representations. Extensive experiments have demonstrated that our DMIC addresses the limitations present in current clustering approaches and achieves competitive performance, which significantly reduces the performance gap with supervised methods.

PaperID: 313,

Authors: Wuhao Wang, Zhiyong Chen, Lepeng Zhang

Affiliations: School of Engineering, The University of Newcastle, Callaghan, NSW, Australia; Department of Computer and Information Science, Linköping University, Linköping, Sweden

Title: Multistate Temporal Difference Target for Model-Free Reinforcement Learning

Abstract:
Temporal difference (TD) learning is a fundamental technique in reinforcement learning that updates value function estimates for states or state-action pairs using a TD target. This target represents an improved estimate of the true value by incorporating both immediate rewards and the estimated value of subsequent states. We propose an enhanced multistate TD (MSTD) target that utilizes multiple subsequent states for a more accurate value function estimation compared to traditional TD learning, which relies on a single subsequent state. Building on this new MSTD concept, we develop actor-critic algorithms that include the management of replay buffers in two modes and integrate with deep deterministic policy optimization (DDPG) and soft actor-critic (SAC). Numerical experiment results demonstrate that algorithms employing the MSTD target improve learning performance compared to traditional methods. In addition, we analyze the convergence of Q-learning with MSTD.

PaperID: 314,

Authors: Zixin Zhang, Fan Qi, Changsheng Xu

Affiliations: School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China; State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China

Title: Knowledge-Guided Label Distribution Calibration for Federated Affective Computing

Abstract:
The federated learning (FL) paradigm can significantly solve the rising public concern about data privacy in affective computing. However, conventional FL methods perform poorly due to the uniqueness of the task, as the personalized emotion data vary from client to client. To resolve the privacy-utility paradox, this work proposes a framework that largely improves federated affective computing (FAC) via calibrating the global feature space and communicating privacy-agnostic auxiliary information. The framework consists of two components: first, an emotion hemisphere (EH) representation structure is proposed, which utilizes emotional prior knowledge to unify the emotion global feature space of different clients. Second, the server uses the normalized parameter importance matrix to guide the model aggregation. It retains crucial parameters for individual local models, thereby alleviating the slow convergence problem in the global model caused by the skewed label distribution. The proposed framework yields significant performance gains, and extensive experiments on three emotion datasets demonstrate the effectiveness and the practicality of our approach.

PaperID: 315,

Authors: Hui Chen, Hengyu Liu, Zhangkai Wu, Xuhui Fan, Longbing Cao

Affiliations: School of Computing, Macquarie University, Sydney, Australia; Department of Computer Science, Aalborg University, Aalborg, Denmark; Faculty of Engineering and Information Technology,, University of Technology Sydney, Sydney, Australia

Title: FedSI: Federated Subnetwork Inference for Efficient Uncertainty Quantification

Abstract:
While deep neural networks (DNNs)-based personalized federated learning (PFL) is demanding for addressing data heterogeneity and shows promising performance, existing methods for federated learning (FL) suffer from efficient systematic uncertainty quantification. The Bayesian DNNs-based PFL is usually questioned of either oversimplified model structures or high computational and memory costs. In this article, we introduce FedSI, a novel Bayesian DNNs-based subnetwork inference (SI) PFL framework. FedSI is simple and scalable by leveraging Bayesian methods to incorporate systematic uncertainties effectively. It implements a client-specific SI mechanism, selects network parameters with large variance to be inferred through posterior distributions, and fixes the rest as deterministic ones. FedSI achieves fast and scalable inference while preserving the systematic uncertainties to the fullest extent. Extensive experiments on four different benchmark datasets demonstrate that FedSI outperforms existing Bayesian and non-Bayesian FL baselines in heterogeneous FL scenarios.

PaperID: 316,

Authors: Wenjue He, Zheng Zhang, Xiaofeng Zhu

Affiliations: School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China; School of Computer Science and Technology, Hainan University, Haikou, Hainan, China

Title: Dual-Correlation-Guided Anchor Learning for Scalable Incomplete Multi-View Clustering

Abstract:
Efficiently learning informative yet compact representations from heterogeneous data remains challenging in incomplete multi-view clustering (IMC). The prevalent resource-efficient IMC models excel in constructing small-size anchors for fast similarity learning and data partition. However, existing anchor-based methods still suffer from shared deficiencies: 1) unstable and less informative anchor generation by random anchor selection or clueless learning and 2) imbalanced coherence and versatility capabilities of the learned anchors among different views. To mitigate these issues, we propose a novel dual-correlation-guided anchor learning (DCGA) method for scalable IMC, which learns informative anchor spaces to simultaneously incorporate both intra-view and inter-view correlations. Specifically, the intra-view anchor space is constructed and stabilized by compressing the view-specific data under the guidance of the conceived anchors as a bottleneck (A3B) strategy, with a strict theoretic analysis. Importantly, we, for the first time, build an unsupervised anchor learning scheme for incomplete multi-view data under the guidance of the bottleneck of information flow with the well-defined IB principle. As such, our model can simultaneously eliminate information redundancy and preserve the versatile knowledge derived from each view. Moreover, to endow the coherence of the learned anchors, an informative anchor constraint (IAC) is imposed to align the anchor spaces across different views. Extensive experiments on seven datasets against 11 state-of-the-art IMC methods validate the effectiveness and efficiency of our method. Code is available at https://github.com/DarrenZZhang/TNNLS25-DCGA

PaperID: 317,

Authors: Liang-Bo Ning, Zuo-Wei Zhang, Weiping Ding, Dian Shao, Yining Zhu

Affiliations: School of Automation, Northwestern Polytechnical University, Xi’an, China; School of Artificial Intelligence and Computer Science, Nantong University, Nantong, China; Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an, China; Computer Science Department, Northwestern Polytechnical University, Xi’an, China

Title: Multilevel Distribution Alignment for Multisource Universal Domain Adaptation

Abstract:
The multisource universal domain adaptation (MSUDA) relaxes the constraints between the source and target domains, enabling the transfer of knowledge between domains without any restrictions on the number of source domains and the existence of unknown (private) categories. However, identifying the unknown samples in the target domain is extremely challenging since there are no available samples with the same label in source domains. Another immense challenge lies in extracting domain-invariant features for knowledge transfer since there are distribution discrepancies between each source and target domain. In this article, we propose the multirepresentation DA network (MRDAN) to classify the unlabeled targets by harnessing multiple source domains with nonidentical label sets. First, we propose a threshold-free conflict-based predictions with uncertainty (CPU) module, which comprehensively mines the complementary knowledge from different source domains to identify both known and unknown samples simultaneously. To accurately extract the domain-invariant features for recognizing known and unknown samples, a multilevel distribution alignment (MLDA) strategy is introduced to decrease the distribution discrepancy between multiple domains with nonidentical category spaces progressively. Finally, comprehensive experiments conducted on three commonly used datasets demonstrate the effectiveness of the proposed MRDAN in recognizing both known and unknown samples.

PaperID: 318,

Authors: Na Li, Chunyi Zhou, Yansong Gao, Hui Chen, Zhi Zhang, Boyu Kuang, Anmin Fu

Affiliations: School of Cyber Science and Engineering, Nanjing University of Science and Technology, Nanjing, China; College of Computer Science and Technology, Zhejiang University, Hangzhou, China; Department of Computer Science and Software Engineering, The University of Western Australia, Perth, WA, Australia; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China

Title: Machine Unlearning: Taxonomy, Metrics, Applications, Challenges, and Prospects

Abstract:
Personal digital data is a critical asset, and governments worldwide have enforced laws and regulations to protect data privacy. Data users have been endowed with the “right to be forgotten” (RTBF) of their data. In the course of machine learning (ML), the forgotten right requires a model provider to delete user data and its subsequent impact on ML models upon user requests. Machine unlearning (MU) emerges to address this, which has garnered ever-increasing attention from both industry and academia. Specifically, MU allows model providers to eliminate the influence of unlearned data without retraining the model from scratch, ensuring the model behaves as if it never encountered this data. While the area has developed rapidly, there is a lack of comprehensive surveys to capture the latest advancements. Recognizing this shortage, we conduct an extensive exploration to map the landscape of MU including the (fine-grained) taxonomy of unlearning algorithms under centralized and distributed settings, debate on approximate unlearning, verification and evaluation metrics, and challenges and solutions across various applications. We also focus on the motivations, challenges, and specific methods for deploying unlearning in large language models (LLMs), as well as the potential attacks targeting unlearning processes. The survey concludes by outlining potential directions for future research, hoping to serve as a beacon for interested scholars.

PaperID: 319,

Authors: Debo Cheng, Jiuyong Li, Lin Liu, Ziqi Xu, Weijia Zhang, Jixue Liu, Thuc Duy Le

Affiliations: UniSA STEM, University of South Australia, Adelaide, SA, Australia; School of Computing Technologies, RMIT University, Melbourne, VIC, Australia; School of Information and Physical Sciences, University of Newcastle, Callaghan, NSW, Australia

Title: Disentangled Representation Learning for Causal Inference With Instruments

Abstract:
Latent confounders are a fundamental challenge for inferring causal effects from observational data. The instrumental variable (IV) approach is a practical way to address this challenge. Existing IV-based estimators need a known IV or other strong assumptions, such as the existence of two or more IVs in the system, which limits the application of the IV approach. In this article, we consider a relaxed requirement, which assumes there is an IV proxy in the system without knowing which variable is the proxy. We propose a variational autoencoder (VAE)-based disentangled representation learning method to learn an IV representation from a dataset with latent confounders and then utilize the IV representation to obtain an unbiased estimation of the causal effect from the data. Extensive experiments on synthetic and real-world data have demonstrated that the proposed algorithm outperforms the existing IV-based estimators and VAE-based estimators.

PaperID: 320,

Authors: Sajad Darabi, Piotr Bigaj, Dawid Majchrowski, Artur Kasymov, Pawel M. Morkisz, Alex Fit-Florea

Affiliations: NVIDIA, Santa Clara, CA, USA

Title: A Framework for Large-Scale Synthetic Graph Dataset Generation

Abstract:
Recently, there has been increasing interest in developing and deploying deep graph learning algorithms for various tasks, such as fraud detection and recommender systems. However, there is a limited number of publicly available graph-structured datasets, most of which are small compared with production-sized applications or limited in their application domain. In this work, we tackle this shortcoming by proposing a synthetic graph generation tool that enables scaling datasets to production-size graphs with trillions of edges and billions of nodes. The proposed method comprises a series of parametric models that can either be randomly initialized or fit to proprietary datasets. These models can then be released to researchers to study graph methods on the synthetic data, facilitating prototype development and novel applications. We demonstrate the generalizability of the framework across various datasets, mimicking their structural and feature distributions, as well as the ability to scale them to varying sizes, demonstrating their usefulness for benchmarking and model development. Code can be found on GitHub.

PaperID: 321,

Authors: Zhiang Liu, Yang Liu, Yongchun Fang

Affiliations: Institute of Robotics and Automatic Information Systems (IRAIS), College of Artificial Intelligence, and Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, China

Title: Diffusion Model-Based Path Follower for a Salamander-Like Robot

Abstract:
Salamander-like robots, renowned for their versatile locomotion, present unique challenges in the development of effective path-following controllers due to their distinctive movement patterns and complex body structures. Conventional path-following controllers, while effective for various bionic robots, struggle with the intricate modeling for salamander-like robots and often require laborious manual tuning. Conversely, learning-based methods offer promising alternatives but face issues such as reliance on environmental interactions, short-sighted prediction, and irrational design of state space and reward function. To overcome these limitations, this article proposes a diffusion model-based hierarchical control framework that treats path tracking as a sequence generation problem. The diffusion model’s capability to model joint distributions of state, action, and reward sequences enables it to outperform other learning-based approaches in efficient data utilization, stable training, and long-horizon dependency modeling. Our framework integrates a high-level policy driven by guided diffusion with a low-level controller for parsing commands into executable movements via inverse kinematics, reducing the action space and improving learning efficiency. In addition, we design a more reasonable state space and reward function tailored to the path-following task, addressing shortcomings in prior learning-based controllers. Furthermore, we optimize the diffusion model (DM) by developing lightweight network architectures and incorporating advanced attention mechanisms, to ensure its practical deployment on physical robots with limited computational resources, without compromising performance. Extensive simulations and real-world experiments demonstrate the framework’s effectiveness, efficiency, and robustness in diverse path-following tasks for salamander-like robots, marking a significant advancement in the control of biomimetic robots.

PaperID: 322,

Authors: Bosheng Qin, Juncheng Li, Siliang Tang, Yueting Zhuang

Affiliations: College of Computer Science and Technology, Zhejiang University, Hangzhou, China

Title: DBA: Efficient Transformer With Dynamic Bilinear Low-Rank Attention

Abstract:
Many studies have aimed to improve Transformer model efficiency using low-rank-based methods that compress sequence length with predetermined or learned compression matrices. However, these methods fix compression coefficients for tokens in the same position during inference, ignoring sequence-specific variations. They also overlook the impact of hidden state dimensions on efficiency gains. To address these limitations, we propose dynamic bilinear low-rank attention (DBA), an efficient and effective attention mechanism that compresses sequence length using input-sensitive dynamic compression matrices. DBA achieves linear time and space complexity by jointly optimizing sequence length and hidden state dimension while maintaining state-of-the-art performance. Specifically, we demonstrate through experiments and the properties of low-rank matrices that sequence length can be compressed with compression coefficients dynamically determined by the input sequence. In addition, we illustrate that the hidden state dimension can be approximated by extending the Johnson-Lindenstrauss lemma, thereby introducing only a small amount of error. DBA optimizes the attention mechanism through bilinear forms that consider both the sequence length and hidden state dimension. Moreover, the theoretical analysis substantiates that DBA excels at capturing high-order relationships in cross-attention problems. Experimental results across different tasks with varied sequence length conditions demonstrate that DBA achieves state-of-the-art performance compared to several robust baselines. DBA also maintains higher processing speed and lower memory usage, highlighting its efficiency and effectiveness across diverse applications.

PaperID: 323,

Authors: Guanghui Zhu, Zhennan Zhu, Hongyang Chen, Chunfeng Yuan, Yihua Huang

Affiliations: State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; Research Center for Data Hub and Security, Zhejiang Lab, Hangzhou, China

Title: HAGNN: Hybrid Aggregation for Heterogeneous Graph Neural Networks

Abstract:
Heterogeneous graph neural networks (GNNs) have been successful in handling heterogeneous graphs. In existing heterogeneous GNNs, meta-path plays an essential role. However, recent work pointed out that a simple homogeneous graph model without a meta-path can also achieve comparable results, which calls into question the necessity of a meta-path. In this article, we first present the intrinsic difference between meta-path-based and meta-path-free models, i.e., how to select neighbors for node aggregation. Then, we propose a novel framework to utilize the rich type of semantic information in heterogeneous graphs comprehensively, namely, hybrid aggregation for heterogeneous GNNs (HAGNNs). The core of HAGNN is to leverage the meta-path neighbors and the directly connected neighbors simultaneously for node aggregations. HAGNN divides the overall aggregation process into two phases: meta-path-based intratype aggregation and meta-path-free intertype aggregation. During the intratype aggregation phase, we propose a new data structure called a fused meta-path graph and perform structural semantic aware aggregation on it. Finally, we combine the embeddings generated by each phase. Compared with existing heterogeneous GNN models, HAGNN can take full advantage of the heterogeneity in heterogeneous graphs. Extensive experimental results on node classification, node clustering, and link prediction tasks show that HAGNN outperforms the existing modes, demonstrating the effectiveness and efficiency of HAGNN.

PaperID: 324,

Authors: Xu Chen, Yahong Han, Changlin Li, Xiaojun Chang, Yifan Sun, Yi Yang

Affiliations: College of Intelligence and Computing, Tianjin Key Laboratory of Machine Learning, Tianjin University, Tianjin, China; Australian Artificial Intelligence Institute, University of Technology Sydney, Ultimo, NSW, Australia; Baidu Research, Beijing, China; College of Computer Science and Technology, Zhejiang University, Hangzhou, China

Title: A Static-Dynamic Composition Framework for Efficient Action Recognition

Abstract:
The dynamic inference, which adaptively allocates computational budgets for different samples, is a prevalent approach for achieving efficient action recognition. Current studies primarily focus on a data-efficient regime that reduces spatial or temporal redundancy, or their combination, by selecting partial video data, such as clips, frames, or patches. However, these approaches often utilize fixed and computationally expensive networks. From a different perspective, this article introduces a novel model-efficient regime that addresses network redundancy by dynamically selecting a partial network in real time. Specifically, we acknowledge that different channels of the neural network inherently contain redundant semantics either spatially or temporally. Therefore, by decreasing the width of the network, we can enhance efficiency while compromising the feature capacity. To strike a balance between efficiency and capacity, we propose the static-dynamic composition (SDCOM) framework, which comprises a static network with a fixed width and a dynamic network with a flexible width. In this framework, the static network extracts the primary feature with essential semantics from the input frame and simultaneously evaluates the gap toward achieving a comprehensive feature representation. Based on these evaluation results, the dynamic network activates a minimal width to extract a supplementary feature that fills the identified gap. We optimize the dynamic feature extraction through the employment of the slimmable network mechanism and a novel meta-learning scheme introduced in this article. Empirical analysis reveals that by combining the primary feature with an extremely lightweight supplementary feature, we can accurately recognize a large majority of frames (76%~92%). As a result, our proposed SDCOM significantly enhances recognition efficiency. For instance, on ActivityNet, FCVID, and Mini-Kinetics datasets, SDCOM saves 90% of the baseline’s floating point operations (FLOPs) while achieving comparable or superior accuracy when compared with state-of-the-art methods.

PaperID: 325,

Authors: Pingping Pan, Yunjian Zhang, You Li, Yishan Ye, Wei He, Yutao Zhu, Renzhong Guo

Affiliations: Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China; Shenzhen Kaihong Digital Industry Development Company Ltd., Shenzhen, China; School of Opto-Electronic and Communication Engineering, Xiamen University of Technology, Xiamen, China; School of Electronics and Communication Engineering, Sun Yat-sen University, Shenzhen, China; China Branch of BRICS Institute of Future Networks, Shenzhen, China; Research Institute for Smart Cities and the School of Architecture and Urban Planning, Shenzhen University, Shenzhen, China

Title: Interpretable Optimization-Inspired Deep Network for Off-Grid Frequency Estimation

Abstract:
The accuracy of on-grid frequency estimation methods suffers from the quantization error of discrete grids. In this article, a deep unfolded network for off-grid frequency estimation is proposed, dubbed OGFreq. In the OGFreq, there exist two kinds of variables. One is the batch-oriented dictionary for frequency-domain transform, and the other one is the instance-specific on-grid frequency and off-grid bias. As the dictionary is required to be universally applicable among all observed signals, network layers are designed and network weights are updated to approximate the transform bases in a data-driven way. Besides, instance-specific on-grid frequencies and off-grid biases are solved by unfolding the iterative soft-threshold algorithm (ISTA). In addition, the instance-specific hyperparameters for sparsity in ISTA are obtained by an encoder-decoder soft-threshold (EDS) module with the attention mechanism. In this way, the dictionary, on-grid frequency, and off-grid bias are learned in a unified data-driven framework. Numerical experiments show that the OGFreq obtains 4% lower false negative rate (FNR) when the SNR is 20 dB. Moreover, the computational complexity is one order of magnitude lower than the iteration-based off-grid frequency estimation methods. Finally, the robustness of the OGFreq is discussed when extended to the impulse noise and damped signals.

PaperID: 326,

Authors: Xinhang Wan, Jiyuan Liu, Hao Yu, Qian Qu, Ao Li, Xinwang Liu, Ke Liang, Zhibin Dong, En Zhu

Affiliations: School of Computer, National University of Defense Technology, Changsha, China; School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China

Title: Contrastive Continual Multiview Clustering With Filtered Structural Fusion

Abstract:
Multiview clustering thrives in applications where views are collected in advance by extracting consistent and complementary information among views. However, it overlooks scenarios where data views are collected sequentially, i.e., real-time data. Due to privacy issues or memory burden, previous views are not available with time in these situations. Some methods are proposed to handle it but are trapped in a stability-plasticity dilemma. In specific, these methods undergo a catastrophic forgetting of prior knowledge when a new view is attained. Such a catastrophic forgetting problem (CFP) would cause the consistent and complementary information hard to get and affect the clustering performance. To tackle this, we propose a novel method termed contrastive continual multiview clustering with filtered structural fusion (CCMVC-FSF). Precisely, considering that data correlations play a vital role in clustering and prior knowledge ought to guide the clustering process of a new view, we develop a data buffer to store filtered structural information and utilize it to guide the generation of a robust partition matrix via contrastive learning. Additionally, to address the high complexity involved in acquiring and storing structural information, we propose a sampling strategy called clustering then sample. Furthermore, we theoretically connect CCMVC-FSF with semisupervised learning and knowledge distillation. Extensive experiments exhibit the excellence of the proposed method. Our code is publicly available at https://github.com/wanxinhang/CCMVC-FSF/.

PaperID: 327,

Authors: Heng Tian, Ziqiu Chi, Zhe Wang, Wei Guo, Mengping Yang, Xinlei Xu

Affiliations: Ministry of Education, and the Department of Computer Science and Engineering, Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, China

Title: Transductive Parameter-Free Propagation Framework for Few-Shot Distribution Rectification

Abstract:
Few-shot learning (FSL) is challenging due to the scarce labeled novel-class data. Researchers have to train the embedding function with auxiliary base-class data to obtain the novel-class embeddings. However, the domain gap makes the novel-class embedding unsatisfactory, as the novel class and the base class are disjoint. Recent studies prove that embedding rectification shows great potential, introduces miscellaneous variants, and achieves similar performances. Nonetheless, while each method demonstrates unique strengths, they often address distinct challenges in isolation, limiting their applicability in more complex or diverse scenarios. In this article, we take a closer look at these methods and hypothesize that a general embedding rectification framework is more essential to the model’s performance. To verify our observation, we propose: 1) a distribution propagation (DisP) layer distinguishes the inter-class margin and increases intra-class aggregation, performing the task-level rectification; and 2) a prototype propagation (ProtoP) layer moves the prototype toward the ideal class center, applying the prototype-query level rectification. Our framework aims to maximize the actual data distribution. Although pseudo-labeling proves effective in achieving this goal, a significant challenge is ensuring the reliable retention of only high-confidence predictions. To overcome this, we introduce a distribution-based pseudo-labeling method pseudo-query upgrade (PseQUp) that provides more reliable pseudo-labeling samples without relying on confidence scores. We evaluate the proposed method in both transfer learning and meta-learning scenarios. Empirical experiments show the applicable and plug-and-play ability of the proposed methods.

PaperID: 328,

Authors: Wenqi Yang, Chang Tang, Xinwang Liu, Guanghui Yue, Yuanyuan Liu, Changqing Zhang, En Zhu

Affiliations: School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China; School of Software Engineering, Huazhong University of Science and Technology, Wuhan, China; School of Computer, National University of Defense Technology, Changsha, China; School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen, China; School of Computer Science, China University of Geosciences, Wuhan, China; College of Intelligence and Computing, Tianjin University, Tianjin, China

Title: Smooth Multiple Kernel k-Means via Underlying Graph Filtering

Abstract:
Clustering has attracted more and more attention as one of the most fundamental techniques in the field of unsupervised learning. To deal with nonlinear problems, clustering methods have been extended to the kernel version. As a traditional kernel clustering algorithm, multiple kernel k-means (MKKM) aims to learn clustering results from a consensus kernel obtained by combining a set of predefined kernels optimally. However, we observe that the existing MKKM algorithm and its variants insufficiently consider the noise that existed in kernel space and the underlying structure of kernelized data points. To this end, we propose a novel smooth MKKM via underlying graph filtering (SMKKM-UGF) to learn the smooth representations of kernelized data points through their nearby nodes in the underlying graph. In particular, different from the common graph filter, we jointly update the graph filter while learning the smooth kernel, so that the graph filter can be guaranteed to adapt to the updating kernel space constantly. Besides, an iterative algorithm with proven convergence is designed to solve the resultant optimization problem. Extensive experiments have been performed on numerous benchmark datasets, whose results prove the superiority of the proposed SMKKM-UGF compared to the other state-of-the-art clustering methods. The demo code of this work is publicly available at https://github.com/wqyang23/SMKKM-UGF.git.

PaperID: 329,

Authors: Yuhao Zhou, Yuxin Tian, Mingjia Shi, Yuanxi Li, Yanan Sun, Qing Ye, Jiancheng Lv

Affiliations: College of Computer Science and the Engineering Research Center of Machine Learning and Industry Intelligence, Ministry of Education, Sichuan University, Chengdu, China; Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA

Title: E-3SFC: Communication-Efficient Federated Learning With Double-Way Features Synthesizing

Abstract:
The exponential growth in model sizes has significantly increased the communication burden in federated learning (FL). Existing methods to alleviate this burden by transmitting compressed gradients often face high compression errors, which slow down the model’s convergence. To simultaneously achieve high compression effectiveness and lower compression errors, we study the gradient compression problem from a novel perspective. Specifically, we propose a systematical algorithm termed extended single-step synthetic features compressing (E-3SFC), which consists of three subcomponents, i.e., the single-step synthetic features compressor (3SFC), a double-way compression (DWC) algorithm, and a communication budget scheduler (BS). First, we regard the process of gradient computation of a model as decompressing gradients from corresponding inputs, while the inverse process is considered as compressing the gradients. Based on this, we introduce a novel gradient compression method termed 3SFC, which utilizes the model itself as a decompressor, leveraging training priors such as model weights and objective functions. The 3SFC compresses raw gradients into tiny synthetic features in a single-step simulation, incorporating error feedback (EF) to minimize overall compression errors. To further reduce communication overhead, 3SFC is extended to E-3SFC, allowing DWC and dynamic communication budget scheduling. Our theoretical analysis under both strongly convex and nonconvex conditions demonstrates that 3SFC achieves linear and sublinear convergence rates with aggregation noise. Extensive experiments across six datasets and six models reveal that 3SFC outperforms the state-of-the-art methods by up to 13.4% while reducing communication costs by 111.6 times. These findings suggest that 3SFC can significantly enhance communication efficiency in FL without compromising model performance.

PaperID: 330,

Authors: Bin Sun, Zuxiang Long, Ziyu Ma, Shutao Li

Affiliations: College of Electrical and Information Engineering and the Key Laboratory of Visual Perception and Artificial Intelligence of Hunan Province, Hunan University, Changsha, China

Title: Cascade Fusion and Correlation Enhancement for Knowledge Distillation

Abstract:
Knowledge distillation (KD) improves the performance of a compact student network by transferring learned knowledge from a cumbersome teacher network. In the existing approaches, the multiscale feature knowledge is transferred via densely connected paths, which increases the optimization difficulty. Moreover, correlations among the labels are neglected despite their capability to enhance the intraclass similarity of samples. To solve these issues, we propose cascade fusion and correlation enhancement for KD (CC-KD). The multiscale feature knowledge is transferred via much simpler paths, which are constructed by fusing features of different scales with cross-scale attention (CSA) in a cascade manner, thereby reducing the optimization difficulty. On the other hand, the relational knowledge of teacher logits is further enhanced by correlations of the corresponding labels, so that the student can produce more similar logits for the samples in the same category. Extensive experimental results on five public datasets (i.e., CIFAR100/10, ImageNet, RAF-DB, and FERPlus) indicate superior performance of the proposed method over several state-of-the-arts (SOTAs). More specifically, our method obtains an accuracy of 71.70% on ImageNet and achieves a new record of 90.20% on RAF-DB with fewer calculations and parameters.

PaperID: 331,

Authors: Vittorio Mazzia, Alessandro Pedrani, Andrea Caciolai, Kay Rottmann, Davide Bernardi

Affiliations: Amazon AGI, Amazon, Seattle, WA, USA

Title: A Survey on Knowledge Editing of Neural Networks

Abstract:
Deep neural networks are becoming increasingly pervasive in academia and industry, matching and surpassing human performance in a wide variety of fields and related tasks. However, just as humans, even the largest artificial neural networks (ANNs) make mistakes, and once-correct predictions can become invalid as the world progresses in time. Augmenting datasets with samples that account for mistakes or up-to-date information has become a common workaround in practical applications. However, the well-known phenomenon of catastrophic forgetting poses a challenge in achieving precise changes in the implicitly memorized knowledge of neural network parameters, often requiring a full model retraining to achieve desired behaviors. That is expensive, unreliable, and incompatible with the current trend of large self-supervised pretraining, making it necessary to find more efficient and effective methods for adapting neural network models to changing data. To address this need, knowledge editing (KE) is emerging as a novel area of research that aims to enable reliable, data-efficient, and fast changes to a pretrained target model, without affecting model behaviors on previously learned tasks. In this survey, we provide a brief review of this recent artificial intelligence field of research. We first introduce the problem of editing neural networks, formalize it in a common framework and differentiate it from more notorious branches of research such as continuous learning. Next, we provide a review of the most relevant KE approaches and datasets proposed so far, grouping works under four different families: regularization techniques, meta-learning, direct model editing, and architectural strategies. Finally, we outline some intersections with other fields of research and potential directions for future works.

PaperID: 332,

Authors: Brian B. Moser, Arundhati S. Shanbhag, Federico Raue, Stanislav Frolov, Sebastian Palacio, Andreas Dengel

Affiliations: German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany

Title: Diffusion Models, Image Super-Resolution, and Everything: A Survey

Abstract:
Diffusion models (DMs) have disrupted the image super-resolution (SR) field and further closed the gap between image quality and human perceptual preferences. They are easy to train and can produce very high-quality samples that exceed the realism of those produced by previous generative methods. Despite their promising results, they also come with new challenges that need further research: high computational demands, comparability, lack of explainability, color shifts, and more. Unfortunately, entry into this field is overwhelming because of the abundance of publications. To address this, we provide a unified recount of the theoretical foundations underlying DMs applied to image SR and offer a detailed analysis that underscores the unique characteristics and methodologies within this domain, distinct from broader existing reviews in the field. This article articulates a cohesive understanding of DM principles and explores current research avenues, including alternative input domains, conditioning techniques, guidance mechanisms, corruption spaces, and zero-shot learning approaches. By offering a detailed examination of the evolution and current trends in image SR through the lens of DMs, this article sheds light on the existing challenges and charts potential future directions, aiming to inspire further innovation in this rapidly advancing area.

PaperID: 333,

Authors: Pengyu Zhang, Hao Ju, Weihua He, Yaoyuan Wang, Ziyang Zhang, Shengming Li, Dong Wang, Huchuan Lu, Xu Jia

Affiliations: School of Information and Communication Engineering, Dalian University of Technology, Dalian, China; Department of Precision Instrument, Tsinghua University, Beijing, China; Advanced Computing and Storage Laboratory, Huawei Technologies Company Ltd., Beijing, China; School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian, China; School of Artificial Intelligence, Dalian University of Technology, Dalian, China

Title: Event-Assisted Recurrent Network for Arbitrary-Temporal-Scale Blurry Image Unfolding

Abstract:
Recovering a sequence of latent sharp frames from a motion-blurred image is a challenging task. The bio-inspired event camera, which produces an event stream with high temporal resolution, has been exploited to promote the recovery performance. However, recovering sharp sequences with arbitrary temporal scales has been ignored for a long time. Existing works can only recover a fixed number of latent frames from a blurry image once they are trained. In this work, we propose an event-assisted blurry image unfolding framework that can work across arbitrary temporal scales. A bi-directional recurrent network is employed to encode events corresponding to each latent frame, which gathers information over all events in the exposure time. Features of both the blurry image and events are fused together and fed to a bi-directional latent sequence decoder (BiLSD) to produce a sequence of latent sharp frames. Extensive experiments show that the proposed method not only performs favorably against state-of-the-art methods in recovering a fixed number of frames from a blurry image but can be well generalized to arbitrary-temporal-scale blurry image unfolding.

PaperID: 334,

Authors: Yiming Wu, Chenduo Ying, Ning Zheng, Wen-An Zhang, Shanying Zhu

Affiliations: School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China; State Key Laboratory of Industrial Control Technology and the College of Control Science and Engineering, Zhejiang University, Hangzhou, China; Department of Automation, Zhejiang University of Technology, Hangzhou, China; Department of Automation, Shanghai Jiao Tong University, Shanghai, China

Title: Whole-Process Privacy-Preserving and Sybil-Resilient Consensus for Multiagent Networks

Abstract:
This article is concerned with the co-design of privacy-preserving and resilient consensus protocol for a class of multiagent networks (MANs), where the information exchanges over communication networks among the agents suffer from eavesdropping and Sybil attacks. First, we introduce a new attack model in which an adversarial agent could launch a Sybil attack, generating a large number of spurious entities in the network, thereby gaining disproportionate influence. In this communication framework, a whole-process privacy-preserving mechanism is designed that is capable of protecting both initial and current states of agents. Then, instead of existing methods requiring identifying and mitigating Sybil nodes, a degree-based mean-subsequence-reduced (D-MSR) resilient strategy is implemented, showcasing its significant properties: 1) ensuring the effectiveness of aforementioned designed privacy protection strategy; 2) allowing the network to contain Sybil nodes without elimination; and 3) reaching consensus among the normal agents. Finally, several numerical simulations are provided to validate the effectiveness of the proposed results.

PaperID: 335,

Authors: Mingyuan Li, Tong Jia, Hao Wang, Bowen Ma, Hui Lu, Shuyang Lin, Da Cai, Dongyue Chen

Affiliations: College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China; College of Information Science and Engineering and the Key Laboratory of Data Analytics and Optimization for Smart Industry, Ministry of Education, Northeastern University, Shenyang, Liaoning, China

Title: AO-DETR: Anti-Overlapping DETR for X-Ray Prohibited Items Detection

Abstract:
Prohibited item detection in X-ray images is one of the most essential and highly effective methods widely employed in various security inspection scenarios. Considering the significant overlapping phenomenon in X-ray prohibited item images, we propose an anti-overlapping detection transformer (AO-DETR) based on one of the state-of-the-art (SOTA) general object detectors, DETR with improved denoising anchor boxes (DINO). Specifically, to address the feature coupling issue caused by overlapping phenomena, we introduce the category-specific one-to-one assignment (CSA) strategy to constrain category-specific object queries in predicting prohibited items of fixed categories, which can enhance their ability to extract features specific to prohibited items of a particular category from the overlapping foreground-background features. To address the edge blurring problem caused by overlapping phenomena, we propose the look forward densely (LFD) scheme, which improves the localization accuracy of reference boxes in mid-to-high-level decoder layers and enhances the ability to locate blurry edges of the final layer. Similar to DINO, our AO-DETR provides two different versions with distinct backbones, tailored to meet diverse application requirements. Extensive experiments on the PIXray, OPIXray, and HIXray datasets demonstrate that the proposed method surpasses the SOTA object detectors, indicating its potential applications in the field of prohibited item detection. The source code will be available at: https://github.com/Limingyuan001/AO-DETR.

PaperID: 336,

Authors: Yuyan Ruan, Dawei Yang, Ziqi Tang, An-ran Ran, Jiguang Wang, Carol Y. Cheung, Hao Chen

Affiliations: Department of Chemical and Biological Engineering, Division of Life Science, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Sai Kung, Hong Kong, SAR; Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong, SAR; SIAT-HKUST Joint Laboratory of Cell Evolution and Digital Health, HKUST Shenzhen-Hong Kong Collaborative Innovation Research Institute, Shenzhen, China; Department of Computer Science and Engineering and the Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Sai Kung, Hong Kong, SAR

Title: Reference-Based OCT Angiogram Super-Resolution With Learnable Texture Generation

Abstract:
Optical coherence tomography angiography (OCTA) can visualize retinal microvasculature and is important to qualitatively and quantitatively identify potential biomarkers for different retinal diseases. However, the resolution of optical coherence tomography (OCT) angiograms inevitably decreases when increasing the field-of-view (FOV) given a fixed acquisition time. To address this issue, we propose a novel reference-based super-resolution (RefSR) framework to preserve the resolution of the OCT angiograms while increasing the scanning area. Specifically, textures from the normal RefSR pipeline are used to train a learnable texture generator (LTG), which is designed to generate textures according to the input. The key difference between the proposed method and traditional RefSR models is that the textures used during inference are generated by the LTG instead of being searched from a single reference (Ref) image. Since the LTG is optimized throughout the whole training process, the available texture space is significantly enlarged and no longer limited to a single Ref image, but extends to all textures contained in the training samples. Moreover, our proposed LTGNet does not require an Ref image at the inference phase, thereby becoming invulnerable to the selection of the Ref image. Both experimental and visual results show that LTGNet has competitive performance and robustness over state-of-the-art methods, indicating good reliability and promise in real-life deployment. The source code is available at https://github.com/RYY0722/LTGNet.

PaperID: 337,

Authors: Xiaoliang Hu, Pengcheng Guo, Yadong Li, Guangyu Li, Zhen Cui, Jian Yang

Affiliations: School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China

Title: TVDO: Tchebycheff Value-Decomposition Optimization for Multiagent Reinforcement Learning

Abstract:
In cooperative multiagent reinforcement learning (MARL), centralized training with decentralized execution (CTDE) has recently attracted more attention due to the physical demand. However, the most dilemma therein is the inconsistency between jointly-trained policies and individually executed actions. In this article, we propose a factorized Tchebycheff value-decomposition optimization (TVDO) method to overcome the trouble of inconsistency. In particular, a nonlinear Tchebycheff aggregation function is formulated to realize the global optimum by tightly constraining the upper bound of individual action-value bias, which is inspired by the Tchebycheff method of multiobjective optimization (MOO). We theoretically prove that, under no extra limitations, the factorized value decomposition with Tchebycheff aggregation satisfies the sufficiency and necessity of individual-global-max (IGM), which guarantees the consistency between the global and individual optimal action-value function. Empirically, in the climb and penalty game, we verify that TVDO precisely expresses the global-to-individual value decomposition with a guarantee of policy consistency. Meanwhile, we evaluate TVDO in the StarCraft multiagent challenge (SMAC) benchmark, and extensive experiments demonstrate that TVDO achieves a significant performance superiority over some SOTA MARL baselines.

PaperID: 338,

Authors: Liangming Chen, Long Jin, Mingsheng Shang

Affiliations: Chongqing Key Laboratory of Edge AI Computing, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China

Title: Efficient Loss Landscape Reshaping for Convolutional Neural Networks

Abstract:
Theoretical and empirical evidence highlights a positive correlation between the flatness of loss landscapes around minima and generalization. However, most current approaches that seek to find flat minima either incur high computational costs or struggle to balance generalization, training stability, and convergence. This work proposes reshaping the loss landscape to induce the optimizer toward flat regions, an approach that has negligible computational costs and does not compromise training stability, convergence, or efficiency. We focus on nonlinear, loss-dependent reshaping functions underpinned by theoretical insights to reshape the loss landscape. To design these functions, we first identify where and how these functions should be applied. With the aid of recently developed tools in stochastic optimization, theoretical analysis shows that steepening the low-loss landscape improves the rate of sharp minimum escape while flattening the high- and ultralow-loss landscapes enhances training stability and optimization performance, respectively. Simulations and experiments reveal that the subtly designed reshaping functions not only induce optimizers to find flat minima and improve generalization performance but also stabilize training, promote optimization, and keep efficiency. Our approach is evaluated on image classification, adversarial robustness, and natural language processing (NLP) tasks and achieves significant improvement in generalization performance with negligible computational cost. We believe that the new perspective introduced in this work will broadly impact the field of deep neural network training. The code is available at https://github.com/LongJin-lab/LLR.

PaperID: 339,

Authors: Wenyan Pan, Wentao Ma, Tongqing Zhou, Shan Zhao, Lichuan Gu, Guolong Shi, Zhihua Xia

Affiliations: School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, China; College of Computer, National University of Defense Technology, Changsha, China; School of Computer and Information Engineering, Hefei University of Technology, Hefei, China; College of Cyber Security, Jinan University, Guangzhou, China

Title: Dual-Decoupling With Frequency-Spatial Domains for Image Manipulation Localization

Abstract:
Leveraging trace-rich features within embedded spaces has been established as effective in image manipulation localization (IML). Nevertheless, the feature of manipulated traces frequently comprises substantial redundant information only loosely related to IML tasks. This complexity has hindered existing methods in fully comprehending the essence of trace features. In light of this challenge, we introduce a novel decoupling representation learning network (DRN) tailored for IML. The DRN excels at decoupling intricate multidomain information and transforming it into representations directly pertinent to IML objectives. This is achieved through a meticulously designed frequency decoupling representation learning module (FDM) and spatial decoupling representation learning module (SDM). Specifically, the FDM operates by acquiring distinct low and high-frequency components to effectively decouple redundant information. The decoupled high-frequency components are then harnessed as intricate trace complements, enhancing the overall aggregation process. In addition, the redundant information is expertly separated into authentic and manipulated representations through the use of channel activation maps in SDM. Through extensive experimentation on three public benchmarks including CASIA, NIST, and Coverage, our method consistently demonstrates superior performance and enhanced robustness compared with existing state-of-the-art methods.

PaperID: 340,

Authors: Yuting Wang, Rong Wang, Feiping Nie, Xuelong Li

Affiliations: School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi’an, China; Institute of Artificial Intelligence (TeleAI), China Telecom Corporation Ltd., Beijing, China

Title: Fast Multiview Semi-Supervised Classification With Optimal Bipartite Graph

Abstract:
As data collection becomes increasingly facile and descriptions of data grow more diverse, exploring heterogeneous multiview data is becoming essential. Extracting valuable insights from vast multiview datasets is profoundly meaningful which can leverage the diversity of multiple features to improve classification accuracy. As is well-known, semi-supervised learning (SSL) utilizes limited set of labeled samples to train models when addressing label scarcity. However, although the existing multiview semi-supervised algorithms can accomplish classification task, they often struggle with high complexity problem and lack interpretability, more transparent, and low-complexity approaches are worth studying. Besides, the interplay between graph structure and multiview consistency makes a deeper understanding of underlying data patterns but challenges persist in optimizing graph and ensuring scalability. In this article, we propose a fast multiview semi-supervised algorithm based on anchor graph (BGFMS), which improves the classification performance. It could significantly reduce the computational complexity by converting the label prediction of the original data into the forecast for few anchor points and avoids the additional processing procedure. Extensive experimental results on synthetic dataset and different real datasets validate the effectiveness and efficiency of our algorithm.

PaperID: 341,

Authors: Zhenchen Li, Xu Yang, Yanchao Zhang, Shaofeng Zeng, Jingbin Yuan, Jiazheng Liu, Zhiyong Liu, Hua Han

Affiliations: Key Laboratory of Brain Cognition and Brain-Inspired Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing, China; State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China

Title: Deep Graph Reinforcement Learning for Solving Multicut Problem

Abstract:
The multicut problem, also known as correlation clustering, is a classic combinatorial optimization problem that aims to optimize graph partitioning given only node (dis)similarities on edges. It serves as an elegant generalization for several graph partitioning problems and has found successful applications in various areas such as data mining and computer vision. However, the multicut problem with an exponentially large number of cycle constraints proves to be NP-hard, and existing solvers either suffer from exponential complexity or often give unsatisfactory solutions due to inflexible heuristics driven by hand-designed mechanisms. In this article, we propose a deep graph reinforcement learning method to solve the multicut problem within a combinatorial decision framework involving sequential edge contractions. The customized subgraph neural network adapts to the dynamically edge-contracted graph environment by extracting bilevel connected features from both contracted and original graphs. Our method can learn to infer feasible multicut solutions end-to-end toward optimization of the multicut objective in a data-driven manner. More specifically, by exploring the decision space adaptively, it implicitly gains heuristic knowledge from topological patterns of instances and thereby generates more targeted heuristics overcoming the short-sightedness inherent in the hand-designed ones. During testing, the learned heuristics iteratively contract graphs to construct high-quality solutions within polynomial time. Extensive experiments on synthetic and real-world multicut instances show the superiority of our method over existing combinatorial solvers, while also maintaining a certain level of out-of-distribution generalization ability.

PaperID: 342,

Authors: Can Xu, Le Hui, Jin Xie, Jian Yang

Affiliations: School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China; School of Electronics and Information and Shaanxi Key Laboratory of Information Acquisition and Processing, Northwestern Polytechnical University, Xi’an, China; State Key Laboratory for Novel Software Technology and the School of Intelligence Science and Technology, Nanjing University, Nanjing, China

Title: Weakly Supervised Object Localization With Progressive Activation Diffusion

Abstract:
Weakly supervised object localization (WSOL) aims to locate objects with only image-level labels. Previous works mainly follow the framework of class activation map (CAM), which discovers the objects by estimating the contribution of each pixel position to the category prediction. However, most of them overlook the pixel-level spatial and semantic contextual correlation, resulting in: 1) limited activation ranges that only highlight the most discriminative parts rather than the entire object and 2) low activation values for some foreground parts, especially regions near the boundary between foreground and background. To alleviate this issue, we propose an activation diffusion network (ADNet) to progressively refine both the range and value of activations on the localization map. Specifically, a context propagation module is first developed to learn the top-down spatial dependency between adjacent feature maps, which helps back-propagate the activation from the discriminative part to its surroundings for more complete objects. Then, a diffusion probability distillation module (DPDM) is proposed, which transfers the pixel-level semantic correlation emerging in the image generation process to the localization map generation in a teacher-student learning manner. This helps boost the value of the activated foreground region and stimulates the value of neighboring inactivated foreground positions to sharpen the object boundary. Experiments on various datasets and backbones demonstrate the superiority of our ADNet over state-of-the-art (SOTA) methods in object localization and segmentation, yielding 82.2% and 62.2% Top-1 Loc on Caltech-UCSD Birds-200-2011 (CUB) and ImageNet Large-ScaleVisual Recognition Challenge (ILSVRC) datasets and 76.6% pixel average precision (PxAP) on OpenImages dataset. Qualitative results also show that we can achieve a more complete and consistent activation covering the whole object.

PaperID: 343,

Authors: En-Hui Yang, Shayan Mohajer Hamidi, Linfeng Ye, Renhao Tan, Beverly Yang

Affiliations: Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada; NBK Institute of Mining Engineering, The University of British Columbia, Vancouver, BC, Canada

Title: Conditional Mutual Information Constrained Deep Learning for Classification

Abstract:
The concepts of conditional mutual information (CMI) and normalized CMI (NCMI) are introduced to measure the concentration and separation performance of a classification deep neural network (DNN) in the output probability distribution space of the DNN, where CMI and the ratio between CMI and NCMI represent the intraclass concentration and interclass separation of the DNN, respectively. By using NCMI to evaluate popular DNNs pretrained over CIFAR-100 and ImageNet in the literature, it is shown that their validation accuracies are more or less inversely proportional to their NCMI values. Based on this observation, the standard deep learning (DL) framework is further modified to minimize the standard cross entropy (CE) function subject to an NCMI constraint, yielding CMI constrained DL (CMIC-DL). A novel alternating learning algorithm is proposed to solve such a constrained optimization problem. Extensive experimental results show that DNNs trained within CMIC-DL outperform the state-of-the-art models trained within the standard DL and other loss functions in the literature in terms of both accuracy and robustness against adversarial attacks. In addition, visualizing the evolution of the learning process through the lens of CMI and NCMI is also advocated.

PaperID: 344,

Authors: Dan Zhang, Tong Zhang, C. L. Philip Chen, Tao Zhang

Affiliations: College of Mechanical and Electrical Engineering, Dalian Minzu University, Dalian, China; Guangdong Provincial Key Laboratory of AI Large Model and Intelligent Cognition, the School of Computer Science and Engineering, South China University of Technology, Guangzhou, China; Computer Science and Engineering College, South China University of Technology, Guangzhou, China

Title: Broad Learning System Based on Fractional Feature Optimization

Abstract:
Broad learning system (BLS) have demonstrated excellent performance in terms of both speed and accuracy in tasks such as image classification. In BLS, the feature nodes predominantly utilize linear features, and sparse representation is mainly employed in the feature optimization component. The robustness of these features to different data needs to be improved. Although there are many improved algorithms for BLS in feature optimization, there is no improvement based on fractional calculus at present. This article proposes BLS-FC, a novel data classification and regression method that can seamlessly combine BLS and fractional calculation. Fractional calculus describes the properties of data between integer orders and has memory properties. Fractional Fourier transform (Frft) also has time domain and frequency domain information. First, Frft is added to the broad learning feature node extraction to enrich the node features, which is called BLS-Frft. Second, fractional calculus is integrated into the BLS-Frft sparse representation feature optimization, and the feature representation capability is enhanced by fractional differential memory. This part is called BLS-FS. Finally, in order to solve the problem of unstable features of random fractional order subspaces, a fractional order multiscale feature interaction based on BLS-Frft is proposed, which is called BLS-MF. Experimental results across various classification and regression datasets demonstrate the superior performance of the proposed method.

PaperID: 345,

Authors: Jiaxu Leng, Jia Wang, Mengjingcheng Mo, Ji Gan, Wen Lu, Xinbo Gao

Affiliations: Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China; School of Software Technology, Dalian University of Technology, Dalian, China; School of Electronic Engineering, Xi'an University of Electronic Science and Technology, Xi'an, China

Title: Difficulty-Guided Variant Degradation Learning for Blind Image Super-Resolution

Abstract:
Recent blind super-resolution (BSR) methods are explored to handle unknown degradations and achieve impressive performance. However, the prevailing assumption in most BSR methods is the spatial invariance of degradation kernels across the entire image, which leads to significant performance declines when faced with spatially variant degradations caused by object motion or defocusing. Additionally, these methods do not account for the human visual system's tendency to focus differently on areas of varying perceptual difficulty, as they uniformly process each pixel during reconstruction. To cope with these issues, we propose a difficulty-guided variant degradation learning network for BSR, named difficulty-guided degradation learning (DDL)-BSR, which explores the relationship between reconstruction difficulty and degradation estimation. Accordingly, the proposed DDL-BSR consists of three customized networks: reconstruction difficulty prediction (RDP), space-variant degradation estimation (SDE), and degradation and difficulty-informed reconstruction (DDR). Specifically, RDP learns the reconstruction difficulty with the proposed reconstruction-distance supervision. Then, SDE is designed to estimate space-variant degradation kernels according to the difficulty map. Finally, both degradation kernels and reconstruction difficulty are fed into DDR, which takes into account such two prior knowledge information to guide super-resolution (SR). Experimental analysis on various synthetic datasets demonstrates that DDL-BSR invariably surpasses state-of-the-art (SOTA) methods, producing SR images with enhanced realism and texture quality. Code is available at https://github.com/JiaWang0704/DDL-BSR.

PaperID: 346,

Authors: Zizhou Wang, Yan Wang, Yangqin Feng, Jiawei Du, Yong Liu, Rick Siow Mong Goh, Liangli Zhen

Affiliations: Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Fusionopolis Way, Singapore

Title: Continuous Disentangled Joint Space Learning for Domain Generalization

Abstract:
Domain generalization (DG) aims to learn a model on one or multiple observed source domains that can generalize to unseen target test domains. Previous approaches have focused on extracting domain-invariant information from multiple source domains, but domain-specific information is also closely tied to semantics in individual domains and is not well-suited for generalization to the target domain. In this article, we propose a novel DG method called continuous disentangled joint space learning (CJSL), which leverages both domain-invariant and domain-specific information for more effective DG. The key idea behind CJSL is to formulate and learn a continuous joint space (CJS) for domain-specific representations from source domains through iterative feature disentanglement. This learned CJS can then be used to simulate domain-specific representations for test samples from a mixture of multiple domains via Monte Carlo sampling during the inference stage. Unlike existing approaches, which exploit domain-invariant feature vectors only or aim to learn a universal domain-specific feature extractor, we simulate domain-specific representations via sampling the latent vectors in the learned CJS for the test sample to fully use the power of multiple domain-specific classifiers for robust prediction. Empirical results demonstrate that CJSL outperforms 19 state-of-the-art (SOTA) methods on seven benchmarks, indicating the effectiveness of our proposed method.

PaperID: 347,

Authors: Yan Xiao, Yaochu Jin, Bin Wang, Yan Zhang, Kuangrong Hao, Haizhou Li

Affiliations: College of Information Engineering, Shanghai Maritime University, Shanghai, China; School of Engineering, Westlake University, Hangzhou, China; Department of Electrical and Computer Engineering, National University of Singapore, Queenstown, Singapore; Engineering Research Center of Digitized Textile and Apparel Technology, Ministry of Education, College of Information Science and Technology, Donghua University, Shanghai, China

Title: Zero-Shot Relation Classification Through Inference on Category Attributes

Abstract:
The goal of relationship classification (RC) is to predict the semantic relationship between two entities in a given sentence. With the advent of deep learning and pretrained language models, RC research has progressed by leaps and bounds. However, the current studies are focused mainly on predicting semantic relationships from a predefined set. How to recognize unseen relationships remains a challenge, which is also known as the zero-shot RC (ZSRC) task. Some ZSRC-related methods directly map relationship categories to numerical indices, constraining the model’s ability to autonomously infer and understand these relationships, while others rely heavily on manual definitions. To address these issues and inspired by the way of reasoning in which humans perform RC tasks, we propose a new framework to handle the ZSRC task through inference on category attributes (ICAs). The main idea of ICA is to detect the semantic relationship between promises, which are RC sentences, and hypotheses, which are relational sentences of entities created by templates. Specifically, instead of manual design, we introduce two hypothesis templates derived from the label words (LWs) and descriptions (LDs) associated with each relationship. These templates are used to automatically convert the RC data into the textual entailment (TE) format. Furthermore, they are fine-tuned with a pretrained TE model, facilitating the acquisition of relational knowledge and enabling the generalization of semantic reasoning rules learned from seen classes to unseen classes. Moreover, to implement multirelationship semantic inference for all unseen classes, we propose an entailment difference mechanism to enhance the reasoning capability of the model. Besides the current ZSRC test setting, we also examine our method in an even more challenging setting to deal with data scarcity in real-world applications. The outstanding performance of ICA on the FewRel and Wiki-ZSL datasets demonstrates its effectiveness in the ZSRC task.

PaperID: 348,

Authors: Zhao Kang, Xuanting Xie, Bingheng Li, Erlin Pan

Affiliations: School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China

Title: CDC: A Simple Framework for Complex Data Clustering

Abstract:
In today’s digital era driven by data, the amount and complexity of the collected data, such as multiview, non-Euclidean, and multirelational, are growing exponentially or even faster. Clustering, which unsupervisedly extracts valid knowledge from data, is extremely useful in practice. However, existing methods are independently developed to handle one particular challenge at the expense of the others. In this work, we propose a simple but effective framework for complex data clustering (CDC) that can efficiently process different types of data with linear complexity. We first use graph filtering (GF) to fuse geometric structure and attribute information. We then reduce complexity with high-quality anchors that are adaptively learned via a novel similarity-preserving (SP) regularizer. We illustrate the cluster-ability of our proposed method theoretically and experimentally. In particular, we deploy CDC to graph data of size 111 M.

PaperID: 349,

Authors: Kehui Ding, Jun Shu, Deyu Meng, Zongben Xu

Affiliations: School of Mathematics and Statistics and Ministry of Education Key Laboratory of Intelligent Networks and Network Security, Xi’an Jiaotong University, Xi’an, Shaanxi, China

Title: Improve Noise Tolerance of Robust Loss via Noise-Awareness

Abstract:
Robust loss minimization is an important strategy for handling robust learning issue on noisy labels. Current approaches for designing robust losses involve the introduction of noise-robust factors, i.e., hyperparameters, to control the trade-off between noise robustness and learnability. However, finding suitable hyperparameters for different datasets with noisy labels is a challenging and time-consuming task. Moreover, existing robust loss methods usually assume that all training samples share common hyperparameters, which are independent of instances. This limits the ability of these methods to distinguish the individual noise properties of different samples and overlooks the varying contributions of diverse training samples in helping models understand underlying patterns. To address above issues, we propose to assemble robust loss with instance-dependent hyperparameters to improve their noise tolerance with theoretical guarantee. To achieve setting such instance-dependent hyperparameters for robust loss, we propose a meta-learning method which is capable of adaptively learning a hyperparameter prediction function, called noise-aware-robust-loss-adjuster (NARL-Adjuster). Through mutual amelioration between hyperparameter prediction function and classifier parameters in our method, both of them can be simultaneously finely ameliorated and coordinated to attain solutions with good generalization capability. Four SOTA robust loss functions are attempted to be integrated with our algorithm, and comprehensive experiments substantiate the general availability and effectiveness of the proposed method in both its noise tolerance and performance. Meanwhile, the explicit parameterized structure makes the meta-learned prediction function ready to be transferrable and plug-and-play to unseen datasets with noisy labels. Specifically, we transfer our meta-learned NARL-Adjuster to unseen tasks, including several real noisy datasets, and achieve better performance compared with conventional hyperparameter tuning strategy, even with carefully tuned hyperparameters.

PaperID: 350,

Authors: Fangfang Li, Quanxue Gao, Qianqian Wang, Ming Yang, Cheng Deng

Affiliations: School of Telecommunications Engineering, Xidian University, Xi’an, China; College of Mathematical Sciences, Harbin Engineering University, Harbin, Heilongjiang, China; School of Electronic Engineering, Xidian University, Xi’an, China

Title: Tensorized Soft Label Learning Based on Orthogonal NMF

Abstract:
Recently, a strong interest has been in multiview high-dimensional data collected through cross-domain or various feature extraction mechanisms. Nonnegative matrix factorization (NMF) is an effective method for clustering these high-dimensional data with clear physical significance. However, existing multiview clustering based on NMF only measures the difference between the elements of the coefficient matrix without considering the spatial structure relationship between the elements. And they often require postprocessing to achieve clustering, making the algorithms unstable. To address this issue, we propose minimizing the Schatten p-norm of the tensor, which consists of a coefficient matrix of different views. This approach considers each element’s spatial structure in the coefficient matrices, crucial for effectively capturing complementary information presented in different views. Furthermore, we apply orthogonal constraints to the cluster index matrix to make it sparse and provide a strong interpretation of the clustering. This allows us to obtain the cluster label directly without any postprocessing. To distinguish the importance of different views, we utilize adaptive weights to assign varying weights to each view. We introduce an unsupervised optimization scheme to solve and analyze the computational complexity of the model. Through comprehensive evaluations of six benchmark datasets and comparisons with several multiview clustering algorithms, we empirically demonstrate the superiority of our proposed method.

PaperID: 351,

Authors: Ruida Xi, Nianchang Huang, Changzhou Lai, Qiang Zhang, Jungong Han

Affiliations: State Key Laboratory of Electromechanical Integrated Manufacturing of High-Performance Electronic Equipments, and the Center for Complex Systems, School of Mechano-Electronic Engineering, Xidian University, Xi’an, Shaanxi, China; State Key Laboratory of Electromechanical Integrated Manufacturing of High-Performance Electronic Equipments, the Center for Complex Systems, School of Mechano-Electronic Engineering, Xidian University, Xi’an, Shaanxi, China; Department of Automation, Tsinghua University, Beijing, China

Title: FMCNet+: Feature-Level Modality Compensation for Visible-Infrared Person Re-Identification

Abstract:
For visible-infrared person re-identification (VI-ReID), current models that compensate modality-specific information strive to generate missing modality images from existing ones to bridge the cross-modality discrepancies. Despite that, those generated images often suffer from low qualities due to the significant modality gap and include interfering information, e.g., inconsistent colors, thus severely degrading the subsequent VI-ReID performance. Alternatively, we propose a feature-level modality compensation network, i.e., FMCNet+, for VI-ReID in this article as an improved version of our previous work (FMCNet). The core of FMCNet+ is to compensate for the missing modality-specific information at the feature level, rather than at the image level, enabling our model to generate more person-related and discriminative modality-specific features for VI-ReID. Concretely, FMCNet+ aims to progressively generate missing modality-specific features by fully exploring the relationships among single-modality features, modality-shared features, and modality-specific features, instead of directly generating them through a generative adversarial way as in the previous FMCNet. To this end, three modules, i.e., single-modality feature decomposition (SFD), modality characteristic dictionary learning (MCDL), and missing modality-specific feature compensation (MMFC), are incorporated in FMCNet+. Experimental results demonstrate the superiority of our proposed FMCNet+ over existing ones, especially for those that compensate for modality-specific information at the image level. Our intriguing findings highlight the necessity of feature-level modality compensation in VI-ReID. Our code and pre-trained models will be released on https://github.com/jssyzsfzy/FMCNet_series.

PaperID: 352,

Authors: Yunhao Gao, Mengmeng Zhang, Wei Li, Ran Tao

Affiliations: School of Information and Electronics, Beijing Institute of Technology, Beijing, China

Title: Distribution-Independent Domain Generalization for Multisource Remote Sensing Classification

Abstract:
The availability of multisource remote sensing data provides the possibility for comprehensive observation. Convolutional neural networks (CNNs) naturally integrate multisource feature extractors and classifiers into an end-to-end multilayer design. However, CNN assumes data are independent and identically distributed. In practice, it is not always possible to access the labels or even data of the testing scenes. Therefore, the CNN-based methods have exposed its limitation on generalization ability. To solve the issue, a feature-distribution-independent network (FDINet) is designed for multisource remote sensing cross-domain classification without feature alignment and decoupling operations. On one hand, an elegantly designed baseline is used for extracting multisource cross-domain features. The baseline extracts the common line and texture features through shallow weight-sharing networks. More importantly, the modality prediction probability is used to measure the similarity between the source domains and the target domains, thereby improving cross-domain collaboration capabilities. On the other hand, the sharpness-aware feature discriminating (SAFD) strategy is developed for model optimization. Specifically, the generalization ability is improved by minimizing the sharpness of local optima. To avoid the decrease in feature discrimination caused by the gradient conflict between sharpness and overall loss, the discrimination constraints are designed to balance feature discrimination and generalization ability. Comprehensive experiments are conducted on two datasets, which demonstrate that the proposed FDINet outperforms other competitors in terms of quantitative and qualitative analyses.

PaperID: 353,

Authors: Yubin Xiao, Di Wang, Boyang Li, Huanhuan Chen, Wei Pang, Xuan Wu, Hao Li, Dong Xu, Yanchun Liang, You Zhou

Affiliations: Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China; Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly, Nanyang Technological University, Nanyang Avenue, Singapore; College of Computing and Data Science, Nanyang Technological University, Nanyang Avenue, Singapore; School of Computer Science, University of Science and Technology of China, Hefei, China; School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, U.K.; College of Computer, National University of Defense Technology, Changsha, Hunan, China; Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA; School of Computer Science, Zhuhai College of Science and Technology, Zhuhai, China

Title: Reinforcement Learning-Based Nonautoregressive Solver for Traveling Salesman Problems

Abstract:
The traveling salesman problem (TSP) is a well-known combinatorial optimization problem (COP) with broad real-world applications. Recently, neural networks (NNs) have gained popularity in this research area because as shown in the literature, they provide strong heuristic solutions to TSPs. Compared to autoregressive neural approaches, nonautoregressive (NAR) networks exploit the inference parallelism to elevate inference speed but suffer from comparatively low solution quality. In this article, we propose a novel NAR model named NAR4TSP, which incorporates a specially designed architecture and an enhanced reinforcement learning (RL) strategy. To the best of our knowledge, NAR4TSP is the first TSP solver that successfully combines RL and NAR networks. The key lies in the incorporation of NAR network output decoding into the training process. NAR4TSP efficiently represents TSP-encoded information as rewards and seamlessly integrates it into RL strategies, while maintaining consistent TSP sequence constraints during both training and testing phases. Experimental results on both synthetic and real-world TSPs demonstrate that NAR4TSP outperforms five state-of-the-art (SOTA) models in terms of solution quality, inference speed, and generalization to unseen scenarios.

PaperID: 354,

Authors: Jinyang Liu, Shutao Li, Lishan Tan, Renwei Dian

Title: Denoiser Learning for Infrared and Visible Image Fusion

Abstract:
Infrared image (IR) and visible image (VI) fusion creates fusion images that contain richer information and gain improved visual effects. Existing methods generally use the operators of manual design, such as intensity and gradient operators, to mine the image information. However, it is hard for them to achieve a complete and accurate description of information, which limits the image fusion performance. To this end, a novel information measurement method is proposed to achieve IR and VI fusion. Its core idea is to guide a generator in achieving image fusion by learning the denoisers. Specifically, by using denoisers to restore fusion images with different noise interference to source images, a mutual competition relationship is formed between denoisers, which helps the generator thoroughly explore the data specificity of the source images and guide it to achieve more accurate feature representation. In addition, a semantic adaptive measurement loss function is proposed to constrain the generator, which fuses semantic information adaptively by considering the semantic information density of different source images. The results of quantitative and qualitative experiments have shown that the proposed method can achieve a higher quality information fusion and has a faster fusion speed on three public datasets when compared with advanced methods.

PaperID: 355,

Authors: Fan Zhang, Huiying Liu, Qing Cai, Chun-Mei Feng, Binglu Wang, Shanshan Wang, Junyu Dong, David Zhang

Affiliations: School of Energy and Electrical Engineering, Chang’an University, Xi’an, Shaanxi, China; School of Automation, Northwestern Polytechnical University, Xi’an, Shaanxi, China; School of Information Science and Engineering, Ocean University of China, Qingdao, Shandong, China; Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Fusionopolis, Singapore; Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; School of Data Science, Shenzhen Research Institute of Artificial Intelligence and Robotics, and the Linklogis Joint Laboratory of Computer Vision and Artificial Intelligence, The Chinese University of Hong Kong, Shenzhen, Guangdong, China

Title: Federated Cross-Incremental Self-Supervised Learning for Medical Image Segmentation

Abstract:
Federated cross learning has shown impressive performance in medical image segmentation. However, it encounters the catastrophic forgetting issue caused by data heterogeneity across different clients and is particularly pronounced when simultaneously facing pixelwise label deficiency problem. In this article, we propose a novel federated cross-incremental self-supervised learning method, coined FedCSL, which not only can enable any client in the federation incrementally yet effectively learn from others without inducing knowledge forgetting or requiring massive labeled samples, but also preserve maximum data privacy. Specifically, to overcome the catastrophic forgetting issue, a novel cross-incremental collaborative distillation (CCD) mechanism is proposed, which distills explicit knowledge learned from previous clients to subsequent clients based on secure multiparty computation (MPC). Besides, an effective retrospect mechanism is designed to rearrange the training sequence of clients per round, further releasing the power of CCD by enforcing interclient knowledge propagation. In addition, to alleviate the need of large-scale densely annotated pretraining medical datasets, we also propose a two-stage training framework, in which federated cross-incremental self-supervised pretraining paradigm first extracts robust yet general image-level patterns across multi-institutional data silos via a novel round-robin distributed masked image modeling (MIM) pipeline; then, the resulting visual concepts, e.g., semantics, are transferred to the federated cross-incremental supervised fine-tuning paradigm, favoring various cross-silo medical image segmentation tasks. The experimental results on public datasets demonstrate the effectiveness of the proposed method as well as the consistently superior performance of our method over most state-of-the-art methods quantitatively and qualitatively.

PaperID: 356,

Authors: Lin Ma, Liang Hu, Yonghao Li, Weiping Ding, Wanfu Gao

Affiliations: Department of Computer Science and Technology, Jilin University, Changchun, Jilin, China; School of Computing and Artificial Intelligence, Southwestern University of Finance and Economics, Chengdu, Sichuan, China; School of Artificial Intelligence and Computer Science, Nantong University, Nantong, China

Title: MI-MCF: A Mutual Information-Based Multilabel Causal Feature Selection

Abstract:
Multilabel causal feature selection has attracted extensive attention in recent years. Current multilabel causal feature selection algorithms typically employ existing Markov Blanket (MB) search methods for the initial construction of the MB, followed by further optimization. These methods generally treat labels and features as equally weighted nodes during the MB construction process. However, the search for spouse sets often involves extensive conditional independence (CI) tests, which are time-consuming. Furthermore, they fail to consider the distinct contributions of labels and features to the target nodes. Information theory is often used to evaluate the contributions of nodes. Inspired by this, we carry out a theoretical investigation into the causal relationships within multilabel datasets and propose the mutual information-based multilabel causal feature selection (MI-MCF) method. First, MI-MCF employs MI and conditional MI (CMI) instead of CI test when constructing the MB of labels without incurring significant time overhead. Then, MI-MCF uses MI to compare the contributions of features and labels to the target nodes. This helps identify which nodes should be retained when recovering features hindered by strong label correlation. Finally, MI-MCF eliminates spurious nodes through a symmetry check. Experiments on real-world datasets demonstrate that MI-MCF can autonomously determine the optimal number of selected features and consistently outperform compared methods. The code is available at https://github.com/malinjlu/MI-MCF.

PaperID: 357,

Authors: Guokai Hao, Yuanzheng Li, Yang Li, Lin Jiang, Zhigang Zeng

Affiliations: School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China; School of Electrical Engineering, Northeast Electric Power University, Jilin City, China; Department of Electrical Engineering and Electronics, University of Liverpool, Liverpool, U.K.

Title: Lyapunov-Based Safe Reinforcement Learning for Microgrid Energy Management

Abstract:
The rapid development of renewable energy sources (RESs) has led to their increased integration into microgrids (MGs), emphasizing the need for safe and efficient energy management in MG operations. We investigate the methods of MG energy management, primarily categorized into model-based and model-free approaches. Due to a lack of incremental knowledge, model-based methods need to be reengineered for new scenarios during the optimization process, leading to reduced computational efficiency. In contrast, model-free methods can obtain incremental knowledge via trial-and-error in the training phase, and output energy management scheme rapidly. However, ensuring the safety of the scheme during the training phases poses significant challenges. To address these challenges, we propose a safe reinforcement learning (SRL) framework. The proposed SRL framework initially includes a safety assessment optimization model (SAOM) to evaluate scheme constraints and refine unsafe schemes for ensuring MG safety. Subsequently, based on SAOM, the MG energy management issue is formulated as an assess-based constrained Markov decision process (A-CMDP), enabling the SRL can be adopted in this issue. After that, we adopt a Lyapunov-based safety policy optimization for agent policy learning to ensure that policy updates are confined within a safe boundary, theoretically ensuring the safety of the MG throughout the learning process. Numerical studies highlight the superior performance of our proposed method. Specifically, the SRL framework effectively learns energy management policy, ensures MG safety, and demonstrates outstanding outcomes in the economic operation of MG.

PaperID: 358,

Authors: Yang Liu, Xinshuo Wang, Xinbo Gao, Jungong Han, Ling Shao

Affiliations: School of Telecommunications Engineering, Xidian University, Xi’an, China; Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China; Department of Computer Science, The University of Sheffield, Yorkshire, U.K.; UCAS-Terminus AI Laboratory, University of Chinese Academy of Sciences, Beijing, China

Title: Concept-Aware Graph Convolutional Network for Compositional Zero-Shot Learning

Abstract:
Compositional zero-shot learning (CZSL) aims to identify unobservable compositional concepts with prior knowledge of known primitives (attributes and objects). Due to distribution differences between seen and unseen components, existing methods for CZSL often ignore intrinsic variations between primitives and suffer from domain bias problems. To address this challenge, we proposed a concept-aware graph convolutional network (GCN) that utilizes cross-attentions to extract features unique to attributes and objects from paired concept-sharing inputs. The proposed model utilizes the cosine similarity between visual features and synthetic embeddings to estimate the feasibility score for each unseen composition. This score is then employed as a weight in the graph adjacency matrix. Additionally, the proposed model incorporates the Earth mover’s distance (EMD) to further limit the concept of learning interest in disentanglers. Experimental results on three challenging dataset benchmarks, including UT-Zappos 50K, C-GQA, and MIT-States, demonstrate that the proposed model outperforms prior work in both closed- and open-world CZSL (OW-CZSL).

PaperID: 359,

Authors: Jintang Bian, Yixiang Lin, Xiaohua Xie, Chang-Dong Wang, Lingxiao Yang, Jian-Huang Lai, Feiping Nie

Affiliations: School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou, China; School of Computer Science, School of Artificial Intelligence, Optics and Electronics (iOPEN) and the Key Laboratory of Intelligent Interaction and Applications, Ministry of Industry and Information Technology, Northwestern Polytechnical University, Xi’an, China

Title: Multilevel Contrastive Multiview Clustering With Dual Self-Supervised Learning

Abstract:
Multiview clustering (MVC) aims to integrate multiple related but different views of data to achieve more accurate clustering performance. Contrastive learning has found many applications in MVC due to its successful performance in unsupervised visual representation learning. However, existing MVC methods based on contrastive learning overlook the potential of high similarity nearest neighbors as positive pairs. In addition, these methods do not capture the multilevel (i.e., cluster, instance, and prototype levels) representational structure that naturally exists in multiview datasets. These limitations could further hinder the structural compactness of learned multiview representations. To address these issues, we propose a novel end-to-end deep MVC method called multilevel contrastive MVC (MCMC) with dual self-supervised learning (DSL). Specifically, we first treat the nearest neighbors of an object from the latent subspace as the positive pairs for multiview contrastive loss, which improves the compactness of the representation at the instance level. Second, we perform multilevel contrastive learning (MCL) on clusters, instances, and prototypes to capture the multilevel representational structure underlying the multiview data in the latent space. In addition, we learn consistent cluster assignments for MVC by adopting a DSL method to associate different level structural representations. The evaluation experiment showed that MCMC can achieve intracluster compactness, intercluster separability, and higher accuracy (ACC) in clustering performance. Our code is available at https://github.com/bianjt-morning/MCMC.

PaperID: 360,

Authors: Yi Liu, Song Guo, Jie Zhang, Yufeng Zhan, Qihua Zhou, Yingchun Wang

Affiliations: Department of Computing, The Hong Kong Polytechnic University, Hong Kong, SAR, China; Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, SAR, China; Beijing Institute of Technology, Beijing, China; College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China; Huawei Technologies Company Ltd., Hangzhou, China

Title: Feature Correlation-Guided Knowledge Transfer for Federated Self-Supervised Learning

Abstract:
Extensive attention has been paid to the application of self-supervised learning (SSL) approaches on federated learning (FL) to tackle the label scarcity problem. Previous works on federated SSL (FedSSL) generally fall into two categories: parameter-based model aggregation or data-based feature sharing to achieve knowledge transfer among multiple unlabeled clients. Despite the progress, they inevitably rely on some assumptions, such as homogeneous models or the existence of an additional public dataset, which hinder the universality of the training frameworks for more general scenarios (e.g., unlabeled clients with heterogeneous models). Therefore, in this article, we propose a novel and general method named federated self-supervised learning with feature-correlation-based aggregation (FedFoA) to tackle the above limitations. By exchanging feature correlation instead of model parameters or feature mappings, our approach reduces the discrepancies of local representations learning processes, thus promoting collaboration between heterogeneous clients. A factorization-based method is designed to extract the cross-feature relation matrix from local representations, which serves as a knowledge medium for the aggregation phase. We demonstrate that FedFoA is a heterogeneity-supportive and privacy-preserving training framework and can be easily compatible with state-of-the-art FedSSL methods. Extensive empirical experiments demonstrate our proposed approach outperforms the state-of-the-art methods by a significant margin.

PaperID: 361,

Authors: Tongcun Liu, Xukai Bao, Jiaxin Zhang, Kai Fang, Hailin Feng

Affiliations: College of Mathematics and Computer Science, Zhejiang A&F University, Hangzhou, China; State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China

Title: Enhancing Session-Based Recommendation With Multi-Interest Hyperbolic Representation Networks

Abstract:
Session-based recommendation (SBR) aims to predict the next item a user might click within an ongoing session, without relying on user profiles or historical data. Modern approaches typically use graph networks to learn item embeddings in Euclidean space via graph convolution operations. However, they often struggle to capture the diversity of user interactions within short, hierarchically structured sessions, which is essential for accurate predictions in SBR. To tackle these challenges, we propose a multi-interest hyperbolic representation network (MIHRN) to enhance the performance of SBR by adeptly modeling both intricate high-order spatial structures and sequence relationships among items in hyperbolic geometry space. Specifically, we use a hyperbolic hypergraph neural network to exploit the high-order spatial relationships and local clustering structures inherent within sessions. Subsequently, a multiaspect interest representation module is designed to articulate the diversity of user interests. Extensive experiments on three real-world datasets demonstrate that the proposed method achieves performance improvements of 23.81%, 14.81%, and 36.84%, respectively, under the P@10 metric.

PaperID: 362,

Authors: Hao Dong, Gaëtan Frusque, Yue Zhao, Eleni N. Chatzi, Olga Fink

Affiliations: Department of Civil, Environmental and Geomatic Engineering, ETH Zürich, Zürich, Switzerland; Chair of Intelligent Maintenance and Operation Systems, EPFL, Lausanne, Switzerland; Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA, USA

Title: NNG-Mix: Improving Semi-Supervised Anomaly Detection With Pseudo-Anomaly Generation

Abstract:
Anomaly detection (AD) is essential in identifying rare and often critical events in complex systems, finding applications in fields such as network intrusion detection, financial fraud detection, and fault detection in infrastructure and industrial systems. While AD is typically treated as an unsupervised learning task due to the high cost of label annotation, it is more practical to assume access to a small set of labeled anomaly samples from domain experts, as is the case for semi-supervised AD. Semi-supervised and supervised approaches can leverage such labeled data, resulting in improved performance. In this article, rather than proposing a new semi-supervised or supervised approach for AD, we introduce a novel algorithm for generating additional pseudo-anomalies on the basis of the limited labeled anomalies and a large volume of unlabeled data. This serves as an augmentation to facilitate the detection of new anomalies. Our proposed algorithm, named nearest neighbor Gaussian mix-up (NNG-Mix), efficiently integrates information from both labeled and unlabeled data to generate pseudo-anomalies. We compare the performance of this novel algorithm with commonly applied augmentation techniques, such as Mixup and Cutout. We evaluate NNG-Mix by training various existing semi-supervised and supervised AD algorithms on the original training data along with the generated pseudo-anomalies. Through extensive experiments on 57 benchmark datasets in ADBench, reflecting different data types, we demonstrate that NNG-Mix outperforms other data augmentation methods. It yields significant performance improvements compared to the baselines trained exclusively on the original training data. Notably, NNG-Mix yields up to 16.4%, 8.8%, and 8.0% improvements on Classical, CV, and NLP datasets in ADBench. Our source code is available at https://github.com/donghao51/NNG-Mix.

PaperID: 363,

Authors: Ziyi Zhang, Mingxuan Ouyang, Wanyu Lin, Hao Lan, Lei Yang

Affiliations: School of Software Engineering, South China University of Technology, Guangzhou, China; Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong, China; Department of Computer Science and Technology, Tsinghua University, Beijing, China

Title: Debiasing Graph Representation Learning Based on Information Bottleneck

Abstract:
Graph representation learning has shown superior performance in numerous real-world applications, such as finance and social networks. Nevertheless, most existing works might make discriminatory predictions due to insufficient attention to fairness in their decision-making processes. This oversight has prompted a growing focus on fair representation learning. Among recent explorations on fair representation learning, prior works based on the adversarial learning usually induce unstable or counterproductive performance. To achieve fairness in a stable manner, we present the design and implementation of graph representation learning based on fairness information bottleneck (GRAFair), a new framework based on a variational graph autoencoder (VGAE). The crux of GRAFair is the conditional fairness bottleneck (CFB), where the objective is to capture the trade-off between the utility of representations and sensitive information of interest. By applying variational approximation, we can make the optimization objective tractable. Particularly, GRAFair can be trained to produce informative representations of tasks while containing little sensitive information without adversarial training. Experiments on various real-world datasets demonstrate the effectiveness of our proposed method in terms of fairness, utility, robustness, and stability.

PaperID: 364,

Authors: Benjamin Devillers, Léopold Maytié, Rufin VanRullen

Affiliations: CerCo, CNRS UMR , Université de Toulouse, Toulouse, France

Title: Semi-Supervised Multimodal Representation Learning Through a Global Workspace

Abstract:
Recent deep learning models can efficiently combine inputs from different modalities (e.g., images and text) and learn to align their latent representations or to translate signals from one domain to another (as in image captioning or text-to-image generation). However, current approaches mainly rely on brute-force supervised training over large multimodal datasets. In contrast, humans (and other animals) can learn useful multimodal representations from only sparse experience with matched cross-modal data. Here, we evaluate the capabilities of a neural network architecture inspired by the cognitive notion of a “global workspace” (GW): a shared representation for two (or more) input modalities. Each modality is processed by a specialized system (pretrained on unimodal data and subsequently frozen). The corresponding latent representations are then encoded to and decoded from a single shared workspace. Importantly, this architecture is amenable to self-supervised training via cycle-consistency: encoding-decoding sequences should approximate the identity function. For various pairings of vision-language modalities and across two datasets of varying complexity, we show that such an architecture can be trained to align and translate between two modalities with very little need for matched data (from four to seven times less than a fully supervised approach). The GW representation can be used advantageously for downstream classification and cross-modal retrieval tasks and for robust transfer learning. Ablation studies reveal that both the shared workspace and the self-supervised cycle-consistency training are critical to the system’s performance.

PaperID: 365,

Authors: Xiangjie Kong, Wenyi Zhang, Hui Wang, Mingliang Hou, Xin Chen, Xiaoran Yan, Sajal K. Das

Affiliations: College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China; School of Software, Dalian University of Technology, Dalian, China; Research Institute of Artificial Intelligence, Zhejiang Lab, Hangzhou, China; Department of Computer Science, Missouri University of Science and Technology, Rolla, MO, USA

Title: Federated Graph Anomaly Detection via Contrastive Self-Supervised Learning

Abstract:
Attribute graph anomaly detection aims to identify nodes that significantly deviate from the majority of normal nodes, and has received increasing attention due to the ubiquity and complexity of graph-structured data in various real-world scenarios. However, current mainstream anomaly detection methods are primarily designed for centralized settings, which may pose privacy leakage risks in certain sensitive situations. Although federated graph learning offers a promising solution by enabling collaborative model training in distributed systems while preserving data privacy, a practical challenge arises as each client typically possesses a limited amount of graph data. Consequently, naively applying federated graph learning directly to anomaly detection tasks in distributed environments may lead to suboptimal performance results. We propose a federated graph anomaly detection framework via contrastive self-supervised learning (CSSL) [federated CSSL anomaly detection framework (FedCAD)] to address these challenges. FedCAD updates anomaly node information between clients via federated learning (FL) interactions. First, FedCAD uses pseudo-label discovery to determine the anomaly node of the client preliminarily. Second, FedCAD employs a local anomaly neighbor embedding aggregation strategy. This strategy enables the current client to aggregate the neighbor embeddings of anomaly nodes from other clients, thereby amplifying the distinction between anomaly nodes and their neighbor nodes. Doing so effectively sharpens the contrast between positive and negative instance pairs within contrastive learning, thus enhancing the efficacy and precision of anomaly detection through such a learning paradigm. Finally, the efficiency of FedCAD is demonstrated by experimental results on four real graph datasets.

PaperID: 366,

Authors: Chao Li, Jia Ning, Han Hu, Kun He

Affiliations: School of Computer Science, Huazhong University of Science and Technology, Wuhan, China; Microsoft Research Asia, Beijing, China

Title: Adaptive Channel Allocation for Robust Differentiable Architecture Search

Abstract:
Differentiable architecture search (DARTS) has attracted much attention due to its simplicity and significant improvement in efficiency. However, the excessive accumulation of the skip connection, when training epochs become large, makes it suffer from weak stability and low robustness, thus limiting its practical applications. Many works have attempted to restrict the accumulation of skip connections by indicators or manual design. These methods, however, are susceptible to human priors and hyperparameters. In this work, we suggest a more subtle and direct approach that no longer explicitly searches for skip connections in the search stage, based on the paradox that skip connections were proposed to guarantee the performance of very deep networks, but the networks in the search stage of DARTS are actually very shallow. Instead, by introducing channel importance ranking and channel allocation strategy, the skip connections are implicitly searched and automatically refilled unimportant channels in the evaluation stage. Our method, dubbed adaptive channel allocation (ACA) strategy, is a general-purpose approach for DARTS, which universally works in DARTS variants without introducing human priors, indicators, or hyperparameters. Extensive experiments on various datasets and DARTS variants verify that the ACA strategy is the most effective one among the existing methods in improving robustness and dealing with the collapse issue when training epochs become large.

PaperID: 367,

Authors: Yifan Wang, Xiao Luo, Chong Chen, Xian-Sheng Hua, Ming Zhang, Wei Ju

Affiliations: School of Information Technology and Management, University of International Business and Economics, Beijing, China; Department of Computer Science, University of California at Los Angeles, Los Angeles, CA, USA; Terminus Group, Beijing, China; School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University-Anker Embodied AI Laboratory, Peking University, Beijing, China; College of Computer Science, Sichuan University, Chengdu, China

Title: DisenSemi: Semi-Supervised Graph Classification via Disentangled Representation Learning

Abstract:
Graph classification is a critical task in numerous multimedia applications, where graphs are employed to represent diverse types of multimedia data, including images, videos, and social networks. Nevertheless, in the real world, labeled graph data are always limited or scarce. To address this issue, we focus on the semi-supervised graph classification task, which involves both supervised and unsupervised models learning from labeled and unlabeled data. In contrast to recent approaches that transfer the entire knowledge from the unsupervised model to the supervised one, we argue that an effective transfer should only retain the relevant semantics that align well with the supervised task. We introduce a novel framework termed DisenSemi in this article, which learns disentangled representation for semi-supervised graph classification. Specifically, a disentangled graph encoder is proposed to generate factorwise graph representations for both supervised and unsupervised models. Then, we train two models via supervised objective and mutual information (MI)-based constraints, respectively. To ensure the meaningful transfer of knowledge from the unsupervised encoder to the supervised one, we further define an MI-based disentangled consistency regularization between two models and identify the corresponding rationale that aligns well with the current graph classification task. Experiments conducted on various publicly available datasets demonstrate the effectiveness of our DisenSemi.

PaperID: 368,

Authors: Tianhao Zhao, Xiaoyang Guo, Yutian Lin, Bo Du

Affiliations: School of Computer Science, Hubei Luojia Laboratory, Wuhan University, Wuhan, China

Title: MixIR: Mixing Input and Representations for Contrastive Learning

Abstract:
Recently, contrastive learning has shown significant progress in learning visual representations from unlabeled data. The core idea is training the backbone to be invariant to different augmentations of an instance. While most methods only maximize the feature similarity between two augmented data, we further generate more challenging training samples and force the model to keep predicting aggregated representation on these hard samples. In this article, we propose MixIR, a mixture-based approach upon the traditional Siamese network. On the one hand, we input two augmented images of an instance to the backbone and obtain the aggregated representation by performing an elementwise maximum of two features. On the other hand, we take the mixture of these augmented images as input and expect the model prediction to be close to the aggregated representation. In this way, the model could access more variant data samples of an instance and keep predicting invariant representations for them. Thus, the learned model is more discriminative compared with previous contrastive learning methods. Extensive experiments on large-scale datasets show that MixIR steadily improves the baseline and achieves competitive results with state-of-the-art methods. Our code is available at https://github.com/happytianhao/MixIR.

PaperID: 369,

Authors: Jintao Huang, Chuangquan Chen, Chi-Man Vong, Yiu-Ming Cheung

Affiliations: Department of Computer Science, Hong Kong Baptist University, Hong Kong, SAR, China; School of Electronics and Information Engineering, Wuyi University, Jiangmen, China; Department of Computer and Information Science, University of Macau, Macau, SAR, China

Title: Broad Multitask Learning System With Group Sparse Regularization

Abstract:
The broad learning system (BLS) featuring lightweight, incremental extension, and strong generalization capabilities has been successful in its applications. Despite these advantages, BLS struggles in multitask learning (MTL) scenarios with its limited ability to simultaneously unravel multiple complex tasks where existing BLS models cannot adequately capture and leverage essential information across tasks, decreasing their effectiveness and efficacy in MTL scenarios. To address these limitations, we proposed an innovative MTL framework explicitly designed for BLS, named group sparse regularization for broad multitask learning system using related task-wise (BMtLS-RG). This framework combines a task-related BLS learning mechanism with a group sparse optimization strategy, significantly boosting BLS’s ability to generalize in MTL environments. The task-related learning component harnesses task correlations to enable shared learning and optimize parameters efficiently. Meanwhile, the group sparse optimization approach helps minimize the effects of irrelevant or noisy data, thus enhancing the robustness and stability of BLS in navigating complex learning scenarios. To address the varied requirements of MTL challenges, we presented two additional variants of BMtLS-RG: BMtLS-RG with sharing parameters of feature mapped nodes (BMtLS-RGf), which integrates a shared feature mapping layer, and BMtLS-RGf and enhanced nodes (BMtLS-RGfe), which further includes an enhanced node layer atop the shared feature mapping structure. These adaptations provide customized solutions tailored to the diverse landscape of MTL problems. We compared BMtLS-RG with state-of-the-art (SOTA) MTL and BLS algorithms through comprehensive experimental evaluation across multiple practical MTL and UCI datasets. BMtLS-RG outperformed SOTA methods in 97.81% of classification tasks and achieved optimal performance in 96.00% of regression tasks, demonstrating its superior accuracy and robustness. Furthermore, BMtLS-RG exhibited satisfactory training efficiency, outperforming existing MTL algorithms by 8.04–42.85 times.

PaperID: 370,

Authors: Ganchao Tan, Zengyu Wan, Yang Wang, Yang Cao, Zheng-Jun Zha

Affiliations: Department of Automation, University of Science and Technology of China, Hefei, China

Title: Tackling Event-Based Lip-Reading by Exploring Multigrained Spatiotemporal Clues

Abstract:
Automatic lip-reading (ALR) is the task of recognizing words based on visual information obtained from the speaker’s lip movements. In this study, we introduce event cameras, a novel type of sensing device, for ALR. Event cameras offer both technical and application advantages over conventional cameras for ALR due to their higher temporal resolution, less redundant visual information, and lower power consumption. To recognize words from the event data, we propose a novel multigrained spatiotemporal features learning framework, which is capable of perceiving fine-grained spatiotemporal features from microsecond time-resolved event data. Specifically, we first convert the event data into event frames of multiple temporal resolutions to avoid losing too much visual information at the event representation stage. Then, they are fed into a multibranch subnetwork where the branch operating on low-rate frames can perceive spatially complete but temporally coarse features, while the branch operating on high frame rate can perceive spatially coarse but temporally fine features. Thus, fine-grained spatial and temporal features can be simultaneously learned by integrating the features perceived by different branches. Furthermore, to model the temporal relationships in the event stream, we design a temporal aggregation subnetwork to aggregate the features perceived by the multibranch subnetwork. In addition, we collect two event-based lip-reading datasets (DVS-Lip and DVS-LRW100) for the study of the event-based lip-reading task. Experimental results demonstrate the superiority of the proposed model over the state-of-the-art event-based action recognition models and video-based lip-reading models.

PaperID: 371,

Authors: Xiaolong Tang, Shuo Ye, Yufeng Shi, Tianheng Hu, Qinmu Peng, Xinge You

Affiliations: School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China; th Research Institute of CETC, Shijiazhuang, China

Title: Filter Pruning Based on Information Capacity and Independence

Abstract:
Filter pruning has gained widespread adoption for the purpose of compressing and speeding up convolutional neural networks (CNNs). However, the existing approaches are still far from practical applications due to biased filter selection and heavy computation cost. This article introduces a new filter pruning method that selects filters in an interpretable, multiperspective, and lightweight manner. Specifically, we evaluate the contributions of filters from both individual and overall perspectives. For the amount of information contained in each filter, a new metric called information capacity is proposed. Inspired by the information theory, we utilize the interpretable entropy to measure the information capacity and develop a feature-guided approximation process. For correlations among filters, another metric called information independence is designed. Since the aforementioned metrics are evaluated in a simple but effective way, we can identify and prune the least important filters with less computation cost. We conduct comprehensive experiments on benchmark datasets employing various widely used CNN architectures to evaluate the performance of our method. For instance, on ILSVRC-2012, our method outperforms state-of-the-art methods by reducing floating-point operations (FLOPs) by 77.4% and parameters by 69.3% for ResNet-50 with only a minor decrease in an accuracy of 2.64%.

PaperID: 372,

Authors: Yujun Cheng, Zhewei Zhang, Xuejing Li, Shengjin Wang

Affiliations: Department of Electronic Information Engineering, Tsinghua University, Beijing, China

Title: Contrastive Unsupervised Representation Learning With Optimize-Selected Training Samples

Abstract:
Contrastive unsupervised representation learning (CURL) is a technique that seeks to learn feature sets from unlabeled data. It has found widespread and successful application in unsupervised feature learning, with the design of positive and negative pairs serving as the type of data samples. While CURL has seen empirical successes in recent years, there is still room for improvement in terms of the pair data generation process. This includes tasks such as combining and re-filtering samples, or implementing transformations among positive/negative pairs. We refer to this as the sample selection process. In this article, we introduce an optimized pair-data sample selection method for CURL. This method efficiently ensures that the two types of sampled data (similar pair and dissimilar pair) do not belong to the same class. We provide a theoretical analysis to demonstrate why our proposed method enhances learning performance by analyzing its error probability. Furthermore, we extend our proof into PAC-Bayes generalization to illustrate how our method tightens the bounds provided in previous literature. Our numerical experiments on text/image datasets show that our method achieves competitive accuracy with good generalization bounds.

PaperID: 373,

Authors: Haoran Li, Zhenwen Ren, Yulan Guo, Jiali You, Xiaojian You

Affiliations: School of Electronics and Communication Engineering, Shenzhen Campus of Sun Yat-sen University, Sun Yat-sen University, Shenzhen, China; Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China; School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China; School of National Defence Science and Technology, Southwest University of Science and Technology, Mianyang, China

Title: LSVC: A Lifelong Learning Approach for Stream-View Clustering

Abstract:
Multiview clustering (MVC) can achieve more accurate results by utilizing complementary information from multiple perspectives, compared to traditional single-view methods. However, current multiview techniques require all views to be available upfront, making them inadequate for dealing with prevalent data sources that arrive as streams, such as stem cell analysis and multicamera surveillance. To address this problem, in this article, we propose a method called lifelong stream-view clustering (LSVC), which comprises an embedding anchor knowledge library and three key components, enabling the capability to perform asynchronous clustering on stream views. These three components are specifically: 1) the knowledge extraction module that extracts the abstract knowledge of the newcome view over time and updates the shared knowledge library; 2) the knowledge transfer module that aligns the newcome view with the historical knowledge library, enabling the transfer of structure information to the knowledge library; and 3) the knowledge rule module that constraints the knowledge library to enjoy a fair amount of anchors for each cluster, improving the discrimination of knowledge. The experimental results show that LSVC outperforms traditional single-view clustering (SVC) and MVC methods as it gradually improves with the accumulation of stream views and tends to be stable over time.

PaperID: 374,

Authors: Chen Yang, Zhixi Feng, Shuyuan Yang, Qiukai Pan

Affiliations: School of Artificial Intelligence, Xidian University, Xi’an, China

Title: Open-ICL: Open-Set Modulation Classification via Incremental Contrastive Learning

Abstract:
Open-set modulation classification (OMC) of signals is a challenging task for handling “unknown” modulation types that are not included in the training dataset. This article proposes an incremental contrastive learning method for OMC, called Open-ICL, to accurately identify unknown modulation types of signals. First, a dual-path 1-D network (DONet) with a classification path (CLP) and a contrast path (COP) is designed to learn discriminative signal features cooperatively. In the COP, the deep features of the input signal are compared with the semantic feature centers (SFCs) of known classes calculated from the network, to infer its signal novelty. An unknown signal bank (USB) is defined to store unknown signals, and a novel moving intersection algorithm (MIA) is proposed to dynamically select reliable unknown signals for the USB. The “unknown” instances, together with SFCs, are continuously optimized and updated, facilitating the process of incremental learning. Furthermore, a dynamic adaptive threshold (DAT) strategy is proposed to enable Open-ICL to adaptively learn changing signal distributions. Extensive experiments are performed on two benchmark datasets, and the results demonstrate the effectiveness of Open-ICL for OMC.

PaperID: 375,

Authors: Jing Wang, Zhiqiang Kou, Yuheng Jia, Jianhui Lv, Xin Geng

Title: Label Distribution Learning by Exploiting Fuzzy Label Correlation

Abstract:
Researchers have proposed to exploit label correlation to alleviate the exponential-size output space of label distribution learning (LDL). In particular, some have designed LDL methods to consider local label correlation. These methods roughly partition the training set into clusters and then exploit local label correlation on each one. Each sample belongs to one cluster and therefore has only one local label correlation. However, in real-world scenarios, the training samples may have fuzziness and belong to multiple clusters with blended local label correlations, which challenge these works. To solve this problem, we propose in LDL fuzzy label correlation (FLC)—each sample blends, with fuzzy membership, multiple local label correlations. First, we propose two types of FLCs, i.e., fuzzy membership-induced label correlation (FC) and joint fuzzy clustering and label correlation (FCC). Then, we put forward LDL-FC and LDL-FCC to exploit these two FLCs, respectively. Finally, we conduct extensive experiments to justify that LDL-FC and LDL-FCC statistically outperform state-of-the-art LDL methods.

PaperID: 376,

Authors: Yingchun Wang, Song Guo, Jingcai Guo, Yuanhong Zhang, Weizhan Zhang, Qinghua Zheng, Jie Zhang

Affiliations: School of Computer Science and Technology, Ministry of Education Key Laboratory of Intelligent Networks and Network Security, Xi’an Jiaotong University, Xi’an, China; Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, SAR, China; Department of Computing, The Hong Kong Polytechnic University, Hong Kong, SAR, China; School of Computer Science and Technology, Shaanxi Province Key Laboratory of Big Data Knowledge Engineering, Xi’an Jiaotong University, Xi’an, China

Title: Data Quality-Aware Mixed-Precision Quantization via Hybrid Reinforcement Learning

Abstract:
Mixed-precision quantization mostly predetermines the model bit-width settings before actual training due to the non-differential bit-width sampling process, obtaining suboptimal performance. Worse still, the conventional static quality-consistent training setting, i.e., all data is assumed to be of the same quality across training and inference, overlooks data quality changes in real-world applications which may lead to poor robustness of the quantized models. In this article, we propose a novel data quality-aware mixed-precision quantization framework, dubbed DQMQ, to dynamically adapt quantization bit-widths to different data qualities. The adaption is based on a bit-width decision policy that can be learned jointly with the quantization training. Concretely, DQMQ is modeled as a hybrid reinforcement learning (RL) task that combines model-based policy optimization with supervised quantization training. By relaxing the discrete bit-width sampling to a continuous probability distribution that is encoded with few learnable parameters, DQMQ is differentiable and can be directly optimized end-to-end with a hybrid optimization target considering both task performance and quantization benefits. Trained on mixed-quality image datasets, DQMQ can implicitly select the most proper bit-width for each layer when facing uneven input qualities. Extensive experiments on various benchmark datasets and networks demonstrate the superiority of DQMQ against existing fixed/mixed-precision quantization methods.

PaperID: 377,

Authors: Zhiling Fu, Zhe Wang, Chengwei Yu, Xinlei Xu, Dongdong Li

Affiliations: Department of Computer Science and Engineering, Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China; Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China

Title: Double Confidence Calibration Focused Distillation for Task-Incremental Learning

Abstract:
Task-incremental learning methods that adopt knowledge distillation face two significant challenges: confidence bias and knowledge loss. These challenges make it difficult to effectively balance the stability and plasticity of the network in the incremental learning process. In this article, we propose double confidence calibration focused distillation (DCCFD) to address these challenges. We introduce intratask and intertask confidence calibration (ECC) modules that can mitigate network overconfidence during incremental learning and reduce the degree of feature representation bias. We also propose a focused distillation (FD) module that can alleviate the problem of knowledge loss during the task increment process, improving model stability without reducing plasticity. Experimental results on the CIFAR-100, TinyImageNet, and CORE-50 datasets demonstrate the effectiveness of our method, with performance that matches or exceeds the state of the art. Furthermore, our method can be used as a plug-and-play module to consistently improve class-incremental learning methods.

PaperID: 378,

Authors: Yuan Gao, Qian Zhao, Laurence T. Yang, Jing Yang, Lei Ren

Affiliations: School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou, China; School of Computer Science and Technology, Hainan University, Haikou, China; School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China; School of Automation Science and Electrical Engineering, Beihang University, Beijing, China

Title: Tensor-Representation-Based Multiview Attributed Graph Clustering With Smooth Structure

Abstract:
Over the past few years, multiview attributed graph clustering has achieved promising performance via various data augmentation strategies. However, we observe that the aggregation of node information in multilayer graph autoencoder (GAE) is prone to deviation, especially when edges or node attributes are randomly perturbed. To this end, we innovatively propose a tensor-representation-based multiview attributed graph clustering framework with smooth structure (MV_AGC) to avoid the bias caused by random view construction. Specifically, we first design a novel tensor-product-based high-order graph attention network (GAT) with structural constraints to realize efficient attribute fusion and semantic consistency encoding. By imposing attribute augmentation mechanisms and smooth constraints (SCs) on the proposed high-order graph attention autoencoder simultaneously, MV_AGC effectively eliminates the instability of reconstructed graph structures and learns a more compact node representation during training. In addition, we also theoretically analyze the stronger generality and expressiveness of the proposed tensor-product-based attention mechanism over the classical GAT and establish an intuitive connection between them. Furthermore, to address the performance degradation caused by clustering distribution updating, we further develop a simple yet effective clustering objective function-guided self-optimizing module for the final clustering performance improvement. Experimental results on the six benchmark datasets have demonstrated that our proposed method can achieve state-of-the-art clustering performance.

PaperID: 379,

Authors: Lanfan Jiang, Zilin Huang, Yu Chen, Wenxing Zhu

Affiliations: College of Computer and Data Science, Fuzhou University, Fuzhou, China; School of Computer and Big Data, Minjiang University, Fuzhou, China; Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou, China

Title: Iterative-Weighted Thresholding Method for Group-Sparsity-Constrained Optimization With Applications

Abstract:
Taking advantage of the natural grouping structure inside data, group sparse optimization can effectively improve the efficiency and stability of high-dimensional data analysis, and it has wide applications in a variety of fields such as machine learning, signal processing, and bioinformatics. Although there has been a lot of progress, it is still a challenge to construct a group sparse-inducing function with good properties and to identify significant groups. This article aims to address the group-sparsity-constrained minimization problem. We convert the problem to an equivalent weighted \ell _p,q -norm ( p\gt 0 , 0\lt q\leq 1 ) constrained optimization model, instead of its relaxation or approximation problem. Then, by applying the proximal gradient method, a solution method with theoretical convergence analysis is developed. Moreover, based on the properties proved in the Lagrangian dual framework, the homotopy technique is employed to cope with the parameter tuning task and to ensure that the output of the proposed homotopy algorithm is an L-stationary point of the original problem. The proposed weighted framework, with the central idea of identifying important groups, is compatible with a wide range of support set identification strategies, which can better meet the needs of different applications and improve the robustness of the model in practice. Both simulated and real data experiments demonstrate the superiority of the proposed method in terms of group feature selection accuracy and computational efficiency. Extensive experimental results in application areas such as compressed sensing, image recognition, and classifier design show that our method has great potential in a wide range of applications. Our codes will be available at https://github.com/jianglanfan/ HIWT-GSC.

PaperID: 380,

Authors: Jing Lai, Junlin Xiong, Yu Kang

Affiliations: School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, China; Department of Automation, University of Science and Technology of China, Hefei, China

Title: Value Iteration for Stochastic LQR With Convergence Guarantees

Abstract:
This brief studies the discounted stochastic linear quadratic regulator (LQR) problem for systems suffering from additive noise of unknown mean. A completely model-free (MF) value iteration (VI) algorithm is developed to learn the optimal control policy using off-line system trajectories. The generated control policies are proven to converge to a small neighborhood of the optimal ones with high probability. In addition, an MF algorithm is proposed to learn a feasible discount factor. The proposed MF algorithms are illustrated through several examples.

PaperID: 381,

Authors: Haoxuanye Ji, Le Wang, Sanping Zhou, Wei Tang, Nanning Zheng, Gang Hua

Affiliations: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, the National Engineering Research Center for Visual Information and Applications, and the Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, Shaanxi, China; Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA; Wormpex AI Research, Bellevue, WA, USA

Title: Meta Pairwise Relationship Distillation for Unsupervised Person Re-Identification

Abstract:
Unsupervised person re-identification (Re-ID) is challenging due to the lack of ground-truth labels. Most existing methods rely on pseudo labels estimated via iterative clustering and thus are highly susceptible to performance penalties incurred by the inaccurate estimated number of clusters. Alternatively, we utilize the sample pairs with pairwise pseudo labels to guide the feature learning to avoid the dilemma of determining cluster numbers. In this article, we propose a meta pairwise relationship distillation (MPRD) method that incorporates a graph convolutional network (GCN) to provide high-fidelity pairwise relationships to supervise the model training. A small amount of metadata with very-confidence pairwise relationships and the unlabeled pairs with the provided pseudo pairwise relationships participate in the GCN training. Besides, we introduce a hard sample deduction (HSD) module to timely mine the sample pairs with error-prone pairwise pseudo labels to mitigate the misled optimization by noisy labels. Furthermore, since the features of each positive pair represent the same person, we design a positive pair alignment (PPA) module to reduce the redundant information in each feature, which is achieved by minimizing the difference between each positive pair’s feature distributions. Extensive experiments on the Market-1501, DukeMTMC-reID, and MSMT17 datasets show that our method outperforms the state-of-the-art unsupervised methods.

PaperID: 382,

Authors: Xiaoliu Luo, Zhao Duan, Anyong Qin, Zhuotao Tian, Ting Xie, Taiping Zhang, Yuan Yan Tang

Affiliations: College of Science, Chongqing University of Technology, Chongqing, China; College of Artificial Intelligence, Chongqing Technology and Business University, Chongqing, China; School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China; School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China; College of Computer Science, Chongqing University, Chongqing, China; Faculty of Science and Technology, University of Macau, Macau, China

Title: Layer-Wise Mutual Information Meta-Learning Network for Few-Shot Segmentation

Abstract:
The goal of few-shot segmentation (FSS) is to segment unlabeled images belonging to previously unseen classes using only a limited number of labeled images. The main objective is to transfer label information effectively from support images to query images. In this study, we introduce a novel meta-learning framework called layer-wise mutual information (LayerMI), which enhances the propagation of label information by maximizing the mutual information (MI) between support and query features at each layer. Our approach involves the utilization of a LayerMI Block based on information-theoretic co-clustering. This block performs online co-clustering on the joint probability distribution obtained from each layer, generating a target-specific attention map. The LayerMI Block can be seamlessly integrated into the meta-learning framework and applied to all convolutional neural network (CNN) layers without altering the training objectives. Notably, the LayerMI Block not only maximizes MI between support and query features but also facilitates internal clustering within the image. Extensive experiments demonstrate that LayerMI significantly enhances the performance of baseline and achieves competitive performance compared to state-of-the-art methods on three challenging benchmarks: PASCAL- 5^i , COCO- 20^i , and FSS-1000.

PaperID: 383,

Authors: Shuqi Xu, Hao Zhang, Zhuping Wang

Affiliations: Shanghai Institute of Intelligent Science and Technology, Department of Control Science and Engineering, Tongji University, Shanghai, China

Title: Learning to Perform Trajectory Generation From Low-Quality Demonstrations

Abstract:
Human-robot skill transfer is an important means for robots to learn skills and has received more and more attention and research in recent years. Typically, to ensure effective skill transfer, a skill is demonstrated several times by a human, from which a robot learns the features contained in the demonstrations and reproduces the skill in a new environment. However, it is necessary to consider the cases such as errors in human demonstrations and sensor issues, resulting in imperfect demonstrations, unrelated data, information loss, and variations in the lengths and amplitudes of the demonstrations. Therefore, this brief proposes a new trajectory alignment and filtering method for extracting relatively useful information from multiple demonstrations. This method can be used in conjunction with most probabilistic movement learning methods (this brief uses probabilistic movement primitives (ProMPs) as an example) for learning from demonstrations (LfDs), so that the robot can eventually learn and generate trajectories for completing skills from multiple demonstrations of varying quality. The effectiveness of the proposed method is verified by simulation results.

PaperID: 384,

Authors: Licheng Jiao, Mengru Ma, Pei He, Xueli Geng, Xu Liu, Fang Liu, Wenping Ma, Shuyuan Yang, Biao Hou, Xu Tang

Title: Brain-Inspired Learning, Perception, and Cognition: A Comprehensive Review

Abstract:
The progress of brain cognition and learning mechanisms has provided new inspiration for the next generation of artificial intelligence (AI) and provided the biological basis for the establishment of new models and methods. Brain science can effectively improve the intelligence of existing models and systems. Compared with other reviews, this article provides a comprehensive review of brain-inspired deep learning algorithms for learning, perception, and cognition from microscopic, mesoscopic, macroscopic, and super-macroscopic perspectives. First, this article introduces the brain cognition mechanism. Then, it summarizes the existing studies on brain-inspired learning and modeling from the perspectives of neural structure, cognitive module, learning mechanism, and behavioral characteristics. Next, this article introduces the potential learning directions of brain-inspired learning from four aspects: perception, cognition, understanding, and decision-making. Finally, the top-ten open problems that brain-inspired learning, perception, and cognition currently face are summarized, and the next generation of AI technology has been prospected. This work intends to provide a quick overview of the research on brain-inspired AI algorithms and to motivate future research by illuminating the latest developments in brain science.

PaperID: 385,

Authors: Xiong Yang, Ding Wang

Affiliations: School of Electrical and Information Engineering, Tianjin University, Tianjin, China; Faculty of Information Technology, Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China

Title: Reinforcement Learning for Robust Dynamic Event-Driven Constrained Control

Abstract:
We consider a robust dynamic event-driven control (EDC) problem of nonlinear systems having both unmatched perturbations and unknown styles of constraints. Specifically, the constraints imposed on the nonlinear systems’ input could be symmetric or asymmetric. Initially, to tackle such constraints, we construct a novel nonquadratic cost function for the constrained auxiliary system. Then, we propose a dynamic event-triggering mechanism relied on the time-based variable and the system states simultaneously for cutting down the computational load. Meanwhile, we show that the robust dynamic EDC of original nonlinear-constrained systems could be acquired by solving the event-driven optimal control problem of the constrained auxiliary system. After that, we develop the corresponding event-driven Hamilton-Jacobi–Bellman equation, and then solve it through a unique critic neural network (CNN) in the reinforcement learning framework. To relax the persistence of excitation condition in tuning CNN’s weights, we incorporate experience replay into the gradient descent method. With the aid of Lyapunov’s approach, we prove that the closed-loop auxiliary system and the weight estimation error are uniformly ultimately bounded stable. Finally, two examples, including a nonlinear plant and the pendulum system, are utilized to validate the theoretical claims.

PaperID: 386,

Authors: Yue Liu, Sihang Zhou, Xihong Yang, Xinwang Liu, Wenxuan Tu, Liang Li, Xin Xu, Fuchun Sun

Affiliations: School of Computer, National University of Defense Technology, Changsha, China; College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China; Department of Computer Science and Technology, Tsinghua University, Beijing, China

Title: Improved Dual Correlation Reduction Network With Affinity Recovery

Abstract:
Deep graph clustering, which aims to reveal the underlying graph structure and divide the nodes into different clusters without human annotations, is a fundamental yet challenging task. However, we observe that the existing methods suffer from the representation collapse problem and tend to encode samples with different classes into the same latent embedding. Consequently, the discriminative capability of nodes is limited, resulting in suboptimal clustering performance. To address this problem, we propose a novel deep graph clustering algorithm termed improved dual correlation reduction network (IDCRN) through improving the discriminative capability of samples. Specifically, by approximating the cross-view feature correlation matrix to an identity matrix, we reduce the redundancy between different dimensions of features, thus improving the discriminative capability of the latent space explicitly. Meanwhile, the cross-view sample correlation matrix is forced to approximate the designed clustering-refined adjacency matrix to guide the learned latent representation to recover the affinity matrix even across views, thus enhancing the discriminative capability of features implicitly. Moreover, we avoid the collapsed representation caused by the oversmoothing issue in graph convolutional networks (GCNs) through an introduced propagation regularization term, enabling IDCRN to capture the long-range information with the shallow network structure. Extensive experimental results on six benchmarks have demonstrated the effectiveness and efficiency of IDCRN compared with the existing state-of-the-art deep graph clustering algorithms. The code of IDCRN is released at IDCRN. Besides, we share a collection of deep graph clustering, including papers, codes, and datasets at ADGC.

PaperID: 387,

Authors: Jiang Lu, Changming Xiao, Changshui Zhang

Affiliations: China Marine Development and Research Center (CMDRC), Beijing, China; Institute for Artificial Intelligence, Tsinghua (THUAI), the Beijing National Research Center for Information Science and Technologies (BNRist), and the Department of Automation, Tsinghua University, Beijing, China

Title: Meta-Modulation: A General Learning Framework for Cross-Task Adaptation

Abstract:
Building learning systems possessing adaptive flexibility to different tasks is critical and challenging. In this article, we propose a novel and general meta-learning framework, called meta-modulation (MeMo), to foster the adaptation capability of a base learner across different tasks where only a few training data are available per task. For one independent task, MeMo proceeds like a “feedback regulation system,” which achieves an adaptive modulation on the so-called definitive embeddings of query data to maximize the corresponding task objective. Specifically, we devise a type of efficient feedback information, definitive embedding feedback (DEF), to mathematize and quantify the unsuitability between the few training data and the base learner as well as the promising adjustment direction to reduce this unsuitability. The DEFs are encoded into high-level representation and temporarily stored as task-specific modulator templates by a modulation encoder. For coming query data, we develop an attention mechanism acting upon these modulator templates and combine both task/data-level modulation to generate the final data-specific meta-modulator. This meta-modulator is then used to modulate the query’s embedding for correct decision-making. Our framework is scalable for various base learner models like multi-layer perceptron (MLP), long short-term memory (LSTM), convolutional neural network (CNN), and transformer, and applicable to different learning problems like language modeling and image recognition. Experimental results on a 2-D point synthetic dataset and various benchmarks in language and vision domains demonstrate the effectiveness and competitiveness of our framework.

PaperID: 388,

Authors: Yunpeng Xiao, Jinsong Yang, Wanjing Zhao, Qian Li, Yucai Pang

Affiliations: School of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China

Title: Cross-Domain Social Rumor-Propagation Model Based on Transfer Learning

Abstract:
Rumors in different topic domains have different text characteristics but similar emotional tendencies. To resolve the scarce-data problem in some rumor-topic domains, this study proposes a cross-domain rumor-propagation model, which is based on transfer learning. First, given the diversity and complexity of the rumor-propagation landscape, this study introduces a novel method, User-Retweet–Rumor2vec (URR2vec), which leverages the power of representation learning to uncover latent features within rumor topics. It also displays the forwarding relationship between users and rumors, user node information, and rumor-topic information in low-dimensional space. To capture the impact of human emotional cognition during rumor spreading, we also introduce a deep-learning model based on the natural language texts of rumor topics, which analyzes the sentiment in the text and uncovers the emotional correlations among users. Furthermore, a rumor-propagation prediction model based on the text-sentiment analysis-graph convolutional network (TSA-GCN) is proposed and pre-trained on existing rumor-topic data to ensure its prediction accuracy. Finally, considering the data sparsity at a rumor-topic outbreak, the trained propagation model is transferred to the rumor topic for prediction. Meanwhile, the rumor topic in different domains has different edges and conditional distribution, similar emotional characteristics, and network structure among the rumor topics. After fine-tuning the parameter and adding a domain adaptation layer in TSA-GCN, a domain adaptation model based on parameter and graph-structure migration is obtained.

PaperID: 389,

Authors: Gengyu Lyu, Zhen Yang, Xiang Deng, Songhe Feng

Affiliations: Faculty of Information Technology, Beijing University of Technology, Beijing, China; School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China

Title: L-VSM: Label-Driven View-Specific Fusion for Multiview Multilabel Classification

Abstract:
In the task of multiview multilabel (MVML) classification, each instance is represented by several heterogeneous features and associated with multiple semantic labels. Existing MVML methods mainly focus on leveraging the shared subspace to comprehensively explore multiview consensus information across different views, while it is still an open problem whether such shared subspace representation is effective to characterize all relevant labels when formulating a desired MVML model. In this article, we propose a novel label-driven view-specific fusion MVML method named L-VSM, which bypasses seeking for a shared subspace representation and instead directly encodes the feature representation of each individual view to contribute to the final multilabel classifier induction. Specifically, we first design a label-driven feature graph construction strategy and construct all instances under various feature representations into the corresponding feature graphs. Then, these view-specific feature graphs are integrated into a unified graph by linking the different feature representations within each instance. Afterward, we adopt a graph attention mechanism to aggregate and update all feature nodes on the unified graph to generate structural representations for each instance, where both intraview correlations and interview alignments are jointly encoded to discover the underlying consensuses and complementarities across different views. Moreover, to explore the widespread label correlations in multilabel learning (MLL), the transformer architecture is introduced to construct a dynamic semantic-aware label graph and accordingly generate structural semantic representations for each specific class. Finally, we derive an instance-label affinity score for each instance by averaging the affinity scores of its different feature representations with the multilabel soft margin loss. Extensive experiments on various MVML applications have verified that our proposed L-VSM has achieved superior performance against state-of-the-art methods. The codes are available at https://gengyulyu.github.io/homepage/assets/codes/LVSM.zip.

PaperID: 390,

Authors: Qi Chen, Chao Li, Jia Ning, Stephen Lin, Kun He

Affiliations: School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China; Microsoft Research Asia, Beijing, China

Title: GMConv: Modulating Effective Receptive Fields for Convolutional Kernels

Abstract:
In convolutional neural networks (CNNs), the convolutions are conventionally performed using a square kernel with a fixed N × N receptive field (RF). However, what matters most to the network is the effective receptive field (ERF), which indicates the extent to which input pixels contribute to an output pixel. Inspired by the property that ERFs typically exhibit a Gaussian distribution, we propose a Gaussian Mask convolutional kernel (GMConv). Specifically, GMConv utilizes the Gaussian function to generate a concentric symmetry mask that is placed over the kernel to refine the RF. We analyze the RFs of CNN kernels in different CNN layers and evaluate our approach through extensive experiments on image classification and object detection tasks. Over several tasks and standard base models, our approach compares favorably against the standard convolution. For instance, using GMConv for AlexNet and ResNet-50, the top-1 accuracy on ImageNet classification is boosted by 0.98% and 0.85%, respectively.

PaperID: 391,

Authors: Sinan Tan, Kuankuan Sima, Dunzheng Wang, Mengmeng Ge, Di Guo, Huaping Liu

Affiliations: Department of Computer Science and Technology, Tsinghua University, Beijing, China; School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, China; School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China

Title: Self-Supervised 3-D Semantic Representation Learning for Vision-and-Language Navigation

Abstract:
In vision-and-language navigation (VLN) tasks, most current methods primarily utilize RGB images, overlooking the rich 3-D semantic data inherent to environments. To rectify this, we introduce a novel VLN framework that integrates 3-D semantic information into the navigation process. Our approach features a self-supervised training scheme that incorporates voxel-level 3-D semantic reconstruction to create a detailed 3-D semantic representation. A key component of this framework is a pretext task focused on region queries, which determines the presence of objects in specific 3-D areas. Following this, we devise an long short-term memory (LSTM)-based navigation model that is trained using our 3-D semantic representations. To maximize the utility of these 3-D semantic representations, we implement a cross-modal distillation strategy. This strategy encourages the RGB model’s outputs to emulate those from the 3-D semantic feature network, enabling the concurrent training of both branches to merge RGB and 3-D semantic data effectively. Comprehensive evaluations on both the R2R and R4R datasets reveal that our method significantly enhances performance in VLN tasks.

PaperID: 392,

Authors: Peng Xing, Hao Tang, Jinhui Tang, Zechao Li

Affiliations: School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China

Title: ADPS: Asymmetric Distillation Postsegmentation for Image Anomaly Detection

Abstract:
Knowledge distillation-based anomaly detection (KDAD) methods rely on the teacher–student paradigm to detect and segment anomalous regions by contrasting the unique features extracted by both networks. However, existing KDAD methods suffer from two main limitations: 1) the student network can effortlessly replicate the teacher network’s representations and 2) the features of the teacher network serve solely as a “reference standard” and are not fully leveraged. Toward this end, we depart from the established paradigm and instead propose an innovative approach called asymmetric distillation postsegmentation (ADPS). Our ADPS employs an asymmetric distillation paradigm that takes distinct forms of the same image as the input of the teacher–student networks, driving the student network to learn discriminating representations for anomalous regions. Meanwhile, a customized Weight Mask Block (WMB) is proposed to generate a coarse anomaly localization mask that transfers the distilled knowledge acquired from the asymmetric paradigm to the teacher network. Equipped with WMB, the proposed postsegmentation module (PSM) can effectively detect and segment abnormal regions with fine structures and clear boundaries. Experimental results demonstrate that the proposed ADPS outperforms the state-of-the-art methods in detecting and segmenting anomalies. Surprisingly, ADPS significantly improves average precision (AP) metric by \mathbf 9% and \mathbf 20% on the MVTec anomaly detection (AD) and KolektorSDD2 datasets, respectively.

PaperID: 393,

Authors: Man-Sheng Chen, Xi-Ran Zhu, Jia-Qi Lin, Chang-Dong Wang

Affiliations: School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China; School of Mathematics (Zhuhai), Sun Yat-sen University, Guangzhou, China

Title: Contrastive Multiview Attribute Graph Clustering With Adaptive Encoders

Abstract:
Multiview attribute graph clustering aims to cluster nodes into disjoint categories by taking advantage of the multiview topological structures and the node attribute values. However, the existing works fail to explicitly discover the inherent relationships in multiview topological graph matrices while considering different properties between the graphs. Besides, they cannot well handle the sparse structure of some graphs in the learning procedure of graph embeddings. Therefore, in this article, we propose a novel contrastive multiview attribute graph clustering (CMAGC) with adaptive encoders method. Within this framework, the adaptive encoders concerning different properties of distinct topological graphs are chosen to integrate multiview attribute graph information by checking whether there exists high-order neighbor information or not. Meanwhile, the number of layers of the GCN encoders is selected according to the prior knowledge related to the characteristics of different topological graphs. In particular, the feature-level and cluster-level contrastive learning are conducted on the multiview soft assignment representations, where the union of the first-order neighbors from the corresponding graph pairs is regarded as the positive pairs for data augmentation and the sparse neighbor information problem in some graphs can be well dealt with. To the best of our knowledge, it is the first time to explicitly deal with the inherent relationships from the interview and intraview perspectives. Extensive experiments are conducted on several datasets to verify the superiority of the proposed CMAGC method compared with the state-of-the-art methods.

PaperID: 394,

Authors: Chun-Mei Feng, Zhanyuan Yang, Huazhu Fu, Yong Xu, Jian Yang, Ling Shao

Affiliations: Inception Institute of Artificial Intelligence, Abu Dhabi, United Arab Emirates; School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China; Shenzhen Key Laboratory of Visual Object Detection and Recognition, Harbin Institute of Technology (Shenzhen), Shenzhen, China; PCA Laboratory, Nanjing University of Science and Technology, Nanjing, China

Title: DONet: Dual-Octave Network for Fast MR Image Reconstruction

Abstract:
Magnetic resonance (MR) image acquisition is an inherently prolonged process, whose acceleration has long been the subject of research. This is commonly achieved by obtaining multiple undersampled images, simultaneously, through parallel imaging. In this article, we propose the dual-octave network (DONet), which is capable of learning multiscale spatial-frequency features from both the real and imaginary components of MR data, for parallel fast MR image reconstruction. More specifically, our DONet consists of a series of dual-octave convolutions (Dual-OctConvs), which are connected in a dense manner for better reuse of features. In each Dual-OctConv, the input feature maps and convolutional kernels are first split into two components (i.e., real and imaginary) and then divided into four groups according to their spatial frequencies. Then, our Dual-OctConv conducts intragroup information updating and intergroup information exchange to aggregate the contextual information across different groups. Our framework provides three appealing benefits: 1) it encourages information interaction and fusion between the real and imaginary components at various spatial frequencies to achieve richer representational capacity; 2) the dense connections between the real and imaginary groups in each Dual-OctConv make the propagation of features more efficient by feature reuse; and 3) DONet enlarges the receptive field by learning multiple spatial-frequency features of both the real and imaginary components. Extensive experiments on two popular datasets (i.e., clinical knee and fastMRI), under different undersampling patterns and acceleration factors, demonstrate the superiority of our model in accelerated parallel MR image reconstruction.

PaperID: 395,

Authors: Yutao Hu, Xiaolong Jiang, Xuhui Liu, Xiaoyan Luo, Yao Hu, Xianbin Cao, Baochang Zhang, Jun Zhang

Affiliations: School of Electronics and Information Engineering, Beihang University, Beijing, China; YouKu Cognitive and Intelligent Laboratory, Alibaba Group, Beijing, China; School of Automation Science and Electrical Engineering, Beihang University, Beijing, China; School of Astronautics, Beihang University, Beijing, China; School of Electronics and Information Engineering and Key Laboratory of Advanced Technologies for Near Space Information Systems, Ministry of Industry and Information Technology of China, Beihang University, Beijing, China; Institute of Artificial Intelligence, Beihang University, Beijing, China; Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China

Title: Hierarchical Self-Distilled Feature Learning for Fine-Grained Visual Categorization

Abstract:
Fine-grained visual categorization (FGVC) relies on hierarchical features extracted by deep convolutional neural networks (CNNs) to recognize closely alike objects. Particularly, shallow layer features containing rich spatial details are vital for specifying subtle differences between objects but are usually inadequately optimized due to gradient vanishing during backpropagation. In this article, hierarchical self-distillation (HSD) is introduced to generate well-optimized CNNs features for accurate fine-grained categorization. HSD inherits from the widely applied deep supervision and implements multiple intermediate losses for reinforced gradients. Besides that, we observe that the hard (one-hot) labels adopted for intermediate supervision hurt the performance of FGVC by enforcing overstrict supervision. As a solution, HSD seeks self-distillation where soft predictions generated by deeper layers of the network are hierarchically exploited to supervise shallow parts. Moreover, self-information entropy loss (SIELoss) is designed in HSD to adaptively soften intermediate predictions and facilitate better convergence. In addition, the gradient detached fusion (GDF) module is incorporated to produce an ensemble result with multiscale features via effective feature fusion. Extensive experiments on four challenging fine-grained datasets show that, with neglectable parameter increase, the proposed HSD framework and the GDF module both bring significant performance gains over different backbones, which also achieves state-of-the-art classification performance.

PaperID: 396,

Authors: Shijie Hao, Yuan Zhou, Yanrong Guo, Richang Hong, Jun Cheng, Meng Wang

Affiliations: Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education, School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China; CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Beijing, China

Title: Real-Time Semantic Segmentation via Spatial-Detail Guided Context Propagation

Abstract:
Nowadays, vision-based computing tasks play an important role in various real-world applications. However, many vision computing tasks, e.g., semantic segmentation, are usually computationally expensive, posing a challenge to the computing systems that are resource-constrained but require fast response speed. Therefore, it is valuable to develop accurate and real-time vision processing models that only require limited computational resources. To this end, we propose the spatial-detail guided context propagation network (SGCPNet) for achieving real-time semantic segmentation. In SGCPNet, we propose the strategy of spatial-detail guided context propagation. It uses the spatial details of shallow layers to guide the propagation of the low-resolution global contexts, in which the lost spatial information can be effectively reconstructed. In this way, the need for maintaining high-resolution features along the network is freed, therefore largely improving the model efficiency. On the other hand, due to the effective reconstruction of spatial details, the segmentation accuracy can be still preserved. In the experiments, we validate the effectiveness and efficiency of the proposed SGCPNet model. On the Cityscapes dataset, for example, our SGCPNet achieves 69.5% mIoU segmentation accuracy, while its speed reaches 178.5 FPS on 768 × 1536 images on a GeForce GTX 1080 Ti GPU card. In addition, SGCPNet is very lightweight and only contains 0.61 M parameters. The code will be released at https://github.com/zhouyuan888888/SGCPNet.

PaperID: 397,

Authors: Shafin Rahman, Salman Khan, Nick Barnes

Affiliations: Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh; Department of Computer Vision, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates; School of Computing, The Australian National University, Canberra, ACT, Australia

Title: Polarity Loss: Improving Visual-Semantic Alignment for Zero-Shot Detection

Abstract:
Conventional object detection models require large amounts of training data. In comparison, humans can recognize previously unseen objects by merely knowing their semantic description. To mimic similar behavior, zero-shot object detection (ZSD) aims to recognize and localize “unseen” object instances by using only their semantic information. The model is first trained to learn the relationships between visual and semantic domains for seen objects, later transferring the acquired knowledge to totally unseen objects. This setting gives rise to the need for correct alignment between visual and semantic concepts so that the unseen objects can be identified using only their semantic attributes. In this article, we propose a novel loss function called “polarity loss” that promotes correct visual-semantic alignment for an improved ZSD. On the one hand, it refines the noisy semantic embeddings via metric learning on a “semantic vocabulary” of related concepts to establish a better synergy between visual and semantic domains. On the other hand, it explicitly maximizes the gap between positive and negative predictions to achieve better discrimination between seen, unseen, and background objects. Our approach is inspired by embodiment theories in cognitive science that claim human semantic understanding to be grounded in past experiences (seen objects), related linguistic concepts (word vocabulary), and visual perception (seen/unseen object images). We conduct extensive evaluations on the Microsoft Common Objects in Context (MS-COCO) and Pascal Visual Object Classes (VOC) datasets, showing significant improvements over state of the art. Our code and evaluation protocols available at: https://github.com/salman-h-khan/PL-ZSD_Release.

PaperID: 398,

Authors: Lu Zhang, Zhiyong Liu, Xiangyu Zhu, Zhan Song, Xu Yang, Zhen Lei, Hong Qiao

Affiliations: State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China; Chinese Academy of Sciences, Shenzhen Institutes of Advanced Technology, Guangdong, Shenzhen, China

Title: Weakly Aligned Feature Fusion for Multimodal Object Detection

Abstract:
To achieve accurate and robust object detection in the real-world scenario, various forms of images are incorporated, such as color, thermal, and depth. However, multimodal data often suffer from the position shift problem, i.e., the image pair is not strictly aligned, making one object has different positions in different modalities. For the deep learning method, this problem makes it difficult to fuse multimodal features and puzzles the convolutional neural network (CNN) training. In this article, we propose a general multimodal detector named aligned region CNN (AR-CNN) to tackle the position shift problem. First, a region feature (RF) alignment module with adjacent similarity constraint is designed to consistently predict the position shift between two modalities and adaptively align the cross-modal RFs. Second, we propose a novel region of interest (RoI) jitter strategy to improve the robustness to unexpected shift patterns. Third, we present a new multimodal feature fusion method that selects the more reliable feature and suppresses the less useful one via feature reweighting. In addition, by locating bounding boxes in both modalities and building their relationships, we provide novel multimodal labeling named KAIST-Paired. Extensive experiments on 2-D and 3-D object detection, RGB-T, and RGB-D datasets demonstrate the effectiveness and robustness of our method.

PaperID: 399,

Authors: Rui Liu, Pengwei Xing, Zichao Deng, Anran Li, Cuntai Guan, Han Yu

Affiliations: School of Computer Science and Engineering, Nanyang Technological University, Nanyang Ave, Singapore

Title: Federated Graph Neural Networks: Overview, Techniques, and Challenges

Abstract:
Graph neural networks (GNNs) have attracted extensive research attention in recent years due to their capability to progress with graph data and have been widely used in practical applications. As societies become increasingly concerned with the need for data privacy protection, GNNs face the need to adapt to this new normal. Besides, as clients in federated learning (FL) may have relationships, more powerful tools are required to utilize such implicit information to boost performance. This has led to the rapid development of the emerging research field of federated GNNs (FedGNNs). This promising interdisciplinary field is highly challenging for interested researchers to grasp. The lack of an insightful survey on this topic further exacerbates the entry difficulty. In this article, we bridge this gap by offering a comprehensive survey of this emerging field. We propose a 2-D taxonomy of the FedGNN literature: 1) the main taxonomy provides a clear perspective on the integration of GNNs and FL by analyzing how GNNs enhance FL training as well as how FL assists GNN training and 2) the auxiliary taxonomy provides a view on how FedGNNs deal with heterogeneity across FL clients. Through discussions of key ideas, challenges, and limitations of existing works, we envision future research directions that can help build more robust, explainable, efficient, fair, inductive, and comprehensive FedGNNs.

PaperID: 400,

Authors: Van Tien Pham, Yassine Zniyed, Thanh Phuong Nguyen

Affiliations: Université de Toulon, Aix Marseille Université, CNRS, LIS, UMR , Toulon, France

Title: Enhanced Network Compression Through Tensor Decompositions and Pruning

Abstract:
Network compression techniques that combine tensor decompositions and pruning have shown promise in leveraging the advantages of both strategies. In this work, we propose enhanced Network cOmpRession through TensOr decompositions and pruNing (NORTON), a novel method for network compression. NORTON introduces the concept of filter decomposition, enabling a more detailed decomposition of the network while preserving the weight’s multidimensional properties. Our method incorporates a novel structured pruning approach, effectively integrating the decomposed model. Through extensive experiments on various architectures, benchmark datasets, and representative vision tasks, we demonstrate the usefulness of our method. NORTON achieves superior results compared to state-of-the-art (SOTA) techniques in terms of complexity and accuracy. Our code is also available for research purposes.

PaperID: 401,

Authors: Qian Li, Zhichao Wang, Haiyang Xia, Gang Li, Yanan Cao, Lina Yao, Guandong Xu

Affiliations: School of Engineering, Computing and Mathematical Sciences, Curtin University, Perth, Australia; School of Electrical Engineering and Telecommunications, University of New South Wales, Sydney, Australia; Research School of Management, Australian National University, Canberra, Australia; Center for Cyber Security Research and Innovation, Deakin University, Melbourne, VIC, Australia; Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Computer Science and Engineering, University of New South Wales, Sydney, Australia; School of Computer Science and the Data Science Institute, University of Technology Sydney, Sydney, Australia

Title: HOT-GAN: Hilbert Optimal Transport for Generative Adversarial Network

Abstract:
Generative adversarial network (GAN) has achieved remarkable success in generating high-quality synthetic data by learning the underlying distributions of target data. Recent efforts have been devoted to utilizing optimal transport (OT) to tackle the gradient vanishing and instability issues in GAN. They use the Wasserstein distance as a metric to measure the discrepancy between the generator distribution and the real data distribution. However, most optimal transport GANs define loss functions in Euclidean space, which limits their capability in handling high-order statistics that are of much interest in a variety of practical applications. In this article, we propose a computational framework to alleviate this issue from both theoretical and practical perspectives. Particularly, we generalize the optimal transport-based GAN from Euclidean space to the reproducing kernel Hilbert space (RKHS) and propose Hilbert Optimal Transport GAN (HOT-GAN). First, we design HOT-GAN with a Hilbert embedding that allows the discriminator to tackle more informative and high-order statistics in RKHS. Second, we prove that HOT-GAN has a closed-form kernel reformulation in RKHS that can achieve a tractable objective under the GAN framework. Third, HOT-GAN’s objective enjoys the theoretical guarantee of differentiability with respect to generator parameters, which is beneficial to learn powerful generators via adversarial kernel learning. Extensive experiments are conducted, showing that our proposed HOT-GAN consistently outperforms the representative GAN works.

PaperID: 402,

Authors: Pengwei Liang, Junjun Jiang, Xianming Liu, Jiayi Ma

Affiliations: School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China; Electronic Information School, Wuhan University, Wuhan, China

Title: Image Deblurring by Exploring In-Depth Properties of Transformer

Abstract:
Image deblurring continues to achieve impressive performance with the development of generative models. Nonetheless, there still remains a displeasing problem if one wants to improve perceptual quality and quantitative scores of recovered image at the same time. In this study, drawing inspiration from the research of transformer properties, we introduce the pretrained transformers to address this problem. In particular, we leverage deep features extracted from a pretrained vision transformer (ViT) to encourage recovered images to be sharp without sacrificing the performance measured by the quantitative metrics. The pretrained transformer can capture the global topological relations (i.e., self-similarity) of image, and we observe that the captured topological relationships about the sharp image will change when blur occurs. By comparing the transformer features between recovered image and target one, the pretrained transformer provides high-resolution blur-sensitive semantic information, which is critical in measuring the sharpness of the deblurred image. On the basis of the advantages, we present two types of novel perceptual losses to guide image deblurring. One regards the features as vectors and computes the discrepancy between representations extracted from recovered image and target one in Euclidean space. The other type considers the features extracted from an image as a distribution and compares the distribution discrepancy between recovered image and target one. We demonstrate the effectiveness of transformer properties in improving the perceptual quality while not sacrificing the quantitative scores peak signal-to-noise ratio (PSNR) over the most competitive models, such as Uformer, Restormer, and NAFNet, on defocus deblurring and motion deblurring tasks. The code is available at https://github. com/erfect2020/TransformerPerceptualLoss.

PaperID: 403,

Authors: Mingfeng Fan, Yaoxin Wu, Zhiguang Cao, Wen Song, Guillaume Sartoretti, Huan Liu, Guohua Wu

Affiliations: School of Traffic and Transportation Engineering, Central South University, Changsha, China; Department of Industrial Engineering and Innovation Sciences, Eindhoven University of Technology, Eindhoven, The Netherlands; School of Computing and Information Systems, Singapore Management University, Victoria Street, Singapore; Institute of Marine Science and Technology, Shandong University, Jinan, China; Department of Mechanical Engineering, National University of Singapore, Cluny Road, Singapore

Title: Conditional Neural Heuristic for Multiobjective Vehicle Routing Problems

Abstract:
Existing neural heuristics for multiobjective vehicle routing problems (MOVRPs) are primarily conditioned on instance context, which failed to appropriately exploit preference and problem size, thus holding back the performance. To thoroughly unleash the potential, we propose a novel conditional neural heuristic (CNH) that fully leverages the instance context, preference, and size with an encoder–decoder structured policy network. Particularly, in our CNH, we design a dual-attention-based encoder to relate preferences and instance contexts, so as to better capture their joint effect on approximating the exact Pareto front (PF). We also design a size-aware decoder based on the sinusoidal encoding to explicitly incorporate the problem size into the embedding, so that a single trained model could better solve instances of various scales. Besides, we customize the REINFORCE algorithm to train the neural heuristic by leveraging stochastic preferences (SPs), which further enhances the training performance. Extensive experimental results on random and benchmark instances reveal that our CNH could achieve favorable approximation to the whole PF with higher hypervolume (HV) and lower optimality gap (Gap) than those of the existing neural and conventional heuristics. More importantly, a single trained model of our CNH can outperform other neural heuristics that are exclusively trained on each size. In addition, the effectiveness of the key designs is also verified through ablation studies.

PaperID: 404,

Authors: Feiping Nie, Han Liu, Rong Wang, Xuelong Li

Affiliations: School of Artificial Intelligence, Optics and Electronics (iOPEN), School of Computer Science, Northwestern Polytechnical University, Xi’an, P. R. China; Key Laboratory of Intelligent Interaction and Applications (Northwestern Polytechnical University), Ministry of Industry and Information Technology, Xi’an, P. R. China

Title: Parameter-Free Multiview K-Means Clustering With Coordinate Descent Method

Abstract:
Recently, more and more real-world datasets have been composed of heterogeneous but related features from diverse views. Multiview clustering provides a promising attempt at a solution for partitioning such data according to heterogeneous information. However, most existing methods suffer from hyper-parameter tuning trouble and high computational cost. Besides, there is still an opportunity for improvement in clustering performance. To this end, a novel multiview framework, called parameter-free multiview k -means clustering with coordinate descent method (PFMVKM), is presented to address the above problems. Specifically, PFMVKM is completely parameter-free and learns the weights via a self-weighted scheme, which can avoid the intractable process of hyper-parameters tuning. Moreover, our model is capable of directly calculating the cluster indicator matrix, with no need to learn the cluster centroid matrix and the indicator matrix simultaneously as previous multiview methods have to do. What’s more, we propose an efficient optimization algorithm utilizing the idea of coordinate descent, which can not only reduce the computational complexity but also improve the clustering performance. Extensive experiments on various types of real datasets illustrate that the proposed method outperforms existing state-of-the-art competitors and conforms well with the actual situation.

PaperID: 405,

Authors: Rui-Jie Zhu, Malu Zhang, Qihang Zhao, Haoyu Deng, Yule Duan, Liang-Jian Deng

Affiliations: University of California, Santa Cruz, Santa Cruz, CA, USA; University of Electronic Science and Technology of China, Chengdu, China; Kuaishou Technology, Beijing, China

Title: TCJA-SNN: Temporal-Channel Joint Attention for Spiking Neural Networks

Abstract:
Spiking neural networks (SNNs) are attracting widespread interest due to their biological plausibility, energy efficiency, and powerful spatiotemporal information representation ability. Given the critical role of attention mechanisms in enhancing neural network performance, the integration of SNNs and attention mechanisms exhibits tremendous potential to deliver energy-efficient and high-performance computing paradigms. In this article, we present a novel temporal-channel joint attention mechanism for SNNs, referred to as TCJA-SNN. The proposed TCJA-SNN framework can effectively assess the significance of spike sequence from both spatial and temporal dimensions. More specifically, our essential technical contribution lies on: 1) we employ the squeeze operation to compress the spike stream into an average matrix. Then, we leverage two local attention mechanisms based on efficient 1-D convolutions to facilitate comprehensive feature extraction at the temporal and channel levels independently and 2) we introduce the cross-convolutional fusion (CCF) layer as a novel approach to model the interdependencies between the temporal and channel scopes. This layer effectively breaks the independence of these two dimensions and enables the interaction between features. Experimental results demonstrate that the proposed TCJA-SNN outperforms the state-of-the-art (SOTA) on all standard static and neuromorphic datasets, including Fashion-MNIST, CIFAR10, CIFAR100, CIFAR10-DVS, N-Caltech 101, and DVS128 Gesture. Furthermore, we effectively apply the TCJA-SNN framework to image generation tasks by leveraging a variation autoencoder. To the best of our knowledge, this study is the first instance where the SNN-attention mechanism has been employed for high-level classification and low-level generation tasks. Our implementation codes are available at https://github.com/ridgerchu/TCJA.

PaperID: 406,

Authors: Zhiwei Chen, Siwei Wang, Liujuan Cao, Yunhang Shen, Rongrong Ji

Affiliations: Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen, China; YouTu Laboratory, Tencent Company Ltd., Shanghai, China

Title: Adaptive Zone Learning for Weakly Supervised Object Localization

Abstract:
Weakly supervised object localization (WSOL) stands as a pivotal endeavor within the realm of computer vision, entailing the location of objects utilizing merely image-level labels. Contemporary approaches in WSOL have leveraged FPMs, yielding commendable outcomes. However, these existing FPM-based techniques are predominantly confined to rudimentary strategies of either augmenting the foreground or diminishing the background presence. We argue for the exploration and exploitation of the intricate interplay between the object’s foreground and its background to achieve efficient object localization. In this manuscript, we introduce an innovative framework, termed adaptive zone learning (AZL), which operates on a coarse-to-fine basis to refine FPMs through a triad of adaptive zone mechanisms. First, an adversarial learning mechanism (ALM) is employed, orchestrating an interplay between the foreground and background regions. This mechanism accentuates coarse-grained object regions in a mutually adversarial manner. Subsequently, an oriented learning mechanism (OLM) is unveiled, which harnesses local insights from both foreground and background in a fine-grained manner. This mechanism is instrumental in delineating object regions with greater granularity, thereby generating better FPMs. Furthermore, we propose a reinforced learning mechanism (RLM) as the compensatory mechanism for adversarial design, by which the undesirable foreground maps are refined again. Extensive experiments on CUB-200-2011 and ILSVRC datasets demonstrate that AZL achieves significant and consistent performance improvements over other state-of-the-art WSOL methods.

PaperID: 407,

Authors: Xiao-Yang Liu, Xiaodong Wang, Bo Yuan, Jiashu Han

Affiliations: Department of Electrical Engineering, Columbia University, New York City, NY, USA; Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA; Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, USA

Title: Spectral Tensor Layers for Communication-Free Distributed Deep Learning

Abstract:
In this article, we propose a novel spectral tensor layer for communication-free distributed deep learning. The overall framework is as follows: first, we represent the data in tensor form (instead of vector form) and replace the matrix product in conventional neural networks with the tensor product, which in effect imposes certain transformed-induced structure on the original weight matrices, e.g., a block-circulant structure; then, we apply a linear transform along a certain dimension to split the original dataset into multiple spectral subdatasets; as a result, the proposed spectral tensor network consists of parallel branches where each branch is a conventional neural network trained on a spectral subdataset with ZERO communication cost. The parallel branches are directly ensembled (i.e., the weighted sum of their outputs) to generate an overall network with substantially stronger generalization capability than that of each branch. Moreover, the proposed method enjoys a byproduct of decentralization gain in terms of memory and computation, compared with traditional networks. It is a natural yet elegant solution for heterogeneous data in federated learning (FL), where data at different nodes have different resolutions. Finally, we evaluate the proposed spectral tensor networks on the MNIST, CIFAR-10, ImageNet-1K, and ImageNet-21K datasets, respectively, to verify that they simultaneously achieve communication-free distributed learning, distributed storage reduction, parallel computation speedup, and learning with multiresolution data.

PaperID: 408,

Authors: Miaoyu Li, Ying Fu, Tao Zhang, Guanghui Wen

Affiliations: Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China; Beijing Key Laboratory of Intelligent Information Technology, the MIIT Key Laboratory of Complex-Field Intelligent Exploration, and the School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China; School of Communication Engineering, Hangzhou Dianzi University, Hangzhou, China; Department of Mathematics, Southeast University, Nanjing, China

Title: Supervise-Assisted Self-Supervised Deep-Learning Method for Hyperspectral Image Restoration

Abstract:
Hyperspectral image (HSI) restoration is a challenging research area, covering a variety of inverse problems. Previous works have shown the great success of deep learning in HSI restoration. However, facing the problem of distribution gaps between training HSIs and target HSI, those data-driven methods falter in delivering satisfactory outcomes for the target HSIs. In addition, the degradation process of HSIs is usually disturbed by noise, which is not well taken into account in existing restoration methods. The existence of noise further exacerbates the dissimilarities within the data, rendering it challenging to attain desirable results without an appropriate learning approach. To track these issues, in this article, we propose a supervise-assisted self-supervised deep-learning method to restore noisy degraded HSIs. Initially, we facilitate the restoration network to acquire a generalized prior through supervised learning from extensive training datasets. Then, the self-supervised learning stage is employed and utilizes the specific prior of the target HSI. Particularly, to restore clean HSIs during the self-supervised learning stage from noisy degraded HSIs, we introduce a noise-adaptive loss function that leverages inner statistics of noisy degraded HSIs for restoration. The proposed noise-adaptive loss consists of Stein’s unbiased risk estimator (SURE) and total variation (TV) regularizer and fine-tunes the network with the presence of noise. We demonstrate through experiments on different HSI tasks, including denoising, compressive sensing, super-resolution, and inpainting, that our method outperforms state-of-the-art methods on benchmarks under quantitative metrics and visual quality. The code is available at https://github.com/ying-fu/SSDL-HSI.

PaperID: 409,

Authors: Xiangyin Kong, Yimeng He, Zhihuan Song, Tong Liu, Zhiqiang Ge

Affiliations: State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou, China; Department of Computer Science, The University of Sheffield, Sheffield, U.K.; Peng Cheng Laboratory, Shenzhen, China

Title: Deep Probabilistic Principal Component Analysis for Process Monitoring

Abstract:
Probabilistic latent variable models (PLVMs), such as probabilistic principal component analysis (PPCA), are widely employed in process monitoring and fault detection of industrial processes. This article proposes a novel deep PPCA (DePPCA) model, which has the advantages of both probabilistic modeling and deep learning. The construction of DePPCA includes a greedy layer-wise pretraining phase and a unified end-to-end fine-tuning phase. The former establishes a hierarchical deep structure based on cascading multiple layers of the PPCA module to extract high-level features. The latter builds an end-to-end connection between the raw inputs and the final outputs to further improve the representation of the model to high-level features. After constructing the model structure of DePPCA, we first present the detailed training processes of the pretraining and fine-tuning stages, then clarify the theoretical merits of the proposed model from the perspective of variational inference. For process monitoring purposes, we develop two statistics based on the established DePPCA. The monitoring performance of these two statistics can remain superior even if the features extracted by DePPCA are significantly compressed to univariate. This makes the feature extraction process and online monitoring procedure of DePPCA quite fast. In other words, the proposed DePPCA can achieve accurate and efficient process monitoring by only extracting one feature for each sample. Finally, the effectiveness of DePPCA is evaluated on the Tennessee Eastman (TE) process and the multiphase flow (MPF) facility.

PaperID: 410,

Authors: Boyan Li, Luziwei Leng, Shuaijie Shen, Kaixuan Zhang, Jianguo Zhang, Jianxing Liao, Ran Cheng

Affiliations: Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China; Advanced Computing and Storage Laboratory, Huawei Technologies Company Ltd., Shenzhen, China

Title: Efficient Deep Spiking Multilayer Perceptrons With Multiplication-Free Inference

Abstract:
Advancements in adapting deep convolution architectures for spiking neural networks (SNNs) have significantly enhanced image classification performance and reduced computational burdens. However, the inability of multiplication-free inference (MFI) to align with attention and transformer mechanisms, which are critical to superior performance on high-resolution vision tasks, imposes limitations on these gains. To address this, our research explores a new pathway, drawing inspiration from the progress made in multilayer perceptrons (MLPs). We propose an innovative spiking MLP architecture that uses batch normalization (BN) to retain MFI compatibility and introduce a spiking patch encoding (SPE) layer to enhance local feature extraction capabilities. As a result, we establish an efficient multistage spiking MLP network that blends effectively global receptive fields with local feature extraction for comprehensive spike-based computation. Without relying on pretraining or sophisticated SNN training techniques, our network secures a top-one accuracy of 66.39% on the ImageNet-1K dataset, surpassing the directly trained spiking ResNet-34 by 2.67%. Furthermore, we curtail computational costs, model parameters, and simulation steps. An expanded version of our network compares with the performance of the spiking VGG-16 network with a 71.64% top-one accuracy, all while operating with a model capacity 2.1 times smaller. Our findings highlight the potential of our deep SNN architecture in effectively integrating global and local learning abilities. Interestingly, the trained receptive field in our network mirrors the activity patterns of cortical cells.

PaperID: 411,

Authors: Xiaojie Yin, Bing Cao, Qinghua Hu, Qilong Wang

Affiliations: College of Intelligence and Computing, Tianjin University, Tianjin, China

Title: RD-OpenMax: Rethinking OpenMax for Robust Realistic Open-Set Recognition

Abstract:
Open-set recognition (OSR) toward a practical open-world setting has attracted increasing research attention in recent years. However, existing OSR settings are either too idealized or focus on specific scenes such as long-tailed distribution and few-shot samples, which fail to capture the complexity of real-world scenarios. In this article, we propose a realistic OSR (ROSR) setting that covers a diverse range of challenging and real-world scenarios, including fine-grained cases with strong semantic correlation and a large number of species, few-shot samples, long-tailed sample distribution, dynamic inputs (e.g., images, spatio-temporal, and multimodal signals) and cross-domain adaptation. In particular, we rethink the simple and basic OpenMax for the ROSR setting and introduce a novel method, regularized discriminative OpenMax (RD-OpenMax), to handle the challenges in the ROSR setting. RD-OpenMax improves upon the basic OpenMax approach by introducing a covariance attention-based covariance pooling (CACP) module as a global aggregation step before the deep architecture’s classifier. This module explores rich statistical information on features and provides discriminative distance scores for OpenMax. To address the instability of extreme value theory (EVT) estimation due to insufficient training samples under few-shot and long-tailed scenarios, we propose a regularized EVT (REVT) method based on Monte Carlo sampling to recalibrate the distribution of distance scores. As such, our RD-OpenMax performs a REVT model of distance scores generated by discriminative CACP representations to distinguish known classes and recognize unknown ones effectively and robustly. Extensive experiments are conducted on more than ten visual benchmarks across several scenarios, and the empirical comparisons show that the ROSR setting challenges existing state-of-the-art OSR approaches. Moreover, our RD-OpenMax clearly outperforms its counterparts under the ROSR setting while performing favorably against state-of-the-arts under the traditional OSR setting.

PaperID: 412,

Authors: Lu Zeng, Xuan Chen, Xiaoshuang Shi, Heng Tao Shen

Affiliations: Center for Future Media and the School of Computer Science and Technology, University of Electronic Science and Technology of China, Chengdu, China

Title: Feature Noise Boosts DNN Generalization Under Label Noise

Abstract:
The presence of label noise in the training data has a profound impact on the generalization of deep neural networks (DNNs). In this study, we introduce and theoretically demonstrate a simple feature noise (FN) method, which directly adds noise to the features of training data and can enhance the generalization of DNNs under label noise. Specifically, we conduct theoretical analyses to reveal that label noise leads to weakened DNN generalization by loosening the generalization bound, and FN results in better DNN generalization by imposing an upper bound on the mutual information between the model weights and the features, which constrains the generalization bound. Furthermore, we conduct a qualitative analysis to discuss the ideal type of FN that obtains good label noise generalization. Finally, extensive experimental results on several popular datasets demonstrate that the FN method can significantly enhance the label noise generalization of state-of-the-art methods. The source codes of the FN method are available on https://github.com/zlzenglu/FN.

PaperID: 413,

Authors: Yidan Ma, Xinjie Shen, Danyang Wu, Jianfu Cao, Feiping Nie

Affiliations: State Key Laboratory for Manufacturing Systems Engineering and the School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China; School of Future Technology, South China University of Technology, Guangzhou, Guangdong, China; College of Information Engineering and Shaanxi Engineering Research Center for Intelligent Perception and Analysis of Agricultural Information, Northwest A&F University, Xianyang, Shaanxi, China; School of Computer Science and the School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi’an, China

Title: Cross-View Approximation on Grassmann Manifold for Multiview Clustering

Abstract:
In existing multiview clustering research, the comprehensive learning from multiview graph and feature spaces simultaneously remains insufficient when achieving a consistent clustering structure. In addition, a postprocessing step is often required. In light of these considerations, a cross-view approximation on Grassman manifold (CAGM) model is proposed to address inconsistencies within multiview adjacency matrices, feature matrices, and cross-view combinations from the two sources. The model uses a ratio-formed objective function, enabling parameter-free bidirectional fusion. Furthermore, the CAGM model incorporates a paired encoding mechanism to generate low-dimensional and orthogonal cross-view embeddings. Through the approximation of two measurable subspaces on the Grassmann manifold, the direct acquisition of the indicator matrix is realized. Furthermore, an effective optimization algorithm corresponding to the CAGM model is derived. Comprehensive experiments on four real-world datasets are conducted to substantiate the effectiveness of our proposed method.

PaperID: 414,

Authors: Kun Xie, Can Liu, Xin Wang, Xiaocan Li, Gaogang Xie, Jigang Wen, Kenli Li

Affiliations: College of Computer Science and Electronics Engineering, Hunan University, Changsha, China; Department of Electrical and Computer Engineering, The State University of New York at Stony Brook, Stony Brook, NY, USA; Institute of Computer Network Information Center, Chinese Academy of Sciences, Beijing, China; School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China

Title: Neural Network Compression Based on Tensor Ring Decomposition

Abstract:
Deep neural networks (DNNs) have made great breakthroughs and seen applications in many domains. However, the incomparable accuracy of DNNs is achieved with the cost of considerable memory consumption and high computational complexity, which restricts their deployment on conventional desktops and portable devices. To address this issue, low-rank factorization, which decomposes the neural network parameters into smaller sized matrices or tensors, has emerged as a promising technique for network compression. In this article, we propose leveraging the emerging tensor ring (TR) factorization to compress the neural network. We investigate the impact of both parameter tensor reshaping and TR decomposition (TRD) on the total number of compressed parameters. To achieve the maximal parameter compression, we propose an algorithm based on prime factorization that simultaneously identifies the optimal tensor reshaping and TRD. In addition, we discover that different execution orders of the core tensors result in varying computational complexities. To identify the optimal execution order, we construct a novel tree structure. Based on this structure, we propose a top-to-bottom splitting algorithm to schedule the execution of core tensors, thereby minimizing computational complexity. We have performed extensive experiments using three kinds of neural networks with three different datasets. The experimental results demonstrate that, compared with the three state-of-the-art algorithms for low-rank factorization, our algorithm can achieve better performance with much lower memory consumption and lower computational complexity.

PaperID: 415,

Authors: Qiuyu Song, Xingxing Jiang, Jie Liu, Juanjuan Shi, Zhongkui Zhu

Affiliations: School of Rail Transportation, Soochow University, Suzhou, China; School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan, China

Title: Contrast-Assisted Domain-Specificity-Removal Network for Semi-Supervised Generalization Fault Diagnosis

Abstract:
Unknown domain shift caused by the unavailability of target domain during training phase degrades the performance of intelligent fault diagnosis models in practical applications. Domain generalization (DG)-based methods have recently emerged to alleviate the influence of domain shift and improve the generalization ability of models toward invisible working conditions. However, most existing studies are conducted on multiple fully labeled source domains. Meanwhile, domain-specific information related to the variations of working conditions is often neglected during model training. Therefore, in order to realize reliable generalization fault diagnosis based on partially labeled source domains, this article proposes a contrast-assisted domain-specificity-removal network (CDSRN) to extract transferable features from domain-specificity-removal perspective. Concretely, a domain-specific feature removal branch is designed to disentangle domain-invariant features and domain-specific features, thus excavating generalized information only in domain-invariance dimension. Simultaneously, proxy-contrastive representation enhancement module is embedded to facilitate the fault class-discriminative and domain-discriminative feature learning, thereby assisting the model in further improvement of generalization capability. Experimental studies confirm the effectiveness and competitiveness of the proposed CDSRN in semi-supervised generalization fault diagnosis.

PaperID: 416,

Authors: Hongyi Pan, Emadeldeen Hamdan, Xin Zhu, Salih Atici, Ahmet Enis Çetin

Affiliations: Department of Electrical and Computer Engineering, University of Illinois Chicago, Chicago, IL, USA

Title: Multichannel Orthogonal Transform-Based Perceptron Layers for Efficient ResNets

Abstract:
In this article, we propose a set of transform-based neural network layers as an alternative to the 3\,\, × 3 Conv2D layers in convolutional neural networks (CNNs). The proposed layers can be implemented based on orthogonal transforms, such as the discrete cosine transform (DCT), Hadamard transform (HT), and biorthogonal block wavelet transform (BWT). Furthermore, by taking advantage of the convolution theorems, convolutional filtering operations are performed in the transform domain using elementwise multiplications. Trainable soft-thresholding layers, that remove noise in the transform domain, bring nonlinearity to the transform domain layers. Compared with the Conv2D layer, which is spatial-agnostic and channel-specific, the proposed layers are location-specific and channel-specific. Moreover, these proposed layers reduce the number of parameters and multiplications significantly while improving the accuracy results of regular ResNets on the ImageNet-1K classification task. Furthermore, they can be inserted with a batch normalization (BN) layer before the global average pooling layer in the conventional ResNets as an additional layer to improve classification accuracy.

PaperID: 417,

Authors: Yingqian Wang, Zhengyu Liang, Longguang Wang, Jungang Yang, Wei An, Yulan Guo

Affiliations: College of Electronic Science and Technology, National University of Defense Technology, Changsha, China; College of Electronic Science, Aviation University of Air Force, Changchun, China

Title: Real-World Light Field Image Super-Resolution Via Degradation Modulation

Abstract:
Recent years have witnessed the great advances of deep neural networks (DNNs) in light field (LF) image super-resolution (SR). However, existing DNN-based LF image SR methods are developed on a single fixed degradation (e.g., bicubic downsampling), and thus cannot be applied to super-resolve real LF images with diverse degradation. In this article, we propose a simple yet effective method for real-world LF image SR. In our method, a practical LF degradation model is developed to formulate the degradation process of real LF images. Then, a convolutional neural network is designed to incorporate the degradation prior into the SR process. By training on LF images using our formulated degradation, our network can learn to modulate different degradation while incorporating both spatial and angular information in LF images. Extensive experiments on both synthetically degraded and real-world LF images demonstrate the effectiveness of our method. Compared with existing state-of-the-art single and LF image SR methods, our method achieves superior SR performance under a wide range of degradation, and generalizes better to real LF images. Codes and models are available at https://yingqianwang.github.io/LF-DMnet/.

PaperID: 418,

Authors: Feiteng Mu, Wenjie Li

Affiliations: Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong

Title: Enhancing Narrative Commonsense Reasoning With Multilevel Causal Knowledge

Abstract:
Narratives is an account of the unfolding of events, along with explanations of how and why these processes and events came to be. To understand narratives, causality has been proven to be especially useful. Causality manifests itself primarily at both the event and sentence levels, offering essential insights into understanding narratives. However, previous works utilize either sentence-level or event-level causalities. In this article, we devise a two-stage approach to fully exploit both levels of causal relationships. In the first stage, by devising posttraining tasks, we inject sentence-level causalities into pretrained language models (PLMs). The causal-enhanced PLMs, which carry sentence-level causalities, can be transferred to downstream tasks. In the second stage, we utilize event causalities to further refine narrative commonsense reasoning. But, the event sparsity problem brings about the difficulty to learn event representations and capture useful causal semantics. To alleviate this problem, we break down events into multiple word components, enabling the retrieval of word–word relations between these components. And it is possible to alleviate the event sparsity problem since word–word relations capture the interplays between event components. Based on the event-level causalities and the word-level relations, we construct the hierarchical knowledge graph (KG) as knowledge ground. A KG-based reasoning process is then employed for narrative commonsense reasoning. Experimental results affirm the effectiveness of our framework.

PaperID: 419,

Authors: Luntong Li, Yuanheng Zhu

Affiliations: State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China

Title: Boosting On-Policy Actor-Critic With Shallow Updates in Critic

Abstract:
Deep reinforcement learning (DRL) benefits from the representation power of deep neural networks (NNs), to approximate the value function and policy in the learning process. Batch reinforcement learning (BRL) benefits from stable training and data efficiency with fixed representation and enjoys solid theoretical analysis. This work proposes least-squares deep policy gradient (LSDPG), a hybrid approach that combines least-squares reinforcement learning (RL) with online DRL to achieve the best of both worlds. LSDPG leverages a shared network to share useful features between policy (actor) and value function (critic). LSDPG learns policy, value function, and representation separately. First, LSDPG views deep NNs of the critic as a linear combination of representation weighted by the weights of the last layer and performs policy evaluation with regularized least-squares temporal difference (LSTD) methods. Second, arbitrary policy gradient algorithms can be applied to improve the policy. Third, an auxiliary task is used to periodically distill the features from the critic into the representation. Unlike most DRL methods, where the critic algorithms are often used in a nonstationary situation, i.e., the policy to be evaluated is changing, the critic in LSDPG is working on a stationary case in each iteration of the critic update. We prove that, under some conditions, the critic converges to the regularized TD fixpoint of current policy, and the actor converges to the local optimal policy. The experimental results on challenging Procgen benchmark illustrate the improvement of sample efficiency of LSDPG over proximal policy optimization and phasic policy gradient (PPG).

PaperID: 420,

Authors: Gaurav Kumar Nayak, Ruchit Rawal, Inder Khatri, Anirban Chakraborty

Affiliations: Centre for Research in Computer Vision, University of Central Florida, Orlando, FL, USA; Max Planck Institute for Software Systems, Saarbrücken, Germany; Tandon School of Engineering, New York University, New York, NY, USA; Department of Computational and Data Sciences, Indian Institute of Science, Bengaluru, India

Title: Robust Few-Shot Learning Without Using Any Adversarial Samples

Abstract:
The high cost of acquiring and annotating samples has made the “few-shot” learning problem of prime importance. Existing works mainly focus on improving performance on clean data and overlook robustness concerns on the data perturbed with adversarial noise. Recently, a few efforts have been made to combine the few-shot problem with the robustness objective using sophisticated meta-learning techniques. These methods rely on the generation of adversarial samples in every episode of training, which further adds to the computational burden. To avoid such time-consuming and complicated procedures, we propose a simple but effective alternative that does not require any adversarial samples. Inspired by the cognitive decision-making process in humans, we enforce high-level feature matching between the base class data and their corresponding low-frequency samples in the pretraining stage via self distillation. The model is then fine-tuned on the samples of novel classes where we additionally improve the discriminability of low-frequency query set features via cosine similarity. On a one-shot setting of the CIFAR-FS dataset, our method yields a massive improvement of 60.55% and 62.05% in adversarial accuracy on the projected gradient descent (PGD) and state-of-the-art auto attack, respectively, with a minor drop in clean accuracy compared to the baseline. Moreover, our method only takes 1.69× of the standard training time while being \approx 5× faster than thestate-of-the-art adversarial meta-learning methods. The code is available at https://github.com/vcl-iisc/robust-few-shot-learning.

PaperID: 421,

Authors: Zijun Cui, Tian Gao, Kartik Talamadupula, Qiang Ji

Affiliations: Rensselaer Polytechnic Institute, Troy, NY, USA; IBM Research, Yorktown Heights, NY, USA

Title: Knowledge-Augmented Deep Learning and its Applications: A Survey

Abstract:
Deep learning models, though having achieved great success in many different fields over the past years, are usually data-hungry, fail to perform well on unseen samples, and lack interpretability. Different kinds of prior knowledge often exists in the target domain, and their use can alleviate the deficiencies with deep learning. To better mimic the behavior of human brains, different advanced methods have been proposed to identify domain knowledge and integrate it into deep models for data-efficient, generalizable, and interpretable deep learning, which we refer to as knowledge-augmented deep learning (KADL). In this survey, we define the concept of KADL and introduce its three major tasks, i.e., knowledge identification, knowledge representation, and knowledge integration. Different from existing surveys that are focused on a specific type of knowledge, we provide a broad and complete taxonomy of domain knowledge and its representations. Based on our taxonomy, we provide a systematic review of existing techniques, different from existing works that survey integration approaches agnostic to the taxonomy of knowledge. This survey subsumes existing works and offers a bird’s-eye view of research in the general area of KADL. The thorough and critical reviews of numerous papers help not only understand current progress but also identify future directions for the research on KADL.

PaperID: 422,

Authors: Yifan Hu, Lei Deng, Yujie Wu, Man Yao, Guoqi Li

Affiliations: Department of Precision Instrument, Center for Brain Inspired Computing Research, Tsinghua University, Beijing, China; Institute of Theoretical Computer Science, Graz University of Technology, Graz, Austria; Peng Cheng Laboratory, Shenzhen, China

Title: Advancing Spiking Neural Networks Toward Deep Residual Learning

Abstract:
Despite the rapid progress of neuromorphic computing, inadequate capacity and insufficient representation power of spiking neural networks (SNNs) severely restrict their application scope in practice. Residual learning and shortcuts have been evidenced as an important approach for training deep neural networks, but rarely did previous work assessed their applicability to the specifics of SNNs. In this article, we first identify that this negligence leads to impeded information flow and the accompanying degradation problem in a spiking version of vanilla ResNet. To address this issue, we propose a novel SNN-oriented residual architecture termed MS-ResNet, which establishes membrane-based shortcut pathways, and further proves that the gradient norm equality can be achieved in MS-ResNet by introducing block dynamical isometry theory, which ensures the network can be well-behaved in a depth-insensitive way. Thus, we are able to significantly extend the depth of directly trained SNNs, e.g., up to 482 layers on CIFAR-10 and 104 layers on ImageNet, without observing any slight degradation problem. To validate the effectiveness of MS-ResNet, experiments on both frame-based and neuromorphic datasets are conducted. MS-ResNet104 achieves a superior result of 76.02% accuracy on ImageNet, which is the highest to the best of our knowledge in the domain of directly trained SNNs. Great energy efficiency is also observed, with an average of only one spike per neuron needed to classify an input sample. We believe our powerful and scalable models will provide strong support for further exploration of SNNs.

PaperID: 423,

Authors: Yixuan Zhou, Xing Xu, Jingkuan Song, Fumin Shen, Heng Tao Shen

Affiliations: Center for Future Media and School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China

Title: MSFlow: Multiscale Flow-Based Framework for Unsupervised Anomaly Detection

Abstract:
Unsupervised anomaly detection (UAD) attracts a lot of research interest and drives widespread applications, where only anomaly-free samples are available for training. Some UAD applications intend to locate the anomalous regions further even without any anomaly information. Although the absence of anomalous samples and annotations deteriorates the UAD performance, an inconspicuous, yet powerful statistics model, the normalizing flows, is appropriate for anomaly detection (AD) and localization in an unsupervised fashion. The flow-based probabilistic models, only trained on anomaly-free data, can efficiently distinguish unpredictable anomalies by assigning them much lower likelihoods than normal data. Nevertheless, the size variation of unpredictable anomalies introduces another inconvenience to the flow-based methods for high-precision AD and localization. To generalize the anomaly size variation, we propose a novel multiscale flow-based framework (MSFlow) composed of asymmetrical parallel flows followed by a fusion flow to exchange multiscale perceptions. Moreover, different multiscale aggregation strategies are adopted for image-wise AD and pixel-wise anomaly localization according to the discrepancy between them. The proposed MSFlow is evaluated on three AD datasets, significantly outperforming existing methods. Notably, on the challenging MVTec AD benchmark, our MSFlow achieves a new state-of-the-art (SOTA) with a detection AUORC score of up to 99.7%, localization AUCROC score of 98.8% and PRO score of 97.1%.

PaperID: 424,

Authors: Yan Luo, Yongkang Wong, Mohan S. Kankanhalli, Qi Zhao

Affiliations: Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN, USA; School of Computing, National University of Singapore, Computing Drive, Singapore

Title: Learning to Predict Gradients for Semi-Supervised Continual Learning

Abstract:
A key challenge for machine intelligence is to learn new visual concepts without forgetting the previously acquired knowledge. Continual learning (CL) is aimed toward addressing this challenge. However, there still exists a gap between CL and human learning. In particular, humans are able to continually learn from the samples associated with known or unknown labels in their daily lives, whereas existing CL and semi-supervised CL (SSCL) methods assume that the training samples are associated with known labels. Specifically, we are interested in two questions: 1) how to utilize unrelated unlabeled data for the SSCL task and 2) how unlabeled data affect learning and catastrophic forgetting in the CL task. To explore these issues, we formulate a new SSCL method, which can be generically applied to existing CL models. Furthermore, we propose a novel gradient learner to learn from labeled data to predict gradients on unlabeled data. In this way, the unlabeled data can fit into the supervised CL framework. We extensively evaluate the proposed method on mainstream CL methods, adversarial CL (ACL), and semi-supervised learning (SSL) tasks. The proposed method achieves state-of-the-art performance on classification accuracy and backward transfer (BWT) in the CL setting while achieving the desired performance on classification accuracy in the SSL setting. This implies that the unlabeled images can enhance the generalizability of CL models on the predictive ability of unseen data and significantly alleviate catastrophic forgetting. The code is available at https://github.com/luoyan407/grad_prediction.git.

PaperID: 425,

Authors: Yu Luo, Tianying Ji, Fuchun Sun, Huaping Liu, Jianwei Zhang, Mingxuan Jing, Wenbing Huang

Affiliations: Department of Computer Science and Technology, Tsinghua University, Beijing, China; Department of Informatics, University of Hamburg, Hamburg, Germany; Science and Technology on Integrated Information System Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, China; Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China

Title: Goal-Conditioned Hierarchical Reinforcement Learning With High-Level Model Approximation

Abstract:
Hierarchical reinforcement learning (HRL) exhibits remarkable potential in addressing large-scale and long-horizon complex tasks. However, a fundamental challenge, which arises from the inherently entangled nature of hierarchical policies, has not been understood well, consequently compromising the training stability and exploration efficiency of HRL. In this article, we propose a novel HRL algorithm, high-level model approximation (HLMA), presenting both theoretical foundations and practical implementations. In HLMA, a Planner constructs an innovative high-level dynamic model to predict the k -step transition of the Controller in a subtask. This allows for the estimation of the evolving performance of the Controller. At low level, we leverage the initial state of each subtask, transforming absolute states into relative deviations by a designed operator as Controller input. This approach facilitates the reuse of subtask domain knowledge, enhancing data efficiency. With this designed structure, we establish the local convergence of each component within HLMA and subsequently derive regret bounds to ensure global convergence. Abundant experiments conducted on complex locomotion and navigation tasks demonstrate that HLMA surpasses other state-of-the-art single-level RL and HRL algorithms in terms of sample efficiency and asymptotic performance. In addition, thorough ablation studies validate the effectiveness of each component of HLMA.

PaperID: 426,

Authors: Jinfu Fan, Linqing Huang, Chaoyu Gong, Yang You, Min Gan, Zhongjie Wang

Affiliations: College of Computer Science and Technology, Qingdao University, Qingdao, China; Department of Computer Science, National University of Singapore, Cluny Road, Singapore; Department of Control Science and Engineering, College of Electronics and Information Engineering, Tongji University, Shanghai, China

Title: KMT-PLL: K-Means Cross-Attention Transformer for Partial Label Learning

Abstract:
Partial label learning (PLL) studies the problem of learning instance classification with a set of candidate labels and only one is correct. While recent works have demonstrated that the Vision Transformer (ViT) has achieved good results when training from clean data, its applications to PLL remain limited and challenging. To address this issue, we rethink the relationship between instances and object queries to propose K-means cross-attention transformer for PLL (KMT-PLL), which can continuously learn cluster centers and be used for downstream disambiguation tasks. More specifically, K-means cross-attention as a clustering process can effectively learn the cluster centers to represent label classes. The purpose of this operation is to make the similarity between instances and labels measurable, which can effectively detect noise labels. Furthermore, we propose a new corrected cross entropy formulation, which can assign weights to candidate labels according to the instance-to-label relevance to guide the training of the instance classifier. As the training goes on, the ground-truth label is progressively identified, and the refined labels and cluster centers in turn help to improve the classifier. Simulation results demonstrate the advantage of the KMT-PLL and its suitability for PLL.

PaperID: 427,

Authors: Giorgio Morales, John W. Sheppard

Affiliations: Gianforte School of Computing, Montana State University, Bozeman, MT, USA

Title: Dual Accuracy-Quality-Driven Neural Network for Prediction Interval Generation

Abstract:
Accurate uncertainty quantification is necessary to enhance the reliability of deep learning (DL) models in real-world applications. In the case of regression tasks, prediction intervals (PIs) should be provided along with the deterministic predictions of DL models. Such PIs are useful or “high-quality (HQ)” as long as they are sufficiently narrow and capture most of the probability density. In this article, we present a method to learn PIs for regression-based neural networks (NNs) automatically in addition to the conventional target predictions. In particular, we train two companion NNs: one that uses one output, the target estimate, and another that uses two outputs, the upper and lower bounds of the corresponding PI. Our main contribution is the design of a novel loss function for the PI-generation network that takes into account the output of the target-estimation network and has two optimization objectives: minimizing the mean PI width and ensuring the PI integrity using constraints that maximize the PI probability coverage implicitly. Furthermore, we introduce a self-adaptive coefficient that balances both objectives within the loss function, which alleviates the task of fine-tuning. Experiments using a synthetic dataset, eight benchmark datasets, and a real-world crop yield prediction dataset showed that our method was able to maintain a nominal probability coverage and produce significantly narrower PIs without detriment to its target estimation accuracy when compared to those PIs generated by three state-of-the-art neural-network-based methods. In other words, our method was shown to produce higher quality PIs.

PaperID: 428,

Authors: Jing Xu, Xinglin Pan, Jingquan Wang, Wenjie Pei, Qing Liao, Zenglin Xu

Affiliations: School of Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China; The Hong Kong University of Science and Technology, Guangzhou, Guangdong, China

Title: CORE: CORrelation-Guided Feature Enhancement for Few-Shot Image Classification

Abstract:
Few-shot classification aims to adapt classifiers trained on base classes to novel classes with a few shots. However, the limited amount of training data is often inadequate to represent the intraclass variations in novel classes. This can result in biased estimation of the feature distribution, which in turn results in inaccurate decision boundaries, especially when the support data are outliers. To address this issue, we propose a feature enhancement method called CORrelation-guided feature Enrichment that generates improved features for novel classes using weak supervision from the base classes. The proposed CORrelation-guided feature Enhancement (CORE) method utilizes an autoencoder (AE) architecture but incorporates classification information into its latent space. This design allows the CORE to generate more discriminative features while discarding irrelevant content information. After being trained on base classes, CORE’s generative ability can be transferred to novel classes that are similar to those in the base classes. By using these generative features, we can reduce the estimation bias of the class distribution, which makes few-shot learning (FSL) less sensitive to the selection of support data. Our method is generic and flexible and can be used with any feature extractor and classifier. It can be easily integrated into existing FSL approaches. Experiments with different backbones and classifiers show that our proposed method consistently outperforms existing methods on various widely used benchmarks.

PaperID: 429,

Authors: Shawon Dey, Hao Xu

Affiliations: Department of Electrical and Biomedical Engineering, University of Nevada, Reno, Reno, NV, USA

Title: Distributed Adaptive Flocking Control for Large-Scale Multiagent Systems

Abstract:
This article presents a novel distributed flocking control method for large-scale multiagent systems (LS-MASs) operating in uncertain environments. When dealing with a massive number of flocking agents in uncertain environments, existing flocking methods encounter the problem of communication complexity and “Curse of dimensionality” caused by the exponential growth of agent interactions while solving PDE-based optimal flocking control for large-scale systems. The mean field game (MFG) method addresses this issue by transforming interactions among all agents into the interaction of each individual agent with average effects represented by a probability density function (pdf) of other agents. However, relying solely on a pdf term to consider other agents’ states can result in inefficient flocking performance due to the absence of a proficient coordination mechanism encompassing all agents involved in flocking. To overcome these difficulties and achieve the desired flocking performance for LS-MASs, the agents are decomposed into a finite number of subgroups. Each subgroup comprises a leader and followers, and a hybrid game theory is developed to manage both inter- and intragroup interactions. The method incorporates a cooperative game that links leaders from different groups to formulate distributed flocking control, a Stackelberg game that teams up leaders and followers within the same group to extend collective flocking behavior, and an MFG for followers to address the challenges of LS-MASs. Furthermore, to achieve distributed adaptive flocking using the hybrid game structure, we propose a hierarchical actor–critic-mass-based reinforcement learning technique. This approach incorporates a multiactor–critic method for leaders and an actor–critic-mass algorithm for followers, enabling adaptive flocking control in a distributed manner for large-scale agents. Finally, numerical simulation including comparison study and Lyapunov analysis demonstrates the effectiveness of the developed method.

PaperID: 430,

Authors: Ziyuan Yang, Wenjun Xia, Zexin Lu, Yingyu Chen, Xiaoxiao Li, Yi Zhang

Affiliations: College of Computer Science and the Key Laboratory of Data Protection and Intelligent Management, Ministry of Education, Sichuan University, Chengdu, China; School of Cyber Science and Engineering and the Key Laboratory of Data Protection and Intelligent Management, Ministry of Education, Sichuan University, Chengdu, China; College of Computer Science, Sichuan University, Chengdu, China; Department of Electrical and Computer Engineering, The University of British Columbia, Vancouver, BC, Canada

Title: Hypernetwork-Based Physics-Driven Personalized Federated Learning for CT Imaging

Abstract:
In clinical practice, computed tomography (CT) is an important noninvasive inspection technology to provide patients’ anatomical information. However, its potential radiation risk is an unavoidable problem that raises people’s concerns. Recently, deep learning (DL)-based methods have achieved promising results in CT reconstruction, but these methods usually require the centralized collection of large amounts of data for training from specific scanning protocols, which leads to serious domain shift and privacy concerns. To relieve these problems, in this article, we propose a hypernetwork-based physics-driven personalized federated learning method (HyperFed) for CT imaging. The basic assumption of the proposed HyperFed is that the optimization problem for each domain can be divided into two subproblems: local data adaption and global CT imaging problems, which are implemented by an institution-specific physics-driven hypernetwork and a global-sharing imaging network, respectively. Learning stable and effective invariant features from different data distributions is the main purpose of global-sharing imaging network. Inspired by the physical process of CT imaging, we carefully design physics-driven hypernetwork for each domain to obtain hyperparameters from specific physical scanning protocol to condition the global-sharing imaging network, so that we can achieve personalized local CT reconstruction. Experiments show that HyperFed achieves competitive performance in comparison with several other state-of-the-art methods. It is believed as a promising direction to improve CT imaging quality and personalize the needs of different institutions or scanners without data sharing. Related codes have been released at https://github.com/Zi-YuanYang/HyperFed.

PaperID: 431,

Authors: Honglei Zhang, Xin Zhou, Zhiqi Shen, Yidong Li

Affiliations: Key Laboratory of Big Data and Artificial Intelligence in Transportation, Ministry of Education, and the School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China; School of Computer Science and Engineering, Nanyang Technological University, Jurong West, Singapore

Title: PrivFR: Privacy-Enhanced Federated Recommendation With Shared Hash Embedding

Abstract:
Federated recommender systems (FRSs), with their improved privacy-preserving advantages to jointly train recommendation models from numerous devices while keeping user data distributed, have been widely explored in modern recommender systems (RSs). However, conventional FRSs require transmitting the entire model between the server and clients, which brings a huge carbon footprint for cost-conscious cross-device learning tasks. While several efforts have been dedicated to improving the efficiency of FRSs, it’s suboptimal to treat the whole model as the objective of compact design. Besides, current research fails to handle the out-of-vocabulary (OOV) issue in real-world FRSs, where the items only occasionally appear in the testing phase but were not observed during the training process, which is another practical challenge and has not been well studied yet. To this end, we propose a privacy-enhanced federated recommendation framework with shared hash embedding, PrivFR, in cross-device settings, which is an efficient representation mechanism specialized for the embedding parameters without compromising the model capability. Specifically, it represents items in a resource-efficient way by delicately utilizing shared hash embedding and multiple hash functions. As such, it just maintains a small shared pool of hash embedding in local clients, rather than fitting all embedding vectors for each item, which can exactly achieve the dual advantages of conserving resources and handling the OOV issue. What’s more, we prove that this mechanism can protect the data privacy of local clients from a theoretical perspective. Extensive experiments show that our method not only effectively reduces storage and communication overheads, but also outperforms state-of-the-art FRSs.

PaperID: 432,

Authors: Xiongtao Zhang, Ji Wang, Weidong Bao, Yaohong Zhang, Xiaomin Zhu, Hao Peng, Xiang Zhao

Affiliations: Laboratory for Big Data and Decision, National University of Defense Technology, Changsha, China; Strategic Assessments and Consultation Institute, Academy of Military Science, Beijing, China; State Key Laboratory of Software Development Environment, School of Cyber Science and Technology, Beihang University, Beijing, China

Title: Improving Generalization and Personalization in Model-Heterogeneous Federated Learning

Abstract:
Conventional federated learning (FL) assumes the homogeneity of models, necessitating clients to expose their model parameters to enhance the performance of the server model. However, this assumption cannot reflect real-world scenarios. Sharing models and parameters raises security concerns for users, and solely focusing on the server-side model neglects clients’ personalization requirements, potentially impeding expected performance improvements of users. On the other hand, prioritizing personalization may compromise the generalization of the server model, thereby hindering extensive knowledge migration. To address these challenges, we put forth an important problem: How can FL ensure both generalization and personalization when clients’ models are heterogeneous? In this work, we introduce FedTED, which leverages a twin-branch structure and data-free knowledge distillation (DFKD) to address the challenges posed by model heterogeneity and diverse objectives in FL. The employed techniques in FedTED yield significant improvements in both personalization and generalization, while effectively coordinating the updating process of clients’ heterogeneous models and successfully reconstructing a satisfactory global model. Our empirical evaluation demonstrates that FedTED outperforms many representative algorithms, particularly in scenarios where clients’ models are heterogeneous, achieving a remarkable 19.37% enhancement in generalization performance and up to 9.76% improvement in personalization performance.

PaperID: 433,

Authors: Yunan Li, Tianyu Qi, Zhuoqi Ma, Dou Quan, Qiguang Miao

Affiliations: School of Computer Science and Technology, the Xi’an Key Laboratory of Big Data and Intelligent Vision, the Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, and the Key Laboratory of Collaborative Intelligence Systems, Ministry of Education, Xidian University, Xi’an, China; School of Computer Science and Technology, Xidian University, Xi’an, China; School of Artificial Intelligence, Xidian University, Xi’an, China

Title: Seeking a Hierarchical Prototype for Multimodal Gesture Recognition

Abstract:
Gesture recognition has drawn considerable attention from many researchers owing to its wide range of applications. Although significant progress has been made in this field, previous works always focus on how to distinguish between different gesture classes, ignoring the influence of inner-class divergence caused by gesture-irrelevant factors. Meanwhile, for multimodal gesture recognition, feature or score fusion in the final stage is a general choice to combine the information of different modalities. Consequently, the gesture-relevant features in different modalities may be redundant, whereas the complementarity of modalities is not exploited sufficiently. To handle these problems, we propose a hierarchical gesture prototype framework to highlight gesture-relevant features such as poses and motions in this article. This framework consists of a sample-level prototype and a modal-level prototype. The sample-level gesture prototype is established with the structure of a memory bank, which avoids the distraction of gesture-irrelevant factors in each sample, such as the illumination, background, and the performers’ appearances. Then the modal-level prototype is obtained via a generative adversarial network (GAN)-based subnetwork, in which the modal-invariant features are extracted and pulled together. Meanwhile, the modal-specific attribute features are used to synthesize the feature of other modalities, and the circulation of modality information helps to leverage their complementarity. Extensive experiments on three widely used gesture datasets demonstrate that our method is effective to highlight gesture-relevant features and can outperform the state-of-the-art methods.

PaperID: 434,

Authors: Wentao Fan, Chao Zhang, Huaxiong Li, Xiuyi Jia, Guoyin Wang

Affiliations: Department of Control Science and Intelligence Engineering, Nanjing University, Nanjing, China; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China; Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China

Title: Three-Stage Semisupervised Cross-Modal Hashing With Pairwise Relations Exploitation

Abstract:
Hashing methods have sparked a great revolution in cross-modal retrieval due to the low cost of storage and computation. Benefiting from the sufficient semantic information of labeled data, supervised hashing methods have shown better performance compared with unsupervised ones. Nevertheless, it is expensive and labor intensive to annotate the training samples, which restricts the feasibility of supervised methods in real applications. To deal with this limitation, a novel semisupervised hashing method, i.e., three-stage semisupervised hashing (TS3H) is proposed in this article, where both labeled and unlabeled data are seamlessly handled. Different from other semisupervised approaches that learn the pseudolabels, hash codes, and hash functions simultaneously, the new approach is decomposed into three stages as the name implies, in which all of the stages are conducted individually to make the optimization cost-effective and precise. Specifically, the classifiers of different modalities are learned via the provided supervised information to predict the labels of unlabeled data at first. Then, hash code learning is achieved with a simple but efficient scheme by unifying the provided and the newly predicted labels. To capture the discriminative information and preserve the semantic similarities, we leverage pairwise relations to supervise both classifier learning and hash code learning. Finally, the modality-specific hash functions are obtained by transforming the training samples to the generated hash codes. The new approach is compared with the state-of-the-art shallow and deep cross-modal hashing (DCMH) methods on several widely used benchmark databases, and the experiment results verify its efficiency and superiority.

PaperID: 435,

Authors: Liang Li, Tongyu Lu, Yaoqi Sun, Yuhan Gao, Chenggang Yan, Zhenghui Hu, Qingming Huang

Affiliations: Chinese Academy of Sciences, Institute of Computing Technology, Beijing, China; School of Automation, Hangzhou Dianzi University, Hangzhou, China; Lishui Institute, Hangzhou Dianzi University, Hangzhou, Zhejiang, China; Hangzhou Innovation Institute, Beihang University, Beijing, Zhejiang, China; School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China

Title: Progressive Decision Boundary Shifting for Unsupervised Domain Adaptation

Abstract:
Unsupervised domain adaptation (UDA) is attracting more attention from researchers for boosting the task-specific generalization on target domain. It focuses on addressing the domain shift between the labeled source domain and the unlabeled target domain. Recent biclassifier-based UDA models perform category-level alignment to reduce domain shift, and meanwhile, self-training is used for improving the discriminability of target instances. However, the error accumulation problem of instances with high semantic uncertainty may cause discriminability degradation and category-level misalignment. To solve this issue, we design the progressive decision boundary shifting algorithm, where stable category information of target instances is explored for learning a discriminability structure on target domain. Specifically, we first model the semantic uncertainty of instances by progressively shifting decision boundaries of category. Then, we introduce the uncertainty decoupling in a contrastive manner, where the discriminative information is learned from the source domain for instance with low semantic uncertainty. Furthermore, we minimize the predictive entropy of instances with high semantic uncertainty to reduce their prediction confidence. Extensive experiments on three popular datasets show that our model outperforms the current state-of-the-art (SOTA) UDA methods.

PaperID: 436,

Authors: Siyuan Li, Zicheng Liu, Zelin Zang, Di Wu, Zhiyuan Chen, Stan Z. Li

Affiliations: Zhejiang University, Hangzhou, China; AI Division, School of Engineering, Westlake University, Hangzhou, China

Title: GenURL: A General Framework for Unsupervised Representation Learning

Abstract:
Unsupervised representation learning (URL) that learns compact embeddings of high-dimensional data without supervision has achieved remarkable progress recently. However, the development of URLs for different requirements is independent, which limits the generalization of the algorithms, especially prohibitive as the number of tasks grows. For example, dimension reduction (DR) methods, t-SNE and UMAP, optimize pairwise data relationships by preserving the global geometric structure, while self-supervised learning, SimCLR and BYOL, focuses on mining the local statistics of instances under specific augmentations. To address this dilemma, we summarize and propose a unified similarity-based URL framework, GenURL, which can adapt to various URL tasks smoothly. In this article, we regard URL tasks as different implicit constraints on the data geometric structure that help to seek optimal low-dimensional representations that boil down to data structural modeling (DSM) and low-dimensional transformation (LDT). Specifically, DSM provides a structure-based submodule to describe the global structures, and LDT learns compact low-dimensional embeddings with given pretext tasks. Moreover, an objective function, general Kullback–Leibler (GKL) divergence, is proposed to connect DSM and LDT naturally. Comprehensive experiments demonstrate that GenURL achieves consistent state-of-the-art performance in self-supervised visual learning, unsupervised knowledge distillation (KD), graph embeddings (GEs), and DR.

PaperID: 437,

Authors: Jiaqi Gao, Zhizhong Huang, Yiming Lei, Hongming Shan, James Z. Wang, Fei-Yue Wang, Junping Zhang

Affiliations: Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, China; Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China; College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA, USA; State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China

Title: Deep Rank-Consistent Pyramid Model for Enhanced Crowd Counting

Abstract:
Most conventional crowd counting methods utilize a fully-supervised learning framework to establish a mapping between scene images and crowd density maps. They usually rely on a large quantity of costly and time-intensive pixel-level annotations for training supervision. One way to mitigate the intensive labeling effort and improve counting accuracy is to leverage large amounts of unlabeled images. This is attributed to the inherent self-structural information and rank consistency within a single image, offering additional qualitative relation supervision during training. Contrary to earlier methods that utilized the rank relations at the original image level, we explore such rank-consistency relation within the latent feature spaces. This approach enables the incorporation of numerous pyramid partial orders, strengthening the model representation capability. A notable advantage is that it can also increase the utilization ratio of unlabeled samples. Specifically, we propose a Deep Rank-consist Ent pyrAmid Model (DREAM), which makes full use of rank consistency across coarse-to-fine pyramid features in latent spaces for enhanced crowd counting with massive unlabeled images. In addition, we have collected a new unlabeled crowd counting dataset, FUDAN-UCC, comprising 4000 images for training purposes. Extensive experiments on four benchmark datasets, namely UCF-QNRF, ShanghaiTech PartA and PartB, and UCF-CC-50, show the effectiveness of our method compared with previous semi-supervised methods. The codes are available at https://github.com/bridgeqiqi/DREAM.

PaperID: 438,

Authors: Yecheng Guo, Liang Bai, Xian Yang, Jiye Liang

Affiliations: Institute of Intelligent Information Processing, Shanxi University, Taiyuan, China; Alliance Manchester Business School, The University of Manchester, Manchester, U.K

Title: Improving Image Contrastive Clustering Through Self-Learning Pairwise Constraints

Abstract:
In this article, a new unsupervised contrastive clustering (CC) model is introduced, namely, image CC with self-learning pairwise constraints (ICC-SPC). This model is designed to integrate pairwise constraints into the CC process, enhancing the latent representation learning and improving clustering results for image data. The incorporation of pairwise constraints helps reduce the impact of false negatives and false positives in contrastive learning, while maintaining robust cluster discrimination. However, obtaining prior pairwise constraints from unlabeled data directly is quite challenging in unsupervised scenarios. To address this issue, ICC-SPC designs a pairwise constraints learning module. This module autonomously learns pairwise constraints among data samples by leveraging consensus information between latent representation and pseudo-labels, which are generated by the clustering algorithm. Consequently, there is no requirement for labeled images, offering a practical resolution to the challenge posed by the lack of sufficient supervised information in unsupervised clustering tasks. ICC-SPC’s effectiveness is validated through evaluations on multiple benchmark datasets. This contribution is significant, as we present a novel framework for unsupervised clustering by integrating contrastive learning with self-learning pairwise constraints.

PaperID: 439,

Authors: Guangsheng Yu, Xu Wang, Caijun Sun, Qin Wang, Ping Yu, Wei Ni, Ren Ping Liu

Affiliations: Data, CSIRO, Sydney, NSW, Australia; Global Big Data Technologies Centre, University of Technology Sydney, Ultimo, NSW, Australia; Zhejiang Lab, Hangzhou, China; State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China

Title: IronForge: An Open, Secure, Fair, Decentralized Federated Learning

Abstract:
Federated learning (FL) offers an effective learning architecture to protect data privacy in a distributed manner. However, the inevitable network asynchrony, overdependence on a central coordinator, and lack of an open and fair incentive mechanism collectively hinder FL’s further development. We propose IronForge, a new generation of FL framework, that features a directed acyclic graph (DAG)-based structure, where nodes represent uploaded models, and referencing relationships between models form the DAG that guides the aggregation process. This design eliminates the need for central coordinators to achieve fully decentralized operations. IronForge runs in a public and open network and launches a fair incentive mechanism by enabling state consistency in the DAG. Hence, the system fits in networks where training resources are unevenly distributed. In addition, dedicated defense strategies against prevalent FL attacks on incentive fairness and data privacy are presented to ensure the security of IronForge. Experimental results based on a newly developed test bed FLSim highlight the superiority of IronForge to the existing prevalent FL frameworks under various specifications in performance, fairness, and security. To the best of our knowledge, IronForge is the first secure and fully decentralized FL (DFL) framework that can be applied in open networks with realistic network and training settings.

PaperID: 440,

Authors: Zhiming Liu, Jinhai Li, Xiao Zhang, Xi-Zhao Wang

Affiliations: Data Science Research Center and the Faculty of Science, Kunming University of Science and Technology, Kunming, China; Department of Applied Mathematics, Xi’an University of Technology, Xi’an, China; College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China

Title: Incremental Incomplete Concept-Cognitive Learning Model: A Stochastic Strategy

Abstract:
Concept-cognitive learning is an emerging area of cognitive computing, which refers to continuously learning new knowledge by imitating the human cognition process. However, the existing research on concept-cognitive learning is still at the level of complete cognition as well as cognitive operators, which is far from the real cognition process. Meanwhile, the current classification algorithms based on concept-cognitive learning models (CCLMs) are not mature enough yet since their cognitive results highly depend on the cognition order of attributes. To address the above problems, this article presents a novel concept-cognitive learning method, namely, stochastic incremental incomplete concept-cognitive learning method (SI2CCLM), whose cognition process adopts a stochastic strategy that is independent of the order of attributes. Moreover, a new classification algorithm based on SI2CCLM is developed, and the analysis of the parameters and convergence of the algorithm is made. Finally, we show the cognitive effectiveness of SI2CCLM by comparing it with other concept-cognitive learning methods. In addition, the average accuracy of our model on 24 datasets is 82.02%, which is higher than the compared 20 classification algorithms, and the elapsed time of our model also has advantages.

PaperID: 441,

Authors: Jie Chen, Shengxiang Yang, Conor Fahy, Zhu Wang, Yinan Guo, Yingke Chen

Affiliations: College of Computer Science, Sichuan University, Chengdu, China; School of Computer Science and Informatics, De Montfort University, Leicester, U.K.; Law School, Sichuan University, Chengdu, China; School of Mechanical Electronic and Information Engineering, China University of Mining and Technology (Beijing), Beijing, China; Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne, U.K.

Title: Online Sparse Representation Clustering for Evolving Data Streams

Abstract:
Data stream clustering can be performed to discover the patterns underlying continuously arriving sequences of data. A number of data stream clustering algorithms for finding clusters in arbitrary shapes and handling outliers, such as density-based clustering algorithms, have been proposed. However, these algorithms are often limited in their ability to construct and merge microclusters by measuring the Euclidean distances between high-dimensional data objects, e.g., transferring valuable knowledge from historical landmark windows to the current landmark window, and exploiting evolving subspace structures adaptively. We propose an online sparse representation clustering (OSRC) method to learn an affinity matrix for evaluating the relationships among high-dimensional data objects in evolving data streams. We first introduce a low-dimensional projection (LDP) into sparse representation to adaptively reduce the potential negative influence associated with the noise and redundancy contained in high-dimensional data. Then, we take advantage of the l_2,1 -norm optimization technique to choose the appropriate number of representative data objects and form a specific dictionary for sparse representation. The specific dictionary is integrated into sparse representation to adaptively exploit the evolving subspace structures of the high-dimensional data objects. Moreover, the data object representatives from the current landmark window can transfer valuable knowledge to the next landmark window. The experimental results based on a synthetic dataset and six benchmark datasets validate the effectiveness of the proposed method compared to that of state-of-the-art methods for data stream clustering.

PaperID: 442,

Authors: Huimin Zhang, Ronghu Chi, Biao Huang

Affiliations: School of Automation and Electronic Engineering, Qingdao University of Science and Technology, Qingdao, China; Department of Chemical and Materials Engineering, University of Alberta, Edmonton, AB, Canada

Title: Data-Driven Internal Model Learning Control for Nonlinear Systems

Abstract:
A novel data-driven internal model learning control (DIMLC) strategy is developed for a nonlinear nonaffine system subject to unknown nonrepetitive uncertainties. At first, an iterative dynamic linearization (IDL) approach is employed for reformulating the nonlinear plant to an iterative linear data model (iLDM). Then, the nominal form of the IDL-based iLDM is used as an internal model of the nonlinear plant whose parameters are estimated by an iterative adaptive updating mechanism using only input–output (I/O) data. The equivalent feedback-principle-based internal model inversion is further applied to the subsequent controller design and analysis. The proposed DIMLC contains two parts. One is a nominal controller designed by the inversion of the internal model which achieves a perfect tracking of the target output; the other is a compensatory controller which offsets the uncertainties. The novel DIMLC is data-driven and does not require an explicit model. It can deal with model-plant mismatch and disturbances, enhancing the robustness against uncertainties. The theoretical results are verified by simulation study.

PaperID: 443,

Authors: Siying Zhu, Jiawei Zheng, Qianli Ma

Affiliations: School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

Title: MR-Transformer: Multiresolution Transformer for Multivariate Time Series Prediction

Abstract:
Multivariate time series (MTS) prediction has been studied broadly, which is widely applied in real-world applications. Recently, transformer-based methods have shown the potential in this task for their strong sequence modeling ability. Despite progress, these methods pay little attention to extracting short-term information in the context, while short-term patterns play an essential role in reflecting local temporal dynamics. Moreover, we argue that there are both consistent and specific characteristics among multiple variables, which should be fully considered for MTS modeling. To this end, we propose a multiresolution transformer (MR-Transformer) for MTS prediction, modeling MTS from both the temporal and the variable resolution. Specifically, for the temporal resolution, we design a long short-term transformer. We first split the sequence into nonoverlapping segments in an adaptive way and then extract short-term patterns within segments, while long-term patterns are captured by the inherent attention mechanism. Both of them are aggregated together to capture the temporal dependencies. For the variable resolution, besides the variable-consistent features learned by long short-term transformer, we also design a temporal convolution module to capture the specific features of each variable individually. MR-Transformer enhances the MTS modeling ability by combining multiresolution features between both time steps and variables. Extensive experiments conducted on real-world time series datasets show that MR-Transformer significantly outperforms the state-of-the-art MTS prediction models. The visualization analysis also demonstrates the effectiveness of the proposed model.

PaperID: 444,

Authors: Chengyuan Mai, Yaomin Chang, Chuan Chen, Zibin Zheng

Affiliations: School of Computer Science and Engineering and the National Engineering Research Center of Digital Life, Sun Yat-sen University, Guangzhou, China; School of Software Engineering, Sun Yat-sen University, Zhuhai, China

Title: Enhanced Scalable Graph Neural Network via Knowledge Distillation

Abstract:
Graph neural networks (GNNs) have achieved state-of-the-art performance in various graph representation learning scenarios. However, when applied to graph data in real world, GNNs have encountered scalability issues. Existing GNNs often have high computational load in both training and inference stages, making them incapable of meeting the performance needs of large-scale scenarios with a large number of nodes. Although several studies on scalable GNNs have developed, they either merely improve GNNs with limited scalability or come at the expense of reduced effectiveness. Inspired by knowledge distillation’s (KDs) achievement in preserving performances while balancing scalability in computer vision and natural language processing, we propose an enhanced scalable GNN via KD (KD-SGNN) to improve the scalability and effectiveness of GNNs. On the one hand, KD-SGNN adopts the idea of decoupled GNNs, which decouples feature transformation and feature propagation in GNNs and leverages preprocessing techniques to improve the scalability of GNNs. On the other hand, KD-SGNN proposes two KD mechanisms (i.e., soft-target (ST) distillation and shallow imitation (SI) distillation) to improve the expressiveness. The scalability and effectiveness of KD-SGNN are evaluated on multiple real datasets. Besides, the effectiveness of the proposed KD mechanisms is also verified through comprehensive analyses.

PaperID: 445,

Authors: Morgan B. Talbot, Rushikesh Zawar, Rohil Badkundri, Mengmi Zhang, Gabriel Kreiman

Affiliations: Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA, USA; Boston Children’s Hospital (BCH), Boston, MA, USA

Title: Tuned Compositional Feature Replays for Efficient Stream Learning

Abstract:
Our brains extract durable, generalizable knowledge from transient experiences of the world. Artificial neural networks come nowhere close to this ability. When tasked with learning to classify objects by training on nonrepeating video frames in temporal order (online stream learning), models that learn well from shuffled datasets catastrophically forget old knowledge upon learning new stimuli. We propose a new continual learning algorithm, compositional replay using memory blocks (CRUMB), which mitigates forgetting by replaying feature maps reconstructed by combining generic parts. CRUMB concatenates trainable and reusable “memory block” vectors to compositionally reconstruct feature map tensors in convolutional neural networks (CNNs). Storing the indices of memory blocks used to reconstruct new stimuli enables memories of the stimuli to be replayed during later tasks. This reconstruction mechanism also primes the neural network to minimize catastrophic forgetting by biasing it toward attending to information about object shapes more than information about image textures and stabilizes the network during stream learning by providing a shared feature-level basis for all training examples. These properties allow CRUMB to outperform an otherwise identical algorithm that stores and replays raw images while occupying only 3.6% as much memory. We stress-tested CRUMB alongside 13 competing methods on seven challenging datasets. To address the limited number of existing online stream learning datasets, we introduce two new benchmarks by adapting existing datasets for stream learning. With only 3.7%–4.1% as much memory and 15%–43% as much runtime, CRUMB mitigates catastrophic forgetting more effectively than the state-of-the-art. Our code is available at https://github.com/MorganBDT/crumb.git

PaperID: 446,

Authors: Nisha Huang, Yuxin Zhang, Fan Tang, Chongyang Ma, Haibin Huang, Weiming Dong, Changsheng Xu

Affiliations: State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences, Beijing, China; Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; Kuaishou Technology, Beijing, China

Title: DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization

Abstract:
Despite the impressive results of arbitrary image-guided style transfer methods, text-driven image stylization has recently been proposed for transferring a natural image into a stylized one according to textual descriptions of the target style provided by the user. Unlike the previous image-to-image transfer approaches, text-guided stylization progress provides users with a more precise and intuitive way to express the desired style. However, the huge discrepancy between cross-modal inputs/outputs makes it challenging to conduct text-driven image stylization in a typical feed-forward CNN pipeline. In this article, we present DiffStyler, a dual diffusion processing architecture to control the balance between the content and style of the diffused results. The cross-modal style information can be easily integrated as guidance during the diffusion process step-by-step. Furthermore, we propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image. We validate the proposed DiffStyler beyond the baseline methods through extensive qualitative and quantitative experiments. The code is available at https://github.com/haha-lisa/Diffstyler.

PaperID: 447,

Authors: Longjie Zhang, Yong Chen

Affiliations: School of Automation Engineering and the Institute of Electric Vehicle Driving System and Safety Technology, University of Electronic Science and Technology of China, Chengdu, China

Title: Finite-Time Adaptive Dynamic Programming for Affine-Form Nonlinear Systems

Abstract:
Inspired by the fusion of state optimization and finite-time convergence, the finite-time optimal control (FTOC) for the affine-form nonlinear systems is investigated in this article. To achieve optimal stability with finite response time, a novel finite-time adaptive dynamic programming (FTADP) is presented for the affine-form nonlinear systems. By mapping the value function into finite-time stability space with the transformation function, the Bellman equation with finite-time stability space is first obtained. Then, by solving the Hamilton–Jacobi–Bellman (HJB) equation, the new FTOC strategy is presented with the theoretical finite-time stability description. Furthermore, to solve the above optimal controller with nonlinearity characteristic, the novel adaptive dynamic programming (ADP) based on the finite-time critic-actor offline neural network (NN) approximation algorithm is implemented, and the corresponding finite-time convergence characteristic is illustrated theoretically. Eventually, the application analysis on the circuit systems shows that the proposed FTADP has superiority compared with general optimal control.

PaperID: 448,

Authors: Jing Wang, Jianhui Lv, Xin Geng

Affiliations: School of Computer Science and Engineering, Southeast University, Nanjing, China; Pengcheng Laboratory, Shenzhen, China

Title: Label Distribution Learning by Partitioning Label Distribution Manifold

Abstract:
Researchers have suggested leveraging label correlation to deal with the exponentially sized output space of label distribution learning (LDL). Among them, some have proposed to exploit local label correlation. They first partition the training set into different groups and then exploit local label correlation on each one. However, these works usually apply clustering algorithms, such as K -means, to split the training set and obtain the clustering results independent of label correlation. The structures (e.g., low rank and manifold) learned on such clusters may not efficiently capture label correlation. To solve this problem, we put forward a novel LDL method called LDL by partitioning label distribution manifold (LDL-PLDM). First, it jointly bipartitions the training set and learns the label distribution manifold to model label correlation. Second, it recurses until the reconstruction error of learning the label distribution manifold cannot be reduced. LDL-PLDM achieves label-correlation-related partition results, on which the learned label distribution manifold can better capture label correlation. We conduct extensive experiments to justify that LDL-PLDM statistically outperforms state-of-the-art LDL methods.

PaperID: 449,

Authors: Hao Wang, Jiahu Qin, Zhen Kan

Affiliations: Department of Automation, University of Science and Technology of China, Hefei, Anhui, China

Title: Shielded Planning Guided Data-Efficient and Safe Reinforcement Learning

Abstract:
Safe reinforcement learning (RL) has shown great potential for building safe general-purpose robotic systems. While many existing works have focused on post-training policy safety, it remains an open problem to ensure safety during training as well as to improve exploration efficiency. Motivated to address these challenges, this work develops shielded planning guided policy optimization (SPPO), a new model-based safe RL method that augments policy optimization algorithms with path planning and shielding mechanism. In particular, SPPO is equipped with shielded planning for guided exploration and efficient data collection via model predictive path integral (MPPI), along with an advantage-based shielding rule to keep the above processes safe. Based on the collected safe data, a task-oriented parameter optimization (TOPO) method is used for policy improvement, as well as the observation-independent latent dynamics enhancement. In addition, SPPO provides explicit theoretical guarantees, i.e., clear theoretical bounds for training safety, deployment safety, and the learned policy performance. Experiments demonstrate that SPPO outperforms baselines in terms of policy performance, learning efficiency, and safety performance during training.

PaperID: 450,

Authors: Zhi-Lin Zhao, Longbing Cao, Chang-Dong Wang

Affiliations: Data Science Laboratory, School of Computing and DataX Research Centre, Macquarie University, Sydney, NSW, Australia; School of Computer Science and Engineering, the Guangdong Province Key Laboratory of Computational Science, and the Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, Sun Yat-sen University, Guangzhou, China

Title: Gray Learning From Non-IID Data With Out-of-Distribution Samples

Abstract:
The integrity of training data, even when annotated by experts, is far from guaranteed, especially for non-independent and identically distributed (non-IID) datasets comprising both in- and out-of-distribution samples. In an ideal scenario, the majority of samples would be in-distribution, while samples that deviate semantically would be identified as out-of-distribution and excluded during the annotation process. However, experts may erroneously classify these out-of-distribution samples as in-distribution, assigning them labels that are inherently unreliable. This mixture of unreliable labels and varied data types makes the task of learning robust neural networks notably challenging. We observe that both in- and out-of-distribution samples can almost invariably be ruled out from belonging to certain classes, aside from those corresponding to unreliable ground-truth labels. This opens the possibility of utilizing reliable complementary labels that indicate the classes to which a sample does not belong. Guided by this insight, we introduce a novel approach, termed gray learning (GL), which leverages both ground-truth and complementary labels. Crucially, GL adaptively adjusts the loss weights for these two label types based on prediction confidence levels. By grounding our approach in statistical learning theory, we derive bounds for the generalization error, demonstrating that GL achieves tight constraints even in non-IID settings. Extensive experimental evaluations reveal that our method significantly outperforms alternative approaches grounded in robust statistics.

PaperID: 451,

Authors: Lu Jin, Zechao Li, Yonghua Pan, Jinhui Tang

Affiliations: School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China

Title: Relational Consistency Induced Self-Supervised Hashing for Image Retrieval

Abstract:
This article proposes a new hashing framework named relational consistency induced self-supervised hashing (RCSH) for large-scale image retrieval. To capture the potential semantic structure of data, RCSH explores the relational consistency between data samples in different spaces, which learns reliable data relationships in the latent feature space and then preserves the learned relationships in the Hamming space. The data relationships are uncovered by learning a set of prototypes that group similar data samples in the latent feature space. By uncovering the semantic structure of the data, meaningful data-to-prototype and data-to-data relationships are jointly constructed. The data-to-prototype relationships are captured by constraining the prototype assignments generated from different augmented views of an image to be the same. Meanwhile, these data-to-prototype relationships are preserved to learn informative compact hash codes by matching them with these reliable prototypes. To accomplish this, a novel dual prototype contrastive loss is proposed to maximize the agreement of prototype assignments in the latent feature space and Hamming space. The data-to-data relationships are captured by enforcing the distribution of pairwise similarities in the latent feature space and Hamming space to be consistent, which makes the learned hash codes preserve meaningful similarity relationships. Extensive experimental results on four widely used image retrieval datasets demonstrate that the proposed method significantly outperforms the state-of-the-art methods. Besides, the proposed method achieves promising performance in out-of-domain retrieval tasks, which shows its good generalization ability. The source code and models are available at https://github.com/IMAG-LuJin/RCSH.

PaperID: 452,

Authors: Yifan Chen, Yang Zhao, Xuelong Li

Affiliations: School of Artificial Intelligence, OPtics and ElectroNics (iOPEN) and the Key Laboratory of Intelligent Interaction and Applications, Ministry of Industry and Information Technology, Northwestern Polytechnical University, Xi’an, China

Title: Adaptive Gait Feature Learning Using Mixed Gait Sequence

Abstract:
Gait recognition has become a mainstream technology for identification, as it can recognize the identity of subjects from a distance without any cooperation. However, when subjects wear coats (CL) or backpacks (BG), their gait silhouette will be occluded, which will lose some gait information and bring great difficulties to the identification. Another important challenge in gait recognition is that the gait silhouette of the same subject captured by different camera angles varies greatly, which will cause the same subject to be misidentified as different individuals under different camera angles. In this article, we try to overcome these problems from three aspects: data augmentation, feature extraction, and feature refinement. Correspondingly, we propose gait sequence mixing (GSM), multigranularity feature extraction (MFE), and feature distance alignment (FDA). GSM is a method that belongs to data enhancement, which uses the gait sequences in NM to assist in learning the gait sequences in BG or CL, thus reducing the influence of lost gait information in abnormal gait sequences (BG or CL). MFE explores and fuses different granularity features of gait sequences from different scales, and it can learn as much useful information as possible from incomplete gait silhouettes. FDA refines the extracted gait features with the help of the distribution of gait features in real world and makes them more discriminative, thus reducing the influence of various camera angles. Extensive experiments demonstrate that our method has better results than some state-of-the-art methods on CASIA-B and mini-OUMVLP. We also embed the GSM module and FDA module into some state-of-the-art methods, and the recognition accuracy of these methods is greatly improved.

PaperID: 453,

Authors: Birgit Hillebrecht, Benjamin Unger

Affiliations: Stuttgart Center for Simulation Science, University of Stuttgart, Stuttgart, Germany

Title: Rigorous a Posteriori Error Bounds for PDE-Defined PINNs

Abstract:
Prediction error quantification in machine learning has been left out of most methodological investigations of neural networks (NNs), for both purely data-driven and physics-informed approaches. Beyond statistical investigations and generic results on the approximation capabilities of NNs, we present a rigorous upper bound on the prediction error of physics-informed NNs (PINNs). This bound can be calculated without the knowledge of the true solution and only with a priori available information about the characteristics of the underlying dynamical system governed by a partial differential equation (PDE). We apply this a posteriori error bound exemplarily to four problems: the transport equation, the heat equation, the Navier–Stokes equation (NSE), and the Klein–Gordon equation.

PaperID: 454,

Authors: Yunzhi Ling, Feiping Nie, Weizhong Yu, Xuelong Li

Affiliations: School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi’an, Shaanxi, China; School of Artificial Intelligence, Optics and Electronics (iOPEN), and the Key Laboratory of Intelligent Interaction and Applications (Ministry of Industry and Information Technology), Northwestern Polytechnical University, Xi’an, Shaanxi, China

Title: Discriminative and Robust Autoencoders for Unsupervised Feature Selection

Abstract:
Many recent research works on unsupervised feature selection (UFS) have focused on how to exploit autoencoders (AEs) to seek informative features. However, existing methods typically employ the squared error to estimate the data reconstruction, which amplifies the negative effect of outliers and can lead to performance degradation. Moreover, traditional AEs aim to extract latent features that capture intrinsic information of the data for accurate data recovery. Without incorporating explicit cluster structure-detecting objectives into the training criterion, AEs fail to capture the latent cluster structure of the data which is essential for identifying discriminative features. Thus, the selected features lack strong discriminative power. To address the issues, we propose to jointly perform robust feature selection and k -means clustering in a unified framework. Concretely, we exploit an AE with a l_2,1 -norm as a basic model to seek informative features. To improve robustness against outliers, we introduce an adaptive weight vector for the data reconstruction terms of AE, which assigns smaller weights to the data with larger errors to automatically reduce the influence of the outliers, and larger weights to the data with smaller errors to strengthen the influence of clean data. To enhance the discriminative power of the selected features, we incorporate k -means clustering into the representation learning of the AE. This allows the AE to continually explore cluster structure information, which can be used to discover more discriminative features. Then, we also present an efficient approach to solve the objective of the corresponding problem. Extensive experiments on various benchmark datasets are provided, which clearly demonstrate that the proposed method outperforms state-of-the-art methods.

PaperID: 455,

Authors: Wei Liu, Rongxin Cui, Yinglin Li, Shouxu Zhang

Affiliations: School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an, China

Title: Hybrid-Input Convolutional Neural Network-Based Underwater Image Quality Assessment

Abstract:
Since precisely sensing the underwater environment is a challenging prerequisite for safe and reliable underwater operation, interest in underwater image processing is growing at a rapid pace. In engineering applications, there are redundant underwater images addressed in real-time on the remotely operated vehicle (ROV). It puts the equipment or operators under great pressure. To relieve this pressure by transmitting images selectively according to the degradation degree, we propose an end-to-end hybrid-input convolutional neural network (HI-CNN) to predict the degradation of underwater images. First, we propose a feature extraction module to extract the features of original underwater images and saliency maps concurrently, which is composed of two branches with the same structure and shared parameters. Second, we design an end-to-end model to predict the quality scores of original images, which consists of a feature extraction module and a prediction module. Finally, we establish a real-world dataset to make the proposed model be duplicated in the practical underwater environment. Through several experiments, we demonstrate that the proposed model outperforms existing models in predicting underwater image quality.

PaperID: 456,

Authors: Shuhan Li, Xiaomeng Li, Xiaowei Xu, Kwang-Ting Cheng

Affiliations: Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, SAR, China; Department of Cardiovascular Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Southern Medical University, Guangzhou, China

Title: Dynamic Subcluster-Aware Network for Few-Shot Skin Disease Classification

Abstract:
This article addresses the problem of few-shot skin disease classification by introducing a novel approach called the subcluster-aware network (SCAN) that enhances accuracy in diagnosing rare skin diseases. The key insight motivating the design of SCAN is the observation that skin disease images within a class often exhibit multiple subclusters, characterized by distinct variations in appearance. To improve the performance of few-shot learning (FSL), we focus on learning a high-quality feature encoder that captures the unique subclustered representations within each disease class, enabling better characterization of feature distributions. Specifically, SCAN follows a dual-branch framework, where the first branch learns classwise features to distinguish different skin diseases, and the second branch aims to learn features, which can effectively partition each class into several groups so as to preserve the subclustered structure within each class. To achieve the objective of the second branch, we present a cluster loss to learn image similarities via unsupervised clustering. To ensure that the samples in each subcluster are from the same class, we further design a purity loss to refine the unsupervised clustering results. We evaluate the proposed approach on two public datasets for few-shot skin disease classification. The experimental results validate that our framework outperforms the state-of-the-art methods by around 2%–5% in terms of sensitivity, specificity, accuracy, and F1-score on the SD-198 and Derm7pt datasets.

PaperID: 457,

Authors: Pengzhen Ren, Kaidong Zhang, Hetao Zheng, Zixuan Li, Yuhang Wen, Fengda Zhu, Shikui Ma, Xiaodan Liang

Affiliations: PengCheng Laboratory, Shenzhen, China; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China; School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an, China; School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, China; ByteDance Ltd., Beijing, China; Hunan Artificial Intelligence and Robotics Institute Company Ltd., Changsha, China

Title: Surfer: A World Model-Based Framework for Vision-Language Robot Manipulation

Abstract:
Considering how to make the model accurately understand and follow natural language instructions and perform actions consistent with world knowledge is a key challenge in robot manipulation. This mainly includes human fuzzy instruction reasoning and the following of physical knowledge. Therefore, the embodied intelligence agent must have the ability to model world knowledge from training data. However, most existing vision and language robot manipulation methods mainly operate in less realistic simulators and language settings and lack explicit modeling of world knowledge. To bridge this gap, we introduce a novel and simple robot manipulation framework, called Surfer. It is based on the world model, treats robot manipulation as a state transfer of the visual scene, and decouples it into two parts: action and scene. Then, the generalization ability of the model on new instructions and new scenes is enhanced by explicit modeling of the action and scene prediction in multimodal information. In addition, we built a robot manipulation simulation platform that supports physics execution based on the MuJoCo physics engine. It can automatically generate demonstration training data and test data, effectively reducing labor costs. To conduct a comprehensive and systematic evaluation of the visual-language understanding and physical execution of the manipulation model, we also created a robotic manipulation benchmark with different difficulty levels, called SeaWave. It contains four visual-language manipulation tasks of different difficulty levels and can provide a standardized testing platform for embedded AI agents in multimodal environments. Overall, we hope Surfer can freely surf in the robot’s SeaWave benchmark. Extensive experiments show that Surfer consistently outperforms all baselines significantly in all manipulation tasks. On average, Surfer achieved a success rate of 54.74% on the defined four levels of manipulation tasks, exceeding the best baseline performance of 51.07%. The simulator, code, and benchmarks are released at https://pzhren.github.io/Surfer.

PaperID: 458,

Authors: Jinghao Wang, Zhang Li, Cong Sun, Yulan Guo, Zi Wang, Qifeng Yu

Affiliations: College of Aerospace Science and Engineering, National University of Defense Technology, Changsha, China; Xi’an Satellite Control Center, State Key Laboratory of Astronautic Dynamics, Xi’an, China; College of Electronic Science and Technology, National University of Defense Technology, Changsha, China

Title: Satellite Pose Set Estimation by Uncertainty-Guided Conformal Keypoint Detection

Abstract:
Satellite pose estimation constitutes a critical technology in the aerospace tasks. The tradeoff between accuracy and efficiency becomes paramount for successful mission execution, due to the limited computational resources of on-board systems. Existing methods predominantly provide single-point estimations, which fall short of fulfilling the uncertainty quantification requirements demanded by safety-critical space operations. To address these problems, we first propose uncertainty-guided conformal keypoint detection to predict keypoint inductive conformal prediction (IndCP) set and then design a uncertainty propagation strategy to obtain pose uncertainty set. Specifically, we build our method upon a transformer-based keypoint predictor, which directly outputs uncertainty-guided keypoints. We first propose a nonconformal function to generate keypoint IndCP set to cover the ground-truth keypoint with a certain probability. We then apply Monte Carlo to sample within the keypoint IndCP set and estimate the poses by solving the perspective-n-point (PnP) problem. The top-n poses with the smallest conformal reprojection error are used to construct a convex hull, which are defined as the pose uncertainty set. Furthermore, we take the mean of the top-n poses as the average pose. Experiments on the Spacecraft PosE Estimation challenge Dataset (SPEED) and LineMOD Occlusion (LMO) dataset show that not only the average pose demonstrates higher accuracy but also the pose uncertainty sets can cover the true pose with the certain probability.

PaperID: 459,

Authors: Yuntao Shou, Xiangyong Cao, Deyu Meng

Affiliations: School of Computer Science and Technology, Ministry of Education Key Laboratory for Intelligent Networks and Network Security, Xi’an Jiaotong University, Xi’an, Shaanxi, China; School of Mathematics and Statistics and the Ministry of Education Key Laboratory of Intelligent Networks and Network Security, Xi’an Jiaotong University, Xi’an, Shaanxi, China

Title: SpeGCL: Self-Supervised Graph Spectrum Contrastive Learning Without Positive Samples

Abstract:
Graph contrastive learning (GCL) has emerged as a powerful method for dealing with noise and fluctuations in graph-structured data, and can be applied to social networks and knowledge graphs. Although various graph augmentation strategies have emerged in the field of GCL, traditional graph convolutional network (GCN) mainly tends to preserve smooth features and has difficulty capturing fine-grained changes between different views. To address the above issue, we first construct Fourier graph neural network (FourierGNN) from the perspective of graph spectrum learning, which captures different frequency components by stacking multiple Fourier graph operations (FGO) layers in Fourier space. Then, we find that the difference between the high-frequency information of two augmented graphs should be larger than the difference between the low-frequency information. Next, we theoretically prove that focusing only on pushing negative pairs farther away can more effectively achieve performance advantages. By leveraging these discoveries, we propose a novel self-supervised graph spectrum contrastive learning framework, i.e., SpeGCL, and design an effective contrastive strategy to optimize this goal. We also provide a theoretical justification for the efficacy of using only negative samples in SpeGCL. Extensive experiments have been conducted on unsupervised, transfer, and semi-supervised learning tasks to show that SpeGCL outperforms existing state-of-the-art (SOTA) GCL methods.

PaperID: 460,

Authors: Yexin Liu, Weiming Zhang, Athanasios V. Vasilakos, Lin Wang

Affiliations: Division of Emerging Interdisciplinary Areas (EMIA), Hong Kong University of Science and Technology, Hong Kong, China; Artificial Intelligence Thrust, Hong Kong University of Science and Technology, Guangzhou, China; Center for AI Research (CAIR), University of Agder (UiA), Grimstad, Norway; School of Electrical and Electronic Engineering, Nanyang Technological University, Jurong West, Singapore

Title: Unsupervised Visible-Infrared ReID via Pseudo-Label Correction and Modality-Level Alignment

Abstract:
Unsupervised visible–infrared person reidentification (UVI-ReID) has recently gained great attention due to its potential for enhancing human detection in diverse environments without labeling. Previous methods utilize intramodality clustering and cross-modality feature matching to achieve UVI-ReID. However, there exist two challenges: 1) noisy pseudo-labels might be generated in the clustering process and 2) the cross-modality feature alignment via matching the marginal distribution of visible and infrared modalities may misalign the different identities from the two modalities. In this article, we first conduct a theoretical analysis where an interpretable generalization upper bound is introduced. Based on the analysis, we then propose a novel unsupervised cross-modality person reidentification framework (PRAISE). Specifically, to address the first challenge, we propose a pseudo-label correction (PLC) strategy that utilizes a beta mixture model (BMM) to predict the probability of misclustering-based network’s memory effect and rectifies the correspondence by adding a perceptual term to contrastive learning. Next, we introduce a modality-level alignment (MLA) strategy that generates paired visible–infrared latent features and reduces the modality gap by aligning the labeling function of visible and infrared features to learn identity-discriminative and modality-invariant features. Experimental results on two benchmark datasets demonstrate that our method achieves a state-of-the-art (SOTA) performance than the unsupervised visible-ReID methods.

PaperID: 461,

Authors: Zhijie Zhong, Zhiwen Yu, Xing Xi, Yue Xu, Wenming Cao, Yiyuan Yang, Kaixiang Yang, Jane You

Affiliations: School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China; Department of Information and Computing Science, Chongqing Jiaotong University, Chongqing, China; Alibaba Group, Oxford, U.K.; Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hong Kong, China

Title: SimAD: A Simple Dissimilarity-Based Approach for Time-Series Anomaly Detection

Abstract:
Despite the prevalence of reconstruction-based deep learning methods, time-series anomaly detection (TSAD) remains a tremendous challenge. Existing approaches often struggle with limited temporal contexts, insufficient representation of normal patterns, and flawed evaluation metrics, all of which hinder their effectiveness in detecting anomalous behavior. To address these issues, we introduce a simple dissimilarity-based approach for time-series anomaly detection (SimAD). Specifically, SimAD first incorporates a patching-based feature extractor capable of processing extended temporal windows and employs the EmbedPatch encoder to fully integrate normal behavioral patterns. Second, we design an innovative ContrastFusion module in SimAD, which strengthens the robustness of anomaly detection by highlighting the distributional differences between normal and abnormal data. Third, we introduce two robust enhanced evaluation metrics, unbiased affiliation (UAff) and normalized affiliation (NAff), designed to overcome the limitations of existing metrics by providing better distinctiveness and semantic clarity. The reliability of these two metrics has been demonstrated by both theoretical and experimental analyses. Experiments conducted on seven diverse time-series datasets clearly demonstrate SimAD’s superior performance compared with state-of-the-art (SOTA) methods, achieving relative improvements of 19.85% on F1 , 4.44% on Aff-F1, 77.79% on NAff-F1, and 9.69% on AUC on six multivariate datasets. Code and pretrained models are available at https://github.com/EmorZz1G/SimAD

PaperID: 462,

Authors: Ye Liu, Hongshan Pu, Junjun Pan, Michael K. Ng, Hongmin Cai

Affiliations: School of Future Technology, South China University of Technology, Guangzhou, China; Department of Mathematics, Hong Kong Baptist University, Hong Kong, China

Title: Anchor-Based Multiview Subspace Clustering With Anchor-wise and Class-wise Alignments

Abstract:
Multiview subspace clustering has shown promising performance in multimedia and data mining applications. However, its employment in large-scale datasets is limited due to its quadratic or even cubic computational complexity. The anchor graph strategy, which selects a few important samples (anchors) to represent the whole data for different views, has been introduced to address this challenge. These methods rely on a heuristic assumption that the correspondence and class structures between the sets of anchors across different views are the same. This assumption ignores the difference in the ordering of anchors with respect to their associated classes and the number of anchors belonging to the same class from different views. As a result, this can lead to unsatisfactory clustering results due to incorrect anchorwise and classwise alignments. To tackle this issue, this article proposes an anchor-based multiview subspace clustering with anchorwise and classwise alignments (AMCA2) method. Specifically, the proposed method simultaneously aligns and fuses multiple anchor graphs anchor wisely and class wisely via learning permutation matrices and utilizing the Hadamard product. To further enhance the clustering performance of AMCA2, we propose a novel anchor selection method called kernel anchor selection (KAS) to select more representative anchors. Extensive experiments on ten benchmark datasets are conducted to show the superiority and effectiveness of AMCA2 over the state-of-the-art methods.

PaperID: 463,

Authors: Yuqi Xiao, Muideen Adegoke, Chi-Sing Leung, Kwok Wa Leung

Affiliations: Department of Electrical Engineering, State Key Laboratory of Terahertz and Millimeter Waves, City University of Hong Kong, Hong Kong, SAR, China; Department of Electrical Engineering, City University of Hong Kong, Hong Kong, SAR, China

Title: Robust Fault-Aware Extreme Learning Machine Based on Maximum Correntropy

Abstract:
Extreme learning machine (ELM) is an effective and efficient neural model for universal approximation. However, its practical performance can degrade due to weight noise, node faults, and outliers. This brief introduces a robust ELM algorithm designed to address these issues and enhance network robustness. We first analyze the square error of the classic ELM, considering both weight noise and node faults. By integrating an outlier-resistant method, the maximum correntropy criterion (MCC), we derive a new objective function to bolster network resilience. This leads to the development of the robust fault-aware ELM (RFAELM) algorithm. The convergence property of RFAELM is rigorously proven. For validation, the proposed algorithm is evaluated in various noise and fault levels using eight different benchmark datasets. The simulation results, encompassing all imperfect conditions and datasets, verify the robustness and generalization of this new algorithm. Also, the new algorithm is compared with other robust ELM algorithms using different statistical measurements. The superior performance of RFAELM substantiates its significant improvement over existing algorithms.

PaperID: 464,

Authors: Zhixiu Lu, Hailong Li, Nehal A. Parikh, Jonathan R. Dillman, Lili He

Affiliations: Department of Radiology, Imaging Research Center, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA; Neurodevelopmental Disorders Prevention Center, Perinatal Institute, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA; Department of Radiology, University of Cincinnati College of Medicine, Cincinnati, OH, USA

Title: RadCLIP: Enhancing Radiologic Image Analysis Through Contrastive Language-Image Pretraining

Abstract:
The integration of artificial intelligence (AI) with radiology signifies a transformative era in medicine. Vision foundation models have been adopted to enhance radiologic imaging analysis. However, the inherent complexities of 2D and 3D radiologic data present unique challenges that existing models, which are typically pretrained on general nonmedical images, do not adequately address. To bridge this gap and harness the diagnostic precision required in radiologic imaging, we introduce radiologic contrastive language–image pretraining (RadCLIP): a cross-modal vision-language foundational model that utilizes a vision-language pretraining (VLP) framework to improve radiologic image analysis. Building on the contrastive language–image pretraining (CLIP) approach, RadCLIP incorporates a slice pooling mechanism designed for volumetric image analysis and is pretrained using a large, diverse dataset of radiologic image-text pairs. This pretraining effectively aligns radiologic images with their corresponding text annotations, resulting in a robust vision backbone for radiologic imaging. Extensive experiments demonstrate RadCLIP’s superior performance in both unimodal radiologic image classification and cross-modal image-text matching, underscoring its significant promise for enhancing diagnostic accuracy and efficiency in clinical settings. Our key contributions include curating a large dataset featuring diverse radiologic 2D/3D image-text pairs, pretraining RadCLIP as a vision-language foundation model on this dataset, developing a slice pooling adapter with an attention mechanism for integrating 2D images, and conducting comprehensive evaluations of RadCLIP on various radiologic downstream tasks.

PaperID: 465,

Authors: Tianyang Zhong, Yi Pan, Yutong Zhang, Yaonai Wei, Li Yang, Zhengliang Liu, Xiaozheng Wei, Wenjun Li, Junjie Yao, Chong Ma, Xi Jiang, Dinggang Shen, Junwei Han, Tuo Zhang

Affiliations: School of Automation, Northwestern Polytechnical University, Xi’an, China; Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China; Institute of Medical Research, Northwestern Polytechnical University, Xi’an, China; School of Computing, University of Georgia, Athens, GA, USA; School of Biomedical Engineering, ShanghaiTech University, Shanghai, China

Title: ChatABL: Abductive Learning via Natural Language Interaction With ChatGPT

Abstract:
Large language models (LLMs) such as ChatGPT have recently demonstrated significant potential in mathematical abilities, providing a valuable reasoning paradigm consistent with human natural language. However, LLMs currently have difficulty in bridging perception, language understanding, and reasoning (PLR) capabilities due to incompatibility of the underlying information flow among them, making their reasoning ability not fully elicited and challenging to accomplish complicated reasoning tasks autonomously. To resolve the above problem, a novel method called ChatABL is proposed by integrating LLMs into an abductive learning (ABL) framework, capable of unifying the three abilities effectively in a more user-friendly and understandable manner. Initially, the proposed method uses LLMs to correct the incomplete logical facts for optimizing the perception module, by summarizing and reorganizing domain knowledge represented in natural language format. Then, the perception module also provides necessary logical reasoning materials for feeding LLMs. Finally, these parts are integrated into a dynamic closed-loop system by introducing the feedback form and automatic learning strategies to mutually promote their performance. As a testbed, the variable-length handwritten equation decipherment (HED), an abstract expression of the Mayan calendar decoding, is used to demonstrate that ChatABL has reasoning ability beyond most existing state-of-the-art methods, which has been well-supported by comparative studies. To the best of authors’ knowledge, the proposed ChatABL is the first attempt to explore a possible and novel avenue to approaching human-level cognitive ability via natural language interaction by means of ChatGPT.

PaperID: 466,

Authors: Yifan Zhang, Yang Yu, Hao Li, Anqi Wu, Xin Chen, Jinfang Liu, Ling-Li Zeng, Dewen Hu

Affiliations: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China; Department of Neurosurgery, Xiangya Hospital, National Clinical Medical Research Center for Geriatric Diseases, Central South University, Changsha, China

Title: DMAE-EEG: A Pretraining Framework for EEG Spatiotemporal Representation Learning

Abstract:
Electroencephalography (EEG) plays a crucial role in neuroscience research and clinical practice, but it remains limited by nonuniform data, noise, and difficulty in labeling. To address these challenges, we develop a pretraining framework named DMAE-EEG, a denoising masked autoencoder for mining generalizable spatiotemporal representation from massive unlabeled EEG. First, we propose a novel brain region topological heterogeneity (BRTH) division method to partition the nonuniform data into fixed patches based on neuroscientific priors. Second, we design a denoised pseudo-label generator (DPLG), which utilizes a denoising reconstruction pretext task to enable the learning of generalizable representations from massive unlabeled EEG, suppressing the influence of noise and artifacts. Furthermore, we utilize an asymmetric autoencoder with self-attention as the backbone in the proposed DMAE-EEG, which captures long-range spatiotemporal dependencies and interactions from unlabeled EEG data across 14 public datasets. The proposed DMAE-EEG is validated on both generative (signal quality enhancement) and discriminative tasks (motion intention recognition). In the quality enhancement, DMAE-EEG outperforms existing statistical methods with normalized mean squared error (nMSE) reduction of 27.78%–50.00% under corruption levels of 25%, 50%, and 75%, respectively. In motion intention recognition, DMAE-EEG achieves a relative improvement of 2.71%–6.14% in intrasession classification balanced accuracy across 2–6 class motor execution and imagery tasks, outperforming state-of-the-art methods. Overall, the results suggest that the pretraining framework DMAE-EEG can capture generalizable spatiotemporal representations from massive unlabeled EEG and enhance the knowledge transferability across sessions, subjects, and tasks in various downstream scenarios, advancing EEG-aided diagnosis and brain–computer communication and control, and other clinical practice.

PaperID: 467,

Authors: Shenfu Zhang, Qiang Liu, Zhenhua Zhang, Rui Zhao, Liang Chen, Feng Shao, Xiangchao Meng

Affiliations: Faculty of Information Science and Engineering, Ningbo University, Ningbo, China; School of Computer and Artificial Intelligence, Huanghuai University, Zhumadian, China

Title: Spatial-Spectral Heterogeneity-Aware Network for Hyperspectral and LiDAR Joint Classification

Abstract:
The integration of hyperspectral (HS) imagery and light detection and ranging (LiDAR) data for land cover classification has emerged as a prominent research focus. Despite the satisfactory classification accuracies achieved by existing methodologies, several unaddressed issues that remain warrant consideration. First, current approaches overlook the pronounced spectral and spatial heterogeneities in remote sensing (RS) images designated for multiclassification tasks, limiting the performance of classification models. Moreover, most existing studies amalgamate elevation features with other characteristics through simple addition and interaction operations, and they do not delve deeply into exploiting elevation height information, leading to an imbalance in the representation of elevation height. In light of the aforementioned issues, this article introduces a spatial–spectral heterogeneity-aware network (S2HANet) for the joint classification of HS and LiDAR data. Specifically, a shared spectral correction module (SSCM) is designed in the spectral branch to preliminarily alleviate the problem of large intraclass variance, followed by the use of a contrastive learning framework to enhance the intraclass compactness and interclass separability of spectral features. A multichannel signed distance discrimination module (MCSDDM) is developed to learn the distance relationships between intra- and interclass pixels and boundaries, and using prior boundary information to improve spatial boundary information. In addition, an elevation boost module (EBM) and an elevation injection module (EIM) are meticulously designed to phase-in elevation height information, further enhancing the utilization of elevation data and better facilitating the fusion of the two modalities. The proposed S2HANet has demonstrated exceptional classification performance across three opening benchmark datasets.

PaperID: 468,

Authors: Xinqiao Zhao, Mingjie Sun, Eng Gee Lim, Yao Zhao, Jimin Xiao

Affiliations: School of Advanced Technology, Xi’an Jiaotong-Liverpool University, Suzhou, China; School of Computer Science and Technology, Soochow University, Suzhou, China; Institute of Information Science, Beijing Jiaotong University, Beijing, China

Title: SFBM: Shared Feature Bias Mitigating for Long-Tailed Image Recognition

Abstract:
Long-tailed distribution exists in real-world scenario and compromises the performance of recognition models. In this article, we point out that a neural network classifier has a shared feature bias, which tends to regard the shared features among different classes as head-class discriminative features, leading to misclassifications on tail-class samples under long-tailed scenarios. To solve this issue, we propose a shared feature bias mitigating (SFBM) framework. Specifically, we create two parallel classifiers trained concurrently with the baseline classifier, using our special training loss. The parallel classifier weight sums are then used for estimating the shared feature components in baseline classifier weights. Finally, we rectify the baseline classifier by removing the estimated shared feature components from it while supplementing the parallel classifier weights class by class to the rectified classifier weights, mitigating shared feature bias. Our proposed SFBM demonstrates broad compatibility with nearly all recognition methods while maintaining high computational efficiency, as it introduces no additional computation during inference. Extensive experiments on CIFAR10/100-LT, ImageNet-LT, and iNaturalist 2018 demonstrate that simply incorporating SFBM during the training phase consistently boosts the performance of various state-of-the-art methods by significant margins. The complete source code will be made publicly available at https://github.com/bzbz-bot/SFBM

PaperID: 469,

Authors: Jing Liang, Genyue Liu, Ying Bi, Mingyuan Yu, Mengnan Liu, Yaochu Jin

Affiliations: School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, China; State Key Laboratory of Intelligent Agricultura Power Equipment, Luoyang, China; State Key Laboratory of Power System of Tractor, Luoyang, China; School of Engineering, Westlake University, Hangzhou, China

Title: Evolutionary Neural Architecture Search for Remote Sensing Image Classification

Abstract:
Remote sensing scene classification is a vital task in remote sensing image analysis with significant application potential. In recent years, convolutional neural network (CNN)-based methods have shown remarkable promise in classifying remote sensing scene images. However, these methods often require extensive trial and error and rely heavily on expert knowledge. To address these challenges, this article proposes a novel neural architecture search (NAS) approach that automatically designs CNNs for remote sensing scene classification. Specifically, an evolutionary algorithm (EA) is employed to search for well-structured basic modules, which are then combined to construct a new architecture. To further enhance the search process, a new population generation strategy is introduced to promote diversity and mitigate premature convergence. Additionally, a random forest-based selection mechanism is utilized to identify high-quality individuals based on estimated fitness values, effectively reducing computational complexity. The proposed approach is evaluated on three benchmark remote sensing scene datasets and compared with several widely used CNNs. The experimental results demonstrate that the proposed approach can discover CNN architectures that not only surpass state-of-the-art performance but also achieve this with fewer parameters and lower search cost.

PaperID: 470,

Authors: Haiteng Wang, Lei Ren, Yikang Li, Yuqing Wang

Affiliations: School of Automation Science and Electrical Engineering, Beihang University, Beijing, China

Title: MetaIndux-TS: Frequency-Aware AIGC Foundation Model for Industrial Time Series

Abstract:
Implementing advanced AI techniques in industrial manufacturing requires large volumes of annotated sensor data. Unfortunately, collecting such data is often impractical due to extreme environments and the manual burden of expert annotation. Recent advancements in artificial intelligence generated content (AIGC) have inspired the exploration of industrial time-series generation to mitigate data shortages. However, existing AIGC models encounter difficulties in generating industrial time series due to their complex temporal dynamics, multichannel intercolumn correlations, and diverse frequency characteristics. To address these challenges, we propose MetaIndux-TS, a frequency-informed AIGC foundation model based on diffusion model frameworks. This model is designed to generate industrial time-series data under a variety of working conditions, across different types of equipment, and with variable lengths. Specifically, MetaIndux-TS integrates dual-frequency cross-attention networks, transforming time series into the frequency domain to model multivariate dependencies and capture intricate temporal details. In addition, the contrastive synthesis layer is constructed to generate high-fidelity time series by comparing periodic and long-term trends with initial noisy sequences. Comprehensive experiments show that MetaIndux-TS outperforms state-of-the-art models (SSSD, Dit, and TabDDPM), achieving a 57.5% improvement in fidelity and 20.4% in predictive score. MetaIndux-TS exhibits zero-shot generation capabilities for samples under unseen conditions, offering the potential to address data collection challenges in extreme environments. Codes are available at: https://github.com/Dolphin-wang/MetaIndux

PaperID: 471,

Authors: Dimitrios Katsikas, Nikolaos Passalis, Anastasios Tefas

Affiliations: Department of Informatics, Faculty of Sciences, Aristotle University of Thessaloniki, Thessaloniki, Greece; Department of Chemical Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece

Title: Inducing Neural Collapse via Anticlasses and One-Cold Cross-Entropy Loss

Abstract:
While softmax cross-entropy (CE) loss is the standard objective for supervised classification, it primarily focuses on the ground-truth classes, ignoring the relationships between the nontarget, complementary classes. This leaves valuable information unexploited during optimization. In this work, we propose a novel loss function, one-cold CE (OCCE) loss, which addresses this limitation by structuring the activations of these complementary classes. Specifically, for each class, we define an anticlass, which consists of everything that is not part of the target class—this includes all complementary classes as well as out-of-distribution (OOD) samples, noise, or in general any instance that does not belong to the true class. By setting a uniform one-cold encoded distribution over the complementary classes as a target for each anticlass, we encourage the model to equally distribute activations across all nontarget classes. This approach promotes a symmetric geometric structure of classes in the final feature space, increases the degree of neural collapse (NC) during training, addresses the independence deficit problem of neural networks, and improves generalization. Our extensive evaluation shows that incorporating OCCE loss in the optimization objective consistently enhances performance across multiple settings, including classification, open-set recognition, and OOD detection.

PaperID: 472,

Authors: Sheng Liu, Shaobo Zhang, Fei Gao, Yuan Feng

Affiliations: College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, Zhejiang, China; College of Science, Zhejiang University of Technology, Hangzhou, Zhejiang, China

Title: Highly Condensed All-MLP Architecture for Long-Term Human Motion Prediction

Abstract:
In artificial intelligence (AI) scenarios where computational resources are constrained, such as in autonomous driving systems, it is challenging to construct a lightweight model that can accurately predict human motion overextended duration. To tackle this challenge, we introduce a highly condensed all-multilayer perceptron (HCMLP) architecture that is engineered for supreme lightweight efficiency. This design facilitates extended-range motion predictions while maintaining uncompromised performance. First, the spatiotemporal dynamic perception (STDP) block enhances operational efficiency while maintaining a simple structure. In STDP, the distinct but parallel spatial multilayer perceptron (SMLP) and temporal multilayer perceptron (TMLP) simultaneously capture the spatial correlations between pose joints and the temporal dynamics of each joint. The subsequent dynamic aggregation (DA), coupled with the channel multilayer perceptron (CMLP), dynamically consolidates and refines spatial and temporal features, leading to improved predictive accuracy. Second, the multiterm union prediction (MTUP) block directly delivers precise predictions for periods ranging from 0 to 4000 ms, eliminating the need for repetitive short-term (ST) prediction iterations. Our experimental results on the Human3.6M, AMASS, 3DPW, and CMU-Mocap datasets demonstrate that HCMLP outperforms existing state-of-the-art (SOTA) methods in ST prediction, long-term (LT) prediction, and especially in extended and extra extended LT (ELT) predictions, all while utilizing the fewest parameters.

PaperID: 473,

Authors: Jing Zhao, Qimin Huang, Shanhu Wang, Shiliang Sun

Affiliations: School of Computer Science and Technology, East China Normal University, Shanghai, China; Department of Automation, Shanghai Jiao Tong University, Shanghai, China

Title: VB-Adapter: Variational Bayesian Adapter for Cross-Domain Speech Representation Learning

Abstract:
To leverage the abundant speech data available for pretraining, current models excel in generalization across diverse tasks. Nevertheless, real-world challenges emerge when addressing unfamiliar speech scenarios far from the pretrained speech, owing to the domain shift between pretraining (source) and fine-tuning (target) data. To overcome this barrier, we propose a variational Bayesian adapter (VB-Adapter) for cross-domain speech representation learning during fine-tuning. First, we establish a latent variable model to construct a desired posterior distribution after incorporating domain-specific knowledge to bridge the gap between the source and target domains. Then, an adaptive objective is presented to maximize the mutual information of the latent variables with and without domain-specific knowledge to facilitate model adaptation. Finally, we introduce contrastive learning on samples to optimize the lower bound of the above adaptive objective. Our experiments apply the VB-Adapter on transformers for dysarthric speech recognition (DSR) and the integration of Whisper-encoder and Llama for Mandarin speech recognition (MSR). The results reveal the effectiveness of VB-Adapter in modeling the uncertainties arising from domain shift and enhancing the robustness of speech representations.

PaperID: 474,

Authors: Oscar Pina, Verónica Vilaplana

Affiliations: Signal Theory and Communications (TSC), Universitat Politècnica de Catalunya-BarcelonaTech (UPC), Barcelona, Spain

Title: Layer-Wise Training of Graph Neural Networks With Self-Supervised Learning

Abstract:
Training graph neural networks (GNNs) on large graphs is challenging due to both the high memory and computational costs of end-to-end training and the scarcity of detailed node-level annotations. To address these challenges, we propose layer-wise regularized graph infomax (LRGI), a self-supervised learning algorithm inspired by predictive coding, a biologically motivated principle in which each layer is trained locally to predict its future inputs. LRGI trains GNNs layer by layer, decoupling their memory and time complexity from the network depth, thereby enabling scalable training on large graphs. In LRGI, each layer learns to predict the features propagated from its neighbors, allowing independent training of each layer. This approach, combined with regularization that promotes diverse representations, also helps mitigate oversmoothing in deep GNNs. Experiments on large inductive graph benchmarks demonstrate that LRGI achieves competitive performance compared to state-of-the-art end-to-end methods, while substantially improving efficiency.

PaperID: 475,

Authors: Xiao Liu, Xiaofeng Wang, Shouyi Wang, Haosong Gou, Zhengyong Wang, Chao Ren

Affiliations: College of Electronics and Information Engineering, Sichuan University, Chengdu, China; China Mobile Group Sichuan Company Ltd., Chengdu, China

Title: GBPG-Net: Global Background Prior-Guided Rain and Snow Image Restoration

Abstract:
The aim of image restoration in the presence of rain and snow effects is to eliminate these disturbances while retaining the underlying background structure. Most existing methods tend to directly learn the mapping from corrupted images to clean ones, often resulting in residual rain or snow artifacts and compromised background structures. In this work, both theoretical analysis and experimental findings confirm the robustness of the hue channel in HSV color space to rain and snow disturbances, even when extracted from corrupted images. Motivated by this insight, we propose to leverage the global clean background cues inherent in the hue channel to guide the network in preserving the image background structure and removing interference. To this end, we introduce the global background prior-guided network (GBPG-Net) for restoring rain and snow-affected images, which employs a triangular formation to facilitate continuous interaction and updating of the global background prior (GBP) with the image feature within the GBPG-unit, resulting in improved interference removal and background structure preservation. Specifically, the GBPG-Net incorporates the global clean background prior injector (GCBPI) to inject the GBP into the network. Subsequently, the prior-guided local detail excavation (PGLDE) module, built on GCBPI, further refines interference removal and structure preservation to process local details intricately. Finally, the prior-guided local-global aggregation (PGLGA) module aggregates global background features with local detailed features, enabling the network to better understand the overall content and subtle interference for more accurate reconstruction. Quantitative and qualitative evaluations on synthetic and real datasets demonstrate the effectiveness of the proposed GBPG-Net in deraining and desnowing tasks, highlighting its advantages over existing methods. The code and supplementary documentation are available at https://github.com/liux520/GBPG-Net

PaperID: 476,

Authors: Qingfeng Chen, Shiyuan Li, Yixin Liu, Shirui Pan, Geoffrey I. Webb, Shichao Zhang

Affiliations: School of Computer, Electronics and Information, Guangxi University, Nanning, China; School of Information and Communication Technology and Institute for Integrated and Intelligent Systems (IIIS), Griffith University, Nathan, QLD, Australia; Department of Data Science and AI, Faculty of IT, Monash University, Melbourne, VIC, Australia; Guangxi Key Laboratory of Multi-Source Information Mining and Security, Guangxi Normal University, Guilin, China

Title: Uncertainty-Aware Graph Neural Networks: A Multihop Evidence Fusion Approach

Abstract:
Graph neural networks (GNNs) excel in graph representation learning by integrating graph structure and node features. Existing GNNs, unfortunately, fail to account for the uncertainty of class probabilities that vary with the depth of the model, leading to unreliable and risky predictions in real-world scenarios. To bridge the gap, in this article, we propose a novel evidence-fusing graph neural network (EFGNN) to achieve trustworthy prediction, enhance node classification accuracy, and make explicit the risk of wrong predictions. In particular, we integrate the evidence theory with multihop propagation-based GNN architecture to quantify the prediction uncertainty of each node with the consideration of multiple receptive fields. Moreover, a parameter-free cumulative belief fusion (CBF) mechanism is developed to leverage the changes in prediction uncertainty and fuse the evidence to improve the trustworthiness of the final prediction. To effectively optimize the EFGNN model, we carefully design a joint learning objective composed of evidence cross-entropy, dissonance coefficient, and false confident penalty. The experimental results on various datasets and theoretical analyses demonstrate the effectiveness of the proposed model in terms of accuracy and trustworthiness, as well as its robustness to potential attacks.

PaperID: 477,

Authors: Wenjia Meng, Qian Zheng, Long Yang, Yilong Yin, Gang Pan

Affiliations: School of Software, Shandong University, Jinan, China; State Key Laboratory of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China; Computing Technology, Chinese Academy of Sciences, Beijing, China

Title: Off-OAB: Off-Policy Policy Gradient Method With Optimal Action-Dependent Baseline

Abstract:
The policy-based methods have achieved remarkable success in solving challenging reinforcement learning (RL) problems. Among these methods, the off-policy policy gradient (OPPG) methods are particularly important because they can benefit from off-policy data. However, these methods suffer from the high variance of the OPPG estimator, which results in poor sample efficiency during training. In this article, we propose an off-policy policy gradient method with the optimal action-dependent baseline (Off-OAB) to mitigate this variance issue. Specifically, this baseline maintains the OPPG estimator’s unbiasedness while theoretically minimizing its variance. To enhance practical computational efficiency, we design an approximated version of this optimal baseline. Utilizing this approximation, our method (Off-OAB) aims to decrease the OPPG estimator’s variance during policy optimization. We evaluate the proposed Off-OAB method on six representative tasks from OpenAI Gym and MuJoCo, where it demonstrably surpasses the state-of-the-art methods on the majority of these tasks.

PaperID: 478,

Authors: Yang Lu, Xiaolin Huang, Yizhou Chen, Mengke Li, Yan Yan, Chen Gong, Hanzi Wang

Affiliations: Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen, China; College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China; School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, China

Title: MOOD: Leveraging Out-of-Distribution Data to Enhance Imbalanced Semi-Supervised Learning

Abstract:
The imbalanced semi-supervised learning (SSL) has emerged as a critical research area due to the prevalence of class imbalanced and partially labeled data in real-world scenarios. As the requirement for data volume increases, naturally collected datasets inevitably contain out-of-distribution (OOD) samples. However, the performance of existing imbalanced SSL methods experiences a marked deterioration with OOD data. In this article, we propose an imbalanced SSL method called mixup-OOD (MOOD) to address this issue. The core idea is to “turn waste into treasure,” exploring the potential of leveraging seemingly detrimental OOD data to expand the feature space, particularly for tail classes. Specifically, we first filter OOD data from unlabeled data, and then fuse it with labeled data to boost feature diversity for the tail classes. To avoid feature overlapping with OOD data, we develop a push-and-pull (PaP) loss to attract in-distribution (ID) instances toward respective class centroids while repelling OOD samples from them. Extensive experiments show that MOOD achieves superior performance compared with other state-of-the-art methods and exhibits robustness across data with different imbalanced ratios and OOD proportions. The source code is available at: https://github.com/xlhuang132/MOODv2

PaperID: 479,

Authors: Yongle Huang, Zedong Liu, Shijie Sun, Ningning Cui, Jianxin Li

Affiliations: School of Information Engineering, Chang’an University, Xi’an, Shaanxi, China; School of Data Science and Artificial Intelligence, Chang’an University, Xi’an, Shaanxi, WA, China; School of Business and Law, Edith Cowan University, Joondalup, WA, Australia

Title: SFAN: Selective Filter and Alignment Network for Cross-Modal Retrieval

Abstract:
Bridging the gap between visual and textual modalities effectively has consistently been a key challenge in cross-modal retrieval. Fine-grained matching approaches improve performance by precisely aligning salient region features in visual modality with word embeddings in textual modality. However, how to effectively and efficiently filter out irrelevant features (e.g., irrelevant background regions and nonmeaningful prepositions) in multimodality remains a significant challenge. Furthermore, capturing key cross-modal relationships while minimizing misalignment interference is crucial for effective cross-modal retrieval. In this work, we propose a novel approach called the selective filter and alignment network (SFAN) to tackle these challenges. First, we propose modality-specific selective filter modules (SFMs) to selectively and implicitly filter out redundant information within each modality. We then propose the state-space models (SSMs)-based selective alignment module (SAM) to selectively capture key correspondences and reduce the disturbance of irrelevant associations. Finally, we utilize a fusion operation to combine these embeddings from both SFM and SAM to derive the final embeddings for similarity computation. Extensive experiments on the Flickr30k, MS-COCO, and MSR-VTT datasets reveal that our proposed SFAN can effectively learn robust patterns, significantly outperforming the state-of-the-art (SOTA) cross-modal retrieval methods by a wide margin.

PaperID: 480,

Authors: Kendong Liu, Mingtao Feng, Wei Zhao, Jingtao Sun, Weisheng Dong, Yaonan Wang, Ajmal Mian

Affiliations: Xidian University, Xi’an, China; Hunan University, Changsha, China; University of Western Australia, Crawley, Perth, WA, Australia

Title: Pixel-Level Noise Mining for Weakly Supervised Salient Object Detection

Abstract:
Training a deep model for visual saliency detection requires the collection and labor-intensive annotation of overwhelmingly large data. We propose to learn saliency detection in a weakly supervised manner from single noisy label, which is easy to obtain from unsupervised handcrafted feature-based methods. However, deep networks tend to overfit such noises leading to a dramatic drop in accuracy. Given our goal, we address a natural question: can we identify outliers during network prediction and rectify the label noises? To this end, we propose a pixel-level noise mining framework for robust salient object detection (SOD) by exploiting its own knowledge, and without the need for external models. Specifically, during the early training stage, we progressively identify the outliers from a novel perspective during saliency detection, before the network overfits to the noisy labels, and generate a selection matrix in each iteration. Next, we adaptively rectify the label noises under the guidance of the selection matrix for better supervision in the later training stage. Extensive experiments on multiple benchmark datasets demonstrate the superiority of our method showing its ability to learn saliency detection comparable to state-of-the-art fully supervised methods. Furthermore, our approach outperforms existing weakly supervised methods utilizing single noisy label and surpasses the half of existing weakly supervised methods employing multiple noisy labels. Our approach, which trains with multiple noisy labels, outperforms all other methods employing multiple noisy labels across four major datasets. Furthermore, we also evaluate the generalization ability of our method on the multiclass semantic segmentation (SS) task. Our code is available at https://github.com/kendongdong/NoiseMining

PaperID: 481,

Authors: Feng Yan, Xiaoheng Jiang, Yang Lu, Jiale Cao, Mingliang Xu

Affiliations: School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China; School of Electrical and Information Engineering, Tianjin University, Tianjin, China

Title: DefectSAM: Hierarchically Adapting SAM for Pixel-Wise Surface Defect Detection

Abstract:
Segment anything model (SAM) has recently demonstrated powerful segmentation ability for natural scene images (NSIs). However, the SAM exhibits limited performance in defect detection owing to the weak appearance of defects and cluttered backgrounds in industrial images. In this article, we propose a hierarchically adapting SAM for pixel-wise surface defect detection, named DefectSAM, which effectively modulates and decodes multilevel features of the encoder to capture defect information. Specifically, we introduce a learnable feature adaptation component between the image encoder and the decoder to modulate each level of features via the dual-feature adaptation unit. The dual-feature adaptation unit mainly includes the correlation-gated feature adaptation (CGFA) module and the mask-guided feature adaptation (MGFA) module. The CGFA exploits cross correlation spatial gating maps to adaptively incorporate a convolutional feature pyramid and Transformer features during feature adaptation, which is beneficial for capturing defect details. Moreover, the MGFA utilizes the mask prediction of high-level features as semantic guidance to select top-confidence foreground and background tokens for feature adaptation, focusing more on defect details and suppressing background noise. Extensive experiments on three defect detection datasets (i.e., MVTec AD, CrackSeg9k, ZJU-Leaper, and Magnetic tile) demonstrate that the proposed method achieves state-of-the-art performance with few learnable parameters, which greatly improves the generalization of SAM in defect detection.

PaperID: 482,

Authors: Kun Dai, Zhiqiang Jiang, Fuyuan Qiu, Dedong Liu, Tao Xie, Ke Wang, Ruifeng Li, Lijun Zhao

Affiliations: State Key Laboratory of Robotics and System, Harbin Institute of Technology (HIT), Harbin, China; State Key Laboratory of Robotics and System, HIT, Harbin, China

Title: HVLF: A Holistic Visual Localization Framework Across Diverse Scenes

Abstract:
Recently, integrating the multitask learning (MTL) paradigm into scene coordinate regression (SCoRe) techniques has achieved significant success in visual localization tasks. However, the feature extraction ability of existing frameworks is inherently constrained by the rigid weight activation strategy, which prevents each layer from concurrently capturing scene-universal features across diverse scenes and scene-particular attributes unique to each individual scene. In addition, the straightforward network architecture further exacerbates the issue of insufficient feature representation. To address these limitations, we introduce HVLF, a holistic framework that ensures flexible identification of both scene-universal and scene-particular attributes while integrating various attention mechanisms to enhance feature representation effectively. Technically, for the first issue, HVLF proposes a soft weight activation strategy (SWAS) equipped with polyhedral convolution to concurrently optimize scene-shared and scene-specific weights within each layer, which facilitates sufficient discernment of both scene-universal features and scene-particular attributes, thereby boosting the network’s capability for comprehensive scene perception. For the second issue, HVLF introduces a mixed attention perception module (MAPM) that incorporates channelwise, spatialwise, and elementwise attention mechanisms to perform multilevel feature fusion, hence extracting discriminative features to regress precise scene coordinates. Extensive experiments on indoor and outdoor datasets prove that HVLF realizes impressive localization performance. In addition, experiments conducted on 3-D object detection and feature matching tasks prove that the two proposed techniques are universal and can be seamlessly inserted into other methods.

PaperID: 483,

Authors: Yang Wang, Ya-Hui Jia, Wei-Neng Chen, Yi Mei

Affiliations: School of Future Technology, South China University of Technology, Guangzhou, China; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China; School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand

Title: Distance-Aware Attention Reshaping for Enhancing Generalization of Neural Solvers

Abstract:
Neural solvers (NSs) based on the attention mechanism have demonstrated remarkable effectiveness in solving routing problems like traveling salesman problems (TSPs) and vehicle routing problems (VRPs). However, in the generalization process, we find a phenomenon of the dispersion of attention scores in existing NSs, which leads to poor performance. To improve the generalization ability of NSs, this article proposes a distance-aware attention reshaping (DAR) method. Specifically, without increasing any parameter of the neural network (NN), we utilize the distance information between nodes to adjust attention scores. This enables an NS trained on small-scale instances with a certain distribution to make rational choices when solving large-scale problems with different distributions. Its effectiveness is verified both theoretically and empirically. Extensive experiments on the TSP, asymmetric TSP (ATSP), capacitated VRP (CVRP), VRP with time windows (VRPTW), capacitated arc routing problem (CARP), and knapsack problem (KP) demonstrate the advantages of our method. Our code is available at https://github.com/ftwangyang/DAR

PaperID: 484,

Authors: Qifeng Zhao, Xuchu Wang

Affiliations: College of Optoelectronic Engineering, Chongqing University, Chongqing, China

Title: Unsupervised Multimanifold Cross-Guided Diffusion Deformable Registration for Cardiac MRI

Abstract:
Diffusion networks demonstrate remarkable robustness in extracting complex structural features across various domains of medical image processing. In the task of cardiac image registration, diffusion networks excel at reconstructing intricate structural details, thereby enabling effective representation of cardiac anatomical motion. In this article, we propose an unsupervised diffusion registration framework named MCG-Reg for 3-D cardiac magnetic resonance (MR) image registration, employing a multimanifold cross-fusion strategy. MCG-Reg comprises two components: the multimanifold cross-fusion (MCF) module and the weighted fusion codec (WFC) module. MCF module decouples the cardiac image, leveraging multifrequency and multiscale features for cross-attention (CA) calculation, and fuses with the edge image to enable adaptive focus gathering and edge perception capabilities in the model, thereby enhancing the effective aggregation of local and global features. WFC module further processes cardiac features by utilizing offset attention to capture large displacement information, while employing feature energy maps for residual connections to enhance the model’s attention perception ability, thus facilitating better topology maintenance and boundary constraint realization. The registration accuracy and model generalization of the proposed MCG-Reg are validated in publicly available ACDC, M&Ms, and CAP datasets. The experimental results verify that it achieves state-of-the-art performance in comparison to related methods, highlighting the significant potential of the proposed framework in cardiac image analysis applications.

PaperID: 485,

Authors: Zutao Jiang, Guian Fang, Jianhua Han, Guansong Lu, Hang Xu, Shengcai Liao, Xiaojun Chang, Xiaodan Liang

Affiliations: Peng Cheng Laboratory, Shenzhen, China; Department of Electrical and Computer Engineering, National University of Singapore, Queenstown, Singapore; Huawei Noah's Ark Lab, Shanghai, China; College of Information Technology, United Arab Emirates University, Al Ain, United Arab Emirates; School of Information Science and Technology, University of Science and Technology of China, Hefei, China; Department of Electrical and Computer Engineering, School of Intelligent Engineering, Sun Yat-sen University, Shenzhen, China

Title: RealignDiff: Boosting Text-to-Image Diffusion Model With Coarse-to-Fine Semantic Realignment

Abstract:
Recent advances in text-to-image diffusion models have achieved remarkable success in generating high-quality, realistic images from textual descriptions. However, these approaches have faced challenges in precisely aligning the generated visual content with the textual concepts described in the prompts. In this article, we propose a two-stage coarse-to-fine semantic realignment method, named RealignDiff, aimed at improving the alignment between text and images in text-to-image diffusion models. In the coarse semantic realignment phase, a novel caption reward, leveraging the BLIP-2 model, is proposed to evaluate the semantic discrepancy between the generated image caption and the given text prompt. Subsequently, the fine semantic realignment stage uses a local dense caption generation module and a reweighting attention modulation module to refine the previously generated images from a local semantic view. Experimental results on the MS-COCO and ViLG-300 datasets demonstrate that the proposed two-stage coarse-to-fine semantic realignment method outperforms other baseline realignment techniques by a substantial margin in both visual quality and semantic similarity with the input prompt.

PaperID: 486,

Authors: Mingyuan Luo, Xin Yang, Zhongnuo Yan, Yan Cao, Yuanji Zhang, Xindi Hu, Jin Wang, Haoxuan Ding, Wei Han, Litao Sun, Dong Ni

Affiliations: National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, , Shenzhen University Medical School, and the Medical UltraSound Image Computing (MUSIC) Laboratory, Shenzhen University, Shenzhen, Guangdong, China; Shenzhen RayShape Medical Technology Inc., Shenzhen, Guangdong, China; Department of Ultrasound Medicine, Cancer Center, Zhejiang Provincial People’s Hospital, Affiliated People’s Hospital, Hangzhou Medical College, Hangzhou, Zhejiang, China; Department of Health Management Center, Qilu Hospital, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, China

Title: MoNetV2: Enhanced Motion Network for Freehand 3-D Ultrasound Reconstruction

Abstract:
Three-dimensional ultrasound (US) aims to provide sonographers with the spatial relationships of anatomical structures, playing a crucial role in clinical diagnosis. Recently, deep-learning-based freehand 3-D US has made significant advancements. It reconstructs volumes by estimating transformations between images without external tracking. However, image-only reconstruction poses difficulties in reducing cumulative drift and further improving reconstruction accuracy, particularly in scenarios involving complex motion trajectories. In this context, we propose an enhanced motion network (MoNetV2) to enhance the accuracy and generalizability of reconstruction under diverse scanning velocities and tactics. First, we propose a sensor-based temporal and multibranch structure (TMS) that fuses image and motion information from a velocity perspective to improve image-only reconstruction accuracy. Second, we devise an online multilevel consistency constraint (MCC) that exploits the inherent consistency of scans to handle various scanning velocities and tactics. This constraint exploits scan-level velocity consistency (SVC), path-level appearance consistency (PAC), and patch-level motion consistency (PMC) to supervise interframe transformation estimation. Third, we distill an online multimodal self-supervised strategy (MSS) that leverages the correlation between network estimation and motion information to further reduce cumulative errors. Extensive experiments clearly demonstrate that MoNetV2 surpasses existing methods in both reconstruction quality and generalizability performance across three large datasets.

PaperID: 487,

Authors: Wenzhuo Liu, Fei Zhu, Cheng-Lin Liu

Affiliations: School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China; Centre for Artificial Intelligence and Robotics, Hong Kong Institute of Science and Innovation, Chinese Academy of Sciences, Hong Kong, China

Title: Branch-Tuning: Balancing Stability and Plasticity for Continual Self-Supervised Learning

Abstract:
The self-supervised learning (SSL) has emerged as an effective paradigm for deriving general representations from vast amounts of unlabeled data. However, as real-world applications continually integrate new content, the high computational and resource demands of SSL necessitate continual learning (CL) rather than complete retraining. This poses a challenge in balancing between stability and plasticity when adapting to new information. In this article, we employ centered kernel alignment (CKA) for quantitatively analyzing model stability and plasticity, revealing the critical roles of batch normalization (BN) layers for stability and convolutional layers for plasticity. Motivated by this, we propose branch-tuning (BT), an efficient and straightforward method that achieves a balance between stability and plasticity in continual SSL. BT consists of branch expansion and compression and can be easily applied to various SSL methods without the need of modifying the original methods, retaining old data or models. We validate our method through experiments on various benchmark datasets, demonstrating its effectiveness and practical value in real-world scenarios. We hope our work offers new insights for future continual SSL research. The code will be made publicly available.

PaperID: 488,

Authors: Matteo Gambella, Jary Pomponi, Simone Scardapane, Manuel Roveri

Affiliations: Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB), Politecnico di Milano, Milan, Italy; Dipartimento di Ingegneria dell'Informazione, Elettronica e Telecomunicazioni (DIET), Sapienza Universitàdi Roma, Rome, Italy

Title: NACHOS: Neural Architecture Search for Hardware-Constrained Early-Exit Neural Networks

Abstract:
Early-exit neural networks (EENNs) endow a standard deep neural network (DNN) with early-exit classifiers (EECs) to provide predictions at intermediate points of the processing when enough confidence in classification is achieved. This leads to many benefits in terms of effectiveness and efficiency. Currently, the design of EENNs is carried out manually by experts, a complex and time-consuming task that requires accounting for many aspects, including the correct placement, the thresholding, and the computational overhead of the EECs. For this reason, the research is exploring the use of neural architecture search (NAS) to automate the design of EENNs. Currently, few comprehensive NAS solutions for EENNs have been proposed in the literature, and a fully automated, joint design strategy taking into consideration both the backbone and the EECs remains an open problem. To this end, this work presents neural architecture search for hardware-constrained early exit neural networks (NACHOS), the first NAS framework for the design of optimal EENNs satisfying constraints on the accuracy and the number of multiply and accumulate (MAC) operations performed by the EENNs at inference time. In particular, this provides the joint design of backbone and EECs to select a set of admissible (i.e., respecting the constraints) Pareto optimal solutions in terms of the best trade-off between the accuracy and the number of MACs. The results show that the models designed by NACHOS are competitive with the state-of-the-art EENNs. Additionally, this work investigates the effectiveness of two novel regularization terms designed for the optimization of the auxiliary classifiers of the EENN.

PaperID: 489,

Authors: Zhihao Yuan, Xu Yan, Zhuo Li, Xuhao Li, Yao Guo, Shuguang Cui, Zhen Li

Affiliations: Shenzhen Future Network of Intelligence Institute and Guangdong Provincial Key Laboratory of Future Networks of Intelligence, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China; Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China; School of Science and Engineering, Chinese University of Hong Kong (Shenzhen), Shenzhen, China

Title: Toward Fine-Grained 3-D Visual Grounding Through Referring Textual Phrases

Abstract:
Recent progress in 3-D scene understanding has explored visual grounding [3D visual grounding (3DVG)] to localize a target object through a language description. However, existing methods only consider the dependency between the entire sentence and the target object, ignoring fine-grained relationships between contexts and nontarget ones. In this article, we extend 3DVG to a more fine-grained task, called 3D phrase-aware grounding (3DPAG). The 3DPAG task aims to localize the target objects in a 3-D scene by explicitly identifying all phrase-related objects and then conducting the reasoning according to contextual phrases. To tackle this problem, we manually labeled about 227 K phrase-level annotations using a self-developed platform, from 88 K sentences of widely used 3DVG datasets, i.e., Natural Reference in 3-D (Nr3D), Spatial Reference in 3-D (Sr3D), and ScanRefer. By tapping on our datasets, we can extend previous 3DVG methods to the fine-grained phrase-aware scenario. It is achieved through the proposed novel phrase-object alignment (POA) optimization and phrase-specific pretraining (PSP), boosting conventional 3DVG performance as well. Extensive results confirm significant improvements, i.e., previous state-of-the-art method achieves 3.9%, 3.5%, and 4.6% overall accuracy gains on Nr3D, Sr3D, and ScanRefer, respectively. Our datasets and platform are released in https://github.com/CurryYuan/PhraseRefer

PaperID: 490,

Authors: Shuchang Lyu, Qi Zhao, Hong Zhang, Guangliang Cheng, Chenguang Yang

Affiliations: Department of Electronics and Information Engineering, Beihang University, Beijing, China; Department of Astronautics, Beihang University, Beijing, China; Department of Computer Science, University of Liverpool, Liverpool, U.K.

Title: Online Self-Training Driven Attention-Guided Self-Mimicking Network for Semantic Segmentation

Abstract:
In the realm of semantic segmentation tasks, knowledge distillation (KD) has emerged as a prominent strategy, leveraging the transfer of mature knowledge from large teacher networks to enhance the performance of smaller student networks. However, existing methods often rely heavily on high-quality yet cumbersome teacher networks, leading to a complex training process. To address this challenge, we introduce a novel approach termed self-training driven attention-guided self-mimicking online ensemble network. Our proposed method begins by employing intermediate channel-joint attention maps to guide image augmentation. Both the original and augmented images are then input into the networks. Leveraging intermediate feature maps and predictive predictions generated from the two images, we employ KD to uncover invariant features. To further harness representation potential through learning from credible predictions, we introduce a self-training mechanism. This mechanism utilizes an exponential moving average (EMA)-teacher network constructed using the exponential moving average technique to generate feature maps and predicted posterior probabilities. The knowledge of the EMA-teacher is subsequently transferred to the student network through distillation. Extensive experiments and visualization analyses conducted on multiple benchmark datasets, including Cityscapes, Pascal VOC, CamVid, and ADE20k, validate the effectiveness of self-training driven attention-guided self-mimicking network (ST-ASMNet). The interpretability of our method is further validated through visualization and analysis. Our code will be publicly available.

PaperID: 491,

Authors: Hao Ding, Chengxing Jia, Zongzhang Zhang, Cong Guan, Feng Chen, Lei Yuan, Yang Yu

Affiliations: National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China

Title: Learning to Coordinate With Different Teammates via Team Probing

Abstract:
Coordinating with different teammates is essential in cooperative multiagent systems (MASs). However, most multiagent reinforcement learning (MARL) methods assume fixed team compositions, which leads to agents overfitting their training partners and failing to cooperate well with different teams during the deployment phase. A common way to mitigate the problem is to anticipate teammate behaviors and adapt policies accordingly during cooperation. However, these methods use the same policy for both collecting information for modeling teammates and maximizing cooperation performance. We argue that these two goals may conflict and reduce the effectiveness of both. In this work, we propose coordinating with different teammates via team probing (CDP), a novel approach that rapidly adapts to different teams by disentangling probing and adaptation phases. Specifically, we first generate a diverse population of teams as training partners with a novel value-based diversity objective. Then, we train a probing module to probe and reveal the coordination pattern of each team with policy-dynamics reconstruction and get a representation space of the population. Finally, we train a generalist meta-policy consisting of several expert policies with module selection based on the clustering of the learned representation space. We empirically show that CDP surpasses existing policy adaptation methods in various complex multiagent scenarios with both seen and unseen teammates.

PaperID: 492,

Authors: Fei Ye, Adrian G. Bors

Affiliations: School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China; Department of Computer Science, University of York, York, U.K

Title: Training a Dynamic Growing Mixture Model for Lifelong Learning

Abstract:
Lifelong learning (LLL) defines a training paradigm that aims to continuously acquire and capture new concepts from a sequence of tasks without forgetting. Recently, dynamic expansion models (DEMs) have been proposed to address catastrophic forgetting under the LLL paradigm. However, the efficiency of DEMs lacks a thorough explanation based on theoretical analysis. In this article, we develop a new theoretical framework that interprets the forgetting process of the DEM as increasing the statistical discrepancy distance between the distribution of the probabilistic representation of the new data and the previously learned knowledge. The theoretical analysis shows that adding new components to a mixture model represents a trade-off between model complexity and its performance. Inspired by the theoretical analysis, we introduce a new DEM, called the growing mixture model (GMM), where generative data components are added according to the novelty of the incoming task information compared to what is already known. A new component selection mechanism considering the model’s already acquired knowledge is employed for updating new DEM’s components, promoting efficient future task learning. We also train a compact student model with samples drawn through the generative mechanisms of the GMM, aiming to accumulate cross-domain representations over time. By employing the student model, we can significantly reduce the number of parameters and make quick inferences during the testing phase.

PaperID: 493,

Authors: Yonghao Song, Yijun Wang, Huiguang He, Xiaorong Gao

Affiliations: School of Biomedical Engineering, Tsinghua University, Beijing, China; Laboratory of Solid State Optoelectronics Information Technology, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, China; State Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing, China

Title: Recognizing Natural Images From EEG With Language-Guided Contrastive Learning

Abstract:
Electroencephalography (EEG), known for its convenient noninvasive acquisition but moderate signal-to-noise ratio, has recently gained much attention due to the potential to decode image information. However, previous works have not delivered sufficient evidence of this task, primarily limited by performance and biological plausibility. In this work, we first introduce a self-supervised framework to demonstrate the feasibility of recognizing images from EEG signals. Contrastive learning is leveraged to align the representations of EEG responses with image stimuli. Then, language descriptions of the stimuli generated by large language models (LLMs) help guide learning core semantic information. With the framework, we attain significantly above-chance results on the THINGS-EEG2 dataset, achieving a top-1 accuracy of 19.7% and a top-5 accuracy of 51.5% in challenging 200-way zero-shot tasks. Furthermore, we conduct thorough experiments to resolve the human visual responses with EEG from temporal, spatial, spectral, and semantic perspectives. These results provide evidence of feasibility and plausibility regarding EEG-based image recognition, substantiated by comparative studies with the THINGS-Magnetoencephalography (MEG) dataset. The findings offer valuable insights for neural decoding and real-world applications of brain-computer interfaces (BCIs), such as health care and robot control. The code is available at https://github.com/eeyhsong/NICE-LLM.

PaperID: 494,

Authors: Alberto Carlevaro, Teodoro Alamo, Fabrizio Dabbene, Maurizio Mongelli

Affiliations: CNR-Istituto di Elettronica e di Ingegneria dell’Informazione e delle Telecomunicazioni, Rome, Italy; Department of Electrical, Electronics and Telecommunications Engineering and Naval Architecture (DITEN), University of Genoa, Genoa, Italy; Departamento de Ingeniería de Sistemas y Automática, Universidad de Seville, Escuela Superior de Ingenieros, Seville, Spain

Title: Probabilistic Safety Regions via Finite Families of Adjustable Classifiers

Abstract:
The supervised classification recognizes patterns in the data to separate classes of behaviors. Canonical solutions contain misclassification errors that are intrinsic to the numerical approximating nature of machine learning (ML). The data analyst may minimize the classification error on a class at the expense of increasing the error of the other classes. The error control of such a design phase is often done in a heuristic manner. In this article, it is key to develop theoretical foundations capable of providing probabilistic certifications to the obtained classifiers. In this perspective, we introduce the concept of probabilistic safety region to describe a subset of the input space in which the number of misclassified instances is probabilistically controlled. The notion of adjustable classifiers, a special class of classifiers that share the property of being controllable by a scalar parameter, is then exploited to link the tuning of ML with error control. Several tests and examples corroborate the approach. They are provided through the synthetic data in order to highlight all the steps involved, as well as notable benchmark datasets and a smart mobility application.

PaperID: 495,

Authors: Lei Zhao, Wing W. Y. Ng, Jianjun Zhang, Xiguang Wu

Affiliations: School of Future Technology, South China University of Technology, Guangzhou, China; Guangdong Provincial Key Laboratory of Computational Intelligence and Cyberspace Information, School of Computer Science and Engineering, South China University of Technology, Guangzhou, China; College of Mathematics and Informatics, South China Agricultural University, Guangzhou, China; Hangzhou Institute of Technology, Xidian University, Hangzhou, China

Title: An Innovative Multisource Teacher Collaborative Framework for Self-Knowledge Distillation

Abstract:
Self-knowledge distillation, abbreviated as SKD, exhibits greater computational efficiency than traditional knowledge distillation (KD) because it learns from its own predictions rather than from a pretrained teacher. Existing SKD methods diversify knowledge through auxiliary branches, data augmentation, historical models, and label smoothing. However, previous methods primarily extract knowledge from a single-source teacher, overlooking the diversity and complementarity of various types of teacher knowledge in model learning, thereby limiting performance improvements. In response to this challenge, we propose a pioneering paradigm termed multisource teacher collaboration for self-knowledge distillation (MSTCS-KD), which integrates knowledge from diverse types of teachers to complementarily enhance the model’s learning capability. We start by adding lightweight auxiliary branches with different structures in the shallow layers to build the student network, while also incorporating a teacher-guided attention mechanism to support adaptive learning. Then, we perform collaborative distillation by combining “heterogeneous knowledge” from the primary network’s deepest layers with “homogeneous knowledge” from the student’s outputs on augmented samples. This complementary distillation approach improves the model’s ability to learn features, generalize, and enhance trainability. Extensive experiments demonstrate that our method outperforms other state-of-the-art SKD methods across various network architectures and datasets.

PaperID: 496,

Authors: Zhiwen Xiao, Huanlai Xing, Rong Qu, Hui Li, Xinzhou Cheng, Lexi Xu, Li Feng, Qian Wan

Affiliations: School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China; School of Computer Science, University of Nottingham Ningbo, Ningbo, China; School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, China; Research Institute, China United Network Communications Corporation, Beijing, China

Title: Heterogeneous Mutual Knowledge Distillation for Wearable Human Activity Recognition

Abstract:
Recently, numerous deep learning algorithms have addressed wearable human activity recognition (HAR), but they often struggle with efficient knowledge transfer to lightweight models for mobile devices. Knowledge distillation (KD) is a popular technique for model compression, transferring knowledge from a complex teacher to a compact student. Most existing KD algorithms consider homogeneous architectures, hindering performance in heterogeneous setups. This is an under-explored area in wearable HAR. To bridge this gap, we propose a heterogeneous mutual KD (HMKD) framework for wearable HAR. HMKD establishes mutual learning within the intermediate and output layers of both teacher and student models. To accommodate substantial structural differences between teacher and student, we employ a weighted ensemble feature approach to merge the features from their intermediate layers, enhancing knowledge exchange within them. Experimental results on the HAPT, WISDM, and UCI_HAR datasets show HMKD outperforms ten state-of-the-art KD algorithms in terms of classification accuracy. Notably, with ResNetLSTMaN as the teacher and MLP as the student, HMKD increases by 9.19% in MLP’s F_1 score on the HAPT dataset.

PaperID: 497,

Authors: Yuanchen Bei, Sheng Zhou, Jinke Shi, Yao Ma, Haishuai Wang, Jiajun Bu

Affiliations: College of Computer Science and Technology, Zhejiang University, Hangzhou, China; Zhejiang Key Laboratory of Accessible Perception and Intelligent Systems, Zhejiang University, Hangzhou, China; Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, USA

Title: Guarding Graph Neural Networks for Unsupervised Graph Anomaly Detection

Abstract:
Unsupervised graph anomaly detection aims at identifying rare patterns that deviate from the majority in a graph without the aid of labels, which is important for a variety of real-world applications. Recent advances have utilized graph neural networks (GNNs) to learn effective node representations by aggregating information from neighborhoods. This is motivated by the hypothesis that nodes in the graph tend to exhibit consistent behaviors with their neighborhoods. However, such consistency can be disrupted by graph anomalies in multiple ways. Most existing methods directly employ GNNs to learn representations, disregarding the negative impact of graph anomalies on GNNs, resulting in suboptimal node representations and anomaly detection performance. While a few recent approaches have redesigned GNNs for graph anomaly detection under semi-supervised label guidance, how to address the adverse effects of graph anomalies on GNNs in unsupervised scenarios and learn effective representations for anomaly detection are still underexplored. To bridge this gap, in this article, we propose a simple, yet effective framework for guarding GNNs for unsupervised graph anomaly detection (G3AD). Specifically, G3AD first introduces two auxiliary networks along with correlation constraints to guard the GNNs against inconsistent information encoding. Furthermore, G3AD introduces an adaptive caching (AC) module to guard the GNNs from directly reconstructing the observed graph data that contains anomalies. Extensive experiments demonstrate that our G3AD can outperform 20 state-of-the-art methods on both synthetic and real-world graph anomaly datasets, with flexible generalization ability in different GNN backbones.

PaperID: 498,

Authors: Xiyuan Jin, Jing Wang, Xiaoyu Ou, Lei Liu, Youfang Lin

Affiliations: School of Computer Science and Technology, Beijing Key Laboratory of Traffic Data Analysis and Embodied Intelligence, Beijing Jiaotong University, Beijing, China

Title: Time-Series Contrastive Learning Against False Negatives and Class Imbalance

Abstract:
Self-supervised contrastive learning (SCL) has driven significant advancements in time-series representation learning. While recent studies built upon the information noise contrastive estimation (InfoNCE) loss framework focus on constructing appropriate positives and negatives, we theoretically analyze and identify two overlooked issues inherent in this approach: false negatives and class imbalance. To address these challenges, we propose a simple yet effective modification based on the SimCLR framework, integrating a multi-instance discrimination task to mitigate false negatives. Additionally, we introduce a graph-based interactive projection head and semantic consistency regularization, which enhances minority-class representations with minimal annotation cost. Extensive experiments on six real-world time-series datasets demonstrate that our approach consistently outperforms state-of-the-art methods, achieving up to 3.96% higher accuracy and 10.73% improvement in F1 -score, particularly benefiting imbalanced data scenarios.

PaperID: 499,

Authors: Ke Wang, Binghong Liu, Pandi Liu, Yungao Shi, Ping Guo, Yafei Li, Mingliang Xu

Affiliations: School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou, China; School of System Science, Beijing Normal University, Beijing, China

Title: Bi-PIL: Bidirectional Gradient-Free Learning Scheme for Multilayer Neural Networks

Abstract:
Training deep neural networks typically relies on gradient descent learning schemes, which is usually time-consuming, and the design of complex network architectures is often intractable. In this article, we explore the building of multilayer neural networks based on an efficient gradient-free learning scheme offering a potential solution to the architectural design. The proposed learning scheme encompasses both forward and backward training (BT) processes. In the forward process, the pseudoinverse learning (PIL) algorithm is employed to train a multilayer neural network, in which the network is dynamically constructed leveraging a layer-by-layer greedy strategy, enabling the automatic determination of the architecture across different hierarchies in a data-driven manner. The network architecture and connection weights determined in the forward training (FT) process are shared with the backward process which also conducts gradient-free learning to update the connection weights. After the bidirectional learning, a neural network comprising two twin subnetworks is obtained, and the fused features of subnetworks are used as inputs for downstream tasks. Comprehensive experiments and detailed analyses demonstrate the effectiveness and superiority of the proposed learning scheme.

PaperID: 500,

Authors: Zhengyi Zhong, Weidong Bao, Ji Wang, Jianguo Chen, Lingjuan Lyu, Wei Yang Bryan Lim

Affiliations: Laboratory for Big Data and Decision, National University of Defense Technology, Changsha, China; School of Software Engineering, Sun Yat-sen University, Guangzhou, China; Sony AI, Shinagawa, Japan; College of Computing and Data Science, Nanyang Technological University, Jurong West, Singapore

Title: SacFL: Self-Adaptive Federated Continual Learning for Resource-Constrained End Devices

Abstract:
The proliferation of end devices has led to a distributed computing paradigm, wherein on-device machine learning models continuously process diverse data generated by these devices. The dynamic nature of this data, characterized by continuous changes or data drift, poses significant challenges for on-device models. To address this issue, continual learning (CL) is proposed, enabling machine learning models to incrementally update their knowledge and mitigate catastrophic forgetting. However, the traditional centralized approach to CL is unsuitable for end devices due to privacy and data volume concerns. In this context, federated CL (FCL) emerges as a promising solution, preserving user data locally while enhancing models through collaborative updates. Aiming at the challenges of limited storage resources for CL, poor autonomy in task shift detection, and difficulty in coping with new adversarial tasks in the FCL scenario, we propose a novel FCL framework named self-adaptive federated CL (SacFL). \rm SacFL employs an encoder–decoder architecture to separate task-robust and task-sensitive components, significantly reducing storage demands by retaining lightweight task-sensitive components for resource-constrained end devices. Moreover, \rm SacFL leverages contrastive learning to introduce an autonomous data shift detection mechanism, enabling it to discern whether a new task has emerged and whether it is a benign task. This capability ultimately allows the device to autonomously trigger CL or attack defense strategy without additional information, which is more practical for end devices. Comprehensive experiments conducted on multiple text and image datasets, such as Cifar100 and THUCNews, have validated the effectiveness of \rm SacFL in both class-incremental and domain-incremental scenarios. Furthermore, a demo system has been developed to verify its practicality.

PaperID: 501,

Authors: Donghua Wang, Wen Yao, Tingsong Jiang, Xiaohu Zheng, Junqi Wu

Affiliations: College of Computer Science and Technology, Zhejiang University, Hangzhou, China; Defense Innovation Institute, Chinese Academy of Military Science, Beijing, China; MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China

Title: Improving the Transferability of Adversarial Examples by Feature Augmentation

Abstract:
Adversarial transferability is a significant property of adversarial examples, which renders the adversarial example capable of attacking unknown models. However, the models with different architectures on the same task would concentrate on different information, which weakens adversarial transferability. To enhance the adversarial transferability, input transformation-based attacks perform random transformation over input to find a better result that can resist such transformations, but these methods ignore the model discrepancy; ensemble attacks fuse multiple models to shrink the search space to ensure that the found adversarial examples work on these models, but ensemble attacks are resource-intensive. In this article, we propose a simple but effective feature augmentation attack (FAUG) method to improve adversarial transferability. We dynamically add random noise to intermediate features of the target model during the generation of adversarial examples, thereby avoiding overfitting the target model. Specifically, we first explore the noise tolerance of the model and disclose the discrepancy under different layers and noise strengths. Then, based on that analysis, we devise a dynamic random noise generation method, which determines noise strength according to the produced features in the mini-batch. Finally, we exploit the gradient-based attack algorithm on featureaugmented models, resulting in better adversarial transferability without introducing extra computation costs. Extensive experiments conducted on the ImageNet dataset across CNN and Transformer models corroborate the efficacy of our method, e.g., we achieve improvement of +30.67% and +5.57% on input transformation-based attacks and combination methods, respectively.

PaperID: 502,

Authors: Yakun Li, Shuhua Gao, Yiming Gao, Jian-Liang Wu, Jun-e Feng, Cheng Xiang

Affiliations: School of Mathematics, Shandong University, Jinan, China; School of Control Science and Engineering, Shandong University, Jinan, China; School of Computer Science and Engineering, Nanyang Technological University, Jurong West, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Queenstown, Singapore

Title: Robust Controllability of Boolean Control Networks via Dynamic Programming

Abstract:
This article presents a novel dynamic programming approach to determine the robust controllability of Boolean control networks (BCNs) subject to stochastic disturbances. By applying Bellman’s optimality principle, we derive the recurrence relation for computing the optimal time matrix, a crucial concept characterizing robust reachability between two arbitrary states. We develop a finite-termination dynamic programming algorithm to calculate the optimal time matrix exactly and efficiently, with a rigorously certified iteration count. Sufficient and necessary conditions for robust controllability are then established based on the optimal time matrix. Furthermore, for any pair of reachable states, we construct time-optimal state feedback control laws to steer the system from the initial state to the target state, regardless of disturbances. Finally, extensive numerical experiments with biological networks validate the effectiveness of the proposed approach, showing significant improvements in computational efficiency. Additionally, we introduce a Q-learning-based algorithm and compare its performance, highlighting the advantages of our dynamic programming approach in terms of both efficiency and solution quality.

PaperID: 503,

Authors: Shiron Thalagala, Pak-Kin Wong, Xiaozheng Wang, Tianang Sun

Affiliations: Department of Electromechanical Engineering, University of Macau, Macau, China

Title: Broad Critic Deep Actor Reinforcement Learning for Continuous Control

Abstract:
In the domain of continuous control, deep reinforcement learning (DRL) demonstrates promising results. However, the dependence of DRL on deep neural networks (DNNs) results in the demand for extensive data and increased computational cost. To address this issue, a novel hybrid actor-critic reinforcement learning (RL) framework is introduced. The proposed framework integrates the broad learning system (BLS) with DNN, aiming to merge the strengths of both distinct architectural paradigms. Specifically, the critic network employs BLS for rapid value estimation via ridge regression, while the actor network retains the DNN structure to optimize policy gradients. This hybrid design is generalizable and can enhance existing actor-critic algorithms. To demonstrate its versatility, the proposed framework is integrated into three widely used actor-critic algorithms—deep deterministic policy gradient (DDPG), soft actor-critic (SAC), and twin delayed DDPG (TD3), resulting in BLS-augmented variants. The experimental results reveal that all BLS-enhanced versions surpass their original counterparts in terms of training efficiency and accuracy. These improvements highlight the suitability of the proposed framework for real-time control scenarios, where computational efficiency and rapid adaptation are critical.

PaperID: 504,

Authors: Shudong Huang, Wentao Feng, Chenwei Tang, Zhenan He, Caiyang Yu, Jiancheng Lv

Affiliations: College of Computer Science, Sichuan University, Chengdu, China

Title: Partial Differential Equations Meet Deep Neural Networks: A Survey

Abstract:
Many problems in science and engineering can be mathematically modeled using partial differential equations (PDEs), which are essential for fields like computational fluid dynamics (CFD), molecular dynamics, and dynamical systems. Although traditional numerical methods like the finite difference/element method are widely used, their computational inefficiency, due to the large number of iterations required, has long been a challenge. Recently, deep learning (DL) has emerged as a promising alternative for solving PDEs, offering new paradigms beyond conventional methods. Despite the growing interest in techniques like physics-informed neural networks (PINNs), a systematic review of the diverse neural network (NN) approaches for PDEs is still missing. This survey fills that gap by categorizing and reviewing the current progress of deep NNs (DNNs) for PDEs. Unlike previous reviews focused on specific methods like PINNs, we offer a broader taxonomy and analyze applications across scientific, engineering, and medical fields. We also provide a historical overview, key challenges, and future trends, aiming to serve both researchers and practitioners with insights into how DNNs can be effectively applied to solve PDEs.

PaperID: 505,

Authors: Xinzhu Liu, Di Guo, Huaping Liu

Affiliations: Department of Computer Science and Technology, Tsinghua University, Beijing, China; School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China

Title: Semantics-Aware Hierarchical Decision Framework for Embodied Visual Room Rearrangement

Abstract:
In embodied visual room rearrangement, the agent needs to recover the scene state to the goal state through interacting with the environment based on the egocentric visual observations after the locations and states of some objects are changed. It has important application potential in the field of robotics. This task is challenging in visual perception, scene understanding, and action execution. Existing methods do not take full advantage of the semantic information and spatial relationship of objects in the scene perception and understanding process. To tackle the challenges and shortcomings of the current methods, we build a hierarchical decision framework based on the pretrained semantic scene representation and transformer-based scene memory to solve this task. The results in the unseen scenes demonstrate the effectiveness of the proposed model compared with other methods.

PaperID: 506,

Authors: Ziheng Jiao, Hongyuan Zhang, Xuelong Li

Affiliations: School of Computer Science and the School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi’an, Shaanxi, China; Institute of Artificial Intelligence (TeleAI), China Telecom, Shanghai, P. R. China

Title: Deep Graph Multi-View Representation Learning With Self-Augmented View Fusion

Abstract:
Some current researchers attempt to extend the graph neural network (GNN) on multi-view representation learning and learn the latent structure information among the data. Generally, they concatenate the features of each view and employ a single GNN to extract the representations of this concatenated feature. It causes that the within-view information may not be learned and the pivotal view will not be strengthened during the concatenation. Although some GNN models introduce the Siamese structure to extract the within-view information, the learned representation may not be informative since the Siamese GNNs share the same parameters. To overcome these issues, we propose a novel deep graph auto-encoder for multi-view representation learning. Among them, a self-augmented view-weight technique is theoretically devised for cross-view fusion, which can highlight the pivotal views and maintain the rest views. Then, GNNs of different views can learn the informative representation without sharing parameters. Furthermore, by fitting the fusion distribution with a neural layer, the model unifies these two individual procedures and achieve to extract the fusion representation end-to-end. Compared with numerous recently proposed methods, extensive experiments on clustering and recognition tasks demonstrate our superior performance.

PaperID: 507,

Authors: Yizhang Wang, Wei Pang, Di Wang, Witold Pedrycz

Affiliations: College of Information Engineering, Yangzhou University, Yangzhou, China; School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, U.K.; Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly, Nanyang Technological University, Jurong West, Singapore; Department of Measurement and Control Systems, Silesian University of Technology (SUT), Gliwice, Poland

Title: One-Shot Secure Federated K-Means Clustering Based on Density Cores

Abstract:
Federated clustering (FC) performs well in independent and identically distributed (IID) scenarios, but it does not perform well in non-IID scenarios. In addition, existing methods lack proof of strict privacy protection. To address the above issues, we propose a new secure federated k-means clustering framework to achieve better clustering results under privacy requirements. Specifically, for the clients, we use cluster centers (representative points) generated by k-means to represent the corresponding clusters. These representative points can effectively preserve the structure of the local data and they are encrypted by differential privacy. For the server, we propose two methods to reprocess the uploaded encrypted representative points to obtain better final cluster centers, one uses k-means, and the other considers the improved density peaks (density cores) as final centers and then sends them back to the clients. Finally, each client assigns local data to their nearest centers. Experimental results show that the proposed methods perform better than several centralized (nonfederated) classical clustering algorithms [k-means, density-based spatial clustering of applications with noise (DBSCAN), and density peak clustering (DPC)] and state-of-the-art (SOTA) centralized clustering algorithms in most cases. In particular, the proposed algorithms perform better than the SOTA FC framework k-FED (ICML2021) and MUFC (ICLR2023).

PaperID: 508,

Authors: Yuan-Hung Kuan, Vignesh Narayanan, Jr-Shin Li

Affiliations: Department of Electrical and Systems Engineering, Washington University in St. Louis, Saint Louis, MO, USA; AI Institute, University of South Carolina, Columbia, SC, USA; Division of Computational & Data Sciences, Washington University in St. Louis, Saint Louis, MO, USA

Title: Iterative Reservoir Computing Networks for Reconstructing Irregular Time Series

Abstract:
Time series data with missing entries are ubiquitous in a broad spectrum of practical and clinical applications, from climatology and cell biology to personalized medicine. This undesired structure arising either due to undesired artifacts (e.g., noise) or by design (e.g., asynchronous or aperiodic sampling in distributed sensors) results in irregularity in the temporal dimension and forms a bottleneck in data mining. Although extensive data science approaches have been proposed to address learning problems involving irregular data, the emphasis was largely placed on filling in the missing entries via interpolation and binning, or the methods were tailored to specific data analytic tasks. In this article, we develop a reservoir computing (RC)-based iterative learning method for recovering missing data in irregular time series generated by dynamical systems and networks. In particular, we formulate this learning task as a fixed-point iterative learning problem and develop a training procedure using an RC network (RCN). We find that when the irregular time series has “sufficient” samples to train an RCN within a tolerant training error then the missing samples in the time series can be recovered systematically. We also derive sufficient conditions with respect to the choices of the reservoir parameters that guarantee the convergence of the iterative procedure. We present several numerical experiments to demonstrate the efficacy of the developed iterative RCN approach. Specifically, we illustrate the capability of our approach to recover missing data in irregular time series generated by chaotic Rössler and Kuramoto-Sivashinsky (KS) systems. Finally, we also report the results of incorporating our approach in an irregular medical data classification task.

PaperID: 509,

Authors: Yachao Zhang, Yuxiang Lan, Yuan Xie, Cuihua Li, Yanyun Qu

Affiliations: School of Informatics, Xiamen University, Xiamen, China; School of Computer Science and Technology, East China Normal University, Shanghai, China

Title: Cross-Cloud Consistency for Weakly Supervised Point Cloud Semantic Segmentation

Abstract:
Weakly supervised point cloud semantic segmentation is an increasingly active topic, because fully supervised learning acquires well-labeled point clouds and entails high costs. The existing weakly supervised methods either need meticulously designed data augmentation for self-supervised learning or ignore the negative effects of learning on pseudolabel noises. In this article, by designing different granularity of cross-cloud structures, we propose a cross-cloud consistency method for weakly supervised point cloud semantic segmentation which forms the expectation-maximum (EM) framework. Benefiting from the cross-cloud constraints, our method allows effective learning alternatively between refining pseudolabels and updating network parameters. Specifically, in E-step, we propose a pseudolabel selecting (PLS) strategy based on cross subcloud consistency, improving the credibility of selected pseudolabels explicitly. In M-step, a cross-scene contrastive regularization enforces cross-scene prototypes with the same label in different scenes to be more similar, while keeping prototypes with different labels to be a clear margin, reducing the noise fitting. Finally, we give some insight into the optimization of our method in the EM theoretical way. The proposed method is evaluated on three challenging datasets, where experimental results demonstrate that our method significantly outperforms state-of-the-art weakly supervised competitors. Our code is available online: https://github.com/Yachao-Zhang/Cross-Cloud-Consistency.

PaperID: 510,

Authors: Heng Xu, Tianqing Zhu, Lefeng Zhang, Wanlei Zhou, Wei Zhao

Affiliations: Centre for Cyber Security and Privacy, School of Computer Science, University of Technology Sydney, Ultimo, NSW, Australia; Faculty of Data Science, City University of Macau, Macau, China; School of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, China

Title: Toward Efficient Target-Level Machine Unlearning Based on Essential Graph

Abstract:
Machine unlearning is an emerging technology that has come to attract widespread attention. A number of factors, including regulations and laws, privacy, and usability concerns, have resulted in this need to allow a trained model to forget some of its training data. Existing studies of machine unlearning mainly focus on unlearning requests that forget a cluster of instances or all instances from one class. While these approaches are effective in removing instances, they do not scale to scenarios where partial targets within an instance need to be forgotten. For example, one would like to only unlearn a person from all instances that simultaneously contain the person and other targets. Directly migrating instance-level unlearning to target-level unlearning will reduce the performance of the model after the unlearning process, or fail to erase information completely. To address these concerns, we have proposed a more effective and efficient unlearning scheme that focuses on removing partial targets from the model, which we name “target unlearning.” Specifically, we first construct an essential graph data structure to describe the relationships between all important parameters that are selected based on the model explanation method. After that, we simultaneously filter parameters that are also important for the remaining targets and use the pruning-based unlearning method, which is a simple but effective solution to remove information about the target that needs to be forgotten. Experiments with different training models on various datasets demonstrate the effectiveness of the proposed approach.

PaperID: 511,

Authors: Liyuan Liu, Yaohui Chen, Weifu Li, Yingjie Wang, Bin Gu, Feng Zheng, Hong Chen

Affiliations: College of Informatics, Huazhong Agricultural University, Wuhan, China; College of Engineering, Huazhong Agricultural University, Wuhan, China; College of Control Science and Engineering, China University of Petroleum (East China), Qingdao, China; School of Artificial Intelligence, Jilin University, Changchun, China; Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China

Title: Generalization Bounds of Deep Neural Networks With τ-Mixing Samples

Abstract:
Deep neural networks (DNNs) have shown an astonishing ability to unlock the complicated relationships among the inputs and their responses. Along with empirical successes, some approximation analysis of DNNs has also been provided to understand their generalization performance. However, the existing analysis depends heavily on the independently identically distribution (i.i.d.) assumption of observations, which may be too ideal and often violated in real-world applications. To relax the i.i.d. assumption, this article develops the covering number-based concentration estimation to establish generalization bounds of DNNs with \tau -mixing samples, where the dependency between samples is much general including \alpha -mixing process as a special case. By assigning a specific parameter value to the \tau -mixing process, our results are consistent with the existing convergence analysis under the i.i.d. case. Experiments on simulated data validate the theoretical findings.

PaperID: 512,

Authors: Ruiyuan Kang, Panos Liatsis

Affiliations: Directed Energy Research Center, Technology Innovation Institute, Abu Dhabi, United Arab Emirates; Department of Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates

Title: Physics-Driven Anomaly Detection and Correction for Spectroscopic Parameter Estimation

Abstract:
Machine learning (ML) techniques are popular in many parameter estimation tasks; however, they face challenges in the real-world deployment due to the lack of robustness to errors. ML estimators are not able to ascertain performance in the presence of noise, variations in the data distribution, and anomalies in the test samples. This work proposes a novel framework, surrogate-based physical error correction (SPEC), which addresses the unmet need for measurement reliability estimation and self-correction under process data uncertainty, by bringing together physics- and network-based optimization. The workings of SPEC are demonstrated using the paradigm of gas parameter estimation in the laser absorption spectroscopy (LAS). It operates in two modes, estimation and correction. During estimation, SPEC provides an initial state estimate, with estimation reliability being assessed by the physics-driven anomaly detection (PAD) module, which uses a hybrid error, combining a nondifferentiable reconstruction error, calculated through an ensemble network, and a differentiable feasibility error. When an estimate is flagged as unreliable, the correction mode is enabled. This network-based optimization algorithm delivers efficient and robust state correction by using a greedy ensemble search. SPEC’s performance is evaluated in a variety of experiments including outside-of-distribution and noisy data. Moreover, it offers reconfigurability through PAD configuration modification, eliminating the need for ML estimator retraining.

PaperID: 513,

Authors: Huiwei Lin, Shanshan Feng, Baoquan Zhang, Xutao Li, Yunming Ye

Affiliations: Department of Computer Science, Harbin Institute of Technology, Shenzhen, China; Centre for Frontier AI Research, Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Fusionopolis, Singapore

Title: HPCR: Holistic Proxy-Based Contrastive Replay for Online Continual Learning

Abstract:
Online continual learning (OCL), aimed at developing a neural network that continuously learns new data from a single pass over an online data stream, generally suffers from catastrophic forgetting (CF). Existing replay-based methods alleviate forgetting by replaying partial old data in a proxy-based or contrastive-based replay manner, each with its own shortcomings. Our previous work proposes a novel replay-based method called proxy-based contrastive replay (PCR), which handles the shortcomings by achieving complementary advantages of both replay manners. In this work, we further conduct gradient and limitation analysis of PCR. The analysis results show that PCR still can be further improved in feature extraction, generalization, and anti-forgetting capabilities of the model. Hence, we developed a more advanced method named holistic PCR (HPCR). HPCR consists of three components, each tackling one of the limitations of PCR. The contrastive component conditionally incorporates anchor-to-sample pairs to PCR, improving the feature extraction ability. The second is a temperature component that decouples the temperature coefficient into two parts based on their gradient impacts and sets different values for them to enhance the generalization ability. The third is a distillation component that constrains the learning process with additional loss terms to improve the anti-forgetting ability. Experiments on four datasets consistently demonstrate the superiority of HPCR over various state-of-the-art methods.

PaperID: 514,

Authors: Mingjin Zhang, Ke Yue, Jie Guo, Qiming Zhang, Jing Zhang, Xinbo Gao

Affiliations: State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, Shaanxi, China; School of Computer Science, The University of Sydney, Sydney, NSW, Australia; School of Computer Science, Wuhan University, Wuhan, China; Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China

Title: Computational Fluid Dynamic Network for Infrared Small Target Detection

Abstract:
Infrared small target detection (IRSTD) aims to identify and locate small targets amidst background noise. It is highly valuable in various practical application domains, such as maritime rescue and early warning systems deployed in challenging conditions such as harsh weather, low illumination, and long imaging distances. Different from existing works that either adopt well-designed backbone networks or devise specific modules to improve them from different aspects, in this article, we formulate the learning process of IRSTD from a novel perspective, i.e., the mechanism of pixel movement. Considering that the movement of pixels passing through the layers of the network for IRSTD can be analogized to the flow of particles in a fluid dynamic system, we propose a computational fluid dynamic network (CFD-Net) derived from computational fluid dynamics. Technically, we leverage the superiority of the unilateral difference equation with third-order accuracy and devise a unilateral differential residual structure as the backbone of CFD-Net. This design ensures that the pixel stream only flows in the forward direction. In addition, a switch-controlled multidirectional treatment tank (SMTT) is introduced to CFD-Net to dynamically guide the pixel stream to the appropriate path for different targets with varying shapes and orientations, facilitating learning robust target representation and improving detection performance. The proposed CFD-Net is evaluated on the IRSTD-1k and SIRST datasets and is found to outperform existing state-of-the-art (SOTA) methods.

PaperID: 515,

Authors: Zhenyu Chen, Lu Zhang, Ping Hu, Huchuan Lu, You He

Affiliations: School of Information and Communication Engineering, Dalian University of Technology, Dalian, China; School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China; Shenzhen International Graduate School, Tsinghua University, Shenzhen, China

Title: MaskTrack: Auto-Labeling and Stable Tracking for Video Object Segmentation

Abstract:
Video object segmentation (VOS) has witnessed notable progress due to the establishment of video training datasets and the introduction of diverse, innovative network architectures. However, video mask annotation is a highly intricate and labor-intensive task, as meticulous frame-by-frame comparisons are needed to ascertain the positions and identities of targets in the subsequent frames. Current VOS benchmarks often annotate only a few instances in each video to save costs, which, however, hinders the model’s understanding of the complete context of the video scenes. To simplify video annotation and achieve efficient dense labeling, we introduce a zero-shot auto-labeling strategy based on the segment anything model (SAM), enabling it to densely annotate video instances without access to any manual annotations. Moreover, although existing VOS methods demonstrate improving performance, segmenting long-term and complex video scenes remains challenging due to the difficulties in stably discriminating and tracking instance identities. To this end, we further introduce a new framework, MaskTrack, which excels in long-term VOS and also exhibits significant performance advantages in distinguishing instances in complex videos with densely packed similar objects. We conduct extensive experiments to demonstrate the effectiveness of the proposed method and show that without introducing image datasets for pretraining, it achieves excellent performance on both short-term (86.2% in YouTube-VOS val) and long-term (68.2% in LVOS val) VOS benchmarks. Our method also surprisingly demonstrates strong generalization ability and performs well in visual object tracking (VOT) (65.6% in VOTS2023) and referring VOS (RVOS) (65.2% in Ref YouTube VOS) challenges.

PaperID: 516,

Authors: Xiaosheng He, Fan Yang, Fayao Liu, Guosheng Lin

Affiliations: S-Lab, Nanyang Technological University (NTU), Jurong West, Singapore; Agency for Science, Technology and Research (A*STAR), Fusionopolis, Singapore; School of Computer Science and Engineering, NTU, Jurong West, Singapore

Title: Few-Shot Image Generation via Style Adaptation and Content Preservation

Abstract:
Training a generative model with limited data (e.g., 10) is a very challenging task. Many works propose to fine-tune a pretrained GAN model. However, this can easily result in overfitting. In other words, they manage to adapt the style but fail to preserve the content, where style denotes the specific properties that define a domain while content denotes the domain-irrelevant information that represents diversity. Recent works try to maintain a predefined correspondence to preserve the content, however, the diversity is still not enough and it may affect style adaptation. In this work, we propose a paired image reconstruction approach for content preservation. We propose to introduce an image translation module to GAN transferring, where the module teaches the generator to separate style and content, and the generator provides training data to the translation module in return. Qualitative and quantitative experiments show that our method consistently surpasses the state-of-the-art methods in a few-shot setting.

PaperID: 517,

Authors: Ya Zhang, Xiaoling Luo, Qian Sun, Yuchen Wang, Hong Qu, Zhang Yi

Affiliations: School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China; School of Computer Science and Engineering, Sichuan University of Science and Engineering, Yibin, China; College of Computer Science, Sichuan University, Chengdu, China

Title: Retina-Inspired Lightweight Spiking Convolutional Neural Network for Single-Image Dehazing

Abstract:
Suspended particles in hazy medium absorb and scatter light, severely degrading imaging quality. Numerous single-image dehazing methods have been proposed to reconstruct clear images from hazy ones. However, most of them focus on increasing depth and width to improve dehazing performance, which incurs high computation and energy costs. To address this issue, we propose a lightweight spiking convolutional neural network (CNN) referred to as retina-inspired spiking CNN (RI-SCNN) for the reconstruction of hazy images. Unlike conventional dehazing techniques, first, our proposed network simulates the hierarchical structure and cellular function of the retina and devises five network modules to efficiently encode and extract image features through ON and OFF roads. Furthermore, the linear reconstruction mechanism is introduced to integrate the outputs from different roads, adaptively preserving regions with optimal details and constructing a comprehensive visual representation. Finally, by the transformed atmospheric scattering formula, our network can generate the dehazy image. Incorporating the microscale spiking mechanism of the brain, the entire network leverages discrete binary spike trains for information encoding and transmission, directly trained by spiking surrogate gradient learning on integrate-and-fire (IF) neurons. Experimental results demonstrate the superiority of the proposed RI-SCNN in terms of quantitative dehazing performance, qualitative visual effect, energy efficiency, and run speed. Considering its lightweight architecture with ultralow computation and energy costs, the network is encouraged to be deployed in the visual sensor hardware to improve overall performance.

PaperID: 518,

Authors: Youngjoon Yu, Yeonju Kim, Yong Man Ro

Affiliations: Image and Video Systems Laboratory, School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea

Title: Advancing Causal Intervention in Image Captioning With Causal Prompt

Abstract:
This article introduces a novel approach, called causal prompting network (CPNet), to enhance the causal intervention in the context of image captioning. By leveraging visual prompt engineering in the feature space, this method aims to achieve superior performance in causal intervention tasks. Since CPNet is highly flexible and adaptable, it can be incorporated into any existing causal intervention-based image captioning framework. Specifically, two types of visual prompts—causal region of interest (RoI) prompt (CRP) and causal matching prompt (CMP)—are employed to refine the feature representations effectively. CRP is utilized on the RoI feature of the object feature to enhance RoI features with deconfounded causal features. Meanwhile, CMP is used to strengthen the contextual representation of confounders linked to image captioning tasks. To evaluate the proposed CPNet’s effectiveness, an extensive range of experiments are conducted on the popular microsoft common objects in context dataset (MS-COCO) and Flickr30k datasets, and the results are validated using the Karpathy split. Experimental results demonstrate that the proposed CPNet surpasses the performance of other state-of-the-art (SOTA) image captioning methods.

PaperID: 519,

Authors: Christopher Adnel, Islem Rekik

Affiliations: Imperial-X, and the Department of Computing, BASIRA Lab, Imperial College London, London, U.K.

Title: FALCON: Feature-Label Constrained Graph Net Collapse for Memory-Efficient GNNs

Abstract:
Graph neural network (GNN) ushered in a new era of machine learning with interconnected datasets. While traditional neural networks can only be trained on independent samples, GNN allows for the inclusion of intersample interactions in the training process. This gain, however, incurs additional memory cost, rendering most GNNs unscalable for real-world applications involving vast and complicated networks with tens of millions of nodes (e.g., social circles, web graphs, and brain graphs). This means that storing the graph in the main memory can be difficult, let alone training the GNN model with significantly less GPU memory. While much of the recent literature has focused on either mini-batching GNN methods or quantization, graph reduction methods remain largely scarce. Furthermore, present graph reduction approaches have several drawbacks. First, most graph reduction focuses only on the inference stage (e.g., condensation, pruning, and distillation) and requires full graph GNN training, which does not reduce training memory footprint. Second, many methods focus solely on the graph’s structural aspect, ignoring the initial population feature-label distribution, resulting in a skewed postreduction label distribution. Here, we propose a feature-label constrained graph net collapse (FALCON) to address these limitations. Our three core contributions lie in: 1) designing FALCON, a topology-aware graph reduction technique that preserves feature-label distribution by introducing a K-means clustering with a novel dimension-normalized Euclidean distance; 2) implementation of FALCON with other state-of-the-art (SOTA) memory reduction methods (i.e., mini-batched GNN and quantization) for further memory reduction; and 3) extensive benchmarking and ablation studies against SOTA methods to evaluate FALCON memory reduction. Our comprehensive results show that FALCON can significantly collapse various public datasets (e.g., PPI and Flickr to as low as 34% of the total nodes) while keeping equal prediction quality across GNN models. Our FALCON code is available at https://github.com/basiralab/FALCON.

PaperID: 520,

Authors: Meng Zhang, Xing He, Tingwen Huang

Affiliations: Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, College of Electronic and Information Engineering, Southwest University, Chongqing, China; Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, China

Title: Matrix Neurodynamic Approaches for Rank Minimization: Finite/Fixed-Time Convergence Technique

Abstract:
This article presents two innovative matrix neurodynamic approaches (MNAs) designed to tackle the rank minimization problem. First, by introducing the matrix norm-normalized sign function, two variants of MNA are developed: finite-time converging MNA (FINt-MNA) and fixed-time converging MNA (FIXt-MNA). Then, the proposed approaches are shown to guarantee the existence and uniqueness of solutions, and based on Lyapunov analysis, it is demonstrated that the proposed approaches converge to the optimal solution within FINt and FIXt. In addition, upper bounds on the settling time are determined using finite-time and fixed-time lemmas, with subsequent analysis examining the influence of tunable parameters on these bounds for the two approaches through the control variable method. Finally, numerical examples and an image completion experiment confirm the effectiveness and superiority of the proposed approaches compared with the existing MNA and two classical approaches.

PaperID: 521,

Authors: Xiaoli Tang, Han Yu

Affiliations: College of Computing and Data Science, Nanyang Technological University (NTU), Jurong West, Singapore

Title: A Cost-Aware Utility-Maximizing Bidding Strategy for Auction-Based Federated Learning

Abstract:
Auction-based federated learning (AFL) has emerged as an efficient and fair approach to incentivize data owners (DOs) to contribute to federated model training, garnering extensive interest. However, the important problem of helping data consumers (DCs) bid for DOs in competitive AFL settings remains open. Existing work simply treats that the actual cost paid by a winning DC (i.e., the bid cost) is equal to the bid price offered by that DC itself. However, this assumption is inconsistent with the widely adopted generalized second-price (GSP) auction mechanism used in AFL, including in these existing works. Under a GSP auction, the winning DC does not pay its own proposed bid price. Instead, the bid cost for the winner is determined by the second-highest bid price among all participating DCs. To address this limitation, we propose a first-of-its-kind federated cost-aware bidding strategy (FedCA-Bidder) to help DCs maximize their utility under GSP auction-based federated learning (FL). It enables DCs to efficiently bid for DOs in competitive AFL markets, maximizing their utility and improving the resulting FL model accuracy. We first formulate the optimal bidding function under the GSP auction setting, and then demonstrate that it depends on utility estimation and market price modeling, which are interrelated. Based on this analysis, FedCA-Bidder jointly optimizes in a novel end-to-end framework, and then executes the proposed return on investment (ROI)-based method to determine the optimal bid price for each piece of the data resource. Through extensive experiments on six commonly adopted benchmark datasets, we show that FedCA-Bidder outperforms eight state-of-the-art methods, beating the best baseline by 4.39%, 4.56%, 1.33%, and 5.43% on average in terms of the total amount of data obtained, number of data samples per unit cost, total utility, and FL model accuracy, respectively.

PaperID: 522,

Authors: Pavel Sinha, Ioannis N. Psaromiligkos, Zeljko Zilic

Affiliations: Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada

Title: A Structurally Regularized CNN Architecture via Adaptive Subband Decomposition

Abstract:
We propose a generalized convolutional neural network (CNN) architecture that first decomposes the input signal into subbands by an adaptive filter bank structure, and then uses convolutional layers to extract features from each subband independently. Fully connected (FC) layers finally combine the extracted features to perform classification. The proposed architecture restrains each of the subband CNNs from learning using the entire input signal spectrum, resulting in structural regularization. Our proposed CNN architecture is fully compatible with the end-to-end learning mechanism of typical CNN architectures and learns the subband decomposition from the input dataset. We show that the proposed CNN architecture has attractive properties, such as robustness to input and weight-and-bias quantization noise, compared to regular full-band CNN architectures. Importantly, the proposed architecture significantly reduces computational costs, while maintaining state-of-the-art classification accuracy. Experiments on image classification tasks using the MNIST, CIFAR-10/100, Caltech-101, and ImageNet-2012 datasets show that the proposed architecture allows accuracy surpassing state-of-the-art results. On the ImageNet-2012 dataset, we achieved top-5 and top-1 validation set accuracy of 86.91% and 69.73%, respectively. Notably, the proposed architecture offers over 90% reduction in computation cost in the inference path and approximately 75% reduction in back-propagation (per iteration) with just a single-layer subband decomposition. With a two-layer subband decomposition, the computational gains are even more significant with comparable accuracy results to the single-layer decomposition.

PaperID: 523,

Authors: Songlin Fan, Wei Gao, Ge Li

Affiliations: School of Electronic and Computer Engineering, Peking University, Shenzhen, China

Title: Point-MPP: Point Cloud Self-Supervised Learning From Masked Position Prediction

Abstract:
Masked autoencoding has gained momentum for improving fine-tuning performance in many downstream tasks. However, it tends to focus on low-level reconstruction details, lacking high-level semantics and resulting in weak transfer capability. This article presents a novel jigsaw puzzle solver inspired by the idea that predicting the positions of disordered point cloud patches provides more semantic information, similar to how children learn by solving jigsaw puzzles. Our method adopts the mask-then-predict paradigm, erasing the positions of selected point patches rather than their contents. We first partition input point clouds into irregular patches and randomly erase the positions of some patches. Then, a Transformer-based model is used to learn high-level semantic features and regress the positions of the masked patches. This approach forces the model to focus on learning transfer-robust semantics while paying less attention to low-level details. To tie the predictions within the encoding space, we further introduce a consistency constraint on their latent representations to encourage the encoded features to contain more semantic cues. We demonstrate that a standard Transformer backbone with our pretraining scheme can capture discriminative point cloud semantic information. Furthermore, extensive experiments indicate that our method outperforms the previous best competitor across six popular downstream vision tasks, achieving new state-of-the-art performance. Codes will be available at https://git.openi.org.cn/OpenPointCloud/Point-MPP.

PaperID: 524,

Authors: Yue Zhao, Maoguo Gong, Mingyang Zhang, A. K. Qin, Fenlong Jiang, Jianzhao Li

Affiliations: School of Electronic Engineering, Key Laboratory of Collaborative Intelligence Systems, Ministry of Education, Xidian University, Xi’an, China; Department of Computing Technologies, Swinburne University of Technology, Hawthorn, Australia; School of Computer Science and Technology, Key Laboratory of Collaborative Intelligence Systems, Ministry of Education, Xidian University, Xi’an, China; Guangzhou Institute of Technology, Xidian University, Guangzhou, China

Title: SPCNet: Deep Self-Paced Curriculum Network Incorporated With Inductive Bias

Abstract:
The vulnerability to poor local optimum and the memorization of noise data limit the generalizability and reliability of massively parameterized convolutional neural networks (CNNs) on complex real-world data. Self-paced curriculum learning (SPCL), which models the easy-to-hard learning progression from human beings, is considered as a potential savior. In spite of the fact that numerous SPCL solutions have been explored, it still confronts two main challenges exactly in solving deep networks. By virtue of various designed regularizers, existing weighting schemes independent of the learning objective heavily rely on the prior knowledge. In addition, alternative optimization strategy (AOS) enables the tedious iterative training procedure, thus there is still not an efficient framework that integrates the SPCL paradigm well with networks. This article delivers a novel insight that attention mechanism allows for adaptive enhancement in the contribution of diverse instance information to the gradient propagation. Accordingly, we propose a general-purpose deep SPCL paradigm that incorporates the preferences of implicit regularizer for different samples into the network structure with inductive bias, which in turn is formalized as the self-paced curriculum network (SPCNet). Our proposal allows simultaneous online difficulty estimation, adaptive sample selection, and model updating in an end-to-end manner, which significantly facilitates the collaboration of SPCL to deep networks. Experiments on image classification and scene classification tasks demonstrate that our approach surpasses the state-of-the-art schemes and obtains superior performance.

PaperID: 525,

Authors: Zhi Jin, Chenxi Wang, Xing Luo

Affiliations: School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong, China; Department of Frontier Research, Peng Cheng Laboratory, Shenzhen, Guangdong, China

Title: Colorization-Inspired Customized Low-Light Image Enhancement by a Decoupled Network

Abstract:
Recently, numerous inspirational approaches have been proposed to enhance the visual quality of the images captured under poor lighting conditions. Simultaneously, in order to accommodate diverse user esthetics, researchers have explored customized operations within the enhancement process. However, most existing studies ignore the significance of the chrominance component, which often leads to unsatisfactory results in terms of color. To address this issue, we novelly decompose the low-light image enhancement (LLIE) task into the brightening and colorization subtasks and develop a decoupled network called CCNet for colorization-inspired customized enhancement. Specifically, the brightening subtask aims to restore images with normal contrast, less noise, and sharper details. While the colorization subtask utilizes the chrominance information from low-light images as color guidance to predict rich chrominance in enhanced images. Then, in the inference stage, users can adjust the color style or the saturation of color guidance to obtain customized results. Extensive experiments demonstrate that our proposed method achieves superior performance in both general and customized LLIE tasks—particularly in terms of improving chrominance components. Code is available at: https://github.com/FVL2020/CCNet.

PaperID: 526,

Authors: Yuqiang Li, Wenli Du, Xinjie Wang, Minglei Yang, Yunmeng Zhao

Affiliations: Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China; Key Laboratory of Smart Manufacturing in Energy Chemical Process and the Engineering Research Center of Process System Engineering, Ministry of Education, East China University of Science and Technology, Shanghai, China; School of Information Science and Technology, Hangzhou Normal University, Hangzhou, China

Title: Adaptive Robust Stochastic Configuration Networks for Near-Infrared Multivariate Analysis

Abstract:
Near-infrared (NIR) technology has gained wide acceptance in practical processes and is now the measurement of choice in many sectors. However, with increasing spectral dimensionality, it is challenging to establish a prediction model with satisfactory stability and generalization. Stochastic configuration networks (SCNs) based on supervisory learning mechanism have demonstrated significant advantages in developing nonlinear learners. However, existing incremental learning strategies make it difficult to achieve fast convergence while obtaining a suitable-scale network in high-dimensional spectra modeling. In addition, the linear or regularization weight estimation methods are vulnerable to outliers and noise in NIR analysis. To accelerate model construction and improve model performance in high-dimensional spectra analysis, the adaptive robust SCN (AR-SCN) algorithm is proposed in this work, which can perform adaptive incremental learning according to the prediction residual and robustly estimate the output weights by the global-local shrinkage strategy. Comparison results on three benchmark NIR datasets and real-world gasoline blending process verify the effectiveness of the proposed method. Compared with the state-of-the-art SCNs, the AR-SCN method can simultaneously improve the construction efficiency and robustness of SCNs.

PaperID: 527,

Authors: Baoshun Shi, Dan Li

Affiliations: School of Information Science and Engineering and Hebei Key Laboratory of Information Transmission and Signal Processing, Yanshan University, Qinhuangdao, Hebei, China; School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing, China

Title: Provably Bounded Dynamic Sparsifying Transform Network for Compressive Imaging

Abstract:
Compressive imaging (CI) aims to recover the underlying image from the under-sampled observations. Recently, deep unfolded CI (DUCI) algorithms, which unfold the iterative algorithms into deep neural networks (DNNs), have achieved remarkable results. Theoretically, unfolding a convergent iterative algorithm could ensure a stable DUCI algorithm, i.e., its performance increases as the increasing stage. However, ensuring convergence often involves imposing constraints, such as bounded spectral norm or tight property, on the filter weights or sparsifying transform. Unfortunately, these constraints may compromise algorithm performance. To address this challenge, we present a provably bounded dynamic sparsifying transform network (BSTNet), which can be explicitly proven to be a bounded network without imposing constraints on the analysis sparsifying transform. Leveraging this advantage, the analysis sparsifying transform can be adaptively generated via a trainable DNN. Specifically, we elaborate a dynamic sparsifying transform generator capable of extracting multiple feature information from input instances, facilitating the creation of a faithful content-adaptive sparsifying transform. We explicitly demonstrate that the proposed BSTNet is a bounded network, and further embed it as the prior network into a DUCI framework to evaluate its performance on two CI tasks, i.e., spectral snapshot CI (SCI) and compressed sensing magnetic resonance imaging (CSMRI). Experimental results showcase that our DUCI algorithms can achieve competitive recovery quality compared to benchmark algorithms. Theoretically, we explicitly prove that the proposed BSTNet is bounded, and we provide a comprehensive theoretical convergence analysis of the proposed iteration algorithms.

PaperID: 528,

Authors: Xinxin Wang, Yongshan Zhang, Yicong Zhou

Affiliations: Department of Computer and Information Science, University of Macau, Macau, China; School of Computer Science, China University of Geosciences, Wuhan, China

Title: Pseudo-Supervision Affinity Propagation for Efficient and Scalable Multiview Clustering

Abstract:
Anchor graph-based multiview clustering (AGMVC) demonstrates high efficiency and satisfactory performance. However, it still suffers from limitations such as single-structure similarity measurement, high time expenditure for large-scale anchor graph partitioning, and limited generalization ability. To alleviate the instability problem of single-structure information, this article proposes an anchor graph construction method that learns local and global (LG) structures simultaneously. To eliminate the need for graph partitioning and address the out-of-sample problem, we develop a landmark learning method to produce structural anchors, and further propose a pseudo-supervision affinity propagation (PSAP) framework. This framework jointly optimizes graph construction and landmark learning to disentangle the in-cluster distribution between samples and anchors while accelerating convergence. In addition, our framework introduces a clustering inference partition (CIP) strategy to directly output clustering results without the need for time-consuming postprocessing. Extensive experiments validate the efficiency and effectiveness of our framework. Our code is publicly available at https://github.com/W-Xinxin/PSAP.

PaperID: 529,

Authors: Zecheng Tang, Xiaolong Wu, Honggui Han

Affiliations: School of Information Science and Technology, Beijing Key Laboratory of Computational Intelligence and Intelligent System, Engineering Research Center of Digital Community, Ministry of Education, Beijing Artificial Intelligence Institute and Beijing Laboratory for Smart Environmental Protection, Beijing University of Technology, Beijing, China

Title: Robust Reconstructed Neural Network Based on Spectral Elastic Activation

Abstract:
The presence of sparse samples poses a formidable challenge for neural networks (NNs) in shaping representative patterns due to the limited coverage of concentrative activation range in NNs. To address this issue, a robust reconstructed NN, based on spectral elastic activation (SEA-RRNN), is developed in this article. Primarily, a spectral elastic activation (SEA) is designed to broaden the original activation of NNs, which embeds a spectral increment scaled by the estimated outlier degree on the activation boundary. It enables SEA-RRNN to stretch the boundary of SEA to cover the features of sparse samples. Then, an adaptive robust gradient descent (ARGD) algorithm is introduced to update the parameters of SEA. By tuning the loss between error and correntropy with the estimated outlier degree, the ARGD algorithm establishes two loss functions with complementary distance sensitivity for different parameters of SEA, which alleviates the construction conflict of robust center and precise boundary in SEA-RRNN. Furthermore, the theoretical analysis of SEA-RRNN is provided to validate its convergence and robustness. Finally, the experimental results demonstrate that SEA-RRNN exhibits superior robustness compared to other NN models.

PaperID: 530,

Authors: Takashi Matsubara, Takehiro Aoshima, Ai Ishikawa, Takaharu Yaguchi

Affiliations: Faculty of Information Science and Technology, Hokkaido University, Hokkaido, Japan; Graduate School of Engineering Science, Osaka University, Osaka, Japan; Central Research Institute of Electric Power Industry, Tokyo, Japan; Graduate School of Science, Kobe University, Kobe, Japan

Title: Deep Energy-Based Discrete-Time Physical Model for Reproducing Energetic Behavior

Abstract:
Modeling and simulating physical phenomena, especially those governed by partial differential equations (PDEs), pose significant challenges in computational physics and scientific machine learning. While neural network approaches have made strides in learning continuous-time dynamics, they have struggled with discrete-time scenarios and often fail to adhere to fundamental laws of physics, such as the conservation of energy and mass. This study addresses this gap by introducing a novel deep energy-based discrete-time model. In the real world, energy-based modeling theories like Hamiltonian mechanics and the Landau theory are pivotal, as they support various laws of physics. By integrating differential geometric structures into neural networks as coefficient matrices, our model successfully simulates the conservation and dissipation laws of energy and mass. Furthermore, we propose an automatic discrete differentiation algorithm, which enables neural networks to utilize the discrete gradient method, ensuring adherence to these laws in discrete-time settings. This capability also facilitates the identification of such laws directly from data by learning matrices that represent geometric structures. These advantages are verified using simulation results of physical phenomena, namely the 1- and 2-D Korteweg–de Vries (KdV) equation and the Cahn–Hilliard equation.

PaperID: 531,

Authors: Jin-Rong Zhang, Wujun Wen, Shen-lan Liu, Gao Huang, Yun-Heng Li, Qifeng Li, Lin Feng

Affiliations: School of Control Science and Engineering, Dalian University of Technology, Dalian, China; School of Computer Science and Technology, Dalian University of Technology, Dalian, China; School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian, China; Department of Automation, Tsinghua University, Beijing, China

Title: End-to-End Streaming Video Temporal Action Segmentation With Reinforcement Learning

Abstract:
The streaming temporal action segmentation (STAS) task, a supplementary task of temporal action segmentation (TAS), has not received adequate attention in the field of video understanding. Existing TAS methods are constrained to offline scenarios due to their heavy reliance on multimodal features and complete contextual information. The STAS task requires the model to classify each frame of the entire untrimmed video sequence clip by clip in time, thereby extending the applicability of TAS methods to online scenarios. However, directly applying existing TAS methods to SATS tasks results in significantly poor segmentation outcomes. In this article, we thoroughly analyze the fundamental differences between STAS tasks and TAS tasks, attributing the severe performance degradation when transferring models to model bias and optimization dilemmas. We introduce an end-to-end streaming video TAS model with reinforcement learning (SVTAS-RL). The end-to-end modeling method mitigates the modeling bias introduced by the change in task nature and enhances the feasibility of online solutions. Reinforcement learning (RL) is utilized to alleviate the optimization dilemma. Through extensive experiments, the SVTAS-RL model significantly outperforms existing STAS models and achieves competitive performance to the state-of-the-art (SOTA) TAS model on multiple datasets under the same evaluation criteria, demonstrating notable advantages on the ultralong video dataset EGTEA. Our code is publicly available at https://github.com/Thinksky5124/SVTAS.

PaperID: 532,

Authors: Min Li, Yunlong Zhao, Qizhen Wang, Hanlin Gao, Gang Wang

Affiliations: School of Mechanical Engineering, Southwest Jiaotong University, Chengdu, China; School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China; National Key Laboratory of Wireless Communications, University of Electronic Science and Technology of China, Chengdu, China

Title: RBF NN-Enabled Adaptive Filter for Any Type of Noise

Abstract:
The brief proposes a radial basis function (RBF) neural network (NN)-enabled adaptive filter (AF) algorithm, which consists of two stages. The first stage is a data-driven (DD) preprocessing part, and the RBF NN is to fit the probability density function (pdf) of the noise. The second stage is a model-driven filtering part, the RBF NN works as the cost function of the adaptive filtering, and an adaptive gradient ascent algorithm is obtained by maximizing the RBF NN. Since the RBF NN can fit any pdf of the noise, the proposed algorithm can work well in Gaussian, sub-Gaussian or light-tailed (uniform), and super-Gaussian or heavy-tailed (multipeak, pulse, and skewness) noises. Theoretical analysis shows the mean-value stability and mean square performance. Simulations verify the effectiveness of the proposed algorithm.

PaperID: 533,

Authors: Qian Cui, Gang Feng, Xuesong Xu

Affiliations: School of Automation, Guangdong University of Technology, Guangzhou, China; Department of Biomedical Engineering, City University of Hong Kong, Hong Kong, China; Department of Frontier Interdisciplinary, Hunan University of Technology and Business, Changsha, China

Title: Q-Learning-Based Robust Control for Nonlinear Systems With Mismatched Perturbations

Abstract:
This brief presents a novel optimal control (OC) approach based on \mathcal Q -learning to address robust control challenges for uncertain nonlinear systems subject to mismatched perturbations. Unlike conventional methodologies that solve the robust control problem directly, our approach reformulates the problem by minimizing a value function that integrates perturbation information. The \mathcal Q -function is subsequently constructed by coupling the optimal value function with the Hamiltonian function. To estimate the parameters of the \mathcal Q -function, an integral reinforcement learning (IRL) technique is employed to develop a critic neural network (NN). Leveraging this parameterized \mathcal Q -function, we derive a model-free OC solution that generalizes the model-based formulation. Furthermore, using Lyapunov’s direct method, the resulting closed-loop system is guaranteed to have uniform ultimate bounded stability. A case study is presented to showcase the effectiveness and applicability of the proposed approach.

PaperID: 534,

Authors: Changtian Ying, Qi Li, Chen Wang, Donghua Yu

Affiliations: Department of Computer Science and Engineering, Shaoxing University, Shaoxing, Zhejiang, China; School of Computer Science, Chongqing University, Chongqing, China

Title: DNRHP: Temporal Network Representation Learning via Hawkes Point Process

Abstract:
Graph neural networks (GNNs) have significantly advanced our ability to mine structured data, playing a central role in areas such as social networks and recommendation systems. However, while most GNN-based methods focus on learning node representations in static graphs, they often ignore the dynamic nature of real-world networks, limiting their applicability. Furthermore, existing dynamic representation learning methods using Hawkes point processes, while capable of modeling event sequences, are inherently transductive and tailored to specific scenarios with dual timescales and mixed event types, thus not fully generalizable. To bridge this gap, we introduce DNRHP, a novel framework for learning temporal network representations. Specifically, DNRHP integrates historical edge (HE) information with the network’s evolutionary properties, using the Hawkes point process to model edge formation. It captures not only the influence of past events on the likelihood of future connections but also the impact of the structural evolution of the network. The novelty of our model lies in its comprehensive consideration of the dynamics of network evolution and historical connectivity, allowing for a more accurate representation of nodes and their interactions over time. Extensive experiments on diverse real-world networks demonstrate the effectiveness of DNRHP, outperforming state-of-the-art baselines in terms of accuracy and efficiency for tasks such as node classification and link prediction.

PaperID: 535,

Authors: Tharindu Fernando, Darshana Priyasad, Sridha Sridharan, Clinton Fookes

Affiliations: Signal Processing, Artificial Intelligence and Vision Technologies, Queensland University of Technology, Brisbane, QLD, Australia

Title: Decoupled and Explainable Associative Memory for Effective Knowledge Propagation

Abstract:
Long-term memory often plays a pivotal role in human cognition through the analysis of contextual information. Machine learning researchers have attempted to emulate this process through the development of memory-augmented neural networks (MANNs) to leverage indirectly related but resourceful historical observations during learning and inference. The area of MANN, however, is still in its infancy and significant research effort is required to enable machines to achieve performance close to the human cognition process. This article presents an innovative MANN framework for the advanced incorporation of historical knowledge into a predictive framework. Within the key-value memory structure, we propose to decouple the key representations from the learned value memory embeddings to offer improved associations between the inputs and latent memory embeddings. We argue that the keys should be static, sparse, and unique representations of a particular observation to offer robust input to memory associations, while the value embeddings could be trainable, dense latent vectors such that they can better capture historical knowledge. Moreover, we introduce a novel memory update procedure that preserves the explainability of the historical knowledge extraction process, which would enable the human end-users to interpret the deep machine learning model decisions, fostering their trust. With extensive experiments conducted on three different datasets using audio, text, and image modalities, we demonstrate that our proposed innovations collectively allow this framework to outperform the current state-of-the-art methods by significant margins, irrespective of the modalities or the downstream tasks. The code is available at https://github.com/tha725/DE-KVMN/tree/main.

PaperID: 536,

Authors: Yanguang Sun, Hanyu Xuan, Jian Yang, Lei Luo

Affiliations: PCA Laboratory, Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China; School of Big Data and Statistics, Anhui University, Hefei, China

Title: GLCONet: Learning Multisource Perception Representation for Camouflaged Object Detection

Abstract:
Recently, the biological perception has been a powerful tool for handling the camouflaged object detection (COD) task. However, most existing methods are heavily dependent on the local spatial information of diverse scales from convolutional operations to optimize initial features. A commonly neglected point in these methods is the long-range dependencies between feature pixels from different scale spaces that can help the model build a global structure of the object, inducing a more precise image representation. In this article, we propose a novel global-local collaborative optimization network called GLCONet. Technically, we first design a collaborative optimization strategy (COS) from the perspective of multisource perception to simultaneously model the local details and global long-range relationships, which can provide features with abundant discriminative information to boost the accuracy in detecting camouflaged objects. Furthermore, we introduce an adjacent reverse decoder (ARD) that contains cross-layer aggregation and reverse optimization to integrate complementary information from different levels for generating high-quality representations. Extensive experiments demonstrate that the proposed GLCONet method with different backbones can effectively activate potentially significant pixels in an image, outperforming 20 state-of-the-art (SOTA) methods on three public COD datasets. The source code is available at: https://github.com/CSYSI/GLCONet.

PaperID: 537,

Authors: Lei Ding, Can Chen, Maojiao Ye, Qing-Long Han

Affiliations: Institute of Advanced Technology, Nanjing University of Posts and Telecommunications, Nanjing, China; School of Automation, Nanjing University of Science and Technology, Nanjing, China; School of Science, Computing and Engineering Technologies, Swinburne University of Technology, Melbourne, VIC, Australia

Title: Fully Distributed Nash Equilibrium Seeking: A Double-Layer Adaptive Approach

Abstract:
This article is concerned with fully distributed Nash equilibrium seeking in networked games under both undirected and directed communication graphs. New fully Nash equilibrium seeking strategies incorporating gradient-based optimization algorithms, consensus algorithms, and double-layer adaptive control laws are presented. In particular, the double-layer adaptive control laws are introduced to ensure that the control gains are not overlarge and free of dependence on any global information. This is achieved by adding a damping term to the adaptive parameter design such that the continuous increase in control gains is avoided. Theoretical analyses are conducted to prove that players’ actions can be convergent to the Nash equilibrium under the proposed strategies. Moreover, it is shown that the developed strategies can be extended to accommodate the players with heterogeneous linear dynamics. Finally, numerical examples are provided to illustrate the effectiveness of the proposed methods.

PaperID: 538,

Authors: Chao Li, Shaokang Dong, Shangdong Yang, Yujing Hu, Tianyu Ding, Wenbin Li, Yang Gao

Affiliations: State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China; NetEase Fuxi AI Lab, NetEase Inc., Hangzhou, China; Applied Sciences Group, Microsoft Corporation, Redmond, WA, USA

Title: Multi-Task Multi-Agent Reinforcement Learning With Interaction and Task Representations

Abstract:
Multi-task multi-agent reinforcement learning (MT-MARL) is capable of leveraging useful knowledge across multiple related tasks to improve performance on any single task. While recent studies have tentatively achieved this by learning independent policies on a shared representation space, we pinpoint that further advancements can be realized by explicitly characterizing agent interactions within these multi-agent tasks and identifying task relations for selective reuse. To this end, this article proposes Representing Interactions and Tasks (RIT), a novel MT-MARL algorithm that characterizes both intra-task agent interactions and inter-task task relations. Specifically, for characterizing agent interactions, RIT presents the interactive value decomposition to explicitly take the dependency among agents into policy learning. Theoretical analysis demonstrates that the learned utility value of each agent approximates its Shapley value, thus representing agent interactions. Moreover, we learn task representations based on per-agent local trajectories, which assess task similarities and accordingly identify task relations. As a result, RIT facilitates the effective transfer of interaction knowledge across similar multi-agent tasks. Structurally, RIT develops universal policy structure for scalable multi-task policy learning. We evaluate RIT against multiple state-of-the-art baselines in various cooperative tasks, and its significant performance under both multi-task and zero-shot settings demonstrates its effectiveness.

PaperID: 539,

Authors: Kunlun Wu, Bo Peng, Donghai Zhai

Affiliations: School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, Sichuan, China

Title: Boundary-Aware Axial Attention Network for High-Quality Pavement Crack Detection

Abstract:
Pavement crack detection is a practical and challenging task that has the ability to significantly reduce the burden of manual building and road maintenance in intelligent transportation systems. Existing methods mainly focus on addressing common crack diseases and are poor in generalizing to other conditions of crack detection due to diverse environmental factors (e.g., illumination), topology complexity, and intensity in-homogeneity. Moreover, the samples suffer from the severe foreground-background imbalance and the model is easily prone to overfitting on trained anomalies, resulting in unsatisfactory performance. To tackle the aforementioned challenges and achieve high-quality pavement crack detection, we propose an innovative approach termed boundary-aware axial attention network (BAAN), which is composed of multiple position-guided axial attention (PAA) modules in a hierarchical encoder-decoder architecture. Specifically, it learns efficient contextual information via decomposed multidimensional position-guided attention to capture more precise spatial structures, and the proposed boundary regularization module (BRM) mines more discriminative foreground-background relationships to regularize the ambiguous details between diverse spatial regions. Moreover, we propose a novel boundary refinement loss (BRL) to alleviate the challenges associated with regional losses (e.g., pixel-wise cross-entropy loss) in the context of heavily imbalanced crack detection problems. The proposed BAAN is evaluated on four crack datasets and experimental results indicate that the BAAN consistently outperforms the state-of-the-art methods with fewer computational requirements.

PaperID: 540,

Authors: Xi Yang, Wenjiao Dong, De Cheng, Nannan Wang, Xinbo Gao

Affiliations: State Key Laboratory of Integrated Services Networks, School of Telecommunications Engineering, Xidian University, Xi’an, China; Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China

Title: TIENet: A Tri-Interaction Enhancement Network for Multimodal Person Reidentification

Abstract:
Multimodal person reidentification (ReID), which aims to learn modality-complementary information by utilizing multimodal images simultaneously for person retrieval, is crucial for achieving all-time and all-weather monitoring. Existing methods try to address this issue through modality fusion to absorb complementary information. However, most of these methods are limited to the spatial domain only and usually overlook the intra-/intermodal interactions during feature fusion, resulting in insufficient learning of modality-specific and complementary information. To address these issues, we propose a tri-interaction enhancement network (TIENet), which contains three modules: spatial-frequency interaction (SFI), intermodal mask interaction (IMMI), and intramodal feature fusion (IMFF). Specifically, the SFI boosts the modality-specific representation by integrating the amplitude-guided attention mechanism into the phase space, combined with spatial-domain convolution to achieve fine-grained information learning. Meanwhile, the IMMI enhances the richness of the feature descriptors by embedding the intermodal relationships to preserve complementary information. Finally, the IMFF module considers the structure of the human body and integrates intramodal contextual information. Extensive experimental results demonstrate the effectiveness of our method, achieving superior performances on RGBNT201 and MARKET1501_RGBNT datasets.

PaperID: 541,

Authors: Hongsong Tang, Yingzhuo Liu, Letian Ni, Liuyu Xiang, Yaodong Yang, Ke Bi, Zhaofeng He

Affiliations: School of Science, Beijing University of Posts and Telecommunications, Beijing, China; School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China; Institute for Artificial Intelligence, Peking University, Beijing, China

Title: Distributed Policy Space Response Oracles in Two-Player Zero-Sum Games

Abstract:
Policy space response oracle (PSRO) is a population-based algorithm that can be used to solve two-player zero-sum games. In the PSRO solution framework, optimizing policy diversity is crucial for addressing nontransitive game problems, helping the agent population avoid exploitation by unfamiliar opponents. In addition, while deep reinforcement learning is highly effective in solving complex game environments, its integration with PSRO remains fragmented and lacking in effective coordination. In this study, we propose distributed PSRO to efficiently solve complex game scenarios. To enhance diversity while managing optimization costs, we introduce TOP-K truncation, which prioritizes high-quality opponents and limits the size of the policy pool during sampling. This approach not only reduces interference from less effective strategies but also ensures computational efficiency by seamlessly integrating with our distributed training framework. We also design the distributed training framework to incorporate diversity estimation directly into the sampling process, achieving diversity optimization without incurring additional computational overhead. Furthermore, we introduce the opponent first (OF) method, which enhances decision-making by leveraging opponent information during interaction sampling. We perform experimental validation using a nontransitive mixture model and AlphaStar888 to confirm the effectiveness of the TOP-K truncation approach. Finally, we demonstrate the feasibility and efficiency of the distributed training framework and the OF approach in a Google Research Football 11 versus 11 scenario.

PaperID: 542,

Authors: Nana Yu, Jie Wang, Zihao Zhang, Yahong Han, Weiping Ding

Affiliations: College of Intelligence and Computing, Tianjin University, Tianjin, China; School of Artificial Intelligence and Computer Science, Nantong University, Nantong, China

Title: Semantic Prompt Enhancement for Semi-Supervised Low-Light Salient Object Detection

Abstract:
Most existing salient object detection (SOD) models are designed based on data collected in well-lit scenes, which is entirely inadequate for low-light conditions. Although recent models are designed for low-light conditions, they still have limitations. First, they simply integrate features without considering the impact of low-light scenes and fail to enhance the contextual information around salient objects. Second, in extremely dark scenes, it is difficult for the human eye to distinguish between the foreground and background, posing significant challenges for data labeling. To address these issues, we design a brightness Retinex enhancer (BRE) tailored for low-light SOD tasks and, for the first time, explore performing low-light SOD within a semi-supervised framework. By using sparse labeled semantic prompts to augment a large amount of unlabeled data, we mitigate the annotation burden while avoiding ineffective labeling in low-light conditions. More specifically, we first use Retinex decomposition to filter out the influence of illumination, while the semantic features extracted by a large model serve as semantic prompts to assist in enhancement. In addition, we introduce a context-guided encoder (CGE) to improve the model’s understanding of salient objects. Finally, both labeled and unlabeled data undergo joint consistency training between the shared decoder (SD) and the perturbation decoder. The semi-supervised model enhances low-light SOD performance while also alleviating the burden of data annotation. Extensive experiments demonstrate that, compared with state-of-the-art fully supervised SOD models, the proposed semi-supervised model achieves highly competitive results across multiple test datasets.

PaperID: 543,

Authors: Bei Pan, Kaoru Hirota, Yaping Dai, Zhiyang Jia, Shuai Shao, Jinhua She

Affiliations: Collective Intelligence and Collaboration Laboratory, China North Artificial Intelligence and Innovation Research Institute, Beijing, China; National Key Laboratory of Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing, China; School of Automation, Beijing Information Science and Technology University, Beijing, China; School of Engineering, Tokyo University of Technology, Tokyo, Japan

Title: Learning Sequential Variation Information for Dynamic Facial Expression Recognition

Abstract:
A multiscale sequence information fusion (MSSIF) method is presented for dynamic facial expression recognition (DFER) in video sequences. It exploits multiscale information by integrating features from individual frames, subsequences, and entire sequences through a transformer-based architecture. This hierarchical feature fusion process includes deep feature extraction at the frame level to capture intricate visual details, intrasubsequence fusion using self-attention mechanisms for analyzing adjacent frames, and intersubsequence fusion to synthesize long-term emotional dynamics across time scales. The efficacy of MSSIF is demonstrated through extensive evaluation on three video datasets: eNTERFACE’05, BAUM-1s, and AFEW, where it achieves overall recognition accuracies of 60.1%, 60.7%, and 58.8%, respectively. These results substantiate MSSIF’s superior performance in accurately recognizing facial expressions by managing short and long-term dependencies within video sequences, making it a potent tool for real-world applications requiring nuanced dynamic facial expression detection.

PaperID: 544,

Authors: Yujie Wang, Weiwei Xu, Lei Zhu

Affiliations: School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, China; College of Engineering, Nanjing Agricultural University, Nanjing, China

Title: Efficient Linear Discriminant Analysis Based on Randomized Low-Rank Approaches

Abstract:
Linear discriminant analysis (LDA) faces challenges in practical applications due to the small sample size (SSS) problem and high computational costs. Various solutions have been proposed to address the SSS problem in both ratio trace LDA and trace ratio LDA (TRLDA). However, the iterative processing of large matrices often makes the computation process cumbersome. To address this issue, for TRLDA, we propose a novel random method that extracts orthogonal bases from matrices, allowing computations with small-sized matrices. This significantly reduces computational time without compromising accuracy. For ratio trace LDA, we introduce a fast generalized singular value decomposition (GSVD) algorithm, which demonstrates superior speed compared to MATLAB’s built-in GSVD algorithm in experiments. By integrating this new GSVD algorithm into ratio trace LDA, we propose FGSVD-LDA, which exhibits low computational complexity and good classification performance. The experimental results show that both methods effectively achieve dimensionality reduction and deliver satisfactory classification accuracy.

PaperID: 545,

Authors: Honggui Han, Chenxuan Sun, Xiaolong Wu, Hongyan Yang, Dezheng Zhao

Affiliations: Faculty of Information Technology and Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China; Intelligence Technology of CEC Company Ltd., Beijing, China

Title: Self-Organizing Stacked Type-2 Fuzzy Neural Network With Rule Generalization

Abstract:
Type-2 fuzzy neural networks (T2FNNs) are particularly effective in dealing with nonlinear systems. However, they inevitably suffer from multicollinearity problems caused by the significant overlaps of the footprint uncertainty (FOU), which leads to generalization biases. To solve this challenge, a self-organizing stacked T2FNN with rule generalization (RG-SOST2FNN) is developed to boost its overall performance. First, a stacked technique with cosine smart priority is designed for T2FNN fusion. This technique employs multivariable cosine similarity to obtain sparse inputs, which selectively stacks multiple T2FNNs with non-collinear inputs to reduce collinearity dependence. Second, a dynamic stacked framework with a rule cluster generation mechanism is developed to achieve individual and batch rule adjustment. Then, a stacked structure with diversity is obtained to alleviate collinearity among rules by eliminating the singularity of the parameter matrix of FOUs. Third, a stacked risk mitigation algorithm is proposed to shape the fuzzy rule clusters (FRCs). Then, the parameters of FRCs are optimized using sparse gradient learning, which avoids the updating of collinear features to reduce the variance of parameter estimation. Finally, the simulation tests show that RG-SOST2FNN can achieve state-of-the-art performance even at high multicollinearity in complex systems.

PaperID: 546,

Authors: Ruixin Chen, Jianping Fan, Mei-Qin Wu, Rui Cheng, Jiawen Song

Affiliations: School of Economics and Management, Shanxi University, Taiyuan, China

Title: G-Diff: A Graph-Based Decoding Network for Diffusion Recommender Model

Abstract:
The recommendation system is an effective approach to alleviate the information overload caused by the popularization of the Internet. Existing recommendation methods often use advanced deep learning algorithms to predict user preferences. The diffusion model is a deep generative model that has received much attention in recent years and has been successfully applied in recommendation systems. However, previous research has mainly used MLP in the reverse process of the diffusion model, which fails to fully utilize the collective signals of various items in the recommendation system. This article improves the diffusion recommendation model by introducing a carefully designed graph-based decoding network (GDN) in the reverse process. GDN improves recommendation performance by introducing relationships between items via the item-item graph. In addition, skip connections and normalization layers are implemented to maintain low-order neighbor information. Experiments are conducted to compare the proposed model with several state-of-the-art recommendation methods on three real-world datasets, which demonstrate the improvement of the proposed method over the diffusion recommendation model. Specifically, the proposed method outperforms the diffusion recommendation model with autoencoder (AE) by 21.67% on average. The contribution of each component of the proposed model is also illustrated by the ablation experiments. The implementation codes of the proposed model are available via https://github.com/crx1729/G-Diff.

PaperID: 547,

Authors: Hao Zhang, Chenglin Li, Wenrui Dai, Junni Zou, Hongkai Xiong

Affiliations: School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China

Title: Federated Learning Based on Model Discrepancy and Variance Reduction

Abstract:
In federated learning (FL), the heterogeneity of data and asynchronous participation of clients have been observed to induce the local client’s model discrepancy with high variance, leading to a slow and unstable convergence globally at the server. In this article, motivated by the usefulness of stale client updates, we first propose a general framework, named FedVR, to address this issue. In FedVR, we design an aggregate of both fresh and stale local model updates without additional communication overhead, which is computed at the server as a control variate to reduce the client variance incurred by data heterogeneity and client sampling. In order to further reduce the model discrepancy between local clients, we therefore propose FedMDVR, which broadcasts the designed control variate to all the active clients to help correct their local update directions toward the global optimum, i.e., stationary point of the global objective function. While in the global update at server, the client variance is also decreased as inherited from the variance reduction nature of FedVR. We theoretically prove the convergence of FedVR and FedMDVR in the general nonconvex settings. Through extensive experimental evaluations on several benchmark datasets, we also demonstrate that our proposed FedVR and FedMDVR not only accelerate the convergence by reducing the number of communication rounds required to achieve a certain target accuracy, but more importantly, can converge to a higher accuracy than the baseline algorithms.

PaperID: 548,

Authors: Sucheng Ren, Xiaomeng Li

Affiliations: Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, SAR, China

Title: HResFormer: Hybrid Residual Transformer for Volumetric Medical Image Segmentation

Abstract:
Vision Transformer shows great superiority in medical image segmentation due to the ability to learn long-range dependency. For medical image segmentation from 3-D data, such as computed tomography (CT), existing methods can be broadly classified into 2-D-based and 3-D-based methods. One key limitation in 2-D-based methods is that the intraslice information is ignored, while the limitation in 3-D-based methods is the high computation cost and memory consumption, resulting in a limited feature representation for inner slice information. During the clinical examination, radiologists primarily use the axial plane and then routinely review both axial and coronal planes to form a 3-D understanding of anatomy. Motivated by this fact, our key insight is to design a hybrid model that can first learn fine-grained inner slice information and then generate a 3-D understanding of anatomy by incorporating 3-D information. We present a novel Hybrid Residual TransFormer (HResFormer) for 3-D medical image segmentation. Building upon standard 2-D and 3-D Transformer backbones, HResFormer involves two novel key designs: 1) a Hybrid Local-Global fusion Module (HLGM) to effectively and adaptively fuse inner slice information from 2-D Transformers and intraslice information from 3-D volumes for 3-D Transformers with local fine-grained and global long-range representation and 2) residual learning of the hybrid model, which can effectively leverage the inner slice and intraslice information for better 3-D understanding of anatomy. Experiments show that our HResFormer outperforms prior art on widely used medical image segmentation benchmarks. This article sheds light on an important but neglected way to design Transformers for 3-D medical image segmentation.

PaperID: 549,

Authors: Shashi Raj Pandey, Pierre Pinson, Petar Popovski

Affiliations: Department of Electronic Systems, Connectivity Section, Aalborg University, Aalborg, Denmark; Dyson School of Design Engineering, Imperial College London, London, U.K.

Title: Privacy-Aware Data Acquisition Under Data Similarity in Regression Markets

Abstract:
Data markets facilitate decentralized data exchange for applications such as prediction, learning, or inference. The design of these markets is challenged by varying privacy preferences and data similarity among data owners. Related works have often overlooked how data similarity impacts pricing and data value through statistical information leakage. We demonstrate that data similarity and privacy preferences are integral to market design and propose a query-response protocol using local differential privacy (LDP) for a two-party data acquisition mechanism. In our regression data market model, we analyze strategic interactions between privacy-aware owners and the learner as a Stackelberg game over the asked price and privacy factor. Finally, we numerically evaluate how data similarity affects market participation and traded data value.

PaperID: 550,

Authors: Yihong Leng, Jiaojiao Li, Rui Song, Yunsong Li, Qian Du

Affiliations: State Key Laboratory of Integrated Service Networks, School of Telecommunications Engineering, Xidian University, Xi’an, China; Department of Electrical and Computer Engineering, Mississippi State University, Starkville, MS, USA

Title: Uncertainty-Guided Discriminative Priors Mining for Flexible Unsupervised Spectral Reconstruction

Abstract:
Existing supervised spectral reconstruction (SR) methods adopt paired RGB images and hyperspectral images (HSIs) to drive the overall paradigms. Nonetheless, in practice, “paired” requires higher device requirements such as specific well-calibrated dual cameras or more complex and exact registration processes among images with different time phases, widths, and spatial resolution. To tackle the above challenges, we propose a flexible uncertainty-aware unsupervised SR paradigm, which dynamically establishes the forceful and potent constraints with RGBs for driving unsupervised learning. As a specific plug-and-play tail in our paradigm, the uncertainty-aware saliency alignment module (USAM) calculates pixel- and spectralwise information entropy for uncertainty estimation, which attempts to represent the corresponding reflectivity or radiance to the light among different objects in various scenes, forcing the paradigm to adaptively explore the scene-agnostic prominent features. Furthermore, a progressively parallel network under our unsupervised paradigm is conducted to excavate discriminate structural and semantic priors of RGBs to assist in recovering dependable HSIs: 1) a learnable rank-guided structural representation (LRSR) flow is leveraged to characterize the latent structural priors via excavating nonzero elements in the full-rank matrix and further preserve evident boundaries in HSIs; and 2) a coarse-to-fine bandwise semantic perception (CBSP) flow is conducted to propagate perceptual bandwise affinity for aggregating and strengthening intrinsic interband dependencies, and further extract delicate semantic priors, which can recover plentiful contiguous spectral information in HSIs. Comprehensive quantitative and qualitative experimental results on three visual and two remote sensing benchmarks have shown the superiority and robustness of our method. We also conducted nine existing SR methods in our unsupervised paradigm to recover HSIs without any manual intervention, which proves the generality of our paradigm to some extent.

PaperID: 551,

Authors: Zhiqiang Pan, Honghui Chen, Wanyu Chen, Fei Cai, Xinwang Liu

Affiliations: Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha, China; College of Electronic Countermeasures, National University of Defense Technology, Hefei, China; School of Computer, National University of Defense Technology, Changsha, China

Title: Time-Aware Graph Learning for Link Prediction on Temporal Networks

Abstract:
Link prediction on temporal networks aims to predict the future edges by modeling the dynamic evolution involved in the graph data. Previous methods relying on the node/edge attributes or the distance on the graph structure are not practical due to the deficiency of the attributes and the limitation of the explicit distance estimation, respectively. Moreover, the existing graph representation learning methods mostly rely on graph neural networks (GNNs), which cannot adequately take the dynamic correlations between nodes into consideration, leading to the generating of inferior node embeddings. Thus, we propose a time-aware graph (TAG) learning method for link prediction on temporal networks. We first conduct a theoretical causal analysis proving that the correlations between nodes are required to be unchanged for the temporal graph representation learning using GNNs. Then, we model the recent dynamic node correlations by designing an edge-dropping (ED) module and adopting a recent neighbor sampling (RNS) strategy so as to approximate the above condition. Besides, we also preserve the long-term stable node correlations by introducing additional self-supervisions using the contrastive learning. Comprehensive experiments were conducted on four public temporal network datasets, i.e., MathOverflow, StackOverflow, AskUbuntu, and SuperUser, demonstrate that TAG can achieve state-of-the-art performance in terms of average precision (AP) and area under the ROC curve (AUC). In addition, TAG can ensure high computational efficiency by making the temporal graph lightweight, letting it be practical in real-world applications.

PaperID: 552,

Authors: Pengfei Sun, Jibin Wu, Malu Zhang, Paul Devos, Dick Botteldooren

Affiliations: Department of Information Technology, WAVES Research Group, Ghent University, Ghent, Belgium; Department of Data Science and Artificial Intelligence, Department of Computing, The Hong Kong Polytechnic University, Hong Kong, SAR, China; School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China

Title: Delayed Memory Unit: Modeling Temporal Dependency Through Delay Gate

Abstract:
Recurrent neural networks (RNNs) are widely recognized for their proficiency in modeling temporal dependencies, making them highly prevalent in sequential data processing applications. Nevertheless, vanilla RNNs are confronted with the well-known issue of gradient vanishing and exploding, posing a significant challenge for learning and establishing long-range dependencies. Additionally, gated RNNs tend to be over-parameterized, resulting in poor computational efficiency and network generalization. To address these challenges, this article proposes a novel delayed memory unit (DMU). The DMU incorporates a delay line structure along with delay gates into vanilla RNN, thereby enhancing temporal interaction and facilitating temporal credit assignment. Specifically, the DMU is designed to directly distribute the input information to the optimal time instant in the future, rather than aggregating and redistributing it over time through intricate network dynamics. Our proposed DMU demonstrates superior temporal modeling capabilities across a broad range of sequential modeling tasks, utilizing considerably fewer parameters than other state-of-the-art gated RNN models in applications such as speech recognition, radar gesture recognition, ECG waveform segmentation, and permuted sequential (PS) image classification.

PaperID: 553,

Authors: Ming Zhang, Zhe Chen, Vasile Palade, Tao Lu, Liya Wang, Junchi Zhang, Yanduo Zhang

Affiliations: Hubei Provincial Key Laboratory of Intelligent Robot, the School of Computer Science and Engineering, and the School of Artificial Intelligence, Wuhan Institute of Technology, Wuhan, China; Institute of Information Technology, Anhui Vocational College of Defense Technology, Lu’An, Anhui, China; Centre for Computational Science and Mathematical Modelling, Coventry University, Coventry, U.K.; Zhejiang Industry and Trade Vocational College, Wenzhou, China; School of Artificial Intelligence, Jianghan University, Wuhan, China; Computer School, Hubei University of Arts and Science, Xiangyang, China

Title: Double-Graph Representation With Relational Enhancement for Emotion-Cause Pair Extraction

Abstract:
The emotion-cause pair extraction (ECPE) task is to simultaneously extract emotions and causes as pairs (EC-pairs) from documents, which is important for natural language processing. Previous research tackled this task via a two-step approach, which first predicts separately the emotion and cause clauses, and then pairs them up by using a binary classifier. However, such a two-step approach may suffer from the possible propagation of errors, and it neglects the interaction between emotions and causes. In this article, an end-to-end double-graph method with relational enhancement (DGRE) is proposed to stimulate two relationship modes among clauses, i.e., semantic dependence and logical dependence. First, two united graph encoders are established to embed the semantic dependence into the representation of clauses and pairs. The first encoder is built on graph attention networks (GATs) for clause-level representation, the result of which is used by a relational graph convolutional network (RGCN) for the refinement of pair-level representation. Aiming to enhance the fitting ability of logical dependence, the emotion-type classification task is introduced into the multitask learning framework of GATs, which can effectively distinguish the logical relations between clauses according to their emotion types. Moreover, seven types of dependence relations have been designed for the node connections in RGCN, which emphasize the contextual interaction and clustering among neighboring nodes. Experiments on a benchmark Chinese corpus demonstrate that the proposed DGRE approach could effectively establish the communication mechanism between clauses and pairs from multiple perspectives, and comparisons with state-of-the-art (SOTA) models well validate its effectiveness.

PaperID: 554,

Authors: Lihong Pei, Yang Cao, Yu Kang, Zhenyi Xu, Qianming Liu

Affiliations: Department of Automation, University of Science and Technology of China, Hefei, China; Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China

Title: Spatiotemporal Imputation of Traffic Emissions With Self-Supervised Diffusion Model

Abstract:
The comprehensive regulatory oversight of traffic emissions frequently encounters the missing not-at-random (MNAR) pattern, characterized by the long-term block missing in adjacent road segments, arising from insufficient monitoring points and nonuniform spatiotemporal distribution. The spatiotemporal block missing simultaneously disrupts the spatiotemporal correlation, introducing significant biases in spatiotemporal modeling for incomplete data. The emerging diffusion model recovers the information of the missing regions in a self-supervised manner and focuses on the generation process of the missing regions to address biases. However, the dynamics and spatiotemporal heterogeneity of traffic emissions limit its applicability in unknown spatiotemporal missing. To address this issue, this article proposes a novel progressive Diffusion Model-based framework for SpatioTemporal Imputation of traffic emissions (STI-dm). Specifically, a self-supervised masked training strategy is first devised to construct the nonlocal similarity prior of traffic emission data, explicitly introducing the MNAR missing mechanism for the diffusion process. Furthermore, an enhanced approach of noise injection and supervised denoising is adopted to rectify misconceptions of nonlocal alignment, decreasing modeling biases associated with incomplete data in the generation process. The imputation and prior modeling processes are progressively performed until obtaining stable results, and each of the preceding modeling processes benefits from the gradual improvement results in the other. Experimental evidence indicates that STI-dm surpasses the current state-of-the-art algorithms in scenarios with intricate spatiotemporal patterns and varying rates of missing data.

PaperID: 555,

Authors: Basit Alawode, Sajid Javed

Affiliations: Department of Computer Science, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates

Title: Learning Spatial-Temporal Regularized Tensor Sparse RPCA for Background Subtraction

Abstract:
Background subtraction in videos is a core challenge in computer vision, aiming to accurately identify moving objects. Robust principal component analysis (RPCA) has emerged as a promising unsupervised (US) paradigm for this task, showing strong performance on various benchmark datasets. Building on RPCA, tensor RPCA (TRPCA) variants have further enhanced background subtraction performance. However, current TRPCA methods often treat moving object pixels independently, lacking spatial-temporal structured-sparsity constraints. This limitation leads to performance degradation in scenarios with dynamic backgrounds, camouflage, and camera jitter. In this work, we introduce a novel spatial-temporal regularized tensor sparse RPCA algorithm to address these issues. By incorporating normalized graph-Laplacian matrices into the sparse component, we enforce spatial-temporal regularization. We construct two graphs—one across spatial locations and another across temporal slices—to guide regularization. By maximizing our objective function, we ensure that the tensor sparse component aligns with the spatiotemporal eigenvectors of the graph-Laplacian matrices, preserving disconnected moving object pixels. We formulate a new objective function and employ batch and online-based optimization methods to jointly optimize background-foreground separation and spatial-temporal regularization. Experimental evaluation on six publicly available datasets demonstrates the superior performance of our algorithm compared to existing methods.

PaperID: 556,

Authors: Lauren Nicole Delong, Ramon Fernández Mir, Jacques D. Fleuriot

Affiliations: Artificial Intelligence and its Applications Institute, School of Informatics, University of Edinburgh, Edinburgh, U.K.

Title: Neurosymbolic AI for Reasoning Over Knowledge Graphs: A Survey

Abstract:
Neurosymbolic artificial intelligence (AI) is an increasingly active area of research that combines symbolic reasoning methods with deep learning to leverage their complementary benefits. As knowledge graphs (KGs) are becoming a popular way to represent heterogeneous and multirelational data, methods for reasoning on graph structures have attempted to follow this neurosymbolic paradigm. Traditionally, such approaches have utilized either rule-based inference or generated representative numerical embeddings from which patterns could be extracted. However, several recent studies have attempted to bridge this dichotomy to generate models that facilitate interpretability, maintain competitive performance, and integrate expert knowledge. Therefore, we survey methods that perform neurosymbolic reasoning tasks on KGs and propose a novel taxonomy by which we can classify them. Specifically, we propose three major categories: 1) logically informed embedding approaches; 2) embedding approaches with logical constraints; and 3) rule-learning approaches. Alongside the taxonomy, we provide a tabular overview of the approaches and links to their source code, if available, for more direct comparison. Finally, we discuss the unique characteristics and limitations of these methods and then propose several prospective directions toward which this field of research could evolve.

PaperID: 557,

Authors: Fanghui Zhang, Haiyue Zhu, Yigang Cen, Shichao Kan, Linna Zhang, Prahlad Vadakkepat, Tong Heng Lee

Affiliations: School of Artificial Intelligence, Henan University, Zhengzhou, Henan, China; Adaptive Robotics and Mechatronics Group, Singapore Institute of Manufacturing Technology (SIMTech), Agency for Science, Technology and Research (A*STAR), Fusionopolis, Singapore; Institute of Information Science, Beijing Jiaotong University, Beijing, China; School of Computer Science and Engineering, Central South University, Changsha, Hunan, China; School of Mechanical Engineering, Guizhou University, Guiyang, China; Department of Electrical and Computer Engineering, National University of Singapore, Cluny Road, Singapore

Title: Low-Shot Unsupervised Visual Anomaly Detection via Sparse Feature Representation

Abstract:
Visual anomaly detection is an essential component in modern industrial manufacturing. Existing studies using notions of pairwise similarity distance between a test feature and nominal features have achieved great breakthroughs. However, the absolute similarity distance lacks certain generalizations, making it challenging to extend the comparison beyond the available samples. This limitation could potentially hamper anomaly detection performance in scenarios with limited samples. This article presents a novel sparse feature representation anomaly detection (SFRAD) framework, which formulates the anomaly detection as a sparse feature representation problem; and notably proposes an anomaly score by orthogonal matching pursuit (ASOMP) as a novel detection metric. Specifically, SFRAD calculates the Gaussian kernel distance between the test feature and its sparse representation in the nominal feature space for anomaly detection. Here, the orthogonal matching pursuit (OMP) algorithm is adopted to achieve the sparse feature representation. Moreover, to construct a low-redundancy memory bank storing the basis features for sparse representation, a novel basis feature sampling (BFS) algorithm is proposed by considering both the maximum coverage and the optimum feature representation simultaneously. As a result, SFRAD incorporates both the advantages of absolute similarity and linear representation; and this enhances the generalization in low-shot scenarios. Extensive experiments on the MVTec anomaly detection (MVTec AD), Kolektor surface-defect dataset (KolektorSDD), Kolektor surface-defect dataset 2 (KolektorSDD2), MVTec logical constraints anomaly detection (MVTec LOCO AD), Visual anomaly (VISA), Modified national institute of standards and technology (MNIST), and CIFAR-10 datasets demonstrate that our proposed SFRAD outperforms the previous methods and achieves state-of-the-art unsupervised anomaly detection performance. Notably, significantly improved outcomes and results have also been achieved on low-shot anomaly detection. Code is available at https://github.com/fanghuisky/SFRAD.

PaperID: 558,

Authors: Ruikai Yang, Fan He, Mingzhen He, Jie Yang, Xiaolin Huang

Affiliations: Institute of Image Processing and Pattern Recognition, MOE Key Laboratory of System Control and Information Processing, Shanghai Jiao Tong University, Shanghai, China; Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Leuven, Belgium

Title: Decentralized Kernel Ridge Regression Based on Data-Dependent Random Feature

Abstract:
Random feature (RF) has been widely used for node consistency in decentralized kernel ridge regression (KRR). Currently, the consistency is guaranteed by imposing constraints on coefficients of features, necessitating that the RFs on different nodes are identical. However, in many applications, data on different nodes vary significantly on the number or distribution, which calls for adaptive and data-dependent methods that generate different RFs. To tackle the essential difficulty, we propose a new decentralized KRR algorithm that pursues consensus on decision functions, which allows great flexibility and well adapts data on nodes. The convergence is rigorously given, and the effectiveness is numerically verified: by capturing the characteristics of the data on each node, while maintaining the same communication costs as other methods, we achieved an average regression accuracy improvement of 25.5% across six real-world datasets.

PaperID: 559,

Authors: Shuaibing Zhu, Hong Sang, Kai Zhang, Fanchao Kong, Jinhu Lü

Affiliations: MOE-LCSM, School of Mathematics and Statistics, Hunan Normal University, Changsha, China; College of Marine Electrical Engineering, Dalian Maritime University, Dalian, China; Center for Control Theory and Guidance Technology, Harbin Institute of Technology, Harbin, China; School of Mathematics and Statistics, Anhui Normal University, Wuhu, Anhui, China; School of Automation Science and Electrical Engineering, State Key Laboratory of Software Development Environment, Beihang University, Beijing, China

Title: Synchronization of Intermittently Coupled Neural Networks With Coupling Delay

Abstract:
In recent years, the synchronization of coupled neural networks (CNNs) has been extensively studied. However, existing results heavily rely on assuming continuous couplings, overlooking the prevalence of intermittent couplings in reality. In this article, we address for the first time the synchronization challenge posed by intermittently CNNs (ICNNs) with coupling delay. To overcome the difficulties arising from intermittent couplings, we put forward a general piecewise delay differential inequality to characterize the dynamics during both coupled intervals and decoupled intervals. Based on the proposed inequality, we establish delay-independent synchronization criteria (DISCs) for ICNNs, enabling them to tackle general coupling delay. Notably, unlike previous studies, the achievement of synchronization in our approach does not rely on external control. Furthermore, for ICNNs that synchronize only under small delays, we formulate non-linear matrix inequality (LMI)-based delay-dependent synchronization criteria (DDSCs) that are computationally efficient and do not require delay differentiability. Finally, we provide illustrative examples to demonstrate our theoretical results.

PaperID: 560,

Authors: Geping Yang, Shusen Yang, Yiyang Yang, Xiang Chen, Can Chen, Zhiguo Gong, Zhifeng Hao

Affiliations: School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, China; College of Guangdong-Taiwan Industrial Science and Technology, Dongguan University of Technology, Dongguan, China; School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China; Department of Accounting and Information Management, University of Macau, Macau, China; Department of Computer and Information Science, State Key Laboratory of Internet of Things for Smart City, University of Macau, Macau, China; College of Engineering, Shantou University, Shantou, China

Title: SPGMVC: Multiview Clustering via Partitioning the Signed Prototype Graph

Abstract:
Multiview clustering (MVC) has been widely studied in machine learning and data mining for its capability of improving clustering performance by fusing the information from multiview data. In the past decade, a large number of MVC methods have made impressive progress, but most of them suffer from computational burdens, especially in large-scale tasks. Binary MVC (BMVC) is proposed to address this issue by representing the large-scale high-dimensional dataset as a group of consensus and low-dimensional binary codes. However, current BMVC-based approaches generate the clustering by executing binary k-means on the obtained binary codes, which fail to capture the embedded geometric information, leading to poor clustering performance. In addition, parameter selection is another “mission impossible” in unsupervised learning tasks including MVC. To tackle these challenges, a framework of multiview clustering via partitioning the signed prototype graph (SPGMVC) is proposed in this work. The SPGMVC framework offers several contributions. First, SPGMVC is designed as a unified framework for MVC. It combines effective technologies, such as consensus binary coding, code compression (CC), signed prototype graph (SPG) partitioning, and prototype-based cluster assignment. Second, SPGMVC partitions the signed graph (SG) based on the relationships between positive and negative edges. By capturing the underlying structure of the data, this partitioning strategy improves clustering accuracy (ACC). CC techniques are applied to reduce the graph’s scale, enabling further partitioning and enhancing computational efficiency. Third, SPGMVC employs an alternate minimizing strategy to efficiently handle the optimization problem. This strategy has nearly linear time and space complexity with respect to the data volume, making it suitable for large-scale tasks. Fourth, SPGMVC proposes an automatic parameter selection strategy, eliminating the need for extensive parameter exploration. Comprehensive experiments illustrate the superiority of our model. The implementation of SPGMVC is available at: https://github.com/gepingyang/PSGMVC.

PaperID: 561,

Authors: Jie Su, Peng Sun, Yuting Jiang, Zhenyu Wen, Fangda Guo, Yiming Wu, Zhen Hong, Haoran Duan, Yawen Huang, Rajiv Ranjan, Yefeng Zheng

Affiliations: Institute of Cyberspace Security and the College of Information Engineering, Zhejiang University of Technology, Hangzhou, Zhejiang, China; School of Engineering, West-lake University, Hangzhou, China; CAS Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; Department of Computer Science, Durham University, Durham, U.K.; Tencent Jarvis Lab, Shenzhen, China; School of Computing Science, Newcastle University, Newcastle upon Tyne, U.K.

Title: A Semantic-Consistent Few-Shot Modulation Recognition Framework for IoT Applications

Abstract:
The rapid growth of the Internet of Things (IoT) has led to the widespread adoption of the IoT networks in numerous digital applications. To counter physical threats in these systems, automatic modulation classification (AMC) has emerged as an effective approach for identifying the modulation format of signals in noisy environments. However, identifying those threats can be particularly challenging due to the scarcity of labeled data, which is a common issue in various IoT applications, such as anomaly detection for unmanned aerial vehicles (UAVs) and intrusion detection in the IoT networks. Few-shot learning (FSL) offers a promising solution by enabling models to grasp the concepts of new classes using only a limited number of labeled samples. However, prevalent FSL techniques are primarily tailored for tasks in the computer vision domain and are not suitable for the wireless signal domain. Instead of designing a new FSL model, this work suggests a novel approach that enhances wireless signals to be more efficiently processed by the existing state-of-the-art (SOTA) FSL models. We present the semantic-consistent signal pretransformation (ScSP), a parameterized transformation architecture that ensures signals with identical semantics exhibit similar representations. ScSP is designed to integrate seamlessly with various SOTA FSL models for signal modulation recognition and supports commonly used deep learning backbones. Our evaluation indicates that ScSP boosts the performance of numerous SOTA FSL models, while preserving flexibility.

PaperID: 562,

Authors: Pengfei Zhu, Jialu Li, Yu Wang, Bin Xiao, Jinglin Zhang, Wanyu Lin, Qinghua Hu

Affiliations: College of Intelligence and Computing, Tianjin University, Tianjin, China; Department of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China; School of Control Science and Engineering, Shandong University, Shandong, China; Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China

Title: Boosting Pseudo-Labeling With Curriculum Self-Reflection for Attributed Graph Clustering

Abstract:
Attributed graph clustering is an unsupervised learning task that aims to partition various nodes of a graph into distinct groups. Existing approaches focus on devising diverse pretext tasks to obtain suitable supervised information for representation learning, among which the predictive methods show great potential. However, these methods 1) generate auxiliary task bias toward the clustering target and 2) introduce label noise due to static thresholds. To address this issue, we propose a new self-supervised learning method, namely, pseudo-labeling with curriculum self-reflection (PLCSR), that learns reliable pseudo-labels by mining its information to achieve progressive processing of nodes in a self-reflection manner. First, a self-auxiliary encoder is constructed using the exponential moving average (EMA) of the original encoder’s parameters to replace the auxiliary tasks, which provides an additional perspective of finding highly confident pseudo-labels. Second, a curriculum selection strategy using dynamic thresholds is designed to take full advantage of graph nodes more accurately. Besides simple nodes with high confidence at the initial stage, nodes that yield consistent predictions from both encoders are then assigned pseudo-labels to avoid the under-learning problem. For the rest difficult nodes that are highly uncertain, we abstain from making judgments to minimize their adverse impact on the model. Extensive experiments have shown that PLCSR significantly outperforms the state-of-the-art predictive method CDRS, achieving more than 6% improvements in terms of clustering accuracy. The code is available at: https://github.com/Jillian555/PLCSR.

PaperID: 563,

Authors: Hang Shao, Lei Luo, Jianjun Qian, Mengkai Yan, Shangbing Gao, Jian Yang

Affiliations: PCA Laboratory, the Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, and Jiangsu Key Laboratory of Image and Video Understanding for Social Security, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China; Faculty of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian, China

Title: Video-Based Multiphysiological Disentanglement and Remote Robust Estimation for Respiration

Abstract:
Remote noncontact respiratory rate estimation by facial visual information has great research significance, providing valuable priors for health monitoring, clinical diagnosis, and anti-fraud. However, existing studies suffer from disturbances in epidermal specular reflections induced by head movements and facial expressions. Furthermore, diffuse reflections of light in the skin-colored subcutaneous tissue caused by multiple time-varying physiological signals independent of breathing are entangled with the intention of the respiratory process, leading to confusion in current research. To address these issues, this article proposes a novel network for natural light video-based remote respiration estimation. Specifically, our model consists of a two-stage architecture that progressively implements vital measurements. The first stage adopts an encoder-decoder structure to recharacterize the facial motion frame differences of the input video based on the gradient binary state of the respiratory signal during inspiration and expiration. Then, the obtained generative mapping, which is disentangled from various time-varying interferences and is only linearly related to the respiratory state, is combined with the facial appearance in the second stage. To further improve the robustness of our algorithm, we design a targeted long-term temporal attention module and embed it between the two stages to enhance the network’s ability to model the breathing cycle that occupies ultra many frames and to mine hidden timing change clues. We train and validate the proposed network on a series of publicly available respiration estimation datasets, and the experimental results demonstrate its competitiveness against the state-of-the-art breathing and physiological prediction frameworks.

PaperID: 564,

Authors: Shunxin Guo, Hongsong Wang, Shuxia Lin, Zhiqiang Kou, Xin Geng

Affiliations: School of Computer Science and Engineering, Southeast University, Nanjing, China

Title: Addressing Skewed Heterogeneity via Federated Prototype Rectification With Personalization

Abstract:
Federated learning (FL) is an efficient framework designed to facilitate collaborative model training across multiple distributed devices while preserving user data privacy. A significant challenge of FL is data-level heterogeneity, i.e., skewed or long-tailed distribution of private data. Although various methods have been proposed to address this challenge, most of them assume that the underlying global data are uniformly distributed across all clients. This article investigates data-level heterogeneity FL with a brief review and redefines a more practical and challenging setting called skewed heterogeneous FL (SHFL). Accordingly, we propose a novel federated prototype rectification with personalization (FedPRP) which consists of two parts: federated personalization and federated prototype rectification. The former aims to construct balanced decision boundaries between dominant and minority classes based on private data, while the latter exploits both interclass discrimination and intraclass consistency to rectify empirical prototypes. Experiments on three popular benchmarks show that the proposed approach outperforms current state-of-the-art methods and achieves balanced performance in both personalization and generalization.

PaperID: 565,

Authors: Haoran Wang, Zeshen Tang, Yaoru Sun, Fang Wang, Siyu Zhang, Yeming Chen

Affiliations: Department of Computer Science and Technology, College of Electronics and Information Engineering, Tongji University, Shanghai, China; Department of Computer Science, Brunel University London, Uxbridge, U.K.

Title: Guided Cooperation in Hierarchical Reinforcement Learning via Model-Based Rollout

Abstract:
Goal-conditioned hierarchical reinforcement learning (HRL) presents a promising approach for enabling effective exploration in complex, long-horizon reinforcement learning (RL) tasks through temporal abstraction. Empirically, heightened interlevel communication and coordination can induce more stable and robust policy improvement in hierarchical systems. Yet, most existing goal-conditioned HRL algorithms have primarily focused on the subgoal discovery, neglecting interlevel cooperation. Here, we propose a novel goal-conditioned HRL framework named Guided Cooperation via Model-Based Rollout (GCMR; code is available at https://github.com/HaoranWang-TJ/GCMR_ACLG_official), aiming to bridge interlayer information synchronization and cooperation by exploiting forward dynamics. First, the GCMR mitigates the state-transition error within off-policy correction via model-based rollout, thereby enhancing sample efficiency. Second, to prevent disruption by the unseen subgoals and states, lower level Q-function gradients are constrained using a gradient penalty with a model-inferred upper bound, leading to a more stable behavioral policy conducive to effective exploration. Third, we propose a one-step rollout-based planning, using higher level critics to guide the lower level policy. Specifically, we estimate the value of future states of the lower level policy using the higher level critic function, thereby transmitting global task information downward to avoid local pitfalls. These three critical components in GCMR are expected to facilitate interlevel cooperation significantly. Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of hierarchical reinforcement learning guided by landmarks (HIGL), namely, adjacency constraint and landmark-guided planning (ACLG), yields more stable and robust policy improvement compared with various baselines and significantly outperforms previous state-of-the-art (SOTA) algorithms.

PaperID: 566,

Authors: Yiheng Lu, Maoguo Gong, Kaiyuan Feng, Jialu Liu, Ziyu Guan, Hao Li

Affiliations: Key Laboratory of Collaborative Intelligence Systems of Ministry of Education, Xidian University, Xi’an, China

Title: SAAF: Self-Adaptive Attention Factor-Based Taylor-Pruning on Convolutional Neural Networks

Abstract:
Nowadays, pruning techniques have drawn attention to convolutional neural networks (CNNs) for reducing the consumption of computation resources. In particular, the Taylor-based method simplifies the evaluation of importance for each filter as the product of the gradient and weight value of the output features, which outperforms other methods in reductions of parameters and floating point operations (FLOPs). However, the Taylor-based method sacrifices too much accuracy when the overall pruning rate is relatively large compared with other pruning algorithms. In this article, we propose a self-adaptive attention factor (SAAF) to improve the performance of the slimmed model when conventional Taylor-based pruning is utilized under higher pruning. Specifically, SAAF can be calculated by leveraging the remaining ratio of filters at the early pruning stage of the Taylor-based method, and then, some pruned filters can be recovered for improving the accuracy of the slimmed model in terms of SAAF. It means that SAAF can protect filters from being overslimmed to eliminate the degeneration of Taylor-based pruning when the pruning rate is large as well as can compress models apparently across various datasets. We test the efficiency of SAAF on VGG-16 and ResNet-50 with CIFAR-10, Tiny-ImageNet, ImageNet-1000, and remote sensing images. Our method outperforms the traditional Taylor-based method obviously in accuracy, and there are only tiny sacrifices in the reduction of parameters and FLOPs, which is better than other pruning methods.

PaperID: 567,

Authors: Qiuping Jiang, Jinguang Cheng, Zongwei Wu, Runmin Cong, Radu Timofte

Affiliations: Faculty of Information Science and Engineering, Ningbo University, Ningbo, China; Computer Vision Laboratory, IFI and CAIDAS, University of Würzburg, Würzburg, Germany; School of Control Science and Engineering, Shandong University, Jinan, China

Title: High-Precision Dichotomous Image Segmentation With Frequency and Scale Awareness

Abstract:
Dichotomous image segmentation (DIS) with rich fine-grained details within a single image is a challenging task. Despite the plausible results achieved by deep learning-based methods, most of them fail to segment generic objects when the boundary is cluttered with the background. In fact, the gradual decrease in feature map resolution during the encoding stage and the misleading texture clue may be the main issues. To handle these issues, we devise a novel frequency- and scale-aware deep neural network (FSANet) for high-precision DIS. The core of our proposed FSANet is twofold. First, a multimodality fusion (MF) module that integrates the information in spatial and frequency domains is adopted to enhance the representation capability of image features. Second, a collaborative scale fusion module (CSFM) which deviates from the traditional serial structures is introduced to maintain high resolution during the entire feature encoding stage. In the decoder side, we introduce hierarchical context fusion (HCF) and selective feature fusion (SFF) modules to infer the segmentation results from the output features of the CSFM module. We conduct extensive experiments on several benchmark datasets and compare our proposed method with existing state-of-the-art (SOTA) methods. The experimental results demonstrate that our FSANet achieves superior performance both qualitatively and quantitatively. The code will be made available at https://github.com/chasecjg/FSANet.

PaperID: 568,

Authors: Songtao Li, Shiqian Wu, Chang Tang, Junchi Zhang, Zushuai Wei

Affiliations: State Key Laboratory of Precision Blasting, School of Artificial Intelligence, Jianghan University, Wuhan, China; Institute of Robotics and Intelligent Systems, School of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan, China; School of Computer Science, China University of Geosciences, Wuhan, China

Title: Robust Nonnegative Matrix Factorization With Self-Initiated Multigraph Contrastive Fusion

Abstract:
Graph regularized nonnegative matrix factorization (GNMF) has been widely used in data representation due to its excellent dimensionality reduction. When it comes to clustering polluted data, GNMF inevitably learns inaccurate representations, leading to models that are unusually sensitive to outliers in the data. For example, in a face dataset, obscured by items such as a mask or glasses, there is a high probability that the graph regularization term incorrectly describes the association relationship for that sample, resulting in an incorrect elicitation in the matrix factorization process. In this article, a novel self-initiated unsupervised subspace learning method named robust nonnegative matrix factorization with self-initiated multigraph contrastive fusion (RNMF-SMGF) is proposed. RNMF-SMGF is capable of creating samples with different angles and learning different graph structures based on these different angles in a self-initiated method without changing the original data. In the process of subspace learning guided by graph regularization, these different graph structures are fused into a more accurate graph structure, along with entropy regularization, L_2,1/2 -norm constraints to facilitate the robust learning of the proposed model and the formation of different clusters in the low-dimensional space. To demonstrate the effectiveness of the proposed model in robust clustering, we have conducted extensive experiments on several benchmark datasets and demonstrated the effectiveness of the proposed method. The source code is available at: https://github.com/LstinWh/RNMF-SMGF/.

PaperID: 569,

Authors: Young Jae Lee, Jaehoon Kim, Youngjoon Park, Mingu Kwak, Seoung Bum Kim

Affiliations: Department of Industrial and Management Engineering, Korea University, Seoul, Republic of Korea; LG AI Research, Seoul, Republic of Korea; School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA

Title: Masked and Inverse Dynamics Modeling for Data-Efficient Reinforcement Learning

Abstract:
In pixel-based deep reinforcement learning (DRL), learning representations of states that change because of an agent’s action or interaction with the environment poses a critical challenge in improving data efficiency. Recent data-efficient DRL studies have integrated DRL with self-supervised learning (SSL) and data augmentation to learn state representations from given interactions. However, some methods have difficulties in explicitly capturing evolving state representations or in selecting data augmentations for appropriate reward signals. Our goal is to explicitly learn the inherent dynamics that change with an agent’s intervention and interaction with the environment. We propose masked and inverse dynamics modeling (MIND), which uses masking augmentation and fewer hyperparameters to learn agent-controllable representations in changing states. Our method is comprised of a self-supervised multitask learning that leverages a transformer architecture, which captures the spatiotemporal information underlying in the highly correlated consecutive frames. MIND uses two tasks to perform self-supervised multitask learning: masked modeling and inverse dynamics modeling. Masked modeling learns the static visual representation required for control in the state, and inverse dynamics modeling learns the rapidly evolving state representation with agent intervention. By integrating inverse dynamics modeling as a complementary component to masked modeling, our method effectively learns evolving state representations. We evaluate our method by using discrete and continuous control environments with limited interactions. MIND outperforms previous methods across benchmarks and significantly improves data efficiency. The code is available at https://github.com/dudwojae/MIND.

PaperID: 570,

Authors: Chenyuan Feng, Daquan Feng, Guanxin Huang, Zuozhu Liu, Zhenzhong Wang, Xiang-Gen Xia

Affiliations: Guangdong Provincial Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Digital Creative Technology, Shenzhen University, Shenzhen, China; Research and Development Department, BYD Company Ltd., Shenzhen, China; Zhejiang University-University of Illinois at Urbana Champaign Institute, Zhejiang University, Haining, Zhejiang, China; Technical Management Center, China Media Group, Beijing, China; Department of Electrical and Computer Engineering, University of Delaware, Newark, DE, USA

Title: Robust Privacy-Preserving Recommendation Systems Driven by Multimodal Federated Learning

Abstract:
Recommendation system (RS) is an important information filtering tool in nowadays digital era. With the growing concern on privacy, deploying RSs in a federated learning (FL) manner emerges as a promising solution, which can train a high-quality model on the premise that the server does not directly access sensitive user data. Nevertheless, some malicious clients can deduce user data by analyzing the uploaded model parameters. Even worse, some Byzantine clients can also send contaminated data to the server, causing blockage or failure of model convergence. In addition, most existing researches on federated recommendation algorithms only focus on unimodality learning, ignoring the assistance of multiple modality data to promote recommendation accuracy. Therefore, this article designs an FL-based privacy-preserving multimodal RS framework. To distinguish various modality data, an attention mechanism is introduced, wherein different weight ratios are assigned to various modal features. To further strengthen the privacy, local differential privacy (LDP) and personalized FL strategies are designed to identify malicious clients and bolster the resilience against Byzantine attacks. Finally, two multimodal datasets are established to verify the effectiveness of the proposed algorithm. The superiority of our proposed techniques is confirmed by the simulation results.

PaperID: 571,

Authors: Xiaolin Zhu, Dongli Wang, Jianxun Li, Rui Su, Qin Wan, Yan Zhou

Affiliations: School of Mathematics and Computational Science, Xiangtan University, Xiangtan, China; School of Automation and Electronics Information, Xiangtan University, Xiangtan, China; School of Automation, Shanghai Jiao Tong University, Shanghai, China; Shanghai AI Laboratory, Shanghai, China; School of Electrical and Information Engineering, Hunan Institute of Engineering, Xiangtan, China

Title: Dynamical Attention Hypergraph Convolutional Network for Group Activity Recognition

Abstract:
Recently, group activity recognition (GAR) has drawn growing interests in video analysis and computer vision communities. The current models of GAR tasks are often impractical in that they suppose that all interactions between actors are pairwise, which only models and leverages part of the information in real entire interactions. Motivated by this, we design a distinct dynamical attention hypergraph convolutional network framework, referred to as DAHGCN, for precise GAR, modeling the entire interactions and capturing the high-order relationships among involved actors in a real-life scenario. Specifically, to learn complementary feature representations for fine-grained GAR, a multilevel feature descriptor (MLFD) module is proposed. Furthermore, for learning higher order interaction relationships, we construct a DAHGCN to accommodate complex group interactions, which can dynamically change the topology of the hypergraph and learn these key representations by virtue of the “similarity-based shared nearest-neighbor (SSNN) clustering” and “attention mechanisms” on hypergraph. Finally, a multiscale temporal convolution (MSTC) module is utilized to explore various long-range temporal dynamic correlations across different frames. In addition, comprehensive experiments on three commonly used GAR datasets clearly demonstrate that, when compared with the state-of-the-art methods, our proposed method can achieve the most optimal performance.

PaperID: 572,

Authors: Yi Zhang, Fengyu Tian, Chuan Ma, Miaomiao Li, Hengfu Yang, Zhe Liu, En Zhu, Xinwang Liu

Affiliations: School of Computer, National University of Defense Technology, Changsha, China; School of Computer Science, Chongqing University, Chongqing, China; College of Electronic Information and Electrical Engineering, Changsha University, Changsha, Hunan, China; School of Computer Science, Hunan First Normal University, Changsha, China; Zhejiang Lab, Hangzhou, China

Title: Regularized Instance Weighting Multiview Clustering via Late Fusion Alignment

Abstract:
Multiview clustering has become a prominent research topic in data analysis, with wide-ranging applications across various fields. However, the existing late fusion multiview clustering (LFMVC) methods still exhibit some limitations, including variable importance and contributions and a heightened sensitivity to noise and outliers during the alignment process. To tackle these challenges, we propose a novel regularized instance weighting multiview clustering via late fusion alignment (R-IWLF-MVC), which considers the instance importance from various views, enabling information integration to be more effective. Specifically, we assign each sample an importance attribute to enable the learning process to focus more on the key sample nodes and avoid being influenced by noise or outliers, while laying the groundwork for the fusion of different views. In addition, we continue to employ late fusion alignment to integrate base clustering from various views and introduce a new regularization term with prior knowledge to ensure that the learning process does not deviate too much from the expected results. After that, we design a three-step alternating optimization strategy with proven convergence for the resultant problem. Our proposed approach has been extensively evaluated on multiple real-world datasets, demonstrating its superiority to state-of-the-art methods.

PaperID: 573,

Authors: Jiao Liu, Xinghua Li, Ximeng Liu, Haiyan Zhang, Yinbin Miao, Robert H. Deng

Affiliations: State Key Laboratory of Integrated Services Networks and the School of Cyber Engineering, Xidian University, Xi’an, China; College of Computer and Data Science, Fuzhou University, Fuzhou, China; School of Information Systems, Singapore Management University, Victoria St, Singapore

Title: DefendFL: A Privacy-Preserving Federated Learning Scheme Against Poisoning Attacks

Abstract:
Federated learning (FL) has become a popular mode of learning, allowing model training without the need to share data. Unfortunately, it remains vulnerable to privacy leakage and poisoning attacks, which compromise user data security and degrade model quality. Therefore, numerous privacy-preserving frameworks have been proposed, among which mask-based framework has certain advantages in terms of efficiency and functionality. However, it is more susceptible to poisoning attacks from malicious users, and current works lack practical means to detect such attacks within this framework. To overcome this challenge, we present DefendFL, an efficient, privacy-preserving, and poisoning-detectable mask-based FL scheme. We first leverage collinearity mask to protect users’ gradient privacy. Then, cosine similarity is utilized to detect masked gradients to identify poisonous gradients. Meanwhile, a verification mechanism is designed to detect the mask, ensuring the mask’s validity in aggregation and preventing poisoning attacks by intentionally changing the mask. Finally, we resist poisoning attacks by removing malicious gradients or lowering their weights in aggregation. Through security analysis and experimental evaluation, DefendFL can effectively detect and mitigate poisoning attacks while outperforming existing privacy-preserving detection works in efficiency.

PaperID: 574,

Authors: Shengjia Wang, Zhiguo Wang, Xi-Le Zhao, Xiaojing Shen

Affiliations: College of Mathematics, Sichuan University, Chengdu, Sichuan, China; School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, China

Title: An Unbalanced Optimal Transport-Based Approach for Robust Dictionary Learning

Abstract:
Dictionary learning (DL) is a pivotal task in machine learning and signal processing, involving extracting representative features from a given dataset. However, conventional DL models are known to be highly sensitive to outliers. To circumvent this issue, we introduce a new and robust DL model based on unbalanced optimal transport (UOT). Compared to DL models based on conventional robust distances and the Wasserstein distance, our model not only captures and leverages the structural information within the data but also demonstrates strong resilience to outliers. By employing the structure of the proposed robust DL model, we develop a novel hybrid block coordinate descent (BCD) algorithm. The proposed algorithm maintains computational tractability by exploiting special block structures of the subproblems. In addition, we establish the convergence of our algorithm without the Lipschitz smooth condition. Through extensive experimentation, we validate our theoretical results and demonstrate the effectiveness of the proposed method on synthetic data, MNIST data, Olivetti faces dataset, and hyperspectral images (HSIs) datasets.

PaperID: 575,

Authors: Sanket R. Jantre, Shrijita Bhattacharya, Tapabrata Maiti

Affiliations: Michigan State University, East Lansing, MI, USA

Title: Spike-and-Slab Shrinkage Priors for Structurally Sparse Bayesian Neural Networks

Abstract:
Network complexity and computational efficiency have become increasingly significant aspects of deep learning. Sparse deep learning addresses these challenges by recovering a sparse representation of the underlying target function by reducing heavily overparameterized deep neural networks. Specifically, deep neural architectures compressed via structured sparsity (e.g., node sparsity) provide low-latency inference, higher data throughput, and reduced energy consumption. In this article, we explore two well-established shrinkage techniques, Lasso and Horseshoe, for model compression in Bayesian neural networks (BNNs). To this end, we propose structurally sparse BNNs, which systematically prune excessive nodes with the following: 1) spike-and-slab group Lasso (SS-GL) and 2) SS group Horseshoe (SS-GHS) priors, and develop computationally tractable variational inference, including continuous relaxation of Bernoulli variables. We establish the contraction rates of the variational posterior of our proposed models as a function of the network topology, layerwise node cardinalities, and bounds on the network weights. We empirically demonstrate the competitive performance of our models compared with the baseline models in prediction accuracy, model compression, and inference latency.

PaperID: 576,

Authors: Zhengdao Shao, Liansheng Zhuang, Yihong Huang, Houqiang Li, Shafei Wang

Affiliations: School of Information Science and Technology, University of Science and Technology of China (USTC), Hefei, Anhui, China; Pengcheng Laboratory, Shenzhen, China

Title: Purified Policy Space Response Oracles for Symmetric Zero-Sum Games

Abstract:
Policy space response oracles (PSRO) is a promising tool to find an approximate Nash equilibrium (NE) in a two-player zero-sum game. It solves the equilibrium by iteratively expanding a small-scale meta-game formed by a restricted strategy population consisting of historical approximate best responses of the meta-games. However, since these best responses have a strong correlation with each other, existing PSRO and its variants often have the slow diversity growth of the strategy population, and thus suffer from poor exploration efficiency and slow convergence rate. To address this problem, this article proposes Purified PSRO, which deliberately maintains a pure strategy population formed by pure strategy bases of approximate best responses. A novel module namely non-best response suppression (NBRS) is introduced to calculate a pure strategy base with better orthogonality to expand the strategy population at each epoch. In this way, Purified PSRO can quickly increase the diversity of the strategy population, thus greatly enhance the efficiency of exploration. Theoretically, we prove the convergence of Purified PSRO. Moreover, we introduce an early stop module to reduce computation cost, and give the upper bound of the exploitability when the algorithm stops early. Extensive experiments on random games of skill (RGoS) and real-world meta-games show that Purified PSRO can consistently outperform existing SOTA methods, sometimes with a large margin.

PaperID: 577,

Authors: Xiyou Fu, Xi Zhou, Yawen Fu, Pan Liu, Sen Jia

Affiliations: College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China

Title: Progressive Semantic Enhancement Network for Hyperspectral and LiDAR Classification

Abstract:
The joint classification of hyperspectral image (HSI) and light detection and ranging (LiDAR) data is gaining attention for its improved classification accuracy. However, effectively integrating the rich spectral information of HSI and the elevation features of LiDAR has remained a challenge in multimodal fusion. This article proposes a novel approach called progressive semantic enhancement network (PSENet) for hyperspectral and LiDAR classification based on a progressive joint spatial-spectral attention mechanism. PSENet mainly comprises two modules: the spatial grouping constraint (SAGC) module and the spectral weighting constraint (SEWC) module. The SAGC module extracts multiscale features in the spatial domain, while the SEWC module focuses on enhancing semantic features in spectral dimension. By gradually utilizing spatial and spectral constraint modules to progressively enhance feature extraction, PSENet integrates affluent information for a more refined classification of ground objects. Based on experimental results, it has been demonstrated that PSENet outperforms several most advanced methods on three datasets. The SAGC and SEWC modules proposed in PSENet enable the effective integration of the spatial, spectral, and elevation information from HSI and LiDAR, providing a promising way to perform classification more accurately. The source codes of this work will be publicly available at http://szu-hsilab.com/.

PaperID: 578,

Authors: Ruohuan Fang, Guansong Pang, Wenjun Miao, Xiao Bai, Jin Zheng, Xin Ning

Affiliations: School of Computer Science and Engineering, Jiangxi Research Institute, Beihang University, Beijing, China; School of Computing and Information Systems, Singapore Management University, Bras Basah, Singapore; Laboratory of Artificial Neural Networks and High Speed Circuits, Institute of Semiconductors,Chinese Academy of Sciences, Beijing, China

Title: Unsupervised Recognition of Unknown Objects for Open-World Object Detection

Abstract:
Open-world object detection (OWOD) extends object detection problem to a realistic and dynamic scenario, where a detection model is required to be capable of detecting both known and unknown objects and incrementally learning newly introduced knowledge. Current OWOD models detect the unknowns that exhibit similar features to the known objects, but they suffer from a severe label bias problem, i.e., they tend to detect all regions (including unknown object regions) that are dissimilar to the known objects as part of the background. To eliminate the label bias, this article proposes a novel module, namely reconstruction error-based Weibull (REW) model, that learns an unsupervised discriminative model for recognizing true unknown objects based on prior knowledge of object occurrence frequency via Weibull modeling. The resulting model can be further refined by another module of our method, called REW-enhanced object localization network (ROLNet), which iteratively extends pseudo-unknown objects to the unlabeled regions. Experimental results show that our method 1) significantly outperforms the prior SOTA in detecting unknown objects while maintaining competitive performance of detecting known object classes on the MS COCO dataset and 2) achieves better generalization ability on the LVIS and Objects365 datasets. Code is available at https://github.com/frh23333/mepu-owod

PaperID: 579,

Authors: Wenmian Yang, Lizhi Cheng, Mohamed Ragab, Min Wu, Sinno Jialin Pan, Zhenghua Chen

Affiliations: Institute of Artificial Intelligence and Future Networks, Beijing Normal University (BNU), Zhuhai, China; China Telecom Cloud Technology Company Ltd., Shanghai, China; Technology and Research, Institute for Infocomm Research (IR), Agency for Science, Fusionopolis, Singapore; Department of Computer Science and Engineering, The Chinese University of Hong Kong (CUHK), Shatin, NT, Hong Kong

Title: A Virtual-Label-Based Hierarchical Domain Adaptation Method for Time-Series Classification

Abstract:
Unsupervised domain adaptation (UDA) is becoming a prominent solution for the domain-shift problem in many time-series classification tasks. With sequence properties, time-series data contain both local and sequential features, and the domain shift exists in both features. However, conventional UDA methods usually cannot distinguish those two features but mix them into one variable for direct alignment, which harms the performance. To address this problem, we propose a novel virtual-label-based hierarchical domain adaptation (VLH-DA) approach for time-series classification. Specifically, we first slice the original time-series data and introduce virtual labels to represent the type of each slice (called local patterns). With the help of virtual labels, we decompose the end-to-end (i.e., signal to time-series label) time-series task into two parts, i.e., signal sequence to local pattern sequence and local pattern sequence to time-series label. By decomposing the complex time-series UDA task into two simpler subtasks, the local features and sequential features can be aligned separately, making it easier to mitigate distribution discrepancies. Experiments on four public time-series datasets demonstrate that our VLH-DA outperforms all state-of-the-art (SOTA) methods.

PaperID: 580,

Authors: Lei Zhu, Xinliang Zhang, Hangzhou He, Qian Chen, Sha Li, Shuang Zeng, Yibao Zhang, Qiushi Ren, Yanye Lu

Affiliations: Department of Biomedical Engineering, Institute of Medical Technology, the Health Science Center, National Biomedical Imaging Center, Peking University, Beijing, China; Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital and Institute, Beijing, China

Title: Branches Mutual Promotion for End-to-End Weakly Supervised Semantic Segmentation

Abstract:
End-to-end weakly supervised semantic segmentation (E2E-WSSS) aims at optimizing a segmentation model in a single-stage training process based on only image annotations. Existing methods adopt an online-trained classification branch to provide pseudo annotations for supervising the segmentation branch. However, this strategy makes the classification branch dominate the whole concurrent training process, hindering these two branches from assisting each other. In our work, we treat these two branches equally by viewing them as diverse ways to generate the segmentation map, and add interactions on both their supervision and operation to achieve mutual promotion. For this purpose, a bidirectional supervision mechanism is elaborated to force the consistency between the outputs of these two branches. Thus, the segmentation branch can also give feedback to the classification branch to enhance the quality of localization seeds. Moreover, our method also designs interaction operations between these two branches to exchange their knowledge to assist each other. Experiments indicate our work outperforms existing end-to-end weakly supervised segmentation methods. Codes are available at https://github.com/zh460045050/BMP-WSSS.

PaperID: 581,

Authors: Mengke Lian, Zhenyuan Guo, Xiaoxuan Wang, Shiping Wen, Tingwen Huang

Affiliations: School of Mathematics, Hunan University, Changsha, China; Research Institute of HNU in Chongqing, Chongqing, China; School of Automation, Nanjing University of Information Science and Technology, Nanjing, China; Australian AI Institute, Faculty of Engineering Information Technology, University of Technology Sydney, Ultimo, Australia; Science Program, Texas A&M University at Qatar, Doha, Qatar

Title: Distributed Algorithms for Linear Equations Over General Directed Networks

Abstract:
This article deals with linear equations of the form Ax = b . By reformulating the original problem as an unconstrained optimization problem, we first provide a gradient-based distributed continuous-time algorithm over weight-balanced directed graphs, in which each agent only knows partial rows of the augmented matrix (A\; b) . The algorithm is also applicable to time-varying networks. By estimating a right-eigenvector corresponding to 0 eigenvalue of the out-Laplacian matrix in finite time, we further propose a distributed algorithm over weight-unbalanced communication networks. It is proved that each solution of the designed algorithms converges exponentially to an equilibrium point. Moreover, the convergence rate is given out clearly. For linear equations without solution, these algorithms are used to obtain a least-squares solution in approximate sense. These theoretical results are illustrated by four numerical examples.

PaperID: 582,

Authors: Kang Liu, Zhenhua Huang, Chang-Dong Wang, Beibei Gao, Yunwen Chen

Affiliations: School of Computer Science, South China Normal University, Guangzhou, China; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China; School of Philosophy and Social Development, South China Normal University, Guangzhou, China; Research and Development Department, DataGrand Inc., Shanghai, China

Title: Fine-Grained Learning Behavior-Oriented Knowledge Distillation for Graph Neural Networks

Abstract:
Knowledge distillation (KD), as an effective compression technology, is used to reduce the resource consumption of graph neural networks (GNNs) and facilitate their deployment on resource-constrained devices. Numerous studies exist on GNN distillation, and however, the impacts of knowledge complexity and differences in learning behavior between teachers and students on distillation efficiency remain underexplored. We propose a KD method for fine-grained learning behavior (FLB), comprising two main components: feature knowledge decoupling (FKD) and teacher learning behavior guidance (TLBG). Specifically, FKD decouples the intermediate-layer features of the student network into two types: teacher-related features (TRFs) and downstream features (DFs), enhancing knowledge comprehension and learning efficiency by guiding the student to simultaneously focus on these features. TLBG maps the teacher model’s learning behaviors to provide reliable guidance for correcting deviations in student learning. Extensive experiments across eight datasets and 12 baseline frameworks demonstrate that FLB significantly enhances the performance and robustness of student GNNs within the original framework.

PaperID: 583,

Authors: Simiao Lai, Chang Liu, Dong Wang, Huchuan Lu

Affiliations: School of Information and Communication Engineering, Dalian University of Technology, Dalian, China

Title: Refocus the Attention for Parameter-Efficient Thermal Infrared Object Tracking

Abstract:
Introducing deep trackers to thermal infrared (TIR) tracking is hampered by the scarcity of large training datasets. To alleviate the predicament, a common approach is full fine-tuning (FFT) based on pretrained RGB parameters. Nevertheless, due to its inefficient training pattern and representation collapse risk, some parameter-efficient fine-tuning (PEFT) alternatives have been promoted recently. However, the existing PEFT algorithms typically follow a bottom-up way, where their attention solely relies on the input and lacks the capability of task-guided top-down attention, which provides the task-relevant representation such as the human visual perception system. In this article, we introduce ReFocus, a new PEFT method that adapts the pretrained RGB foundation tracking model to the downstream TIR tracking task through the guidance of high-level task-specific signals in a top-down attention manner. By freezing the entire foundation model and only training query-guided feature selection and top-down blocks, ReFocus achieves state-of-the-art (SOTA) TIR tracking performance while keeping training efficiency. Extensive experiments on five TIR tracking benchmarks demonstrate that ReFocus significantly improves the performance of the foundation tracker. Besides, further ablation studies show the effectiveness and flexible adaptability of the proposed method to lighter foundation models and different tracking frameworks. Compared to FFT and other bottom-up PEFT paradigms, such as head probe, low-rank adaptation (LoRA), and adapter, our method achieves comparable or superior performance with fewer training parameters and reveals the advantage of learning stability.

PaperID: 584,

Authors: Zhengrong Xiang, Pingchuan Li, Wencheng Zou, Choon Ki Ahn

Affiliations: School of Automation, Nanjing University of Science and Technology, Nanjing, China; School of Electrical Engineering, Korea University, Seoul, South Korea

Title: Data-Based Optimal Switching and Control With Admissibility Guaranteed Q-Learning

Abstract:
This article addresses the data-based optimal switching and control codesign for discrete-time nonlinear switched systems via a two-stage approximate dynamic programming (ADP) algorithm. Through offline policy improvement and policy evaluation, the proposed algorithm iteratively determines the optimal hybrid control policy using system input/output data. Moreover, a strict proof of the convergence is given for the two-stage ADP algorithm. Admissibility, an essential property of the hybrid control policy must be ensured for practical application. To this end, the properties of the hybrid control policies are analyzed and an admissibility criterion is obtained. To realize the proposed Q-learning algorithm, an actor-critic neural network (NN) structure that employs multiple NNs to approximate the Q-functions and control policies for different subsystems is adopted. By applying the proposed admissibility criterion, the obtained hybrid control policy is guaranteed to be admissible. Finally, two numerical simulations verify the effectiveness of the proposed algorithm.

PaperID: 585,

Authors: Xun Shen, Tinghui Ouyang, Kazumune Hashimoto, Yuhu Wu

Affiliations: Graduate School of Engineering, Osaka University, Suita, Osaka, Japan; Center for Computational Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan; School of Control Science and Engineering, Dalian University of Technology, Dalian, China

Title: Sample-Based Continuous Approximate Method for Constructing Interval Neural Network

Abstract:
In safety-critical engineering applications, such as robust prediction against adversarial noise, it is necessary to quantify neural networks’ uncertainty. Interval neural networks (INNs) are effective models for uncertainty quantification, giving an interval of predictions instead of a single value for a corresponding input. This article formulates the problem of training an INN as a chance-constrained optimization problem. The optimal solution of the formulated chance-constrained optimization naturally forms an INN that gives the tightest interval of predictions with a required confidence level. Since the chance-constrained optimization problem is intractable, a sample-based continuous approximate method is used to obtain approximate solutions to the chance-constrained optimization problem. We prove the uniform convergence of the approximation, showing that it gives the optimal INN consistently with the original ones. Additionally, we investigate the reliability of the approximation with finite samples, giving the probability bound for violation with finite samples. Through a numerical example and an application case study of anomaly detection in wind power data, we evaluate the effectiveness of the proposed INN against existing approaches, including Bayesian neural networks, highlighting its capability to significantly improve the performance of applying INNs for regression and unsupervised anomaly detection.

PaperID: 586,

Authors: Yamin Sepehri, Pedram Pad, Ahmet Caner Yüzügüler, Pascal Frossard, L. Andrea Dunbar

Affiliations: Centre Suisse d’Electronique et de Microtechnique (CSEM), Neuchâtel, Switzerland; Signal Processing Laboratory (LTS), École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland

Title: Hierarchical Training of Deep Neural Networks Using Early Exiting

Abstract:
Deep neural networks (DNNs) provide state-of-the-art accuracy for vision tasks, but they require significant resources for training. Thus, they are trained on cloud servers far from the edge devices that acquire the data. This issue increases communication cost, runtime, and privacy concerns. In this study, a novel hierarchical training method for DNNs is proposed that uses early exits in a divided architecture between edge and cloud workers to reduce the communication cost, training runtime, and privacy concerns. The method proposes a brand-new use case for early exits to separate the backward pass of neural networks between the edge and the cloud during the training phase. We address the issues of most available methods that, due to the sequential nature of the training phase, cannot train the levels of hierarchy simultaneously or they do it with the cost of compromising privacy. In contrast, our method can use both edge and cloud workers simultaneously, does not share the raw input data with the cloud, and does not require communication during the backward pass. Several simulations and on-device experiments for different neural network architectures demonstrate the effectiveness of this method. It is shown that the proposed method reduces the training runtime for VGG-16 and ResNet-18 architectures by 29% and 61% in CIFAR-10 classification and by 25% and 81% in Tiny ImageNet classification, respectively, when the communication with the cloud is done over a low bit rate channel. This gain in the runtime is achieved, while the accuracy drop is negligible. This method is advantageous for online learning of high-accuracy DNNs on sensor-holding low-resource devices such as mobile phones or robots as a part of an edge–cloud system, making them more flexible in facing new tasks and classes of data.

PaperID: 587,

Authors: Pengfei Zhu, Jialu Li, Zhe Dong, Qinghua Hu, Xiao Wang, Qilong Wang

Affiliations: College of Intelligence and Computing, Tianjin University, Tianjin, China; Baidu Company, Beijing, China; School of Software, Beihang University, Beijing, China

Title: CCP-GNN: Competitive Covariance Pooling for Improving Graph Neural Networks

Abstract:
Graph neural networks (GNNs) have advanced graph classification tasks, where a global pooling to generate graph representations by summarizing node features plays a critical role in the final performance. Most of the existing GNNs are built with a global average pooling (GAP) or its variants, which however, take no full consideration of node specificity while neglecting rich statistics inherent in node features, limiting classification performance of GNNs. Therefore, this article proposes a novel competitive covariance pooling (CCP) based on observation of graph structures, i.e., graphs generally can be identified by a (small) key part of nodes. To this end, our CCP generates node-level second-order representations to explore rich statistics inherent in node features, which are fed to a competitive-based attention module for effectively discovering key nodes through learning node weights. Subsequently, our CCP aggregates node-level second-order representations in conjunction with node weights by summation to produce a covariance representation for each graph, while an iterative matrix normalization is introduced to consider geometry of covariances. Note that our CCP can be flexibly integrated with various GNNs (namely CCP-GNN) to improve the performance of graph classification with little computational cost. The experimental results on seven graph-level benchmarks show that our CCP-GNN is superior or competitive to state-of-the-arts. Our code is available at https://github.com/Jillian555/CCP-GNN.

PaperID: 588,

Authors: Kailing Guo, Zhenquan Lin, Canyang Chen, Xiaofen Xing, Fang Liu, Xiangmin Xu

Affiliations: School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China; School of Internet finance and Information Engineering, Guangdong University of Finance, Guangzhou, China; Pazhou Laboratory, Guangzhou, China

Title: Compact Model Training by Low-Rank Projection With Energy Transfer

Abstract:
Low-rankness plays an important role in traditional machine learning but is not so popular in deep learning. Most previous low-rank network compression methods compress networks by approximating pretrained models and retraining. However, the optimal solution in the Euclidean space may be quite different from the one with low-rank constraint. A well-pretrained model is not a good initialization for the model with low-rank constraints. Thus, the performance of a low-rank compressed network degrades significantly. Compared with other network compression methods such as pruning, low-rank methods attract less attention in recent years. In this article, we devise a new training method, low-rank projection with energy transfer (LRPET), that trains low-rank compressed networks from scratch and achieves competitive performance. We propose to alternately perform stochastic gradient descent training and projection of each weight matrix onto the corresponding low-rank manifold. Compared to retraining on the compact model, this enables full utilization of model capacity since solution space is relaxed back to Euclidean space after projection. The matrix energy (the sum of squares of singular values) reduction caused by projection is compensated by energy transfer. We uniformly transfer the energy of the pruned singular values to the remaining ones. We theoretically show that energy transfer eases the trend of gradient vanishing caused by projection. In modern networks, a batch normalization (BN) layer can be merged into the previous convolution layer for inference, thereby influencing the optimal low-rank approximation (LRA) of the previous layer. We propose BN rectification to cut off its effect on the optimal LRA, which further improves the performance. Comprehensive experiments on CIFAR-10 and ImageNet have justified that our method is superior to other low-rank compression methods and also outperforms recent state-of-the-art pruning methods. For object detection and semantic segmentation, our method still achieves good compression results. In addition, we combine LRPET with quantization and hashing methods and achieve even better compression than the original single method. We further apply it in Transformer-based models to demonstrate its transferability. Our code is available at https://github.com/BZQLin/LRPET.

PaperID: 589,

Authors: Quan-Yong Fan, Meiying Cai, Bin Xu

Affiliations: School of Automation, Northwestern Polytechnical University, Xi’an, China

Title: An Improved Prioritized DDPG Based on Fractional-Order Learning Scheme

Abstract:
Although deep deterministic policy gradient (DDPG) algorithm gets widespread attention as a result of its powerful functionality and applicability for large-scale continuous control, it cannot be denied that DDPG has problems such as low sample utilization efficiency and insufficient exploration. Therefore, an improved DDPG is presented to overcome these challenges in this article. Firstly, an optimizer based on fractional gradient is introduced into the algorithm network, which is conductive to increase the speed and accuracy of training convergence. On this basis, high-value experience replay based on weight-changed priority is proposed to improve sample utilization efficiency, and aiming to have a stronger exploration of the environment, an optimized exploration strategy for boundary action space is adopted. Finally, our proposed method is tested through the experiments of gym and pybullet platform. According to the results, our method speeds up the learning process, obtains higher average rewards in comparison with other algorithms.

PaperID: 590,

Authors: Bosen Lian, Wenqian Xue, Frank L. Lewis, Ali Davoudi

Affiliations: Electrical and Computer Engineering Department, Auburn University, Auburn, AL, USA; Electrical and Computer Engineering Department, University of Florida, Gainesville, FL, USA; The University of Texas at Arlington Research Institute, Fort Worth, TX, USA

Title: Inverse Value Iteration and Q-Learning: Algorithms, Stability, and Robustness

Abstract:
This article proposes a data-driven model-free inverse Q-learning algorithm for continuous-time linear quadratic regulators (LQRs). Using an agent’s trajectories of states and optimal control inputs, the algorithm reconstructs its cost function that captures the same trajectories. This article first poses a model-based inverse value iteration scheme using the agent’s system dynamics. Then, an online model-free inverse Q-learning algorithm is developed to recover the agent’s cost function only using the demonstrated trajectories. It is more efficient than the existing inverse reinforcement learning (RL) algorithms as it avoids the repetitive RL in inner loops. The proposed algorithms do not need initial stabilizing control policies and solve for unbiased solutions. The proposed algorithm’s asymptotic stability, convergence, and robustness are guaranteed. Theoretical analysis and simulation examples show the effectiveness and advantages of the proposed algorithms.

PaperID: 591,

Authors: Xiaojing Ge, Rui Yan, Xiangbo Shu, Keke Chen, Wei Tian, Guo-Sen Xie

Affiliations: School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China; Department of Computer Science and Technology, Nanjing University, Nanjing, China; School of Computer Science, South Central Minzu University, Wuhan, China

Title: Coarse-Fine Nested Network for Weakly Supervised Group Activity Recognition

Abstract:
Weakly supervised group activity recognition (WSGAR) aims at identifying the overall behavior of multiple persons without any fine-grained supervision information (including individual position and action label). Traditional methods usually adopt a person-to-whole way: detect persons via off-the-shelf detectors, obtain person-level features, and integrate into the group-level features for training the classifier. However, these methods are unflexible due to serious reliance on the quality of detectors. To get rid of the detector, recent works learn several prototype tokens from noisy grid features with learnable weights directly, which treat all the local visual information equally and bring in redundant and ambiguous information to some extent. To this end, we propose a novel coarse-fine nested network (CFNN) to coarsely localize the key visual patches of activity and further finely learn the local features, as well as the global features. Specifically, we design a nested interactor (NI) to progressively model the spatiotemporal interactions of the learnable global token. According to the cue of spatial interaction in NI, we localize several key visual patches via a new coarse-grained spatial localizer (CSL). Then, we finally encode these localized visual patches with the help of global spatiotemporal dependency via a new fine-grained spatiotemporal selector (FSS). Extensive experiments on Volleyball and NBA datasets demonstrate the effectiveness of the proposed CFNN compared with the existing competitive methods. Code is available at: https://github.com/gexiaojingshelby/CFNN

PaperID: 592,

Authors: Xing Zhao, Haoran Liang, Ronghua Liang

Affiliations: College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China

Title: Position Fusing and Refining for Clear Salient Object Detection

Abstract:
Multilevel feature fusion plays a pivotal role in salient object detection (SOD). High-level features present rich semantic information but lack object position information, whereas low-level features contain object position information but are mixed with noises such as backgrounds. Appropriately addressing the gap between low- and high-level features is important in SOD. We first propose a global position embedding attention (GPEA) module to minimize the discrepancy between multilevel features in this article. We extract the position information by utilizing the semantic information at high-level features to resist noises at low-level features. Object refine attention (ORA) module is introduced to refine features used to predict saliency maps further without any additional supervision and heighten discriminative regions near the salient object, such as boundaries. Moreover, we find that the saliency maps generated by the previous methods contain some blurry regions, and we design a pixel value (PV) loss to help the model generate saliency maps with improved clarity. Experimental results on five commonly used SOD datasets demonstrated that the proposed method is effective and outperforms the state-of-the-art approaches on multiple metrics.

PaperID: 593,

Authors: Wangli Hao, He Guan, Zhaoxiang Zhang

Affiliations: National Laboratory of Pattern Recognition (NLPR), Center for Research on Intelligent Perception and Computing (CRIPAC), Institute of Automation, Chinese Academy of Sciences (CASIA), University of Chinese Academy of Sciences (UCAS), Beijing, China; Center for Research on Intelligent Perception and Computing, Institute of Automation, University of Chinese Academy of Sciences, Beijing, China

Title: VAG: A Uniform Model for Cross-Modal Visual-Audio Mutual Generation

Abstract:
Considering both audio and visual modalities is helpful for understanding a video. In the face of harsh environmental interference or signal packet loss, automatically compensating for audio and vision is a challenging task. We propose a dynamic cross-modal visual-audio mutual generation model (VAMG), which includes audio to visual conversion, visual to audio conversion, audio self-generation, and visual self-generation. VAMG jointly optimizes modal reconstruction and adversarial constraints, effectively solving the problems of structural alignment and signal compensation in incomplete videos. We conducted an instrument-oriented and pose-oriented cross-modal audio-visual mutual generation experiment on the sub-University of Rochester Musical Performance dataset to verify the effectiveness of the model.

PaperID: 594,

Authors: Liyan Zhang, Guodong Du, Fan Liu, Huawei Tu, Xiangbo Shu

Affiliations: College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China; College of Computer and Information, Hohai University, Nanjing, China; Department of Computer Science and Information Technology, La Trobe University, Melbourne, VIC, Australia; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China

Title: Global-Local Multiple Granularity Learning for Cross-Modality Visible-Infrared Person Reidentification

Abstract:
Cross-modality visible–infrared person reidentification (VI-ReID), which aims to retrieve pedestrian images captured by both visible and infrared cameras, is a challenging but essential task for smart surveillance systems. The huge barrier between visible and infrared images has led to the large cross-modality discrepancy and intraclass variations. Most existing VI-ReID methods tend to learn discriminative modality-sharable features based on either global or part-based representations, lacking effective optimization objectives. In this article, we propose a novel global–local multichannel (GLMC) network for VI-ReID, which can learn multigranularity representations based on both global and local features. The coarse- and fine-grained information can complement each other to form a more discriminative feature descriptor. Besides, we also propose a novel center loss function that aims to simultaneously improve the intraclass cross-modality similarity and enlarge the interclass discrepancy to explicitly handle the cross-modality discrepancy issue and avoid the model fluctuating problem. Experimental results on two public datasets have demonstrated the superiority of the proposed method compared with state-of-the-art approaches in terms of effectiveness.

PaperID: 595,

Authors: Seongmin Lee, Suwoong Heo, Sanghoon Lee

Affiliations: Department of Electrical and Electronic Engineering, Yonsei University, Seoul, South Korea

Title: DMESH: A Structure-Preserving Diffusion Model for 3-D Mesh Denoising

Abstract:
Denoising diffusion models have shown a powerful capacity for generating high-quality image samples by progressively removing noise. Inspired by this, we present a diffusion-based mesh denoiser that progressively removes noise from mesh. In general, the iterative algorithm of diffusion models attempts to manipulate the overall structure and fine details of target meshes simultaneously. For this reason, it is difficult to apply the diffusion process to a mesh denoising task that removes artifacts while maintaining a structure. To address this, we formulate a structure-preserving diffusion process. Instead of diffusing the mesh vertices to be distributed as zero-centered isotopic Gaussian distribution, we diffuse each vertex into a specific noise distribution, in which the entire structure can be preserved. In addition, we propose a topology-agnostic mesh diffusion model by projecting the vertex into multiple 2-D viewpoints to efficiently learn the diffusion using a deep network. This enables the proposed method to learn the diffusion of arbitrary meshes that have an irregular topology. Finally, the denoised mesh can be obtained via refinement based on 2-D projections obtained from reverse diffusion. Through extensive experiments, we demonstrate that our method outperforms the state-of-the-art mesh denoising methods in both quantitative and qualitative evaluations.

PaperID: 596,

Authors: Ruicong Zhi, Yicheng Meng, Junyi Hou, Jun Wan

Affiliations: School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China; Department of Computer Science, School of Computing, National University of Singapore, Cluny Road, Singapore; State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences (CASIA), and the School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS), Beijing, China

Title: Dual Balanced Class-Incremental Learning With im-Softmax and Angular Rectification

Abstract:
Owing to the superior performances, exemplar-based methods with knowledge distillation (KD) are widely applied in class incremental learning (CIL). However, it suffers from two drawbacks: 1) data imbalance between the old/learned and new classes causes the bias of the new classifier toward the head/new classes and 2) deep neural networks (DNNs) suffer from distribution drift when learning sequence tasks, which results in narrowed feature space and deficient representation of old tasks. For the first problem, we analyze the insufficiency of softmax loss when dealing with the problem of data imbalance in theory and then propose the imbalance softmax (im-softmax) loss to relieve the imbalanced data learning, where we re-scale the output logits to underfit the head/new classes. For another problem, we calibrate the feature space by incremental-adaptive angular margin (IAAM) loss. The new classes form a complete distribution in feature space yet the old are squeezed. To recover the old feature space, we first compute the included angle of normalized features and normalized anchor prototypes, and use the angle distribution to represent the class distribution, then we replenish the old distribution with the deviation from the new. Each anchor prototype is predefined as a learnable vector for a designated class. The proposed im-softmax reduces the bias in the linear classification layer. IAAM rectifies the representation learning, reduces the intra-class distance, and enlarges the inter-class margin. Finally, we seamlessly combine the im-softmax and IAAM in an end-to-end training framework, called the dual balanced class incremental learning (DBL), for further improvements. Experiments demonstrate the proposed method achieves state-of-the-art (SOTA) performances on several benchmarks, such as CIFAR10, CIFAR100, Tiny-ImageNet, and ImageNet-100.

PaperID: 597,

Authors: Sina Ghiassian, Banafsheh Rafiee, Richard S. Sutton

Affiliations: Department of Computing Science, University of Alberta, Edmonton, AB, Canada; Reinforcement Learning and Artificial Intelligence (RLAI) Laboratory, University of Alberta, Edmonton, AB, Canada

Title: Off-Policy Prediction Learning: An Empirical Study of Online Algorithms

Abstract:
Off-policy prediction—learning the value function for one policy from data generated while following another policy—is one of the most challenging problems in reinforcement learning. This article makes two main contributions: 1) it empirically studies 11 off-policy prediction learning algorithms with emphasis on their sensitivity to parameters, learning speed, and asymptotic error and 2) based on the empirical results, it proposes two step-size adaptation methods called Step-size Ratchet and Soft Step-size Ratchet that help the algorithm with the lowest error from the experimental study learn faster. Many off-policy prediction learning algorithms have been proposed in the past decade, but it remains unclear which algorithms learn faster than others. In this article, we empirically compare 11 off-policy prediction learning algorithms with linear function approximation on three small tasks: the Collision task, the Rooms task, and the High Variance Rooms task. The Collision task is a small off-policy problem analogous to that of an autonomous car trying to predict whether it will collide with an obstacle. The Rooms and High Variance Rooms tasks are designed such that learning fast in them is challenging. In the Rooms task, the product of importance sampling ratios can be as large as 2^14 . To control the high variance caused by the product of the importance sampling ratios, step size should be set small, which, in turn, slows down learning. The High Variance Rooms task is more extreme in that the product of the ratios can become as large as 2^14 × 25 . The algorithms considered are Off-policy TD( \lambda ), five Gradient-TD algorithms, two Emphatic-TD algorithms, Vtrace, and variants of Tree Backup and ABQ that are applicable to the prediction setting. We found that the algorithms’ performance is highly affected by the variance induced by the importance sampling ratios. Tree Backup( \lambda ), Vtrace( \lambda ), and ABTD( \zeta ) are not affected by the high variance as much as other algorithms, but they restrict the effective bootstrapping parameter in a way that is too limiting for tasks where high variance is not present. We observed that Emphatic TD( \lambda ) tends to have lower asymptotic error than other algorithms but might learn more slowly in some cases. Based on the empirical results, we propose two step-size adaptation algorithms, which we collectively refer to as the Ratchet algorithms, with the same underlying idea: keep the step-size parameter as large as possible and ratchet it down only when necessary to avoid overshoot. We show that the Ratchet algorithms are effective by comparing them with other popular step-size adaptation algorithms, such as the Adam optimizer.

PaperID: 598,

Authors: Boyue Wang, Yujian Ma, Xiaoyan Li, Junbin Gao, Yongli Hu, Baocai Yin

Affiliations: Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology, Beijing Artificial Intelligence Institute, Faculty of Information Technology, Beijing University of Technology, Beijing, China; Discipline of Business Analytics, The University of Sydney Business School, The University of Sydney, Sydney, NSW, Australia

Title: Bridging the Cross-Modality Semantic Gap in Visual Question Answering

Abstract:
The objective of visual question answering (VQA) is to adequately comprehend a question and identify relevant contents in an image that can provide an answer. Existing approaches in VQA often combine visual and question features directly to create a unified cross-modality representation for answer inference. However, this kind of approach fails to bridge the semantic gap between visual and text modalities, resulting in a lack of alignment in cross-modality semantics and the inability to match key visual content accurately. In this article, we propose a model called the caption bridge-based cross-modality alignment and contrastive learning model (CBAC) to address the issue. The CBAC model aims to reduce the semantic gap between different modalities. It consists of a caption-based cross-modality alignment module and a visual-caption (V-C) contrastive learning module. By utilizing an auxiliary caption that shares the same modality as the question and has closer semantic associations with the visual, we are able to effectively reduce the semantic gap by separately matching the caption with both the question and the visual to generate pre-alignment features for each, which are then used in the subsequent fusion process. We also leverage the fact that V-C pairs exhibit stronger semantic connections compared to question-visual (Q-V) pairs to employ a contrastive learning mechanism on visual and caption pairs to further enhance the semantic alignment capabilities of single-modality encoders. Extensive experiments conducted on three benchmark datasets demonstrate that the proposed model outperforms previous state-of-the-art VQA models. Additionally, ablation experiments confirm the effectiveness of each module in our model. Furthermore, we conduct a qualitative analysis by visualizing the attention matrices to assess the reasoning reliability of the proposed model.

PaperID: 599,

Authors: Kewei Wu, Wenjie Luo, Zhao Xie, Dan Guo, Zhao Zhang, Richang Hong

Affiliations: School of Computer and Information, Hefei University of Technology, Hefei, China

Title: Ensemble Prototype Network For Weakly Supervised Temporal Action Localization

Abstract:
Weakly supervised temporal action localization (TAL) aims to localize the action instances in untrimmed videos using only video-level action labels. Without snippet-level labels, this task should be hard to distinguish all snippets with accurate action/background categories. The main difficulties are the large variations brought by the unconstraint background snippets and multiple subactions in action snippets. The existing prototype model focuses on describing snippets by covering them with clusters (defined as prototypes). In this work, we argue that the clustered prototype covering snippets with simple variations still suffers from the misclassification of the snippets with large variations. We propose an ensemble prototype network (EPNet), which ensembles prototypes learned with consensus-aware clustering. The network stacks a consensus prototype learning (CPL) module and an ensemble snippet weight learning (ESWL) module as one stage and extends one stage to multiple stages in an ensemble learning way. The CPL module learns the consensus matrix by estimating the similarity of clustering labels between two successive clustering generations. The consensus matrix optimizes the clustering to learn consensus prototypes, which can predict the snippets with consensus labels. The ESWL module estimates the weights of the misclassified snippets using the snippet-level loss. The weights update the posterior probabilities of the snippets in the clustering to learn prototypes in the next stage. We use multiple stages to learn multiple prototypes, which can cover the snippets with large variations for accurate snippet classification. Extensive experiments show that our method achieves the state-of-the-art weakly supervised TAL methods on two benchmark datasets, that is, THUMOS’14, ActivityNet v1.2, and ActivityNet v1.3 datasets.

PaperID: 600,

Authors: Zhengming Chen, Jie Qiao, Feng Xie, Ruichu Cai, Zhifeng Hao, Keli Zhang

Affiliations: School of Computer, Guangdong University of Technology, Guangzhou, China; School of Mathematics and Statistics, Beijing Technology and Business University, Beijing, China; School of Computer Science, Guangdong University of Technology, Guangzhou, China; College of Science, Shantou University, Shantou, Guangdong, China; Huawei Noah’s Ark Lab, Huawei, Shenzhen, China

Title: Testing Conditional Independence Between Latent Variables by Independence Residuals

Abstract:
Conditional independence (CI) testing is an important problem, especially in causal discovery. Most testing methods assume that all variables are fully observable and then test the CI among the observed data. Such an assumption is often untenable beyond applications dealing with, e.g., psychological analysis about the mental health status and medical diagnosing (researchers need to consider the existence of latent variables in these scenarios); and typically adopted latent CI test schemes mainly suffer from robust or efficient issues. Accordingly, this article investigates the problem of testing CI between latent variables. To this end, we offer an auxiliary regression-based CI (AReCI) test by taking the measured variable as the surrogate variable of the latent variables to conduct the regression over the latent variables under the linear causal models, in which each latent variable has some certain measured variables. Specifically, given a pair of latent variables L_X and L_Y , and a corresponding latent variable set \mathcal L_O , L_X \mathrel \perp \mspace -10mu\perp L_Y | \mathcal L_O holds if and only if A_\L_X\-\omega _1^\intercal A^\prime _\\mathcal L_O\ and A_\L_Y\-\omega _2^\intercal A^\prime \prime _\\mathcal L_O\ are statistically independent, where A^\prime and A^\prime \prime are the two disjoint subset of the measured variable for the corresponding latent variables, A^\prime _\\mathcal L_O\ \cap A^\prime \prime _\\mathcal L_O\ =\emptyset , and \omega _1 is a parameter vector characterized from the cross covariance between A_\L_X\ and A^\prime _\\mathcal L_O\ , and \omega _2 is a parameter vector characterized from the cross covariance between A_\L_Y\ and A^\prime \prime _\\mathcal L_O\ . We theoretically show that the AReCI test is capable of addressing both Gaussian and non-Gaussian data. In addition, we find that the well-known partial correlation test can be seen as a special case of the AReCI test. Finally, we devise a causal discovery method by using the AReCI test as the CI test. The experimental results on synthetic and real-world data illustrate the effectiveness of our method.

PaperID: 601,

Authors: Cong Xu, Wei Zhang, Jun Wang, Min Yang

Affiliations: School of Mathematics and Information Sciences, Yantai University, Yantai, China; School of Computer Science and Technology, East China Normal University, Shanghai, China

Title: Understanding Adversarial Robustness From Feature Maps of Convolutional Layers

Abstract:
The adversarial robustness of a neural network mainly relies on two factors: model capacity and antiperturbation ability. In this article, we study the antiperturbation ability of the network from the feature maps of convolutional layers. Our theoretical analysis discovers that larger convolutional feature maps before average pooling can contribute to better resistance to perturbations, but the conclusion is not true for max pooling. It brings new inspiration to the design of robust neural networks and urges us to apply these findings to improve existing architectures. The proposed modifications are very simple and only require upsampling the inputs or slightly modifying the stride configurations of downsampling operators. We verify our approaches on several benchmark neural network architectures, including AlexNet, VGG, RestNet18, and PreActResNet18. Nontrivial improvements in terms of both natural accuracy and adversarial robustness can be achieved under various attack and defense mechanisms. The code is available at https://github.com/MTandHJ/rcm.

PaperID: 602,

Authors: Qi Yao, Tengda Wei, Ping Lin, Linshan Wang

Affiliations: College of Science, China University of Petroleum (East China), Qingdao, China; School of Mathematics and Statistics, Shandong Normal University, Jinan, China; Department of Mathematics, University of Dundee, Dundee, U.K.; School of Mathematical Sciences, Ocean University of China, Qingdao, China

Title: Finite-Time Boundedness of Impulsive Delayed Reaction-Diffusion Stochastic Neural Networks

Abstract:
Considering the impulsive delayed reaction–diffusion stochastic neural networks (IDRDSNNs) with hybrid impulses, the finite-time boundedness (FTB) and finite-time contractive boundedness (FTCB) are investigated in this article. First, a novel delay integral inequality is presented. By integrating this inequality with the comparison principle, some sufficient conditions that ensure the FTB and FTCB of IDRDSNNs are obtained. This study demonstrates that the FTB of neural networks with hybrid impulses can be maintained, even in the presence of impulsive perturbations. And for a system that is not FTB due to impulsive perturbations, achieving FTB is possible through the implementation of appropriate impulsive control and optimization of the average impulsive intervals. In addition, to validate the practicality of our results, three illustrative examples are provided. In the end, these theoretical findings are successfully applied to image encryption.

PaperID: 603,

Authors: Qiujie Lv, Guanxing Chen, Ziduo Yang, Weihe Zhong, Calvin Yu-Chian Chen

Affiliations: Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong, China; Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Zhongshan, Guangdong, China; AI for Science (AIS)-Preferred Program, Peking University Shenzhen Graduate School, Shenzhen, Guangdong, China

Title: Meta-MolNet: A Cross-Domain Benchmark for Few Examples Drug Discovery

Abstract:
Predicting the pharmacological activity, toxicity, and pharmacokinetic properties of molecules is a central task in drug discovery. Existing machine learning methods are transferred from one resource rich molecular property to another data scarce property in the same scaffold dataset. However, existing models may produce fragile and highly uncertain predictions for new scaffold molecules. And these models were tested on different benchmarks, which seriously affected the quality of their evaluation results. In this article, we introduce Meta-MolNet, a collection of data benchmark and algorithms, which is a standard benchmark platform for measuring model generalization and uncertainty quantification capabilities. Meta-MolNet manages a wide range of molecular datasets with high ratio of molecules/scaffolds, which often leads to more difficult data shift and generalization problems. Furthermore, we propose a graph attention network based on cross-domain meta-learning, Meta-GAT, which uses bilevel optimization to learn meta-knowledge from the scaffold family molecular dataset in the source domain. Meta-GAT benefits from meta-knowledge that reduces the requirement of sample complexity to enable reliable predictions of new scaffold molecules in the target domain through internal iteration of a few examples. We evaluate existing methods as baselines for the community, and the Meta-MolNet benchmark demonstrates the effectiveness of measuring the proposed algorithm in domain generalization and uncertainty quantification. Extensive experiments demonstrate that the Meta-GAT model has state-of-the-art domain generalization performance and robustly estimates uncertainty under few examples constraints. By publishing AI-ready data, evaluation frameworks, and baseline results, we hope to see the Meta-MolNet suite become a comprehensive resource for the AI-assisted drug discovery community. Meta-MolNet is freely accessible at https://github.com/lol88/Meta-MolNet.

PaperID: 604,

Authors: Jinye Qu, Zeyu Gao, Tielin Zhang, Yanfeng Lu, Huajin Tang, Hong Qiao

Affiliations: State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Science (CASIA), Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS), Beijing, China; College of Computer Science and Technology, Zhejiang University, Hangzhou, China

Title: Spiking Neural Network for Ultralow-Latency and High-Accurate Object Detection

Abstract:
Spiking Neural Networks (SNNs) have attracted significant attention for their energy-efficient and brain-inspired event-driven properties. Recent advancements, notably Spiking-YOLO, have enabled SNNs to undertake advanced object detection tasks. Nevertheless, these methods often suffer from increased latency and diminished detection accuracy, rendering them less suitable for latency-sensitive mobile platforms. Additionally, the conversion of artificial neural networks (ANNs) to SNNs frequently compromises the integrity of the ANNs’ structure, resulting in poor feature representation and heightened conversion errors. To address the issues of high latency and low detection accuracy, we introduce two solutions: timestep compression and spike-time-dependent integrated (STDI) coding. Timestep compression effectively reduces the number of timesteps required in the ANN-to-SNN conversion by condensing information. The STDI coding employs a time-varying threshold to augment information capacity. Furthermore, we have developed an SNN-based spatial pyramid pooling (SPP) structure, optimized to preserve the network’s structural efficacy during conversion. Utilizing these approaches, we present the ultralow latency and highly accurate object detection model, SUHD. SUHD exhibits exceptional performance on challenging datasets like PASCAL VOC and MS COCO, achieving a remarkable reduction of approximately 750 times in timesteps and a 30% enhancement in mean average precision (mAP) compared to Spiking-YOLO on MS COCO. To the best of our knowledge, SUHD is currently the deepest spike-based object detection model, achieving ultralow timesteps for lossless conversion.

PaperID: 605,

Authors: Yuqing Chen, Zhencai Shen, Daoliang Li, Ping Zhong, Yingyi Chen

Affiliations: College of Science, China Agricultural University, Beijing, China; College of Science and the National Innovation Center for Digital Fishery, China Agricultural University, Beijing, China; National Innovation Center for Digital Fishery, the Key Laboratory of Smart Farming Technologies for Aquatic Animal and Livestock, Ministry of Agriculture and Rural Affairs, Beijing Engineering and Technology Research Center for Internet of Things in Agriculture, and the College of Information and Electrical Engineering, China Agricultural University, Beijing, China

Title: Heterogeneous Domain Adaptation With Generalized Similarity and Dissimilarity Regularization

Abstract:
Heterogeneous domain adaptation (HDA) aims to address the transfer learning problems where the source domain and target domain are represented by heterogeneous features. The existing HDA methods based on matrix factorization have been proven to learn transferable features effectively. However, these methods only preserve the original neighbor structure of samples in each domain and do not use the label information to explore the similarity and separability between samples. This would not eliminate the cross-domain bias of samples and may mix cross-domain samples of different classes in the common subspace, misleading the discriminative feature learning of target samples. To tackle the aforementioned problems, we propose a novel matrix factorization-based HDA method called HDA with generalized similarity and dissimilarity regularization (HGSDR). Specifically, we propose a similarity regularizer by establishing the cross-domain Laplacian graph with label information to explore the similarity between cross-domain samples from the identical class. And we propose a dissimilarity regularizer based on the inner product strategy to expand the separability of cross-domain labeled samples from different classes. For unlabeled target samples, we keep their neighbor relationship to preserve the similarity and separability between them in the original space. Hence, the generalized similarity and dissimilarity regularization is built by integrating the above regularizers to facilitate cross-domain samples to form discriminative class distributions. HGSDR can more efficiently match the distributions of the two domains both from the global and sample viewpoints, thereby learning discriminative features for target samples. Extensive experiments on the benchmark datasets demonstrate the superiority of the proposed method against several state-of-the-art methods.

PaperID: 606,

Authors: Iyyakutti Iyappan Ganapathi, Fayaz Ali Dharejo, Sajid Javed, Syed Sadaf Ali, Naoufel Werghi

Affiliations: Department of Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates

Title: Unsupervised Dual Transformer Learning for 3-D Textured Surface Segmentation

Abstract:
Analysis of the 3-D texture is indispensable for various tasks, such as retrieval, segmentation, classification, and inspection of sculptures, knit fabrics, and biological tissues. A 3-D texture represents a locally repeated surface variation (SV) that is independent of the overall shape of the surface and can be determined using the local neighborhood and its characteristics. Existing methods mostly employ computer vision techniques that analyze a 3-D mesh globally, derive features, and then utilize them for classification or retrieval tasks. While several traditional and learning-based methods have been proposed in the literature, only a few have addressed 3-D texture analysis, and none have considered unsupervised schemes so far. This article proposes an original framework for the unsupervised segmentation of 3-D texture on the mesh manifold. The problem is approached as a binary surface segmentation task, where the mesh surface is partitioned into textured and nontextured regions without prior annotation. The proposed method comprises a mutual transformer-based system consisting of a label generator (LG) and a label cleaner (LC). Both models take geometric image representations of the surface mesh facets and label them as texture or nontexture using an iterative mutual learning scheme. Extensive experiments on three publicly available datasets with diverse texture patterns demonstrate that the proposed framework outperforms standard and state-of-the-art unsupervised techniques and performs reasonably well compared to supervised methods.

PaperID: 607,

Authors: Qian Guo, Yi Guo, Jin Zhao

Affiliations: Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China; Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai, China

Title: HRCL: Hierarchical Relation Contrastive Learning for Low-Resource Relation Extraction

Abstract:
Low-resource relation extraction (LRE) aims to extract the relationships between given entities from natural language sentences in low-resource application scenarios, which has been an incredibly challenging task due to the limited annotated corpora. Existing studies either leverage self-training schemes to expand the scale of labeled data, while the error accumulation of pseudo-labels’ selection bias provoke the gradual drift problem in subsequent relation prediction, or utilize the instance-wise contrastive learning that fails to distinguish those sentence pairs with similar semantics. To alleviate these defects, this article introduces a novel contrastive learning framework called hierarchical relation contrastive learning (HRCL) for LRE. HRCL leverages task-related instruction description and schema-constrained as prompts to generate high-level relation representations. To enhance the efficacy of contrastive learning, we further employ hierarchical affinity propagation clustering (HiPC) to derive hierarchical signals from relational feature space with a hierarchy cross-attention (HCA) mechanism and effectively optimize pair-level relation features through relation-wise contrastive learning. Exhaustive experiments have been conducted on five public relation extraction (RE) datasets in low-resource settings. The results demonstrate the effectiveness and robustness of HRCL and outperform the current state-of-the-art (SOTA) model by 6.56% on average in terms of B3F1. Our source code is publicly available at https://github.com/Phevos75/HRCLRE.

PaperID: 608,

Authors: Jingchen Li, Haobin Shi, Huarui Wu, Chunjiang Zhao, Kao-Shing Hwang

Affiliations: Information Technology Research Center, Beijing Academy of Agriculture and Forestry Science, Beijing, China; School of Computer Science, Northwestern Polytechnical University, Xian, Shaanxi, China; Department of Electrical Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan

Title: Eliminating Primacy Bias in Online Reinforcement Learning by Self-Distillation

Abstract:
Excessive invalid explorations at the beginning of training lead deep reinforcement learning process to fall into the risk of overfitting, further resulting in spurious decisions, which obstruct agents in the following states and explorations. This phenomenon is termed primacy bias in online reinforcement learning. This work systematically investigates the primacy bias in online reinforcement learning, discussing the reason for primacy bias, while the characteristic of primacy bias is also analyzed. Besides, to learn a policy generalized to the following states and explorations, we develop an online reinforcement learning framework, termed self-distillation reinforcement learning (SDRL), based on knowledge distillation, allowing the agent to transfer the learned knowledge into a randomly initialized policy at regular intervals, and the new policy network is used to replace the original one in the following training. The core idea for this work is distilling knowledge from the trained policy to another policy can filter biases out, generating a more generalized policy in the learning process. Moreover, to avoid the overfitting of the new policy due to excessive distillations, we add an additional loss in the knowledge distillation process, using L2 regularization to improve the generalization, and the self-imitation mechanism is introduced to accelerate the learning on the current experiences. The results of several experiments in DMC and Atari 100k suggest the proposal has the ability to eliminate primacy bias for reinforcement learning methods, and the policy after knowledge distillation can urge agents to get higher scores more quickly.

PaperID: 609,

Authors: Qi Xu, Sibo Liu, Xuming Ran, Yaxin Li, Jiangrong Shen, Huajin Tang, Jian K. Liu, Gang Pan, Qiang Zhang

Affiliations: School of Computer Science and Technology, Dalian University of Technology, Dalian, China; Shanghai AI Laboratory, Shanghai, China; College of Computer Science and Technology, Zhejiang University, Xihu, Zhejiang, China; School of Computing, University of Leeds, Leeds, U.K.

Title: Robust Sensory Information Reconstruction and Classification With Augmented Spikes

Abstract:
Sensory information recognition is primarily processed through the ventral and dorsal visual pathways in the primate brain visual system, which exhibits layered feature representations bearing a strong resemblance to convolutional neural networks (CNNs), encompassing reconstruction and classification. However, existing studies often treat these pathways as distinct entities, focusing individually on pattern reconstruction or classification tasks, overlooking a key feature of biological neurons, the fundamental units for neural computation of visual sensory information. Addressing these limitations, we introduce a unified framework for sensory information recognition with augmented spikes. By integrating pattern reconstruction and classification within a single framework, our approach not only accurately reconstructs multimodal sensory information but also provides precise classification through definitive labeling. Experimental evaluations conducted on various datasets including video scenes, static images, dynamic auditory scenes, and functional magnetic resonance imaging (fMRI) brain activities demonstrate that our framework delivers state-of-the-art pattern reconstruction quality and classification accuracy. The proposed framework enhances the biological realism of multimodal pattern recognition models, offering insights into how the primate brain visual system effectively accomplishes the reconstruction and classification tasks through the integration of ventral and dorsal pathways.

PaperID: 610,

Authors: Zikun Zhou, Kaige Mao, Wenjie Pei, Hongpeng Wang, Yaowei Wang, Zhenyu He

Affiliations: Peng Cheng Laboratory, Shenzhen, China

Title: Reliability-Guided Hierarchical Memory Network for Scribble-Supervised Video Object Segmentation

Abstract:
This article aims to solve the video object segmentation (VOS) task in a scribble-supervised manner, in which VOS models are not only initialized with sparse target scribbles for inference but also trained by sparse scribble annotations. Thus, the annotation burdens for both initialization and training can be substantially lightened. The difficulties of scribble-supervised VOS lie in two aspects: 1) it demands a strong reasoning ability to carefully segment the target given only a sparse initial target scribble and 2) it necessitates learning dense prediction from sparse scribble annotations during training, requiring powerful learning capability. In this work, we propose a reliability-guided hierarchical memory network (RHMNet) for this task, which segments the target in a stepwise expanding strategy w.r.t. the memory reliability level. To be specific, RHMNet maintains a reliability-guided memory bank. It first uses the high-reliability memory to locate the region with high reliability belonging to the target, i.e., highly similar to the initial target scribble. Then, it expands the located high-reliability region to the entire target conditioned on the region itself and all existing memories. In addition, we propose a scribble-supervised learning mechanism to facilitate the model learning for dense prediction. It exploits the pixel-level relations within a single frame and the instance-level variations across multiple frames to take full advantage of the scribble annotations in sequence training samples. The favorable performance on four popular benchmarks demonstrates that our method is promising. Our project is available at: https://github.com/mkg1204/RHMNet-for-SSVOS.

PaperID: 611,

Authors: Hong Yin, Jiang Zhong, Rongzhen Li, Jiaxing Shang, Chen Wang, Xue Li

Affiliations: College of Computer Science, Chongqing University, Chongqing, China; School of Electrical Engineering and Computer Science, The University of Queensland, Brisbane, QLD, Australia

Title: High-Order Neighbors Aware Representation Learning for Knowledge Graph Completion

Abstract:
As a building block of knowledge acquisition, knowledge graph completion (KGC) aims at inferring missing facts in knowledge graphs (KGs) automatically. Previous studies mainly focus on graph convolutional network (GCN)-based KG embedding (KGE) to determine the representations of entities and relations, accordingly predicting missing triplets. However, most existing KGE methods suffer from limitations in predicting tail entities that are far away or even unreachable in KGs. This limitation can be attributed to the related high-order information being largely ignored. In this work, we focus on learning the information from the related high-order neighbors in KGs to improve the performance of prediction. Specifically, we first introduce a set of new nodes called pedalnodes to augment the KGs for facilitating message passing between related high-order entities, effectively injecting the information of high-order neighbors into entity representation. Additionally, we propose strength-guided graph neural networks to aggregate neighboring entity representations. To address the issue of transmitting irrelevant higher order information to entities through pedal nodes, which can potentially hurt entity representation, we further propose to dynamically integrate the aggregated representation of each node with its corresponding self-representation. Extensive experiments have been conducted on three benchmark datasets and the results demonstrate the superiority of our method compared to strong baseline models.

PaperID: 612,

Authors: Weixin An, Yuanyuan Liu, Fanhua Shang, Hongying Liu, Licheng Jiao

Affiliations: Key Laboratory of Intelligent Perception and Image Understanding of the Ministry of Education, School of Artificial Intelligence, Xidian University, Xi’an, China; College of Intelligence and Computing, Tianjin University, Tianjin, China; Medical College, Tianjin University, Tianjin, China

Title: DEs-Inspired Accelerated Unfolded Linearized ADMM Networks for Inverse Problems

Abstract:
Many research works have shown that the traditional alternating direction multiplier methods (ADMMs) can be better understood by continuous-time differential equations (DEs). On the other hand, many unfolded algorithms directly inherit the traditional iterations to build deep networks. Although they achieve superior practical performance and a faster convergence rate than traditional counterparts, there is a lack of clear insight into unfolded network structures. Thus, we attempt to explore the unfolded linearized ADMM (LADMM) from the perspective of DEs, and design more efficient unfolded networks. First, by proposing an unfolded Euler LADMM scheme and inspired by the trapezoid discretization, we design a new more accurate Trapezoid LADMM scheme. For the convenience of implementation, we provide its explicit version via a prediction–correction strategy. Then, to expand the representation space of unfolded networks, we design an accelerated variant of our Euler LADMM scheme, which can be interpreted as second-order DEs with stronger representation capabilities. To fully explore this representation space, we designed an accelerated Trapezoid LADMM scheme. To the best of our knowledge, this is the first work to explore a comprehensive connection with theoretical guarantees between unfolded ADMMs and first- (second-) order DEs. Finally, we instantiate our schemes as (A-)ELADMM and (A-)TLADMM with the proximal operators, and (A-)ELADMM-Net and (A-)TLADMM-Net with convolutional neural networks (CNNs). Extensive inverse problem experiments show that our Trapezoid LADMM schemes perform better than well-known methods.

PaperID: 613,

Authors: Qi Yu, Xijun Liang, Mengzhen Li, Ling Jian

Affiliations: College of Science, China University of Petroleum, Qingdao, Shandong, China; School of Economics and Management, China University of Petroleum, Qingdao, Shandong, China

Title: NGDE: A Niching-Based Gradient-Directed Evolution Algorithm for Nonconvex Optimization

Abstract:
Nonconvex optimization issues are prevalent in machine learning and data science. While gradient-based optimization algorithms can rapidly converge and are dimension-independent, they may, unfortunately, fall into local optimal solutions or saddle points. In contrast, evolutionary algorithms (EAs) gradually adapt the population of solutions to explore global optimal solutions. However, this approach requires substantial computational resources to perform numerous fitness function evaluations, which poses challenges for high-dimensional optimization in particular. This study introduces a novel nonconvex optimization algorithm, the niching-based gradient-directed evolution (NGDE) algorithm, designed specifically for high-dimensional nonconvex optimization. The NGDE algorithm generates potential solutions and divides them into multiple niches to explore distinct areas within the feasible region. Subsequently, each individual creates candidate offspring using the gradient-directed mutation operator we designed. The convergence properties of the NGDE algorithm are investigated in two scenarios: accessing the full gradient and approximating the gradient with mini-batch samples. The experimental studies demonstrate the superior performance of the NGDE algorithm in minimizing multimodal optimization functions. Additionally, when applied to train the neural networks of LeNet-5, NGDE shows significantly improved classification accuracy, especially in smaller training sizes.

PaperID: 614,

Authors: Yunhao Li, Zhen Xiao, Lin Yang, Dan Meng, Xin Zhou, Heng Fan, Libo Zhang

Affiliations: School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China; State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China; Turing Quantum Company, Beijing, China; OPPO Research Institute, Beijing, China; Department of Computer Science and Engineering, University of North Texas, Denton, TX, USA

Title: AttMOT: Improving Multiple-Object Tracking by Introducing Auxiliary Pedestrian Attributes

Abstract:
Multiobject tracking (MOT) is a fundamental problem in computer vision with numerous applications, such as intelligent surveillance and automated driving. Despite the significant progress made in MOT, pedestrian attributes, such as gender, hairstyle, body shape, and clothing features, which contain rich and high-level information, have been less explored. To address this gap, we propose a simple, effective, and generic method to predict pedestrian attributes to support general reidentification (Re-ID) embedding. We first introduce attribute multi-object tracking (AttMOT), a large, highly enriched synthetic dataset for pedestrian tracking, containing over 80k frames and six million pedestrian identity switches (IDs) with different times, weather conditions, and scenarios. To the best of authors’ knowledge, AttMOT is the first MOT dataset with semantic attributes. Subsequently, we explore different approaches to fuse Re-ID embedding and pedestrian attributes, including attention mechanisms, which we hope will stimulate the development of attribute-assisted MOT. The proposed method attribute-assisted method (AAM) demonstrates its effectiveness and generality on several representative pedestrian MOT benchmarks, including MOT17 and MOT20, through experiments on the AttMOT dataset. When applied to the state-of-the-art trackers, AAM achieves consistent improvements in multi-object tracking accuracy (MOTA), higher order tracking accuracy (HOTA), association accuracy (AssA), IDs, and IDF1 scores. For instance, on MOT17, the proposed method yields a +1.1 MOTA, +1.7 HOTA, and +1.8 IDF1 improvement when used with FairMOT. To further encourage related research, we release the data and code at https://github.com/HengLan/AttMOT.

PaperID: 615,

Authors: Junhao Dong, Yuan Wang, Xiaohua Xie, Jianhuang Lai, Yew-Soon Ong

Affiliations: School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China,; School of Computer Science and Engineering, Nanyang Technological University, Jurong West, Singapore

Title: Generalizable and Discriminative Representations for Adversarially Robust Few-Shot Learning

Abstract:
Few-shot image classification (FSIC) is beneficial for a variety of real-world scenarios, aiming to construct a recognition system with limited training data. In this article, we extend the original FSIC task by incorporating defense against malicious adversarial examples. This can be an arduous challenge because numerous deep learning-based approaches remain susceptible to adversarial examples, even when trained with ample amounts of data. Previous studies on this problem have predominantly concentrated on the meta-learning framework, which involves sampling numerous few-shot tasks during the training stage. In contrast, we propose a straightforward but effective baseline via learning robust and discriminative representations without tedious meta-task sampling, which can further be generalized to unforeseen adversarial FSIC tasks. Specifically, we introduce an adversarial-aware (AA) mechanism that exploits feature-level distinctions between the legitimate and the adversarial domains to provide supplementary supervision. Moreover, we design a novel adversarial reweighting training strategy to ameliorate the imbalance among adversarial examples. To further enhance the adversarial robustness without compromising discriminative features, we propose the cyclic feature purifier during the postprocessing projection, which can reduce the interference of unforeseen adversarial examples. Furthermore, our method can obtain robust feature embeddings that maintain superior transferability, even when facing cross-domain adversarial examples. Extensive experiments and systematic analyses demonstrate that our method achieves state-of-the-art robustness as well as natural performance among adversarially robust FSIC algorithms on three standard benchmarks by a substantial margin.

PaperID: 616,

Authors: Jian Wang, Shujun Wu, Huaqing Zhang, Bin Yuan, Caili Dai, Nikhil R. Pal

Affiliations: College of Science, China University of Petroleum (East China), Qingdao, China; College of Petroleum Engineering, China University of Petroleum (East China), Qingdao, China; Electronics and Communication Sciences Unit and the Centre for Artificial Intelligence and Machine Learning, Indian Statistical Institute, Kolkata, India

Title: Universal Approximation Abilities of a Modular Differentiable Neural Network

Abstract:
Approximation ability is one of the most important topics in the field of neural networks (NNs). Feedforward NNs, activated by rectified linear units and some of their specific smoothed versions, provide universal approximators to convex as well as continuous functions. However, most of these networks are investigated empirically, or their characteristics are analyzed based on specific operation rules. Moreover, an adequate level of interpretability of the networks is missing as well. In this work, we propose a class of new network architecture, built with reusable neural modules (functional blocks), to supply differentiable and interpretable approximators for convex and continuous target functions. Specifically, first, we introduce a concrete model construction mechanism with particular blocks based on differentiable programming and the composition essence of the max operator, extending the scope of existing activation functions. Moreover, explicit block diagrams are provided for a clear understanding of the external architecture and the internal processing mechanism. Subsequently, the approximation behavior of the proposed network to convex functions and continuous functions is rigorously proved as well, by virtue of mathematical induction. Finally, plenty of numerical experiments are conducted on a wide variety of problems, which exhibit the effectiveness and the superiority of the proposed model over some existing ones.

PaperID: 617,

Authors: Yuqi Feng, Zeqiong Lv, Hongyang Chen, Shangce Gao, Fengping An, Yanan Sun

Affiliations: College of Computer Science, Sichuan University, Chengdu, China; Research Center for Graph Computing, Zhejiang Laboratory, Hangzhou, China; Faculty of Engineering, University of Toyama, Toyama, Japan; School of Automation and Software Engineering, Shanxi University, Taiyuan, China

Title: LRNAS: Differentiable Searching for Adversarially Robust Lightweight Neural Architecture

Abstract:
The adversarial robustness is critical to deep neural networks (DNNs) in deployment. However, the improvement of adversarial robustness often requires compromising with the network size. Existing approaches to addressing this problem mainly focus on the combination of model compression and adversarial training. However, their performance heavily relies on neural architectures, which are typically manual designs with extensive expertise. In this article, we propose a lightweight and robust neural architecture search (LRNAS) method to automatically search for adversarially robust lightweight neural architectures. Specifically, we propose a novel search strategy to quantify contributions of the components in the search space, based on which the beneficial components can be determined. In addition, we further propose an architecture selection method based on a greedy strategy, which can keep the model size while deriving sufficient beneficial components. Owing to these designs in LRNAS, the lightness, the natural accuracy, and the adversarial robustness can be collectively guaranteed to the searched architectures. We conduct extensive experiments on various benchmark datasets against the state of the arts. The experimental results demonstrate that the proposed LRNAS method is superior at finding lightweight neural architectures that are both accurate and adversarially robust under popular adversarial attacks. Moreover, ablation studies are also performed, which reveals the validity of the individual components designed in LRNAS and the component effects in positively deciding the overall performance.

PaperID: 618,

Authors: Zhiming Zhang, Zhenyu Lei, MengChu Zhou, Hideyuki Hasegawa, Shangce Gao

Affiliations: Faculty of Engineering, University of Toyama, Toyama, Japan; School of Information and Electronic Engineering, Zhejiang Gongshang University, Hangzhou, China

Title: Complex-Valued Convolutional Gated Recurrent Neural Network for Ultrasound Beamforming

Abstract:
Ultrasound detection is a potent tool for the clinical diagnosis of various diseases due to its real-time, convenient, and noninvasive qualities. Yet, existing ultrasound beamforming and related methods face a big challenge to improve both the quality and speed of imaging for the required clinical applications. The most notable characteristic of ultrasound signal data is its spatial and temporal features. Because most signals are complex-valued, directly processing them by using real-valued networks leads to phase distortion and inaccurate output. In this study, for the first time, we propose a complex-valued convolutional gated recurrent (CCGR) neural network to handle ultrasound analytic signals with the aforementioned properties. The complex-valued network operations proposed in this study improve the beamforming accuracy of complex-valued ultrasound signals over traditional real-valued methods. Further, the proposed deep integration of convolution and recurrent neural networks makes a great contribution to extracting rich and informative ultrasound signal features. Our experimental results reveal its outstanding imaging quality over existing state-of-the-art methods. More significantly, its ultrafast processing speed of only 0.07 s per image promises considerable clinical application potential. The code is available at https://github.com/zhangzm0128/CCGR.

PaperID: 619,

Authors: Yiqun Zhang, Zhenyue Qin, Saeed Anwar, Dongwoo Kim, Yang Liu, Pan Ji, Tom Gedeon

Affiliations: School of Computing, Australian National University (ANU), Canberra, ACT, Australia; Information and Computer Science Department, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran, Saudi Arabia; Department of Computer Science and Engineering and the Graduate School of Artificial Intelligence, Pohang University of Science and Technology, Pohang, South Korea; Seeing Machines, Fyshwick, ACT, Australia; OPPO US Research Center, Palo Alto, CA, USA

Title: Position-Sensing Graph Neural Networks: Proactively Learning Nodes Relative Positions

Abstract:
Most existing graph neural networks (GNNs) learn node embeddings using the framework of message passing and aggregation. Such GNNs are incapable of learning relative positions between graph nodes within a graph. To empower GNNs with the awareness of node positions, some nodes are set as anchors. Then, using the distances from a node to the anchors, GNNs can infer relative positions between nodes. However, position-aware GNNs (P-GNNs) arbitrarily select anchors, leading to compromising position awareness and feature extraction. To eliminate this compromise, we demonstrate that selecting evenly distributed and asymmetric anchors is essential. On the other hand, we show that choosing anchors that can aggregate embeddings of all the nodes within a graph is NP-complete. Therefore, devising efficient optimal algorithms in a deterministic approach is practically not feasible. To ensure position awareness and bypass NP-completeness, we propose position-sensing GNNs (PSGNNs), learning how to choose anchors in a backpropagatable fashion. Experiments verify the effectiveness of PSGNNs against state-of-the-art GNNs, substantially improving performance on various synthetic and real-world graph datasets while enjoying stable scalability. Specifically, PSGNNs on average boost area under the curve (AUC) more than 14% for pairwise node classification and 18% for link prediction over the existing state-of-the-art position-aware methods. Our source code is publicly available at: https://github.com/ZhenyueQin/PSGNN.

PaperID: 620,

Authors: Christos Kyrkou

Affiliations: KIOS Research and Innovation Center of Excellence, University of Cyprus, Nicosia, Cyprus

Title: Toward Efficient Convolutional Neural Networks With Structured Ternary Patterns

Abstract:
High-efficiency deep learning (DL) models are necessary not only to facilitate their use in devices with limited resources but also to improve resources required for training. Convolutional neural networks (ConvNets) typically exert severe demands on local device resources and this conventionally limits their adoption within mobile and embedded platforms. This brief presents work toward utilizing static convolutional filters generated from the space of local binary patterns (LBPs) and Haar features to design efficient ConvNet architectures. These are referred to as Structured Ternary Patterns (STePs) and can be generated during network initialization in a systematic way instead of having learnable weight parameters thus reducing the total weight updates. The ternary values require significantly less storage and with the appropriate low-level implementation, can also lead to inference improvements. The proposed approach is validated using four image classification datasets, demonstrating that common network backbones can be made more efficient and provide competitive results. It is also demonstrated that it is possible to generate completely custom STeP-based networks that provide good trade-offs for on-device applications such as unmanned aerial vehicle (UAV)-based aerial vehicle detection. The experimental results show that the proposed method maintains high detection accuracy while reducing the trainable parameters by 40%–80%. This work motivates further research toward good priors for nonlearnable weights that can make DL architectures more efficient without having to alter the network during or after training.

PaperID: 621,

Authors: Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

Affiliations: Department of Electrical and Computer Engineering, University of California at Santa Barbara, Santa Barbara, CA, USA

Title: Asymptotic Behavior of Adversarial Training in Binary Linear Classification

Abstract:
Adversarial training using empirical risk minimization (ERM) is the state-of-the-art method for defense against adversarial attacks, that is, against small additive adversarial perturbations applied to test data leading to misclassification. Despite being successful in practice, understanding the generalization properties of adversarial training in classification remains widely open. In this article, we take the first step in this direction by precisely characterizing the robustness of adversarial training in binary linear classification. Specifically, we consider the high-dimensional regime where the model dimension grows with the size of the training set at a constant ratio. Our results provide exact asymptotics for both standard and adversarial test errors under general \ell _q -norm bounded perturbations ( q \ge 1 ) in both discriminative binary models and generative Gaussian-mixture models with correlated features. We use our sharp error formulae to explain how the adversarial and standard errors depend upon the over-parameterization ratio, the data model, and the attack budget. Finally, by comparing with the robust Bayes estimator, our sharp asymptotics allow us to study the fundamental limits of adversarial training.

PaperID: 622,

Authors: Yuxin Dong, Tieliang Gong, Hong Chen, Chen Li

Affiliations: School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi, China; College of Science, Huazhong Agriculture University, Wuhan, China

Title: Efficient Approximations for Matrix-Based Rényi's Entropy on Sequential Data

Abstract:
The matrix-based Rényi’s entropy (MBRE) has recently been introduced as a substitute for the original Rényi’s entropy that could be directly obtained from data samples, avoiding the expensive intermediate step of density estimation. Despite its remarkable success in a broad of information-related tasks, the computational cost of MBRE, however, becomes a bottleneck for large-scale applications. The challenge, when facing sequential data, is further amplified due to the requirement of large-scale eigenvalue decomposition on multiple dense kernel matrices constructed by sliding windows in the region of interest, resulting in O(mn^3) overall time complexity, where m and n denote the number and the size of windows, respectively. To overcome this issue, we adopt the static MBRE estimator together with a variance reduction criterion to develop randomized approximations for the target entropy, leading to high accuracy with substantially lower query complexity by utilizing the historical estimation results. Specifically, assuming that the changes of adjacent sliding windows are bounded by \beta \ll 1 , which is a trivial case in domains, e.g., time-series analysis, we lower the complexity by a factor of \sqrt \beta . Polynomial approximation techniques are further adopted to support arbitrary \alpha orders. In general, our algorithms achieve O(mn^2\sqrt \beta st) total computational complexity, where s, t \ll n denote the number of vector queries and the polynomial degrees, respectively. Theoretical upper and lower bounds are established in terms of the convergence rate for both s and t , and large-scale experiments on both simulation and real-world data are conducted to validate the effectiveness of our algorithms. The results show that our methods achieve promising speedup with only a trivial loss in performance.

PaperID: 623,

Authors: Xin Luo, Zhibin Li, Wenbin Yue, Shuai Li

Affiliations: College of Computer and Information Science, Southwest University, Chongqing, China; School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China; China North Vehicle Research Institute, Beijing, China; Faculty of Information Technology and Electrical Engineering, University of Oulu, Oulu, Finland

Title: A Calibrator Fuzzy Ensemble for Highly-Accurate Robot Arm Calibration

Abstract:
The absolute positioning accuracy of an industrial robot arm is vital for advancing manufacturing-related applications like automatic assembly, which can be improved via the data-driven approaches to robot arm calibration. Existing data-driven calibrators have illustrated their efficiency in addressing the issue of robot arm calibration. However, they mostly are single learning models that can be easily affected by the insufficient representation of the solution space, therefore, suffering from the calibration accuracy loss. To address this issue, this study proposes a calibrator fuzzy ensemble (CFE) with twofold ideas: 1) implementing eight data-driven calibrators relying on different sophisticated machine learning algorithms for an industrial robot arm, which guarantees the accuracy of individual base models and 2) innovatively developing a fuzzy ensemble of the obtained eight diversified calibrators to obtain impressively high calibration accuracy for an industrial robot arm. Extensive experiments on an ABB IRB120 industrial robot implemented with MATLAB demonstrate that compared with state-of-the-art calibrators, CFE decreases the maximum error at 8.59%. Hence, it has great potential for real applications.

PaperID: 624,

Authors: Haitong Ma, Changliu Liu, Shengbo Eben Li, Sifa Zheng, Wenchao Sun, Jianyu Chen

Affiliations: State Key Laboratory of Automotive Safety and Energy, School of Vehicle and Mobility, and the Center for Intelligent Connected Vehicles and Transportation, Tsinghua University, Beijing, China; Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA; Institute of Interdisciplinary Information Science, Tsinghua University, Beijing, China

Title: Learn Zero-Constraint-Violation Safe Policy in Model-Free Constrained Reinforcement Learning

Abstract:
We focus on learning the zero-constraint-violation safe policy in model-free reinforcement learning (RL). Existing model-free RL studies mostly use the posterior penalty to penalize dangerous actions, which means they must experience the danger to learn from the danger. Therefore, they cannot learn a zero-violation safe policy even after convergence. To handle this problem, we leverage the safety-oriented energy functions to learn zero-constraint-violation safe policies and propose the safe set actor-critic (SSAC) algorithm. The energy function is designed to increase rapidly for potentially dangerous actions, locating the safe set on the action space. Therefore, we can identify the dangerous actions prior to taking them and achieve zero-constraint violation. Our major contributions are twofold. First, we use the data-driven methods to learn the energy function, which releases the requirement of known dynamics. Second, we formulate a constrained RL problem to solve the zero-violation policies. We prove that our Lagrangian-based constrained RL solutions converge to the constrained optimal zero-violation policies theoretically. The proposed algorithm is evaluated on the complex simulation environments and a hardware-in-loop (HIL) experiment with a real autonomous vehicle controller. Experimental results suggest that the converged policies in all environments achieve zero-constraint violation and comparable performance with model-based baseline.

PaperID: 625,

Authors: Rui Wang, Deyu Zhou, Haiping Huang, Yongquan Zhou

Affiliations: School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China; School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, China; School of Computer Science and the Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing University of Posts and Telecommunications, Nanjing, China; Institute of Artificial Intelligence, Guangxi University for Nationalities, Nanning, China

Title: MIT: Mutual Information Topic Model for Diverse Topic Extraction

Abstract:
To automatically mine structured semantic topics from text, neural topic modeling has arisen and made some progress. However, most existing work focuses on designing a mechanism to enhance topic coherence but sacrificing the diversity of the extracted topics. To address this limitation, we propose the first neural-based topic modeling approach purely based on mutual information maximization, called the mutual information topic (MIT) model, in this article. The proposed MIT significantly improves topic diversity by maximizing the mutual information between word distribution and topic distribution. Meanwhile, MIT also utilizes Dirichlet prior in latent topic space to ensure the quality of mined topics. The experimental results on three publicly benchmark text corpora show that MIT could extract topics with higher coherence values (considering four topic coherence metrics) than competitive approaches and has a significant improvement on topic diversity metric. Besides, our experiments prove that the proposed MIT converges faster and more stable than adversarial-neural topic models.

PaperID: 626,

Authors: Jiabin Liu, Huadong Wang, Hanyuan Hang, Shumin Ma, Xin Shen, Yong Shi

Affiliations: School of Information and Electronics, Beijing Institute of Technology, Beijing, China; ModelBest Inc., Beijing, China; Department of Applied Mathematics, University of Twente, Enschede, The Netherlands; Guangdong Provincial Key Laboratory IRADS and the Department of Mathematical Science, BNU-HKBU United International College, Zhuhai, China; Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China; School of Economics and Management, University of Chinese Academy of Sciences, Beijing, China

Title: Self-Supervised Random Forest on Transformed Distribution for Anomaly Detection

Abstract:
Anomaly detection, the task of differentiating abnormal data points from normal ones, presents a significant challenge in the realm of machine learning. Numerous strategies have been proposed to tackle this task, with classification-based methods, specifically those utilizing a self-supervised approach via random affine transformations (RATs), demonstrating remarkable performance on both image and non-image data. However, these methods encounter a notable bottleneck, the overlap of constructed labeled datasets across categories, which hampers the subsequent classifiers’ ability to detect anomalies. Consequently, the creation of an effective data distribution becomes the pivotal factor for success. In this article, we introduce a model called “self-supervised forest (sForest),” which leverages the random Fourier transform (RFT) and random orthogonal rotations to craft a controlled data distribution. Our model utilizes the RFT to map input data into a new feature space. With this transformed data, we create a self-labeled training dataset using random orthogonal rotations. We theoretically prove that the data distribution formulated by our methodology is more stable compared to one derived from RATs. We then use the self-labeled dataset in a random forest (RF) classifier to distinguish between normal and anomalous data points. Comprehensive experiments conducted on both real and artificial datasets illustrate that sForest outperforms other anomaly detection methods, including distance-based, kernel-based, forest-based, and network-based benchmarks.

PaperID: 627,

Authors: Youfa Liu, Bo Du

Affiliations: School of Computer Science, National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence, Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, China

Title: Frequency Domain-Oriented Complex Graph Neural Networks for Graph Classification

Abstract:
Graph neural networks (GNNs) could directly deal with the data of graph structure. Current GNNs are confined to the spatial domain and learn real low-dimensional embeddings in graph classification tasks. In this article, we explore frequency domain-oriented complex GNNs in which the node’s embedding in each layer is a complex vector. The difficulty lies in the design of graph pooling and we propose a mirror-connected design with two crucial problems: parameter reduction problem and complex gradient backpropagation problem. To deal with the former problem, we propose the notion of squared singular value pooling (SSVP) and prove that the representation power of SSVP followed by a fully connected layer with nonnegative weights is exactly equivalent to that of a mirror-connected layer. To resolve the latter problem, we provide an alternative feasible method to solve singular values of complex embeddings with a theoretical guarantee. Finally, we propose a mixture of pooling strategies in which first-order statistics information is employed to enrich the last low-dimensional representation. Experiments on benchmarks demonstrate the effectiveness of the complex GNNs with mirror-connected layers.

PaperID: 628,

Authors: Zhenyu Li, Yazhong Luo

Affiliations: College of Aerospace Science and Engineering, National University of Defense Technology, Changsha, China

Title: Deep Reinforcement Learning for Nash Equilibrium of Differential Games

Abstract:
Nash equilibrium is a significant solution concept representing the optimal strategy in an uncooperative multiagent system. This study presents two deep reinforcement learning (DRL) algorithms for solving the Nash equilibrium of differential games. Both algorithms are built upon the distributed distributional deep deterministic policy gradient (D4PG) algorithm, which is a one-sided learning method. We modified it to a two-sided adversarial learning method. The first is D4PG for games (D4P2G), which directly applies an adversarial play framework based on the D4PG. A simultaneous policy gradient descent (SPGD) method is employed to optimize the policies of the players with conflicting objectives. The second is the distributional deep deterministic symplectic policy gradient (D4SPG) algorithm, which is our main contribution. More specifically, it newly designs a minimax learning framework that combines the critics of the two players and proposes a symplectic policy gradient adjustment method to find a better policy gradient. Simulations show that both algorithms converge to the Nash equilibrium in most cases, but D4SPG can learn the Nash equilibrium more accurately and efficiently, especially in Hamiltonian games. Moreover, it can handle games with complex dynamics, which is challenging for traditional methods.

PaperID: 629,

Authors: Bing Yan, Peng Shi, Chee Peng Lim, Yuan Sun, Ramesh K. Agarwal

Affiliations: School of Electrical and Mechanical Engineering, The University of Adelaide, Adelaide, SA, Australia; Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC, Australia; School of Mechanical and Electrical Engineering, Soochow University, Suzhou, Jiangsu, China; Department of Mechanical Engineering, Washington University in St. Louis Campus, St. Louis, MO, USA

Title: Security and Safety-Critical Learning-Based Collaborative Control for Multiagent Systems

Abstract:
This article presents a novel learning-based collaborative control framework to ensure communication security and formation safety of nonlinear multiagent systems (MASs) subject to denial-of-service (DoS) attacks, model uncertainties, and barriers in environments. The framework has a distributed and decoupled design at the cyber-layer and the physical layer. A resilient control Lyapunov function-quadratic programming (RCLF-QP)-based observer is first proposed to achieve secure reference state estimation under DoS attacks at the cyber-layer. Based on deep reinforcement learning (RL) and control barrier function (CBF), a safety-critical formation controller is designed at the physical layer to ensure safe collaborations between uncertain agents in dynamic environments. The framework is applied to autonomous vehicles for area scanning formations with barriers in environments. The comparative experimental results demonstrate that the proposed framework can effectively improve the resilience and robustness of the system.

PaperID: 630,

Authors: Di Wu, Zechao Li, Zhikai Yu, Yi He, Xin Luo

Affiliations: College of Computer and Information Science, Southwest University, Chongqing, China; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China; School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China; Department of Computer Science, Old Dominion University, Norfolk, VA, USA

Title: Robust Low-Rank Latent Feature Analysis for Spatiotemporal Signal Recovery

Abstract:
Wireless sensor network (WSN) is an emerging and promising developing area in the intelligent sensing field. Due to various factors like sudden sensors breakdown or saving energy by deliberately shutting down partial nodes, there are always massive missing entries in the collected sensing data from WSNs. Low-rank matrix approximation (LRMA) is a typical and effective approach for pattern analysis and missing data recovery in WSNs. However, existing LRMA-based approaches ignore the adverse effects of outliers inevitably mixed with collected data, which may dramatically degrade their recovery accuracy. To address this issue, this article innovatively proposes a latent feature analysis (LFA) based spatiotemporal signal recovery (STSR) model, named LFA-STSR. Its main idea is twofold: 1) incorporating the spatiotemporal correlation into an LFA model as the regularization constraint to improve its recovery accuracy and 2) aggregating the L_1 -norm into the loss part of an LFA model to improve its robustness to outliers. As such, LFA-STSR can accurately recover missing data based on partially observed data mixed with outliers in WSNs. To evaluate the proposed LFA-STSR model, extensive experiments have been conducted on four real-world WSNs datasets. The results demonstrate that LFA-STSR significantly outperforms the related six state-of-the-art models in terms of both recovery accuracy and robustness to outliers.

PaperID: 631,

Authors: Xindi Yang, Hao Zhang, Zhuping Wang, Shun-Feng Su

Affiliations: Department of Control Science and Engineering, Tongji University, Shanghai, China; Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan

Title: Learning Robust Predictive Control: A Spatial-Temporal Game Theoretic Approach

Abstract:
This article investigates robust predictive control problem for unknown dynamical systems. Since the dynamics unavailability restricts feasibility of model-driven methods, learning robust predictive control (LRPC) framework is developed from the aspect of time consistency. Under feedback-like control causality, the robust predictive control is then reconstructed as spatial–temporal games, and we guarantee stability through time-consistent Nash equilibrium. For gradation clarity, our framework is specified as four-follow contents. First, multistep feedback-like control causality is drawn from time series analysis, and Takens’ theorem provides theoretical support from steady-state property. Second, control problem is reconstructed as games, while performance and robustness partition the game into temporal nonzero-sum subgames and spatial zero-sum ones, respectively. Next, multistep reinforcement learning (RL) is designed to solve robust predictive control without system model. Convergence is proven through bounds analysis of oscillatory value functions, and properties of receding horizon are derived from time consistency. Finally, data-driven implementation is given with function approximation, and neural networks are chosen to approximate value functions and feedback-like causality. Weights are estimated with least squares errors. Numerical results verify the effectiveness.

PaperID: 632,

Authors: Mingyue Guo, Binghui Chen, Zhaoyi Yan, Yaowei Wang, Qixiang Ye

Affiliations: School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing, China; DAMO Academy, Alibaba, Beijing, China; Peng Cheng Laboratory, Shenzhen, China

Title: Virtual Classification: Modulating Domain-Specific Knowledge for Multidomain Crowd Counting

Abstract:
Multidomain crowd counting aims to learn a general model for multiple diverse datasets. However, deep networks prefer modeling distributions of the dominant domains instead of all domains, which is known as domain bias. In this study, we propose a simple-yet-effective modulating domain-specific knowledge network (MDKNet) to handle the domain bias issue in multidomain crowd counting. MDKNet is achieved by employing the idea of “modulating,” enabling deep network balancing and modeling different distributions of diverse datasets with little bias. Specifically, we propose an instance-specific batch normalization (IsBN) module, which serves as a base modulator to refine the information flow to be adaptive to domain distributions. To precisely modulating the domain-specific information, the domain-guided virtual classifier (DVC) is then introduced to learn a domain-separable latent space. This space is employed as an input guidance for the IsBN modulator, such that the mixture distributions of multiple datasets can be well treated. Extensive experiments performed on popular benchmarks, including Shanghai-tech A/B, QNRF, and NWPU validate the superiority of MDKNet in tackling multidomain crowd counting and the effectiveness for multidomain learning. Code is available at https://github.com/csguomy/MDKNet.

PaperID: 633,

Authors: Wenxuan Ma, Xing Yan, Kun Zhang

Affiliations: Institute of Statistics and Big Data, Renmin University of China, Beijing, China

Title: Improving Uncertainty Quantification of Variance Networks by Tree-Structured Learning

Abstract:
To improve the uncertainty quantification of variance networks, we propose a novel tree-structured local neural network model that partitions the feature space into multiple regions based on uncertainty heterogeneity. A tree is built upon giving the training data, whose leaf nodes represent different regions where region-specific neural networks are trained to predict both the mean and the variance for quantifying uncertainty. The proposed uncertainty-splitting neural regression tree (USNRT) employs novel splitting criteria. At each node, a neural network is trained on the full data first, and a statistical test for the residuals is conducted to find the best split, corresponding to the two subregions with the most significant uncertainty heterogeneity between them. USNRT is computationally friendly, because very few leaf nodes are sufficient and pruning is unnecessary. Furthermore, an ensemble version can be easily constructed to estimate the total uncertainty, including the aleatory and epistemic. On extensive UCI datasets, USNRT or its ensemble shows superior performance compared to some recent popular methods for quantifying uncertainty with variances. Through comprehensive visualization and analysis, we uncover how USNRT works and show its merits, revealing that uncertainty heterogeneity does exist in many datasets and can be learned by USNRT.

PaperID: 634,

Authors: Zeqiang Lai, Ying Fu, Jun Zhang

Affiliations: School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China; MIIT Key Laboratory of Complex-Field Intelligent Exploration, Beijing Institute of Technology, Beijing, China

Title: Hyperspectral Image Super Resolution With Real Unaligned RGB Guidance

Abstract:
Fusion-based hyperspectral image (HSI) super-resolution has become increasingly prevalent for its capability to integrate high-frequency spatial information from the paired high-resolution (HR) RGB reference (Ref-RGB) image. However, most of the existing methods either heavily rely on the accurate alignment between low-resolution (LR) HSIs and RGB images or can only deal with simulated unaligned RGB images generated by rigid geometric transformations, which weakens their effectiveness for real scenes. In this article, we explore the fusion-based HSI super-resolution with real Ref-RGB images that have both rigid and nonrigid misalignments. To properly address the limitations of existing methods for unaligned reference images, we propose an HSI fusion network (HSIFN) with heterogeneous feature extractions, multistage feature alignments, and attentive feature fusion. Specifically, our network first transforms the input HSI and RGB images into two sets of multiscale features with an HSI encoder and an RGB encoder, respectively. The features of Ref-RGB images are then processed by a multistage alignment module to explicitly align the features of Ref-RGB with the LR HSI. Finally, the aligned features of Ref-RGB are further adjusted by an adaptive attention module to focus more on discriminative regions before sending them to the fusion decoder to generate the reconstructed HR HSI. Additionally, we collect a real-world HSI fusion dataset, consisting of paired HSI and unaligned Ref-RGB, to support the evaluation of the proposed model for real scenes. Extensive experiments are conducted on both simulated and our real-world datasets, and it shows that our method obtains a clear improvement over existing single-image and fusion-based super-resolution methods on quantitative assessment as well as visual comparison. The code and dataset are publicly available at https://zeqiang-lai.github.io/HSI-RefSR/.

PaperID: 635,

Authors: Heyan Chai, Jinhao Cui, Siyu Tang, Ye Ding, Xinwang Liu, Binxing Fang, Qing Liao

Affiliations: Department of Computer Science and Technology, Harbin Institute of Technology, Guangdong, Shenzhen, China; School of Cyberspace Security, Dongguan University of Technology, Dongguan, China; College of Computer, National University of Defense Technology, Changsha, China

Title: MG-SIN: Multigraph Sparse Interaction Network for Multitask Stance Detection

Abstract:
Stance detection on social media aims to identify if an individual is in support of or against a specific target. Most existing stance detection approaches primarily rely on modeling the contextual semantic information in sentences and neglect to explore the pragmatics dependency information of words, thus degrading performance. Although several single-task learning methods have been proposed to capture richer semantic representation information, they still suffer from semantic sparsity problems caused by short texts on social media. This article proposes a novel multigraph sparse interaction network (MG-SIN) by using multitask learning (MTL) to identify the stances and classify the sentiment polarities of tweets simultaneously. Our basic idea is to explore the pragmatics dependency relationship between tasks at the word level by constructing two types of heterogeneous graphs, including task-specific and task-related graphs (tr-graphs), to boost the learning of task-specific representations. A graph-aware module is proposed to adaptively facilitate information sharing between tasks via a novel sparse interaction mechanism among heterogeneous graphs. Through experiments on two real-world datasets, compared with the state-of-the-art baselines, the extensive results exhibit that MG-SIN achieves competitive improvements of up to 2.1% and 2.42% for the stance detection task, and 5.26% and 3.93% for the sentiment analysis task, respectively.

PaperID: 636,

Authors: Geng Chen, Qingyue Wang, Bo Dong, Ruitao Ma, Nian Liu, Huazhu Fu, Yong Xia

Affiliations: National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, China; Center for Brain Imaging Science and Technology, Zhejiang University, Hangzhou, China; Computer Vision Department, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates; Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Fusionopolis, Singapore

Title: EM-Trans: Edge-Aware Multimodal Transformer for RGB-D Salient Object Detection

Abstract:
RGB-D salient object detection (SOD) has gained tremendous attention in recent years. In particular, transformer has been employed and shown great potential. However, existing transformer models usually overlook the vital edge information, which is a major issue restricting the further improvement of SOD accuracy. To this end, we propose a novel edge-aware RGB-D SOD transformer, called EM-Trans, which explicitly models the edge information in a dual-band decomposition framework. Specifically, we employ two parallel decoder networks to learn the high-frequency edge and low-frequency body features from the low- and high-level features extracted from a two-steam multimodal backbone network, respectively. Next, we propose a cross-attention complementarity exploration module to enrich the edge/body features by exploiting the multimodal complementarity information. The refined features are then fed into our proposed color-hint guided fusion module for enhancing the depth feature and fusing the multimodal features. Finally, the resulting features are fused using our deeply supervised progressive fusion module, which progressively integrates edge and body features for predicting saliency maps. Our model explicitly considers the edge information for accurate RGB-D SOD, overcoming the limitations of existing methods and effectively improving the performance. Extensive experiments on benchmark datasets demonstrate that EM-Trans is an effective RGB-D SOD framework that outperforms the current state-of-the-art models, both quantitatively and qualitatively. A further extension to RGB-T SOD demonstrates the promising potential of our model in various kinds of multimodal SOD tasks.

PaperID: 637,

Authors: Zhirong Luan, Wenrui Li, Meiqin Liu, Badong Chen

Affiliations: School of Electrical Engineering, Xi’an University of Technology, Xi’an, China; National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and the Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, China

Title: Robust Federated Learning: Maximum Correntropy Aggregation Against Byzantine Attacks

Abstract:
As an emerging decentralized machine learning technique, federated learning organizes collaborative training and preserves the privacy and security of participants. However, untrustworthy devices, typically Byzantine attackers, pose a significant challenge to federated learning since they can upload malicious parameters to corrupt the global model. To defend against such attacks, we propose a novel robust aggregation method—maximum correntropy aggregation (MCA), which applies the maximum correntropy criterion (MCC) to derive a central value from parameters. Different from the previous use of MCC for denoising, we utilize it as a similarity metric to measure parameter distribution and aggregate a robust center. Correntropy in MCC, with all even-order moments of the parameter, contains high-order statistical properties, which allows for a comprehensive capture of parameter characteristics, thus helping to prevent interference from attackers. Meanwhile, correntropy extracts information from the parameters themselves, without requiring the proportion of malicious attackers. Through the fixed-point iteration, we solve the optimization objective, demonstrating the linear convergence of the iteration formula. Theoretical analysis reveals the robustness aggregation property of MCA and the error bound between MCA and the global optimal solution, with linear convergence to the optimal solution neighborhood. By performing independent identically distribution (IID) and non-IID experiments on three different datasets, we show that MCA exhibits significant robustness under mainstream attacks, whereas other methods cannot withstand all of them.

PaperID: 638,

Authors: Lei Meng, Zhuang Qi, Lei Wu, Xiaoyu Du, Zhaochuan Li, Lizhen Cui, Xiangxu Meng

Affiliations: School of Software, Shandong University, Jinan, China; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China; Inspur, Jinan, China

Title: Improving Global Generalization and Local Personalization for Federated Learning

Abstract:
Federated learning aims to facilitate collaborative training among multiple clients with data heterogeneity in a privacy-preserving manner, which either generates the generalized model or develops personalized models. However, existing methods typically struggle to balance both directions, as optimizing one often leads to failure in another. To address the problem, this article presents a method named personalized federated learning via cross silo prototypical calibration (pFedCSPC) to enhance the consistency of knowledge of clients by calibrating features from heterogeneous spaces, which contributes to enhancing the collaboration effectiveness between clients. Specifically, pFedCSPC employs an adaptive aggregation method to offer personalized initial models to each client, enabling rapid adaptation to personalized tasks. Subsequently, pFedCSPC learns class representation patterns on clients by clustering, averages the representations within each cluster to form local prototypes, and aggregates them on the server to generate global prototypes. Meanwhile, pFedCSPC leverages global prototypes as knowledge to guide the learning of local representation, which is beneficial for mitigating the data imbalanced problem and preventing overfitting. Moreover, pFedCSPC has designed a cross-silo prototypical calibration (CSPC) module, which utilizes contrastive learning techniques to map heterogeneous features from different sources into a unified space. This can enhance the generalization ability of the global model. Experiments were conducted on four datasets in terms of performance comparison, ablation study, in-depth analysis, and case study, and the results verified that pFedCSPC achieves improvements in both global generalization and local personalization performance via calibrating cross-source features and strengthening collaboration effectiveness, respectively.

PaperID: 639,

Authors: Aditya Pribadi Kalapaaking, Ibrahim Khalil, Xun Yi, Kwok-Yan Lam, Guang-Bin Huang, Ning Wang

Affiliations: School of Computing Technologies, RMIT University, Melbourne, Australia; College of Computing and Data Science, Nanyang Avenue, Singapore; School of Automation, Southeast University, Nanjing, China; Chongqing College of Mobile Communication, Chongqing, China

Title: Auditable and Verifiable Federated Learning Based on Blockchain-Enabled Decentralization

Abstract:
Auditability and verifiability are critical elements in establishing trustworthiness in federated learning (FL). These principles promote transparency, accountability, and independent validation of FL processes. Incorporating auditability and verifiability is imperative for building trust and ensuring the robustness of FL methodologies. Typical FL architectures rely on a trustworthy central authority to manage the FL process. However, reliance on a central authority could become a single point of failure, making it an attractive target for cyber-attacks and insider frauds. Moreover, the central entity lacks auditability and verifiability, which undermines the privacy and security that FL aims to ensure. This article proposes an auditable and verifiable decentralized FL (DFL) framework. We first develop a smart-contract-based monitoring system for DFL participants. This monitoring system is then deployed to each DFL participant and executed when the local model training is initiated. The monitoring system records necessary information during the local training process for auditing purposes. Afterward, each DFL participant sends the local model and monitoring system to the respective blockchain node. The blockchain nodes representing each DFL participant exchange the local models and use the monitoring system to validate each local model. To ensure an auditable and verifiable decentralized aggregation procedure, we record the aggregation steps taken by each blockchain node in the aggregation contract. Following the aggregation phase, each blockchain node applies a multisignature scheme to the aggregated model, producing a globally verifiable model. Based on the signed global model and the aggregation contract, each blockchain node implements a consensus protocol to store the validated global model in tamper-proof storage. To evaluate the performance of our proposed model, we conducted a series of experiments with different machine learning architectures and datasets, including CIFAR-10, F-MNIST, and MedMNIST. The experimental results indicate a slight increase in time consumption compared with the state-of-the-art, serving as a tradeoff to ensure auditability and verifiability. The proposed blockchain-enabled DFL also saves up to 95% communication costs for the participant side.

PaperID: 640,

Authors: Lili Guo, Shifei Ding, Longbiao Wang, Jianwu Dang

Affiliations: School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China; Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin, China

Title: DSTCNet: Deep Spectro-Temporal-Channel Attention Network for Speech Emotion Recognition

Abstract:
Speech emotion recognition (SER) plays an important role in human–computer interaction, which can provide better interactivity to enhance user experiences. Existing approaches tend to directly apply deep learning networks to distinguish emotions. Among them, the convolutional neural network (CNN) is the most commonly used method to learn emotional representations from spectrograms. However, CNN does not explicitly model features’ associations in the spectral-, temporal-, and channel-wise axes or their relative relevance, which will limit the representation learning. In this article, we propose a deep spectro-temporal-channel network (DSTCNet) to improve the representational ability for speech emotion. The proposed DSTCNet integrates several spectro-temporal-channel (STC) attention modules into a general CNN. Specifically, we propose the STC module that infers a 3-D attention map along the dimensions of time, frequency, and channel. The STC attention can focus more on the regions of crucial time frames, frequency ranges, and feature channels. Finally, experiments were conducted on the Berlin emotional database (EmoDB) and interactive emotional dyadic motion capture (IEMOCAP) databases. The results reveal that our DSTCNet can outperform the traditional CNN-based and several state-of-the-art methods.

PaperID: 641,

Authors: Shu Yang, Lu Zhang, Shuai Liu, Huchuan Lu, Hao Chen

Affiliations: Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China; Department of Artificial Intelligence, Dalian University of Technology, Dalian, China; School of Artificial Intelligence and Data Science, Hebei University of Technology, Tianjin, China; School of Information and Communication Engineering, Dalian University of Technology, Dalian, China; Department of Computer Science and Engineering and the Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong, China

Title: Real-Time Semantic Segmentation via a Densely Aggregated Bilateral Network

Abstract:
With the growing demands of applications on online devices, the speed-accuracy trade-off is critical in the semantic segmentation system. Recently, the bilateral segmentation network has shown promising capacity to achieve the balance between favorable accuracy and fast speed, and has become the mainstream backbone in real-time semantic segmentation. Segmentation of target objects relies on high-level semantics, whereas it requires detailed low-level features to model specific local patterns for accurate location. However, the lightweight backbone of bilateral architecture limits the extraction of semantic context and spatial details. And the late fusion of the bilateral streams incurs the insufficient aggregation of semantic context and spatial details. In this article, we propose a densely aggregated bilateral network (DAB-Net) for real-time semantic segmentation. In the context path, a patchwise context enhancement (PCE) module is proposed to efficiently capture the local semantic contextual information from spatialwise and channelwise, respectively. Meanwhile, a context-guided spatial path (CGSP) is designed to exploit more spatial information by encoding finer details from the raw image and the transition from the context path. Finally, with multiple interactions between bilateral branches, the intertwined outputs from bilateral streams are combined in a unified decoder for a final interaction to further enhance the feature representation, which generates the final segmentation prediction. Experimental results on three public benchmarks demonstrate that our proposed method achieves higher accuracy with a limited decay in speed, which performs favorably against state-of-the-art real-time approaches and runs at 31.1 frames/s (FPS) on the high resolution of 2048 × 1024 . The source code is released at https://github.com/isyangshu/DABNet.

PaperID: 642,

Authors: Luying Zhong, Zhaoliang Chen, Zhihao Wu, Shide Du, Zheyi Chen, Shiping Wang

Affiliations: College of Computer and Data Science and the Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou University, Fuzhou, China

Title: Learnable Graph Convolutional Network With Semisupervised Graph Information Bottleneck

Abstract:
Graph convolutional network (GCN) has gained widespread attention in semisupervised classification tasks. Recent studies show that GCN-based methods have achieved decent performance in numerous fields. However, most of the existing methods generally adopted a fixed graph that cannot dynamically capture both local and global relationships. This is because the hidden and important relationships may not be directed exhibited in the fixed structure, causing the degraded performance of semisupervised classification tasks. Moreover, the missing and noisy data yielded by the fixed graph may result in wrong connections, thereby disturbing the representation learning process. To cope with these issues, this article proposes a learnable GCN-based framework, aiming to obtain the optimal graph structures by jointly integrating graph learning and feature propagation in a unified network. Besides, to capture the optimal graph representations, this article designs dual-GCN-based meta-channels to simultaneously explore local and global relations during the training process. To minimize the interference of the noisy data, a semisupervised graph information bottleneck (SGIB) is introduced to conduct the graph structural learning (GSL) for acquiring the minimal sufficient representations. Concretely, SGIB aims to maximize the mutual information of both the same and different meta-channels by designing the constraints between them, thereby improving the node classification performance in the downstream tasks. Extensive experimental results on real-world datasets demonstrate the robustness of the proposed model, which outperforms state-of-the-art methods with fixed-structure graphs.

PaperID: 643,

Authors: Yanbing Mao, Yuliang Gu, Lui Sha, Huajie Shao, Qixin Wang, Tarek F. Abdelzaher

Affiliations: Engineering Technology Division, Wayne State University, Detroit, MI, USA; Department of Mechanical Engineering, University of Illinois at Urbana–Champaign, Urbana, IL, USA; Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA; Department of Computer Science, College of William and Mary, Williamsburg, VA, USA; Department of Computing, Hong Kong Polytechnic University, Hong Kong, SAR, China

Title: Phy-Taylor: Partially Physics-Knowledge-Enhanced Deep Neural Networks via NN Editing

Abstract:
Purely data-driven deep neural networks (DNNs) applied to physical engineering systems can infer relations that violate physics laws, thus leading to unexpected consequences. To address this challenge, we propose a physics-knowledge-enhanced DNN framework called Phy-Taylor, accelerating learning-compliant representations with physics knowledge. The Phy-Taylor framework makes two key contributions; it introduces a new architectural physics-compatible neural network (PhN) and features a novel compliance mechanism, which we call physics-guided neural network (NN) editing. The PhN aims to directly capture nonlinear physical quantities, such as kinetic energy, electrical power, and aerodynamic drag force. To do so, the PhN augments NN layers with two key components: 1) monomials of the Taylor series for capturing physical quantities and 2) a suppressor for mitigating the influence of noise. The NN editing mechanism further modifies network links and activation functions consistently with physics knowledge. As an extension, we also propose a self-correcting Phy-Taylor framework for safety-critical control of autonomous systems, which introduces two additional capabilities: 1) safety relationship learning and 2) automatic output correction when safety violations occur. Through experiments, we show that Phy-Taylor features considerably fewer parameters and a remarkably accelerated training process while offering enhanced model robustness and accuracy.

PaperID: 644,

Authors: Zhiwei Shang, Renxing Li, Chunhua Zheng, Huiyun Li, Yunduan Cui

Affiliations: Systems Hub, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, China; School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, China; CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems and the Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China

Title: Relative Entropy Regularized Sample-Efficient Reinforcement Learning With Continuous Actions

Abstract:
In this article, a novel reinforcement learning (RL) approach, continuous dynamic policy programming (CDPP), is proposed to tackle the issues of both learning stability and sample efficiency in the current RL methods with continuous actions. The proposed method naturally extends the relative entropy regularization from the value function-based framework to the actor–critic (AC) framework of deep deterministic policy gradient (DDPG) to stabilize the learning process in continuous action space. It tackles the intractable softmax operation over continuous actions in the critic by Monte Carlo estimation and explores the practical advantages of the Mellowmax operator. A Boltzmann sampling policy is proposed to guide the exploration of actor following the relative entropy regularized critic for superior learning capability, exploration efficiency, and robustness. Evaluated by several benchmark and real-robot-based simulation tasks, the proposed method illustrates the positive impact of the relative entropy regularization including efficient exploration behavior and stable policy update in RL with continuous action space and successfully outperforms the related baseline approaches in both sample efficiency and learning stability.

PaperID: 645,

Authors: Mengmeng Wang, Jiazheng Xing, Jianbiao Mei, Yong Liu, Yunliang Jiang

Affiliations: Laboratory of Advanced Perception on Robotics and Intelligent Learning, College of Control Science and Engineering, Zhejiang University, Hangzhou, China; School of Computer Science and Technology, Zhejiang Normal University, Jinhua, China

Title: ActionCLIP: Adapting Language-Image Pretrained Models for Video Action Recognition

Abstract:
The canonical approach to video action recognition dictates a neural network model to do a classic and standard 1-of-N majority vote task. They are trained to predict a fixed set of predefined categories, limiting their transferability on new datasets with unseen concepts. In this article, we provide a new perspective on action recognition by attaching importance to the semantic information of label texts rather than simply mapping them into numbers. Specifically, we model this task as a video-text matching problem within a multimodal learning framework, which strengthens the video representation with more semantic language supervision and enables our model to do zero-shot action recognition without any further labeled data or parameters’ requirements. Moreover, to handle the deficiency of label texts and make use of tremendous web data, we propose a new paradigm based on this multimodal learning framework for action recognition, which we dub “pre-train, adapt and fine-tune.” This paradigm first learns powerful representations from pre-training on a large amount of web image-text or video-text data. Then, it makes the action recognition task to act more like pre-training problems via adaptation engineering. Finally, it is fine-tuned end-to-end on target datasets to obtain strong performance. We give an instantiation of the new paradigm, ActionCLIP, which not only has superior and flexible zero-shot/few-shot transfer ability but also reaches a top performance on general action recognition task, achieving 83.8% top-1 accuracy on Kinetics-400 with a ViT-B/16 as the backbone. Code is available at https://github.com/sallymmx/ActionCLIP.git.

PaperID: 646,

Authors: Che Liu, Sibo Cheng, Weiping Ding, Rossella Arcucci

Affiliations: Department of Earth Science and Engineering and the Data Science Institute, Department of Computing, Imperial College London, London, U.K; Department of Computing, Imperial College London, Data Science Institute, London, U.K.; School of Information Science and Technology, Nantong University, Nantong, China; Department of Earth Science and Engineering, Imperial College London, London, U.K

Title: Spectral Cross-Domain Neural Network With Soft-Adaptive Threshold Spectral Enhancement

Abstract:
Electrocardiography (ECG) signals can be considered as multivariable time series (TS). The state-of-the-art ECG data classification approaches, based on either feature engineering or deep learning techniques, treat separately spectral and time domains in machine learning systems. No spectral–time domain communication mechanism inside the classifier model can be found in current approaches, leading to difficulties in identifying complex ECG forms. In this article, we proposed a novel deep learning model named spectral cross-domain neural network (SCDNN) with a new block called soft-adaptive threshold spectral enhancement (SATSE), to simultaneously reveal the key information embedded in spectral and time domains inside the neural network. More precisely, the domain-cross information is captured by a general convolutional neural network (CNN) backbone, and different information sources are merged by a self-adaptive mechanism to mine the connection between time and spectral domains. In SATSE, the knowledge from time and spectral domains is extracted via the fast Fourier transformation (FFT) with soft trainable thresholds in modified sigmoid functions. The proposed SCDNN is tested with several classification tasks implemented on the public ECG databases PTB-XL and CPSC2018. SCDNN outperforms the state-of-the-art approaches with a low computational cost regarding a variety of metrics in all classification tasks on both databases, by finding appropriate domains from the infinite spectral mapping. The convergence of the trainable thresholds in the spectral domain is also numerically investigated in this article. The robust performance of SCDNN provides a new perspective to exploit knowledge across deep learning models from time and spectral domains. The code repository can be found: https://github.com/DL-WG/SCDNN-TS.

PaperID: 647,

Authors: Jiwei Shen, Liang Yuan, Yue Lu, Shujing Lyu

Affiliations: Shanghai Key Laboratory of Multidimensional Information Processing, School of Communication and Electronic Engineering, East China Normal University, Shanghai, China; Beijing Advanced Innovation Center for Soft Matter Science and Engineering, Beijing University of Chemical Technology, Beijing, China

Title: Leveraging Predictions of Task-Related Latents for Interactive Visual Navigation

Abstract:
Interactive visual navigation (IVN) involves tasks where embodied agents learn to interact with the objects in the environment to reach the goals. Current approaches exploit visual features to train a reinforcement learning (RL) navigation control policy network. However, RL-based methods continue to struggle at the IVN tasks as they are inefficient in learning a good representation of the unknown environment in partially observable settings. In this work, we introduce predictions of task-related latents (PTRLs), a flexible self-supervised RL framework for IVN tasks. PTRL learns the latent structured information about environment dynamics and leverages multistep representations of the sequential observations. Specifically, PTRL trains its representation by explicitly predicting the next pose of the agent conditioned on the actions. Moreover, an attention and memory module is employed to associate the learned representation to each action and exploit spatiotemporal dependencies. Furthermore, a state value boost module is introduced to adapt the model to previously unseen environments by leveraging input perturbations and regularizing the value function. Sample efficiency in the training of RL networks is enhanced by modular training and hierarchical decomposition. Extensive evaluations have proved the superiority of the proposed method in increasing the accuracy and generalization capacity.

PaperID: 648,

Authors: Junjie Hu, Chenyou Fan, Liguang Zhou, Qing Gao, Honghai Liu, Tin Lun Lam

Affiliations: Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS), The Chinese University of Hong Kong, Shenzhen, China; School of Artificial Intelligence, South China Normal University, Guangzhou, China; School of Electronics and Communication Engineering, Sun Yat-sen University, Shenzhen, China; State Key Laboratory of Robotics and Systems, Harbin Institute of Technology (Shenzhen), Shenzhen, China

Title: Lifelong-MonoDepth: Lifelong Learning for Multidomain Monocular Metric Depth Estimation

Abstract:
With the rapid advancements in autonomous driving and robot navigation, there is a growing demand for lifelong learning (LL) models capable of estimating metric (absolute) depth. LL approaches potentially offer significant cost savings in terms of model training, data storage, and collection. However, the quality of RGB images and depth maps is sensor-dependent, and depth maps in the real world exhibit domain-specific characteristics, leading to variations in depth ranges. These challenges limit existing methods to LL scenarios with small domain gaps and relative depth map estimation. To facilitate lifelong metric depth learning, we identify three crucial technical challenges that require attention: 1) developing a model capable of addressing the depth scale variation through scale-aware depth learning; 2) devising an effective learning strategy to handle significant domain gaps; and 3) creating an automated solution for domain-aware depth inference in practical applications. Based on the aforementioned considerations, in this article, we present 1) a lightweight multihead framework that effectively tackles the depth scale imbalance; 2) an uncertainty-aware LL solution that adeptly handles significant domain gaps; and 3) an online domain-specific predictor selection method for real-time inference. Through extensive numerical studies, we show that the proposed method can achieve good efficiency, stability, and plasticity, leading the benchmarks by 8%–15%. The code is available at https://github.com/FreeformRobotics/Lifelong-MonoDepth.

PaperID: 649,

Authors: Honggui Han, Jiaqian Wang, Zheng Liu, Hongyan Yang, Junfei Qiao

Affiliations: Faculty of Information Technology, Beijing Key Laboratory of Computational Intelligence and Intelligent System, Engineering Research Center of Digital Community, Ministry of Education, Beijing Artificial Intelligence Institute and Beijing Laboratory For Urban Mass Transit, Beijing University of Technology, Beijing, China

Title: Self-Organizing Robust Fuzzy Neural Network for Nonlinear System Modeling

Abstract:
Fuzzy neural network (FNN) is a structured learning technique that has been successfully adopted in nonlinear system modeling. However, since there exist uncertain external disturbances arising from mismatched model errors, sensor noises, or unknown environments, FNN generally fails to achieve the desirable performance of modeling results. To overcome this problem, a self-organization robust FNN (SOR-FNN) is developed in this article. First, an information integration mechanism (IIM), consisting of partition information and individual information, is introduced to dynamically adjust the structure of SOR-FNN. The proposed mechanism can make itself adapt to uncertain environments. Second, a dynamic learning algorithm based on the \alpha -divergence loss function ( \alpha -DLA) is designed to update the parameters of SOR-FNN. Then, this learning algorithm is able to reduce the sensibility of disturbances and improve the robustness of Third, the convergence of SOR-FNN is given by the Lyapunov theorem. Then, the theoretical analysis can ensure the successful application of SOR-FNN. Finally, the proposed SOR-FNN is tested on several benchmark datasets and a practical application to validate its merits. The experimental results indicate that the proposed SOR-FNN can obtain superior performance in terms of model accuracy and robustness.

PaperID: 650,

Authors: Ruizhi Hou, Fang Li, Tieyong Zeng

Affiliations: School of Mathematical Sciences, East China Normal University, Shanghai, China; School of Mathematical Sciences, Key Laboratory of Ministry of Education (MEA), and the Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai, China; Department of Mathematics, The Chinese University of Hong Kong (CUHK), Hong Kong, China

Title: Fast and Reliable Score-Based Generative Model for Parallel MRI

Abstract:
The score-based generative model (SGM) can generate high-quality samples, which have been successfully adopted for magnetic resonance imaging (MRI) reconstruction. However, the recent SGMs may take thousands of steps to generate a high-quality image. Besides, SGMs neglect to exploit the redundancy in k space. To overcome the above two drawbacks, in this article, we propose a fast and reliable SGM (FRSGM). First, we propose deep ensemble denoisers (DEDs) consisting of SGM and the deep denoiser, which are used to solve the proximal problem of the implicit regularization term. Second, we propose a spatially adaptive self-consistency (SASC) term as the regularization term of the k -space data. We use the alternating direction method of multipliers (ADMM) algorithm to solve the minimization model of compressed sensing (CS)-MRI incorporating the image prior term and the SASC term, which is significantly faster than the related works based on SGM. Meanwhile, we can prove that the iterating sequence of the proposed algorithm has a unique fixed point. In addition, the DED and the SASC term can significantly improve the generalization ability of the algorithm. The features mentioned above make our algorithm reliable, including the fixed-point convergence guarantee, the exploitation of the k space, and the powerful generalization ability.

PaperID: 651,

Authors: Zhiqiang Pan, Fei Cai, Xinwang Liu, Honghui Chen

Affiliations: Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha, China; School of Computer, National University of Defense Technology, Changsha, China

Title: Distance-Aware Learning for Inductive Link Prediction on Temporal Networks

Abstract:
Inductive link prediction on temporal networks aims to predict the future links associated with node(s) unseen in the historical timestamps. Existing methods generate the predictions mainly by learning node representation from the node/edge attributes as well as the network dynamics or by measuring the distance between nodes on the temporal network structure. However, the attribute information is unavailable in many realistic applications and the structure-aware methods highly rely on nodes’ common neighbors, which are difficult to accurately detect, especially in sparse temporal networks. Thus, we propose a distance-aware learning (DEAL) approach for inductive link prediction on temporal networks. Specifically, we first design an adaptive sampling method to extract temporal adaptive walks for nodes, increasing the probability of including the common neighbors between nodes. Then, we design a dual-channel distance measuring component, which simultaneously measures the distance between nodes in the embedding space and on the dynamic graph structure for predicting future inductive edges. Extensive experiments are conducted on three public temporal network datasets, i.e., MathOverflow, AskUbuntu, and StackOverflow. The experimental results validate the superiority of DEAL over the state-of-the-art baselines in terms of accuracy, area under the ROC curve (AUC), and average precision (AP), where the improvements are especially obvious in scenarios with only limited data.

PaperID: 652,

Authors: Ranran Wang, Xing Xu, Yin Zhang

Affiliations: School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China; School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China

Title: Multiscale Information Diffusion Prediction With Minimal Substitution Neural Network

Abstract:
Information diffusion prediction is a complex task due to the dynamic of information substitution present in large social platforms, such as Weibo and Twitter. This task can be divided into two levels: the macroscopic popularity prediction and the microscopic information diffusion prediction (who is next), which share the essence of modeling the dynamic spread of information. While many researchers have focused on the internal influence of individual cascades, they often overlook other influential factors that affect information diffusion, such as competition and cooperation among information, the attractiveness of information to users, and the potential impact of content anticipation on further diffusion. To address this issue, we propose a multiscale information diffusion prediction with minimal substitution (MIDPMS) neural network. This model simultaneously enables macroscale popularity prediction and microscale diffusion prediction. Specifically, information diffusion is modeled as a substitution system among different information. First, the life cycle of content, user preferences, and potential content anticipation are considered in this system. Second, a minimal-substitution-theory-based neural network is first proposed to model this substitution system to facilitate joint training of macroscopic and microscopic diffusion prediction. Finally, extensive experiments are conducted on Weibo and Twitter datasets to validate the performance of our proposed model on multiscale tasks. The results confirmed that the proposed model performed well on both multiscale tasks on Weibo and Twitter.

PaperID: 653,

Authors: Weiping Ding, Yu Geng, Jiashuang Huang, Hengrong Ju, Haipeng Wang, Chin-Teng Lin

Affiliations: School of Computer Science and Technology, Nantong University, Nantong, China; Centre for Artificial Intelligence, FEIT, University of Technology Sydney, Ultimo, NSW, Australia

Title: MGRW-Transformer: Multigranularity Random Walk Transformer Model for Interpretable Learning

Abstract:
Deep-learning models have been widely used in image recognition tasks due to their strong feature-learning ability. However, most of the current deep-learning models are “black box” systems that lack a semantic explanation of how they reached their conclusions. This makes it difficult to apply these methods to complex medical image recognition tasks. The vision transformer (ViT) model is the most commonly used deep-learning model with a self-attention mechanism that shows the region of influence as compared to traditional convolutional networks. Thus, ViT offers greater interpretability. However, medical images often contain lesions of variable size in different locations, which makes it difficult for a deep-learning model with a self-attention module to reach correct and explainable conclusions. We propose a multigranularity random walk transformer (MGRW-Transformer) model guided by an attention mechanism to find the regions that influence the recognition task. Our method divides the image into multiple subimage blocks and transfers them to the ViT module for classification. Simultaneously, the attention matrix output from the multiattention layer is fused with the multigranularity random walk module. Within the multigranularity random walk module, the segmented image blocks are used as nodes to construct an undirected graph using the attention node as a starting node and guiding the coarse-grained random walk. We appropriately divide the coarse blocks into finer ones to manage the computational cost and combine the results based on the importance of the discovered features. The result is that the model offers a semantic interpretation of the input image, a visualization of the interpretation, and insight into how the decision was reached. Experimental results show that our method improves classification performance with medical images while presenting an understandable interpretation for use by medical professionals.

PaperID: 654,

Authors: Ningning Bai, Xiaofeng Wang, Ruidong Han, Qin Wang, Zinian Liu

Affiliations: Department of Mathematics, Xi’an University of Technology, Xi’an, China; School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, China

Title: PAFormer: Anomaly Detection of Time Series With Parallel-Attention Transformer

Abstract:
Time-series anomaly detection is a critical task with significant impact as it serves a pivotal role in the field of data mining and quality management. Current anomaly detection methods are typically based on reconstruction or forecasting algorithms, as these methods have the capability to learn compressed data representations and model time dependencies. However, most methods rely on learning normal distribution patterns, which can be difficult to achieve in real-world engineering applications. Furthermore, real-world time-series data is highly imbalanced, with a severe lack of representative samples for anomalous data, which can lead to model learning failure. In this article, we propose a novel end-to-end unsupervised framework called the parallel-attention transformer (PAFormer), which discriminates anomalies by modeling both the global characteristics and local patterns of time series. Specifically, we construct parallel-attention (PA), which includes two core modules: the global enhanced representation module (GERM) and the local perception module (LPM). GERM consists of two pattern units and a normalization module, with attention weights that indicate the relationship of each data point to the whole series (global). Due to the rarity of anomalous points, they have strong associations with adjacent data points. LPM is composed of a learnable Laplace kernel function that learns the neighborhood relevancies through the distributional properties of the kernel function (local). We employ the PA to learn the global-local distributional differences for each data point, which enables us to discriminate anomalies. Finally, we propose a two-stage adversarial loss to optimize the model. We conduct experiments on five public benchmark datasets (real-world datasets) and one synthetic dataset. The results show that PAFormer outperforms state-of-the-art baselines.

PaperID: 655,

Authors: Yi Li, Luming Zhang, Lin Shao

Affiliations: College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou, Zhejiang, China; Hithink RoyalFlush Information Network Company Ltd., Hangzhou, China

Title: LR Aerial Photo Categorization by Cross-Resolution Perceptual Knowledge Propagation

Abstract:
There are hundreds of high- and low-altitude earth observation satellites that asynchronously capture massive-scale aerial photographs every day. Generally, high-altitude satellites take low-resolution (LR) aerial pictures, each covering a considerably large area. In contrast, low-altitude satellites capture high-resolution (HR) aerial photos, each depicting a relatively small area. Accurately discovering the semantics of LR aerial photos is an indispensable technique in computer vision. Nevertheless, it is also a challenging task due to: 1) the difficulty to characterize human hierarchical visual perception and 2) the intolerable human resources to label sufficient training data. To handle these problems, a novel cross-resolution perceptual knowledge propagation (CPKP) framework is proposed, focusing on adapting the visual perceptual experiences deeply learned from HR aerial photos to categorize LR ones. Specifically, by mimicking the human vision system, a novel low-rank model is designed to decompose each LR aerial photo into multiple visually/semantically salient foreground regions coupled with the background nonsalient regions. This model can: 1) produce a gaze-shifting path (GSP) simulating human gaze behavior and 2) engineer the deep feature for each GSP. Afterward, a kernel-induced feature selection (FS) algorithm is formulated to obtain a succinct set of deep GSP features discriminative across LR and HR aerial photos. Based on the selected features, the labels from LR and HR aerial photos are collaboratively utilized to train a linear classifier for categorizing LR ones. It is worth emphasizing that, such a CPKP mechanism can effectively optimize the linear classifier training, as labels of HR aerial photos are acquired more conveniently in practice. Comprehensive visualization results and comparative study have validated the superiority of our approach.

PaperID: 656,

Authors: Siqi Li, Jun Chen, Shanqi Liu, Chengrui Zhu, Guanzhong Tian, Yong Liu

Affiliations: Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China; Ningbo Innovation Center, Zhejiang University, Hangzhou, Zhejiang, China

Title: MCMC: Multi-Constrained Model Compression via One-Stage Envelope Reinforcement Learning

Abstract:
Model compression methods are being developed to bridge the gap between the massive scale of neural networks and the limited hardware resources on edge devices. Since most real-world applications deployed on resource-limited hardware platforms typically have multiple hardware constraints simultaneously, most existing model compression approaches that only consider optimizing one single hardware objective are ineffective. In this article, we propose an automated pruning method called multi-constrained model compression (MCMC) that allows for the optimization of multiple hardware targets, such as latency, floating point operations (FLOPs), and memory usage, while minimizing the impact on accuracy. Specifically, we propose an improved multi-objective reinforcement learning (MORL) algorithm, the one-stage envelope deep deterministic policy gradient (DDPG) algorithm, to determine the pruning strategy for neural networks. Our improved one-stage envelope DDPG algorithm reduces exploration time and offers greater flexibility in adjusting target priorities, enhancing its suitability for pruning tasks. For instance, on the visual geometry group (VGG)-16 network, our method achieved an 80% reduction in FLOPs, a 2.31× reduction in memory usage, and a 1.92× acceleration, with an accuracy improvement of 0.09% compared with the baseline. For larger datasets, such as ImageNet, we reduced FLOPs by 50% for MobileNet-V1, resulting in a 4.7× faster speed and 1.48× memory compression, while maintaining the same accuracy. When applied to edge devices, such as JETSON XAVIER NX, our method resulted in a 71% reduction in FLOPs for MobileNet-V1, leading to a 1.63× faster speed, 1.64× memory compression, and an accuracy improvement.

PaperID: 657,

Authors: Zhuo Su, Jiehua Zhang, Tianpeng Liu, Zhen Liu, Shuanghui Zhang, Matti Pietikäinen, Li Liu

Affiliations: College of Computer Science, Nankai University, Tianjin, China; Center for Machine Vision and Signal Analysis, University of Oulu, Oulu, Finland; College of Electronic Science, National University of Defense Technology, Changsha, Hunan, China

Title: Boosting Convolutional Neural Networks With Middle Spectrum Grouped Convolution

Abstract:
This article proposes a novel module called middle spectrum grouped convolution (MSGC) for efficient deep convolutional neural networks (DCNNs) with the mechanism of grouped convolution. It explores the broad “middle spectrum” area between channel pruning and conventional grouped convolution. Compared with channel pruning, MSGC can retain most of the information from the input feature maps due to the group mechanism; compared with grouped convolution, MSGC benefits from the learnability, the core of channel pruning, for constructing its group topology, leading to better channel division. The middle spectrum area is unfolded along four dimensions: groupwise, layerwise, samplewise, and attentionwise, making it possible to reveal more powerful and interpretable structures. As a result, the proposed module acts as a booster that can reduce the computational cost of the host backbones for general image recognition with even improved predictive accuracy. For example, in the experiments on the ImageNet dataset for image classification, MSGC can reduce the multiply–accumulates (MACs) of ResNet-18 and ResNet-50 by half but still increase the Top-1 accuracy by more than 1%. With a 35% reduction of MACs, MSGC can also increase the Top-1 accuracy of the MobileNetV2 backbone. Results on the MS COCO dataset for object detection show similar observations. Our code and trained models are available at https://github.com/hellozhuo/msgc.

PaperID: 658,

Authors: Sichao Fu, Qinmu Peng, Yang He, Xiaorui Wang, Bin Zou, Duanquan Xu, Xiao-Yuan Jing, Xinge You

Affiliations: School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China; Platform Operation and Marketing Center, JD Retail, Beijing, China; Faculty of Mathematics and Statistics, Hubei Key Laboratory of Applied Mathematics, Hubei University, Wuhan, China; School of Computer Science, Wuhan University, Wuhan, China

Title: Multilevel Contrastive Graph Masked Autoencoders for Unsupervised Graph-Structure Learning

Abstract:
Unsupervised graph-structure learning (GSL) which aims to learn an effective graph structure applied to arbitrary downstream tasks by data itself without any labels’ guidance, has recently received increasing attention in various real applications. Although several existing unsupervised GSL has achieved superior performance in different graph analytical tasks, how to utilize the popular graph masked autoencoder to sufficiently acquire effective supervision information from the data itself for improving the effectiveness of learned graph structure has been not effectively explored so far. To tackle the above issue, we present a multilevel contrastive graph masked autoencoder (MCGMAE) for unsupervised GSL. Specifically, we first introduce a graph masked autoencoder with the dual feature masking strategy to reconstruct the same input graph-structured data under the original structure generated by the data itself and learned graph-structure scenarios, respectively. And then, the inter- and intra-class contrastive loss is introduced to maximize the mutual information in feature and graph-structure reconstruction levels simultaneously. More importantly, the above inter- and intra-class contrastive loss is also applied to the graph encoder module for further strengthening their agreement at the feature-encoder level. In comparison to the existing unsupervised GSL, our proposed MCGMAE can effectively improve the training robustness of the unsupervised GSL via different-level supervision information from the data itself. Extensive experiments on three graph analytical tasks and eight datasets validate the effectiveness of the proposed MCGMAE.

PaperID: 659,

Authors: Xiaohan Yu, Zicheng Pan, Yang Zhao, Yongsheng Gao

Affiliations: School of Computing, Macquarie University, Macquarie Park, NSW, Australia; Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD, Australia; Department of Computer Science and Information Technology, La Trobe University, Bundoora, VIC, Australia

Title: Self-Supervised Lie Algebra Representation Learning via Optimal Canonical Metric

Abstract:
Learning discriminative representation with limited training samples is emerging as an important yet challenging visual categorization task. While prior work has shown that incorporating self-supervised learning can improve performance, we found that the direct use of canonical metric in a Lie group is theoretically incorrect. In this article, we prove that a valid optimization measurement should be a canonical metric on Lie algebra. Based on the theoretical finding, this article introduces a novel self-supervised Lie algebra network (SLA-Net) representation learning framework. Via minimizing canonical metric distance between target and predicted Lie algebra representation within a computationally convenient vector space, SLA-Net avoids computing nontrivial geodesic (locally length-minimizing curve) metric on a manifold (curved space). By simultaneously optimizing a single set of parameters shared by self-supervised learning and supervised classification, the proposed SLA-Net gains improved generalization capability. Comprehensive evaluation results on eight public datasets show the effectiveness of SLA-Net for visual categorization with limited samples.

PaperID: 660,

Authors: Lingyuan Meng, Ke Liang, Bin Xiao, Sihang Zhou, Yue Liu, Meng Liu, Xihong Yang, Xinwang Liu, Jinyan Li

Affiliations: School of Computer, National University of Defense Technology, Changsha, China; School of Computer Science, Chongqing University of Posts and Telecommunications, Chongqing, China; College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China; Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Shenzhen, China

Title: SARF: Aliasing Relation-Assisted Self-Supervised Learning for Few-Shot Relation Reasoning

Abstract:
Few-shot relation reasoning on knowledge graphs (FS-KGR) is an important and practical problem that aims to infer long-tail relations and has drawn increasing attention these years. Among all the proposed methods, self-supervised learning (SSL) methods, which effectively extract the hidden essential inductive patterns relying only on the support sets, have achieved promising performance. However, the existing SSL methods simply cut down connections between high-frequency and long-tail relations, which ignores the fact, i.e., the two kinds of information could be highly related to each other. Specifically, we observe that relations with similar contextual meanings, called aliasing relations (ARs), may have similar attributes. In other words, the ARs of the target long-tail relation could be in high-frequency, and leveraging such attributes can largely improve the reasoning performance. Based on the interesting observation above, we proposed a novel Self-supervised learning model by leveraging Aliasing Relations to assist FS-KGR, termed SARF. Specifically, we propose a graph neural network (GNN)-based AR-assist module to encode the ARs. Besides, we further provide two fusion strategies, i.e., simple summation and learnable fusion, to fuse the generated representations, which contain extra abundant information underlying the ARs, into the self-supervised reasoning backbone for performance enhancement. Extensive experiments on three few-shot benchmarks demonstrate that SARF achieves state-of-the-art (SOTA) performance compared with other methods in most cases.

PaperID: 661,

Authors: Bo Liu, Xuan Li, Junjie Huang, Housheng Su

Affiliations: Key Laboratory for Intelligent Analysis and Security Governance of Ethnic Languages, Ministry of Education, School of Information Engineering, Minzu University of China, Beijing, China; School of Mathematical Sciences, Inner Mongolia University, Hohhot, China; School of Artificial Intelligence and Automation, Key Laboratory of Image Processing and Intelligent Control of Education Ministry of China, Huazhong University of Science and Technology, Wuhan, China

Title: Controllability of N-Duplication Corona Product Networks With Laplacian Dynamics

Abstract:
This article studies the controllability of a new composite network generated by two smaller scale factor networks via the Corona product with Laplacian dynamics. First, the eigenvalues and corresponding eigenvectors of a new composite network—the \mathcal N -duplication Corona product network—are derived by some properties of its factor networks. Second, a necessary and sufficient algebra-based criterion for the controllability of such network is established based on the Popov-Belevitch-Hautus (PBH) test. Furthermore, the weights on edges between the different factor networks are considered. Finally, several examples are presented to demonstrate the effectiveness of our results applied to the unmanned aerial vehicle (UAV) formation.

PaperID: 662,

Authors: Ruiheng Zhang, Zhe Cao, Shuo Yang, Lingyu Si, Haoyang Sun, Lixin Xu, Fuchun Sun

Affiliations: State Key Laboratory of Electromechanical Dynamic Control, School of Mechatronical Engineering, Beijing Institute of Technology, Beijing, China; Faculty of Electronic and Information Technology, University of Technology Sydney, Sydney, NSW, Australia; Science and Technology on Integrated Information System Laboratory, Institute of Software Chinese Academy of Sciences, Beijing, China; Department of Computer Science and Technology, State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing, China

Title: Cognition-Driven Structural Prior for Instance-Dependent Label Transition Matrix Estimation

Abstract:
The label transition matrix has emerged as a widely accepted method for mitigating label noise in machine learning. In recent years, numerous studies have centered on leveraging deep neural networks to estimate the label transition matrix for individual instances within the context of instance-dependent noise. However, these methods suffer from low search efficiency due to the large space of feasible solutions. Behind this drawback, we have explored that the real murderer lies in the invalid class transitions, that is, the actual transition probability between certain classes is zero but is estimated to have a certain value. To mask the invalid class transitions, we introduced a human-cognition-assisted method with structural information from human cognition. Specifically, we introduce a structured transition matrix network (STMN) designed with an adversarial learning process to balance instance features and prior information from human cognition. The proposed method offers two advantages: 1) better estimation effectiveness is obtained by sparing the transition matrix and 2) better estimation accuracy is obtained with the assistance of human cognition. By exploiting these two advantages, our method parametrically estimates a sparse label transition matrix, effectively converting noisy labels into true labels. The efficiency and superiority of our proposed method are substantiated through comprehensive comparisons with state-of-the-art methods on three synthetic datasets and a real-world dataset. Our code will be available at https://github.com/WheatCao/STMN-Pytorch.

PaperID: 663,

Authors: Keke Huang, Hengxing Zhu, Dehao Wu, Chunhua Yang, Weihua Gui

Affiliations: School of Automation, Central South University, Changsha, China

Title: EaLDL: Element-Aware Lifelong Dictionary Learning for Multimode Process Monitoring

Abstract:
With the rapid development of modern industry and the increasing prominence of artificial intelligence, data-driven process monitoring methods have gained significant popularity in industrial systems. Traditional static monitoring models struggle to represent the new modes that arise in industrial production processes due to changes in production environments and operating conditions. Retraining these models to address the changes often leads to high computational complexity. To address this issue, we propose a multimode process monitoring method based on element-aware lifelong dictionary learning (EaLDL). This method initially treats dictionary elements as fundamental units and measures the global importance of dictionary elements from the perspective of the multimode global learning process. Subsequently, to ensure that the dictionary can represent new modes without losing the representation capability of historical modes during the updating process, we construct a novel surrogate loss to impose constraints on the update of dictionary elements. This constraint enables the continuous updating of the dictionary learning (DL) method to accommodate new modes without compromising the representation of previous modes. Finally, to evaluate the effectiveness of the proposed method, we perform comprehensive experiments on numerical simulations as well as an industrial process. A comparison is made with several advanced process monitoring methods to assess its performance. Experimental results demonstrate that our proposed method achieves a favorable balance between learning new modes and retaining the memory of historical modes. Moreover, the proposed method exhibits insensitivity to initial points, delivering satisfactory results under various initial conditions.

PaperID: 664,

Authors: Weiqing Yan, Yuanyang Zhang, Chang Tang, Wujie Zhou, Weisi Lin

Title: Anchor-Sharing and Cluster-Wise Contrastive Network for Multiview Representation Learning

Abstract:
Multiview clustering (MVC) has gained significant attention as it enables the partitioning of samples into their respective categories through unsupervised learning. However, there are a few issues as follows: 1) many existing deep clustering methods use the same latent features to achieve the conflict objectives, namely, reconstruction and view consistency. The reconstruction objective aims to preserve view-specific features for each individual view, while the view-consistency objective strives to obtain common features across all views; 2) some deep embedded clustering (DEC) approaches adopt view-wise fusion to obtain consensus feature representation. However, these approaches overlook the correlation between samples, making it challenging to derive discriminative consensus representations; and 3) many methods use contrastive learning (CL) to align the view’s representations; however, they do not take into account cluster information during the construction of sample pairs, which can lead to the presence of false negative pairs. To address these issues, we propose a novel multiview representation learning network, called anchor-sharing and clusterwise CL (CwCL) network for multiview representation learning. Specifically, we separate view-specific learning and view-common learning into different network branches, which addresses the conflict between reconstruction and consistency. Second, we design an anchor-sharing feature aggregation (ASFA) module, which learns the sharing anchors from different batch data samples, establishes the bipartite relationship between anchors and samples, and further leverages it to improve the samples’ representations. This module enhances the discriminative power of the common representation from different samples. Third, we design CwCL module, which incorporates the learned transition probability into CL, allowing us to focus on minimizing the similarity between representations from negative pairs with a low transition probability. It alleviates the conflict in previous sample-level contrastive alignment. Experimental results demonstrate that our method outperforms the state-of-the-art performance.

PaperID: 665,

Authors: Ziheng Jiao, Xuelong Li

Affiliations: School of Computer Science, Northwestern Polytechnical University, Xi’an, China; School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi’an, China

Title: An End-to-End Deep Graph Clustering via Online Mutual Learning

Abstract:
In clustering fields, the deep graph models generally utilize the graph neural network to extract the deep embeddings and aggregate them according to the data structure. The optimization procedure can be divided into two individual stages, optimizing the neural network with gradient descent and generating the aggregation with a machine learning-based algorithm. Hence, it means that clustering results cannot guide the optimization of graph neural networks. Besides, since the aggregating stage involves complicated matrix computation such as decomposition, it brings a high computational burden. To address these issues, a unified deep graph clustering (UDGC) model via online mutual learning is proposed in this brief. Specifically, it maps the data into the deep embedding subspace and extracts the deep graph representation to explore the latent topological knowledge of the nodes. In the deep subspace, the model aggregates the embeddings and generates the clustering assignments via the local preserving loss. More importantly, we train a neural layer to fit the clustering results and design an online mutual learning strategy to optimize the whole model, which can not only output the clustering assignments end-to-end but also reduce the computation complexity. Extensive experiments support the superiority of our model.

PaperID: 666,

Authors: Qiaolin Ye, Jie Yang, Hao Zheng, Liyong Fu

Affiliations: College of Information Science and Technology, Nanjing Forestry University, Nanjing, Jiangsu, China; Key Laboratory of Intelligent Information Processing, Nanjing Xiaozhuang University, Nanjing, Jiangsu, China; Institute of Forest Resource Information Techniques, Chinese Academy of Forestry, Beijing, China

Title: Convergence Analysis on Trace Ratio Linear Discriminant Analysis Algorithms

Abstract:
Linear discriminant analysis (LDA) may yield an inexact solution by transforming a trace ratio problem into a corresponding ratio trace problem. Most recently, optimal dimensionality LDA (ODLDA) and trace ratio LDA (TRLDA) have been developed to overcome this problem. As one of the greatest contributions, the two methods design efficient iterative algorithms to derive an optimal solution. However, the theoretical evidence for the convergence of these algorithms has not yet been provided, which renders the theory of ODLDA and TRLDA incomplete. In this correspondence, we present some rigorously theoretical insight into the convergence of the iterative algorithms. To be specific, we first demonstrate the existence of lower bounds for the objective functions in both ODLDA and TRLDA, and then establish proofs that the objective functions are monotonically decreasing under the iterative frameworks. Based on the findings, we disclose the convergence of the iterative algorithms finally.

PaperID: 667,

Authors: Adolfo Perrusquía, Weisi Guo

Affiliations: School of Aerospace, Transport and Manufacturing, Cranfield University, Bedford, U.K.

Title: Drone's Objective Inference Using Policy Error Inverse Reinforcement Learning

Abstract:
Drones are set to penetrate society across transport and smart living sectors. While many are amateur drones that pose no malicious intentions, some may carry deadly capability. It is crucial to infer the drone’s objective to prevent risk and guarantee safety. In this article, a policy error inverse reinforcement learning (PEIRL) algorithm is proposed to uncover the hidden objective of drones from online data trajectories obtained from cooperative sensors. A set of error-based polynomial features are used to approximate both the value and policy functions. This set of features is consistent with current onboard storage memories in flight controllers. The real objective function is inferred using an objective constraint and an integral inverse reinforcement learning (IRL) batch least-squares (LS) rule. The convergence of the proposed method is assessed using Lyapunov recursions. Simulation studies using a quadcopter model are provided to demonstrate the benefits of the proposed approach.

PaperID: 668,

Authors: Min Wang, Min Xie, Yanwen Wang, Maoyin Chen

Affiliations: School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China; Department of Advanced Design and Systems Engineering, City University of Hong Kong, Kowloon Tong, Hong Kong; Department of Automation, China University of Petroleum (Beijing), Beijing, China

Title: A Deep Quality Monitoring Network for Quality-Related Incipient Faults

Abstract:
Although quality-related process monitoring has achieved the great progress, scarce works consider the detection of quality-related incipient faults. Partial least square (PLS) and its variants only focus on faults with larger magnitudes. In this article, a deep quality monitoring network (DQMNet) for quality-related incipient fault detection is developed. DQMNet includes the feature input layer, feature extraction layers, and the output layer. In the feature input layer, collected variables are divided according to quality variables, and then, features are extracted, respectively, through base detectors. For the feature extraction layers, singular values (SVs) of sliding-window patches and principal component analysis (PCA) are adopted to mine the hidden information layer by layer. For the output layer, statistics are constructed from quality-related/unrelated feature matrix through Bayesian inference. The superiority of DQMNet is demonstrated by a numerical simulation and the benchmark data of Tennessee Eastman process (TEP).

PaperID: 669,

Authors: Jingtao Hu, Bin Xiao, Hu Jin, Jingcan Duan, Siwei Wang, Zhao Lv, Siqi Wang, Xinwang Liu, En Zhu

Affiliations: School of Computer Science, National University of Defense Technology, Changsha, China; Department of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China; Intelligent Game and Decision Laboratory, Beijing, China

Title: SAMCL: Subgraph-Aligned Multiview Contrastive Learning for Graph Anomaly Detection

Abstract:
Graph anomaly detection (GAD) has gained increasing attention in various attribute graph applications, i.e., social communication and financial fraud transaction networks. Recently, graph contrastive learning (GCL)-based methods have been widely adopted as the mainstream for GAD with remarkable success. However, existing GCL strategies in GAD mainly focus on node–node and node–subgraph contrast and fail to explore subgraph–subgraph level comparison. Furthermore, the different sizes or component node indices of the sampled subgraph pairs may cause the “nonaligned” issue, making it difficult to accurately measure the similarity of subgraph pairs. In this article, we propose a novel subgraph-aligned multiview contrastive approach for graph anomaly detection, named SAMCL, which fills the subgraph–subgraph contrastive-level blank for GAD tasks. Specifically, we first generate the multiview augmented subgraphs by capturing different neighbors of target nodes forming contrasting subgraph pairs. Then, to fulfill the nonaligned subgraph pair contrast, we propose a subgraph-aligned strategy that estimates similarities with the Earth mover’s distance (EMD) of both considering the node embedding distributions and typology awareness. With the newly established similarity measure for subgraphs, we conduct the interview subgraph-aligned contrastive learning module to better detect changes for nodes with different local subgraphs. Moreover, we conduct intraview node–subgraph contrastive learning to supplement richer information on abnormalities. Finally, we also employ the node reconstruction task for the masked subgraph to measure the local change of the target node. Finally, the anomaly score for each node is jointly calculated by these three modules. Extensive experiments conducted on benchmark datasets verify the effectiveness of our approach compared to existing state-of-the-art (SOTA) methods with significant performance gains (up to 6.36% improvement on ACM). Our code can be verified at https://github.com/hujingtao/SAMCL.

PaperID: 670,

Authors: Yanmeng Wang, Qingjiang Shi, Tsung-Hui Chang

Affiliations: School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China; Shenzhen Research Institute of Big Data, Shenzhen, China

Title: Why Batch Normalization Damage Federated Learning on Non-IID Data?

Abstract:
As a promising distributed learning paradigm, federated learning (FL) involves training deep neural network (DNN) models at the network edge while protecting the privacy of the edge clients. To train a large-scale DNN model, batch normalization (BN) has been regarded as a simple and effective means to accelerate the training and improve the generalization capability. However, recent findings indicate that BN can significantly impair the performance of FL in the presence of non-i.i.d. data. While several FL algorithms have been proposed to address this issue, their performance still falls significantly when compared to the centralized scheme. Furthermore, none of them have provided a theoretical explanation of how the BN damages the FL convergence. In this article, we present the first convergence analysis to show that under the non-i.i.d. data, the mismatch between the local and global statistical parameters in BN causes the gradient deviation between the local and global models, which, as a result, slows down and biases the FL convergence. In view of this, we develop a new FL algorithm that is tailored to BN, called FedTAN, which is capable of achieving robust FL performance under a variety of data distributions via iterative layer-wise parameter aggregation. Comprehensive experimental results demonstrate the superiority of the proposed FedTAN over existing baselines for training BN-based DNN models.

PaperID: 671,

Authors: Jian Li, Yong Liu, Weiping Wang

Affiliations: Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China

Title: Optimal Convergence for Agnostic Kernel Learning With Random Features

Abstract:
Owing to their solid theoretical guarantees and flexible learning framework, random features (RFs) methods have drawn increasing attention in the field of nonparametric statistical learning. However, existing studies on RFs assume that the target function lies exactly in the associated kernel space, which may not hold true in practical applications. In this article, we investigate the effectiveness of RFs in an agnostic setting that the target regression may be out of the kernel space and prove that they can still achieve capacity-dependent statistical optimality. To achieve this, we provide a finer grained estimate for the capacity of the hypothesis space, and conduct a refined analysis of error terms after a concise error decomposition. Our results show that RF with uniform sampling can guarantee optimality in half of the agnostic situations, while RF with data-dependent sampling can achieve optimal rates in the entire agnostic setting. This finding suggests that using data-dependent sampling not only reduces the number of RFs but also improves their applicability in agnostic settings. Finally, we compare the performance of RFs with different sampling strategies on several real-world datasets. The experimental results provide supports for our theoretical findings.

PaperID: 672,

Authors: Chixuan Wei, Zhihai Wang, Jidong Yuan, Xiaokang Wang, Haiyang Liu, Qiyang Zhao

Affiliations: School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China; School of Economics and Management, Beijing University of Posts and Telecommunications, Beijing, China

Title: SemiHAR: Improving Semisupervised Human Activity Recognition via Multitask Learning

Abstract:
Semisupervised human activity recognition (SemiHAR) has attracted attention in recent years from various domains, such as digital health and ambient intelligence. Currently, it still faces two challenges. For one thing, discriminative features may exist among multiple sequences rather than a single sequence since activities are combinations of motions involving several body parts. For another thing, labeled data and unlabeled data suffer from distribution discrepancies due to the different behavior patterns or biological conditions of users. For that, we propose a novel SemiHAR method based on multitask learning. First, a dimension-based Markov transition field (DMTF) technique is designed to generate 2-D activity data for capturing the interactions among different dimensions. Second, we jointly consider the user recognition (UR) task and the activity recognition (AR) task to reduce the underlying discrepancy. In addition, a task relation learner (TRL) is introduced to dynamically learn task relations, which enables the primary AR task to exploit preferred knowledge from other secondary tasks. We theoretically analyze the proposed SemiHAR and provide a novel generalization result. Extensive experiments conducted on four real-world datasets demonstrate that SemiHAR outperforms other state-of-the-art methods.

PaperID: 673,

Authors: Hang Hua, Xingjian Li, Dejing Dou, Cheng-Zhong Xu, Jiebo Luo

Affiliations: Department of Computer Science, University of Rochester, Rochester, NY, USA; Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA; BCG in Greater China, Beijing, China; State Key Laboratory of IOTSC, Faculty of Science and Technology, University of Macau, Macau, SAR, China

Title: Improving Pretrained Language Model Fine-Tuning With Noise Stability Regularization

Abstract:
The advent of large-scale pretrained language models (PLMs) has contributed greatly to the progress in natural language processing (NLP). Despite its recent success and wide adoption, fine-tuning a PLM often suffers from overfitting, which leads to poor generalizability due to the extremely high complexity of the model and the limited training samples from downstream tasks. To address this problem, we propose a novel and effective fine-tuning framework, named layerwise noise stability regularization (LNSR). Specifically, our method perturbs the input of neural networks with the standard Gaussian or in-manifold noise in the representation space and regularizes each layer’s output of the language model. We provide theoretical and experimental analyses to prove the effectiveness of our method. The empirical results show that our proposed method outperforms several state-of-the-art algorithms, such as \text L^2 norm and start point (L2-SP), Mixout, FreeLB, and smoothness inducing adversarial regularization and Bregman proximal point optimization (SMART). In addition to evaluating the proposed method on relatively simple text classification tasks, similar to the prior works, we further evaluate the effectiveness of our method on more challenging question-answering (QA) tasks. These tasks present a higher level of difficulty, and they provide a larger amount of training examples for tuning a well-generalized model. Furthermore, the empirical results indicate that our proposed method can improve the ability of language models to domain generalization.

PaperID: 674,

Authors: Huasheng Wang, Yueran Ma, Hongchen Tan, Xiaochang Liu, Ying Chen, Hantao Liu

Affiliations: School of Computer Science and Informatics, Cardiff University, CF AT, Cardiff, U.K.; School of Future Technology, Dalian University of Technology, Dalian, China; School of Mathematics, Sun Yat-sen University, Guangzhou, China; Alibaba Group, Hangzhou, China

Title: A Bioinspired Deep Learning Framework for Saliency-Based Image Quality Assessment

Abstract:
Advancements in deep learning have led to significant progress in no-reference (NR) image quality assessment (NR-IQA) for evaluating the perceived quality of digital images without relying on a reference. However, existing NR-IQA models remain suboptimal in handling complex and diverse natural images. Visual saliency constitutes a critical element for enhancing the reliability of NR-IQA, but the optimal use of saliency in deep learning-based NR-IQA has not heretofore been significantly explored. In this article, we present a novel method for integrating saliency in NR-IQA, which is motivated by the saliency-based visual search mechanism that different parts of the visual input are visited by the focus of attention (FOA) in the order of decreasing saliency. By dividing saliency into the high and low levels of FOA, we build a bioinspired deep neural network–BioSIQNet–based on a multitask learning (MTL) framework. The network architecture consists of two saliency-specific tasks and one primary image quality assessment (IQA) task. The low and high saliency (HS) are separately encoded and integrated into the early and deeper layers of the IQA network, respectively, analogous to the hierarchical processing in the visual cortex of the brain that allocates low attentional resources to process the simple patterns and high resources to learn intricate representations. We demonstrate that leveraging the synergy between visual attention and image quality perception and joint learning of these interconnected visual tasks can enhance the overall learning capabilities of the primary IQA model. Experiments validate the effectiveness of our proposed BioSIQNet for NR-IQA.

PaperID: 675,

Authors: Kaitong Zheng, Ya-Hui Jia, Kejiang Ye, Wei-Neng Chen

Affiliations: School of Future Technology, South China University of Technology, Guangzhou, China; Chinese Academy of Sciences, Shenzhen Institutes of Advanced Technology, Shenzhen, China; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

Title: Strategic Evolutionary Reinforcement Learning With Operator Selection and Experience Filter

Abstract:
The shared replay buffer is the core of synergy in evolutionary reinforcement learning (ERL). Existing methods overlooked the objective conflict between population evolution in evolutionary algorithm and ERL, leading to poor quality of the replay buffer. In this article, we propose a strategic ERL algorithm with operator selection and experience filter (SERL-OS-EF) to address the objective conflict issue and improve the synergy from three aspects: 1) an operator selection strategy is proposed to enhance the performance of all individuals, thereby fundamentally improving the quality of experiences generated by the population; 2) an experience filter is introduced to filter the experiences obtained from the population, maintaining the long-term high quality of the buffer; and 3) a dynamic mixed sampling strategy is introduced to improve the efficiency of RL agent learning from the buffer. Experiments in four MuJoCo locomotion environments and three Ant-Maze environments with deceptive rewards demonstrate the superiority of the proposed method. In addition, the practical significance of the proposed method is verified on a low-carbon multienergy microgrid (MEMG) energy management task.

PaperID: 676,

Authors: Jifeng Hu, Li Shen, Sili Huang, Zhejian Yang, Hechang Chen, Lichao Sun, Yi Chang, Dacheng Tao

Affiliations: School of Artificial Intelligence, Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, Jilin University, Changchun, China; Shenzhen Campus, Sun Yat-sen University, Shenzhen, China; Department of Information Engineering and Hainan International College, Minzu University of China, Beijing, China; Lehigh University, Bethlehem, PA, USA; Nanyang Technological University, Jurong West, Singapore

Title: Continual Diffuser (CoD): Mastering Continual Offline RL With Experience Rehearsal

Abstract:
Artificial neural networks, especially recent diffusion-based models, have shown remarkable superiority in gaming, control, and QA systems, where the training tasks’ datasets are usually static. However, in real-world applications, such as robotic control of reinforcement learning (RL), the tasks are changing, and new tasks arise in a sequential order. This situation poses the new challenge of plasticity–stability tradeoff for training an agent who can adapt to task changes and retain acquired knowledge. In view of this, we propose a rehearsal-based continual diffusion model, called continual diffuser (CoD), to endow the diffuser with the capabilities of quick adaptation (plasticity) and lasting retention (stability). Specifically, we first construct an offline benchmark that contains 90 tasks from multiple domains. Then, we train the CoD on each task with sequential modeling and conditional generation for making decisions. Next, we preserve a small portion of previous datasets as the rehearsal buffer and replay it to retain the acquired knowledge. Extensive experiments on a series of tasks show that CoD can achieve a promising plasticity–stability tradeoff and outperform existing diffusion-based methods and other representative baselines on most tasks. The source code is available at https://github.com/JF-Hu/Continual_Diffuser.

PaperID: 677,

Authors: Xiaoshu Chen, Sihang Zhou, Ke Liang, Xinwang Liu

Affiliations: College of Computer Science and Technology, National University of Defense Technology, Changsha, China; College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China

Title: Distilling Reasoning Ability From Large Language Models With Adaptive Thinking

Abstract:
Chain-of-thought distillation (CoT-distillation) aims to endow small language models (SLMs) with reasoning ability to improve their performance toward specific tasks by allowing them to imitate the reasoning procedure of large language models (LLMs) beyond simply predicting the answers. Most existing CoT-distillation methods adopt a pre-thinking mechanism, allowing the SLM to generate a rationale before answering. In this way, pre-thinking enables SLM to analyze questions but makes answer correctness sensitive to minor errors in rationale. Therefore, we propose a robust post-thinking mechanism to generate answers before the rationale. Thanks to this answer-first setting: 1) the answer can escape from the rationale-sensitive problem; 2) the rationale serves as an error amplifier, making SLM focus on learning hard samples; and 3) the inferring efficiency can also benefit. Although post-thinking brings many advantages, it may lose the ability to analyze complex questions compared to pre-thinking. Therefore, a plug-and-play adaptive-thinking mechanism is proposed to integrate the merits of pre-thinking and post-thinking, in which a perception module based on soft prompt tuning is introduced to prompt SLM to answer or think first according to the complexity of questions. Extensive experiments are conducted across 12 datasets and 2 language models to demonstrate the effectiveness of the proposed mechanism.

PaperID: 678,

Authors: Zikai Zhang, Ping Liu, Jiahao Xu, Rui Hu

Affiliations: Department of Computer Science and Engineering, University of Nevada, Reno, NV, USA

Title: Fed-HeLLo: Efficient Federated Foundation Model Fine-Tuning With Heterogeneous LoRA Allocation

Abstract:
Federated learning (FL) has recently been used to collaboratively fine-tune foundation models (FMs) across multiple clients. Notably, federated low-rank adaptation (LoRA)-based fine-tuning methods have recently gained attention, which allows clients to fine-tune FMs with a small portion of trainable parameters locally. However, most existing methods do not account for the heterogeneous resources of clients or lack an effective local training strategy to maximize global fine-tuning performance under limited resources. In this work, we propose federated LoRA-based fine-tuning framework with heterogeneous LoRA allocation (Fed-HeLLo), a novel federated LoRA-based fine-tuning framework that enables clients to collaboratively fine-tune an FM with different local trainable LoRA layers. To ensure its effectiveness, we develop several heterogeneous LoRA allocation (HLA) strategies that adaptively allocate local trainable LoRA layers based on clients’ resource capabilities and the layer importance. Specifically, based on the dynamic layer importance, we design a Fisher information matrix score-based HLA (FIM-HLA) that leverages dynamic gradient norm information. To better stabilize the training process, we consider the intrinsic importance of LoRA layers and design a geometrically defined HLA (GD-HLA) strategy. It shapes the collective distribution of trainable LoRA layers into specific geometric patterns, such as triangle, inverted triangle, bottleneck, and uniform. Moreover, we extend GD-HLA into a randomized version, named randomized GD-HLA (RGD-HLA), for enhanced model accuracy with randomness. By codesigning the proposed HLA strategies, we incorporate both the dynamic and intrinsic layer importance into the design of our HLA strategy. To thoroughly evaluate our approach, we simulate various complex federated LoRA-based fine-tuning settings using five datasets and three levels of data distributions ranging from independent identically distributed (i.i.d.) to extreme non-i.i.d. The experimental results demonstrate the effectiveness and efficiency of Fed-HeLLo with the proposed HLA strategies. The code is available at https://github.com/ TNI-playground/Fed_HeLLo

PaperID: 679,

Authors: Zhiwei Wang, Qi Lang, Xiaodong Liu, Hao Zhang

Affiliations: School of Control Science and Engineering, Dalian University of Technology, Dalian, China; School of Computer Science and Information Technology, Northeast Normal University, Changchun, China

Title: AFSPrompt: An Axiomatic Fuzzy Set Prompt Pipeline for Knowledge-Based VQA

Abstract:
Despite the impressive few-shot performance of in-context learning (ICL) in knowledge-based visual question answering (VQA), existing research often prioritizes addressing the image information gap in VQA, while placing less emphasis on organizing appropriate demonstrations (e.g., in-context examples) to support this task. Recent studies, however, have shown that ICL performance is sensitive to the organization of demonstrations. To address this, we introduce axiomatic fuzzy set (AFS) theory into knowledge-based VQA, leveraging its unsupervised and interpretable nature to effectively organize demonstrations by describing each candidate with semantic concepts, thereby enhancing both the understanding and trustworthiness of the decision-making process. In this article, we propose AFSPrompt, a train-free example selection and ranking framework based on AFS theory for knowledge-based VQA tasks. After filtering irrelevant examples using multimodal embeddings, we apply AFS logic to integrate comparison information from candidates with multidimensional features. Furthermore, to reduce reliance on large-scale language model APIs such as OpenAI and facilitate model deployment, we employ a smaller 7B LLM as the knowledge engine to answer questions based on the optimized prompt. Through extensive evaluations of two datasets, we demonstrate the effectiveness of AFSPrompt within a lightweight pipeline for knowledge-based VQA tasks. Our code is publicly available at https://github.com/afs001/AFSPrompt

PaperID: 680,

Authors: Yong-Sheng Ma, Wei-Wei Che, Shixu Xu, Chao Deng, Zheng-Guang Wu

Affiliations: National Key Laboratory of Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing, China; State Key Laboratory of Synthetical Automation for Process Industries and the College of Information Science and Engineering, Northeastern University, Shenyang, China; Department of Automation, Shandong Key Laboratory of Industrial Control Technology, Qingdao University, Qingdao, China; Institute of Advanced Technology, Nanjing University of Posts and Telecommunications, Nanjing, China; State Key Laboratory of Industrial Control Technology, Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China

Title: Distributed Model-Free Adaptive Learning Control of Discrete-Time Nonlinear Multiagent Systems

Abstract:
This article investigates the distributed control problem for nonlinear multiagent systems (MASs) with unknown system models. A novel distributed model-free adaptive learning algorithm is developed to learn a controller from the online system data. Notably, a significant advancement over conventional methods is that the proposed algorithm requires only local interaction data from neighboring agents, eliminating dependencies on both a priori system structural knowledge and global topology information. Comprehensive simulations validate the theoretical results and demonstrate the superior efficacy of the devised algorithm.

PaperID: 681,

Authors: Zhuang Li, Jing Tao, Xintao Liu, Dahua Shou

Affiliations: Future Intelligent Wear Centre, School of Fashion and Textiles, The Hong Kong Polytechnic University, Hong Kong, China; Department of Land Surveying and Geo-Informatics, The Hong Kong Polytechnic University, Hong Kong, China

Title: LCwmcaR: Learning Cross-Window Cross-Modality Correlation-Aware Representation for Human Activity Recognition

Abstract:
Deep learning (DL)-based human activity recognition (HAR) has attracted considerable attention owing to its vast potential across various applications. Currently, HAR still faces two challenges. For one thing, existing methods neglect the spatial distribution information embedded in HAR signals and lack the ability to comprehensively model the spatial–temporal (ST) dependencies within HAR data, restricting them from effectively decoding human activity. For another thing, previous models generate feature representations for a sliding window of the sequence solely based on this window itself, without cross-window interaction learning, posing challenges to classifiers, such as perceptual aliasing or feature inconsistency issues. For that, we propose a novel cross-window and cross-modality correlation-aware framework, namely LCwmcaR, which is a dual-branch network that simultaneously models temporal- and spatial-level information using Mamba and convolutional neural network (CNN), respectively. Additionally, a learnable temporal 2-dimensionalization (LT2D) strategy is designed to encode low-level temporal patterns into high-level learnable image-liked 2-D space representations that integrate both local and global ST dependencies. Moreover, a cross-window correlation-aware feature representation generation (CrwcaFRGen) module, which correlates multiple windows representations within a batch at the attention level, is introduced to produce more robust features for the classifier. Experimental results on four public datasets demonstrate that the LCwmcaR outperforms state-of-the-art (SOTA) methods by a large margin.

PaperID: 682,

Authors: Kui Jiang, Junjun Jiang, Zheng Wang, Zihan Geng, Xianming Liu

Affiliations: School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China; National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China; Institute of Data and Information, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, China

Title: DAWN+: Wavelet-Based Image Deraining Meets Direction-Aware Attention and Mutual Representation

Abstract:
The single-image deraining aims to restore clean scenes from rainy inputs by eliminating precipitation artifacts. Current methods often neglect the directional nature of rain streaks—a critical oversight that causes heterogeneous degradation, particularly in texture regions aligned with rain orientations. To address this issue and advance image deraining, we propose a novel direction-aware attention wavelet network (DAWN) for rain streaks removal. DAWN has several key distinctions and innovative features compared with existing wavelet transform-based methods: 1) introducing vector decomposition to parameterize rain distribution through vertical (V) and horizontal (H) component decomposition, enabling explicit direction-aware representation; 2) devising a novel direction-aware attention module (DAM) to learn projection/transformation parameters via coordinate attention mechanisms for precise rain removal and texture preservation; and 3) exploring practical composite constraints to jointly optimize structural coherence, detail fidelity, and chrominance accuracy. Building upon the conference version (DAWN), we devise DAWN+ with enhanced capabilities: 1) decoupling diagonal coefficient learning to eliminate frequency aliasing by characterizing diagonal components with dedicated projection parameters; 2) dividing vector decomposition and parameter fitting into multiple stages to reduce error accumulation; and 3) applying cross-frequency mutual representation to boost training and performance. Experiments across six tasks (deraining, raindrop/rainhaze removal, dehazing, and low-light/underwater enhancement) demonstrate the portability and reusability of these strategies. Meanwhile, DAWN+ delivers significant performance gains over DAWN, achieving an average peak signal to noise ratio (PSNR) increase of 1.17 dB with an acceptable complexity increase. Meanwhile, DAWN+ achieves the competitive performance to the state-of-the-art DRSformer (gaining 0.15 dB in PSNR) while saving 94.4% and 95% model parameters and inference time, respectively.

PaperID: 683,

Authors: Jingzheng Li, Hailong Sun, Jiyi Li, Pengpeng Chen, Shikui Wei

Affiliations: School of Artificial Intelligence, Jilin University, Jilin, China; State Key Laboratory of Complex and Critical Software Environment (CCSE), Beihang University, Beijing, China; Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan; Aviation System Engineering Institute of China, Beijing, China; Institute of Information Science, Beijing Jiaotong University, Beijing, China

Title: Toward Open-World Domain Adaptation via Iteratively Contrastive Learning and Clustering

Abstract:
The open-set domain adaptation (DA) aims to address both covariate shift and category shift between a labeled source domain and an unlabeled target domain. Nevertheless, existing open-set DA methods always ignore the demand for discovering novel classes that are not present in the source domain and simply reject them as “unknown” sets without further exploration, which motivates us to understand the unknown sets more specifically. In this article, we present a more challenging open-world DA problem that recognizes seen classes while discovering novel classes in the target domain. To address this problem, we propose a novel framework that converts this problem into a clustering task via contrastive learning to learn pairwise relationships among the instances. More specifically, our method consists of two iterative steps. The semi-supervised clustering step clusters the unlabeled target data and separates it into seen and novel classes. In the contrastive learning step, based on the cluster assignments, we design tailored contrastive losses that learn pairwise relationships to reduce domain discrepancy and discover novel classes. Our method can be optimized as an example of expectation maximization (EM). We establish several baselines by extending related work. Our method obtains the superior performance on five public datasets, benchmarking this challenging setting for future research.

PaperID: 684,

Authors: Muhammad A. A. Abdelgawad, Ray C. C. Cheung, Hong Yan

Affiliations: Department of Electrical Engineering and Centre for Intelligent Multidimensional Data Analysis, City University of Hong Kong, Kowloon Tong, Hong Kong; Department of Electrical Engineering, Faculty of Engineering, Minia University, Minia, Egypt

Title: IncTSVD: Incremental Tensor Singular Value Decomposition of Multidimensional Streaming Data

Abstract:
In this article, we develop an online method called IncTSVD to incrementally compute the tensor singular value decomposition (TSVD) of a given sequence of third-order tensors based on the tensor-tensor concept. This can be considered an extension of incremental SVD based on updating matrices to tensors. IncTSVD is suitable for streamed tensor data and where memory resources are limited. Most existing methods to compute TSVD focus on approximating it using randomized or sketching techniques in a batch setting to decrease the storage and computational costs required. The IncTSVD extends the computation of TSVD to streaming by maintaining the basis tensors of previously arrived data and incrementally updating the approximation using the tensor of incoming data. The computational cost and approximation error of the proposed method were analyzed theoretically and through extensive numerical experiments, which included using synthetic and real-world datasets under streaming scenarios. The IncTSVD method was superior to existing deterministic and randomized tensor decompositions (TDs) based on the t-product for computational and storage costs, and had comparable accuracy to the standard TSVD method.

PaperID: 685,

Authors: Dean L. Slack, G. Thomas Hudson, Thomas Winterbottom, Noura Al Moubayed

Affiliations: Department of Computer Science, Durham University, Durham, U.K.

Title: Video Prediction of Dynamic Physical Simulations With Pixel-Space Spatiotemporal Transformers

Abstract:
Inspired by the performance and scalability of autoregressive large language models (LLMs), transformer-based models have seen recent success in the visual domain. This study investigates a transformer adaptation for video prediction with a simple end-to-end approach, comparing various spatiotemporal self-attention layouts. Focusing on causal modeling of physical simulations over time; a common shortcoming of existing video-generative approaches, we attempt to isolate spatiotemporal reasoning via physical object tracking metrics and unsupervised training on physical simulation datasets. We introduce a simple yet effective pure transformer model for autoregressive video prediction, utilizing continuous pixel-space representations for video prediction. Without the need for complex training strategies or latent feature-learning components, our approach significantly extends the time horizon for physically accurate predictions by up to 50% when compared with existing latent-space approaches, while maintaining comparable performance on common video quality metrics. In addition, we conduct interpretability experiments to identify network regions that encode information useful to perform accurate estimations of PDE simulation parameters via probing models, and find that this generalizes to the estimation of out-of-distribution simulation parameters. This work serves as a platform for further attention-based spatiotemporal modeling of videos via a simple, parameter efficient, and interpretable approach.

PaperID: 686,

Authors: Qianyu Long, Qiyuan Wang, Christos Anagnostopoulos, Daning Bi

Affiliations: School of Computing Science, University of Glasgow, Glasgow, U.K.; College of Finance and Statistics, Hunan University, Changsha, China

Title: Decentralized Personalized Federated Learning Based on a Conditional "Sparse-to-Sparser" Scheme

Abstract:
Decentralized federated learning (DFL) has gained popularity due to its robustness and elimination of centralized coordination requirements. In this paradigm, clients actively participate in training by exchanging models with neighboring nodes in their network. However, DFL introduces significant overhead in both training and communication costs. While existing methods focus primarily on reducing communication costs, they often overlook training efficiency and the challenges of data heterogeneity. We address these limitations by introducing DA-DPFL, a novel sparse-to-sparser training scheme that initializes with a subset of model parameters which progressively decrease during training through dynamic aggregation. This approach substantially reduces energy consumption while preserving adequate information during critical learning periods. Our experimental results demonstrate that DA-DPFL significantly outperforms DFL baselines in test accuracy while achieving up to 5x reduction in energy costs. We provide theoretical convergence analysis that validates the applicability of our approach in decentralized and personalized learning contexts. The code is available at: https://github.com/EricLoong/da-dpfl

PaperID: 687,

Authors: Xiaokun Luan, Xiyue Zhang, Jingyi Wang, Meng Sun

Affiliations: School of Mathematical Sciences, Peking University, Beijing, China; School of Computer Science, University of Bristol, Bristol, U.K.; School of Control Science and Engineering, Zhejiang University, Hangzhou, China

Title: Protecting Deep Learning Model Copyrights With Adversarial Example-Free Reuse Detection

Abstract:
Model reuse techniques can reduce the resource requirements for training high-performance deep neural networks (DNNs) by leveraging existing models. However, unauthorized reuse and replication of DNNs can lead to copyright infringement and economic loss to the model owner. This underscores the need to analyze the reuse relation between DNNs and develop copyright protection techniques to safeguard intellectual property rights. Existing DNN copyright protection approaches suffer from several inherent limitations hindering their effectiveness in practical scenarios. For instance, existing white-box fingerprinting approaches cannot address the common heterogeneous reuse case where the model architecture is changed, and DNN fingerprinting approaches heavily rely on generating adversarial examples with good transferability, which is known to be challenging in the black-box setting. To bridge the gap, we propose a neuron functionality analysis-based reuse detector (NFARD), a neuron functionality (NF) analysis-based reuse detector, which only requires normal test samples to detect reuse relations by measuring the models’ differences on a newly proposed model characterization, i.e., NF. A set of NF-based distance metrics is designed to make NFARD applicable to both white-box and black-box settings. Moreover, we devise a linear transformation method to handle heterogeneous reuse cases by constructing the optimal projection matrix for dimension consistency, significantly extending the application scope of NFARD. To the best of our knowledge, this is the first adversarial example-free method that exploits NF for DNN copyright protection. As a side contribution, we constructed a reuse detection benchmark named Reuse Zoo that covers various practical reuse techniques and popular datasets. Extensive evaluations on this comprehensive benchmark show that NFARD achieves F1 scores of 0.984 and 1.0 for detecting reuse relationships in black-box and white-box settings, respectively, while generating test suites 2～ 99 times faster than previous methods.

PaperID: 688,

Authors: Chengpeng Hu, Ziming Wang, Bo Yuan, Jialin Liu, Chengqi Zhang, Xin Yao

Affiliations: Department of Industrial Engineering and Innovation Sciences, Eindhoven University of Technology, Eindhoven, The Netherlands; Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China; School of Data Science, Lingnan University, Hong Kong, SAR, China; Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong, SAR, China

Title: Robust Dynamic Material Handling via Adaptive Constrained Evolutionary Reinforcement Learning

Abstract:
Dynamic material handling (DMH) involves the assignment of dynamically arriving material transporting tasks to suitable vehicles in real time for minimizing makespan and tardiness. In real-world scenarios, historical task records are usually available, which enables the training of a decision policy on multiple instances consisting of historical records. Recently, reinforcement learning (RL) has been applied to solve DMH. Due to the occurrence of dynamic events such as new tasks, adaptability is highly required. Solving DMH is challenging since constraints, including task delay, should be satisfied. A feedback is received only when all tasks are served, which leads to sparse reward. Besides, making the best use of limited computational resources and historical records for training a robust policy is crucial. The time allocated to different problem instances would highly impact the learning process. To tackle those challenges, this article proposes a novel adaptive constrained evolutionary RL (ACERL) approach, which maintains a population of actors for diverse exploration. ACERL accesses each actor for tackling sparse rewards and constraint violation to restrict the behavior of the policy. Moreover, ACERL adaptively selects the most beneficial training instances for improving the policy. Extensive experiments on eight training and eight unseen test instances demonstrate the outstanding performance of ACERL compared with several state-of-the-art algorithms. Policies trained by ACERL can schedule the vehicles while fully satisfying the constraints. Additional experiments on 40 unseen noised instances show the robust performance of ACERL. Cross validation further presents the overall effectiveness of ACREL. Besides, a rigorous ablation study highlights the coordination and benefits of each ingredient of ACERL.

PaperID: 689,

Authors: Hanyi Xu, Linyan Dai, Yinyan Zhang

Affiliations: College of Cyber Security, Jinan University, Guangzhou, China

Title: Frobenius Norm-Based Robust Dynamic Neural Network for Time-Dependent Matrix Inversion

Abstract:
Time-dependent matrix inversion (TDMI) is popularly utilized in scientific fields. Considering the low computing costs and simplified structure, this brief puts forward a Frobenius norm-based dynamic neural network (FNBDNN) model to address a TDMI problem for the first time, which achieves convergence within finite time and ensures strong robustness without using integral operations and element-wise nonlinear activation functions. Moreover, precise theoretical analyses are provided to display the property of finite-time convergence of the FNBDNN model in dealing with the TDMI problem. Simulation experiments are further conducted to verify the validity and preponderance of the FNBDNN model. Finally, an application of the devised FNBDNN model to the precise motion control of a two-axis manipulator is introduced.

PaperID: 690,

Authors: Khoi Nguyen Tiet Nguyen, Wenyu Zhang, Kangkang Lu, Yuhuan Wu, Xingjian Zheng, Hui Li Tan, Liangli Zhen

Affiliations: Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Fusionopolis, Singapore; Institute for Infocomm Research, A*STAR, Fusionopolis, Singapore; Institute of High Performance Computing, A*STAR, Fusionopolis, Singapore

Title: A Survey and Evaluation of Adversarial Attacks in Object Detection

Abstract:
Deep learning models achieve remarkable accuracy in computer vision tasks yet remain vulnerable to adversarial examples—carefully crafted perturbations to input images that can deceive these models into making confident but incorrect predictions. This vulnerability poses significant risks in high-stakes applications such as autonomous vehicles, security surveillance, and safety-critical inspection systems. While the existing literature extensively covers adversarial attacks in image classification, comprehensive analyses of such attacks on object detection systems remain limited. This article presents a novel taxonomic framework for categorizing adversarial attacks specific to object detection architectures, synthesizes existing robustness metrics, and provides a comprehensive empirical evaluation of state-of-the-art attack methodologies on popular object detection models, including both traditional detectors and modern detectors with vision-language pretraining. Through rigorous analysis of open-source attack implementations and their effectiveness across diverse detection architectures, we derive key insights into attack characteristics. Furthermore, we delineate critical research gaps and emerging challenges to guide future investigations in securing object detection systems against adversarial threats. Our findings establish a foundation for developing more robust detection models while highlighting the urgent need for standardized evaluation protocols in this rapidly evolving domain.

PaperID: 691,

Authors: Wanlin Tan, Rui Luo, Zhinan Peng, Qiang Ling

Affiliations: Department of Automation, University of Science and Technology of China, Hefei, China; School of Electronic Information and Electrical Engineering, Chengdu University, Chengdu, China; School of Automation Engineering, Center for Robotics, University of Electronic Science and Technology of China, Chengdu, China

Title: Online Adaptive Optimal Control Algorithm Based on Weighted Policy Iteration

Abstract:
In this article, we propose a novel online learning algorithm based on weighted policy iteration (WPI) for addressing optimal control problems of nonlinear systems. WPI is proposed to deal with the influence of the neural network (NN) approximation error on the admissibility of the improved control policy. It is shown that the new iterative method can converge to the optimal solution uniformly. Utilizing NN approximation and experience replaying techniques, a WPI-based online learning algorithm is proposed. The new online algorithm distinguishes from previously known ones in that there can be fewer neurons in the hidden layer, giving rise to significant computational improvement. The assumption that the number of neurons needs to be sufficiently large can be dropped. Besides, instead of a standard persistently excited (PE) condition, only a relaxed PE condition is needed, which is also easier to check. Finally, numerical experiments are conducted to verify the effectiveness of our method.

PaperID: 692,

Authors: Zian Zhang, Yongqiang Zhang, Yancheng Bai, Man Zhang, Rui Tian, Yin Zhang, Mingli Ding, Wangmeng Zuo

Affiliations: School of Instrument Science and Engineering, Harbin Institute of Technology (HIT), Harbin, China; College of Computer Science (College of Software-College of Artificial Intelligence), Inner Mongolia University, Hohhot, China; Chinese Academy of Sciences, Institute of Software, Beijing, China; School of Computer Science and Technology, Harbin Institute of Technology (HIT), Harbin, China

Title: Revising Representation and Target Deviations for Accurate Human Pose Estimation

Abstract:
Owing to the normalized instance scales and robust supervision, heatmap-based human pose estimation (HPE) methods with top-down paradigm have achieved a dominant performance. However, there are two inherent deviations in the basic framework, i.e., representation and target deviations, resulting in performance bottlenecks. The representation deviation is caused by transforming various scales of instances into a unified input size, which results in performance degradation because data with different scale-related characteristics can hardly be handled via unified parameters. The target deviation is caused by exploiting a prior distribution (e.g., Gauss) to model the prediction error, which hinders sufficient network training. In this article, we propose a novel framework called DRPose to revise the abovementioned deviations. Specifically, to address the representation deviation, a scale-aware domain bridging (SDB) block is proposed to transfer feature maps from multiple scale-dependent domains into a unified intermediate domain with dynamic parameters. To address the target deviation, a differentiable coordinate decoder (DCD) is presented to adaptively adjust target distribution of heatmaps in an end-to-end manner. Extensive experiments show that the proposed method significantly improves the performance of most existing models with negligible additional cost. Beyond this, our method achieves 77.1% AP on the COCO test-dev set, outperforming prior works with similar model complexity.

PaperID: 693,

Authors: Tu Anh Ngo, Chuan Song Heng, Nandish Chattopadhyay, Anupam Chattopadhyay

Affiliations: CCDS, Nanyang Technological University Singapore, Jurong West, Singapore

Title: Persistence of Backdoor-Based Watermarks for Neural Networks: A Comprehensive Evaluation

Abstract:
Deep neural networks (DNNs) have gained considerable traction in recent years due to the unparalleled results they gathered. However, the cost behind training such sophisticated models is resource-intensive, resulting in many to consider DNNs to be intellectual property (IP) to model owners. In this era of cloud computing, high-performance DNNs are often deployed all over the Internet so that people can access them publicly. As such, DNN watermarking schemes, especially backdoor-based watermarks, have been actively developed in recent years to preserve proprietary rights. Nonetheless, there lies much uncertainty on the robustness of existing backdoor watermark schemes, toward both adversarial attacks and unintended means such as fine-tuning neural network models. One reason for this is that no complete guarantee of robustness can be assured in the context of backdoor-based watermark. In this article, we extensively evaluate the persistence of recent backdoor-based watermarks within neural networks in the scenario of fine-tuning, and we propose/develop a novel data-driven idea to restore watermark after fine-tuning without exposing the trigger set. Our empirical results show that by solely introducing training data after fine-tuning, the watermark can be restored if model parameters do not shift dramatically during fine-tuning. Depending on the types of trigger samples used, trigger accuracy can be reinstated to up to 100%. This study further explores how the restoration process works using loss landscape visualization, as well as the idea of introducing training data in the fine-tuning stage to alleviate watermark vanishing.

PaperID: 694,

Authors: Yuxiang Yang, Xinyi Zeng, Pinxian Zeng, Chen Zu, Binyu Yan, Jiliu Zhou, Yan Wang

Affiliations: School of Computer Science, Sichuan University, Chengdu, China; JD.com, Chengdu, China

Title: Adaptive Hardness-Driven Augmentation and Alignment Strategies for Multisource Domain Adaptations

Abstract:
Multisource domain adaptation (MDA) aims to transfer knowledge from multiple labeled source domains to an unlabeled target domain. Nevertheless, traditional methods primarily focus on achieving interdomain alignment through sample-level constraints, such as maximum mean discrepancy (MMD), neglecting three pivotal aspects: 1) the potential of data augmentation; 2) the significance of intradomain alignment; and 3) the design of cluster-level constraints. In this article, we introduce a novel hardness-driven strategy for MDA tasks, named \mathrm A^3\mathrm MDA , which collectively considers these three aspects through adaptive hardness quantification and utilization in both data augmentation and domain alignment. To achieve this, \mathrm A^3\mathrm MDA progressively proposes three adaptive hardness measurements (AHMs), i.e., basic, smooth, and comparative AHMs, each incorporating distinct mechanisms for diverse scenarios. Specifically, basic AHM aims to gauge the instantaneous hardness for each source/target sample. Then, hardness values measured by smooth AHM will adaptively adjust the intensity level of strong data augmentation to maintain compatibility with the model’s generalization capacity. In contrast, comparative AHM is designed to facilitate cluster-level constraints. By leveraging hardness values as sample-specific weights, the traditional MMD is enhanced into a weighted-clustered variant, strengthening the robustness and precision of interdomain alignment. As for the often-neglected intradomain alignment, we adaptively construct a pseudo-contrastive matrix (PCM) by selecting harder samples based on the hardness rankings, enhancing the quality of pseudo-labels, and shaping a well-clustered target feature space. Experiments on multiple MDA benchmarks show that \mathrm A^3\mathrm MDA outperforms other methods.

PaperID: 695,

Authors: Weiming Wu, Zhirui Li, Chen Sun, Cong Wang, Guanrong Chen

Affiliations: School of Control Science and Engineering, Shandong University, Jinan, China; Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China

Title: Rapid Dynamical Pattern Classification via Deterministic Learning From Sampling Sequences

Abstract:
This article is concerned with the rapid classification issue for dynamical patterns consisting of sampling sequences in a relatively large-scale dynamical dataset constructed by benchmark Rossler systems. Specifically, based on a recently developed deterministic learning mechanism, a rapid dynamical pattern classification method is developed, which contains a modeling stage and a classification stage. In the modeling stage, a deterministic learning scheme is employed to accurately learn/model the inherent dynamics of the training dynamical patterns and store the acquired knowledge in a set of constant radial basis function (RBF) networks. In the classification stage, based on the trained RBF networks, a set of dynamical estimators is developed for real-time dynamic comparison. The generating recognition errors are then used to effectively represent the dynamic differences in real-time. To this end, the associated class label of the minimum recognition error is assigned to the test pattern also in real-time. To demonstrate the effectiveness of the proposed method, a relatively large-scale dynamical pattern dataset containing various dynamical behaviors is constructed by utilizing a deterministic chaos prospector (DCP) technique. The simulation results show that the new method achieves competitive classification performances compared to the state-of-the-art time-series classification method for the dynamical system classification task. In addition to performance advantages, the new method can perform real-time time-series classification with the first 10% of data achieving over 95% of accuracy based on the full-length data. Besides, the superiority of our method is demonstrated from various datasets in the UCR time-series classification (TSC) archive.

PaperID: 696,

Authors: Qi Zang, Shuang Wang, Dong Zhao, Wanqing Li, Zining Wang, Dou Quan, Fei Liang, Licheng Jiao

Affiliations: Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, School of Artificial Intelligence, Xidian University, Xi’an, Shaanxi, China; School of Computing and Information Technology, University of Wollongong, Wollongong, NSW, Australia

Title: Boosting Generalization of Semantic Segmentation With Unseen Style Seeking-Based Meta-Learning

Abstract:
This article considers a worst and most challenging scene in domain generalization (DG), where a model aims to generalize well on unseen domains while only one single domain is available for training. Existing randomization-based methods achieve this goal by enriching the style of the training data. However, they fail to guarantee the diversity of newly generated data required for generalization and thus lead to insufficient expansion of the training distribution. Thus, we propose a novel single DG (SDG) framework, unseen style seeking-based meta-learning (USSML). In USSML, multiple plausible domains with various styles are first constructed from a single source domain and the combination is performed across generated domains to emulate unseen images, extending the distribution boundaries of the source domain. The domain combination is performed at two levels, i.e., global and instance, to meet the generalization challenge in semantic segmentation. Then, the generated diverse domains are further exploited to force the model to optimize in an unbiased manner across all domains by relearning regions lacking domain-invariant representation capability, driving the model toward domain invariance. A point worth mentioning is that the proposed method is easily integrated into existing segmentation methods with little computational cost to improve their generalization. Extensive experiments are conducted on five popular segmentation datasets and the results have verified the effectiveness of USSML in improving the model’s generalization and the superiority of USSML over existing works.

PaperID: 697,

Authors: Yixiao Xu, Mohan Li, Binxing Fang, Yuan Liu, Zhihong Tian

Affiliations: School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China; Cyberspace Institute of Advanced Technology, Guangdong Key Laboratory of Industrial Control System Security and Huangpu Research School, Guangzhou University, Guangzhou, China

Title: Neural Honeypoint: An Active Defense Framework Against Model Inversion Attacks

Abstract:
Learning-based systems have been proved to be vulnerable against model inversion attacks (MIAs), where attackers steal private information of training data by querying the target model using synthetic samples. To alleviate the urgent threat introduced by MIAs, existing advancements are proposed to increase the attack overhead by limiting the information available. Although these methods successfully reduced the attack success rate (ASR) for a one-time inversion attempt, they usually compromise the usability of the protected model. More importantly, existing MIA defense methods fail to capture attack attempts, which can lead to persistent threats to data privacy. To bridge this gap, we propose Neural Honeypoint, an active defense framework against MIAs. The key insight is that MIA attackers will make a series of forward steps in the feature space while benign users will not. Motivated by the observation, defenders can deploy active defense devices (honeypoints) on critical paths to capture attack behaviors. Specifically, Neural Honeypoint first models the attackers’ capabilities from the frequency domain and designs specialized honeypoints for protected classes in the training dataset. Subsequently, it deploys these honeypoints into the protected model via backdoor-like model fine-tuning. Then, defenders can distinguish model inversion examples by comparing the similarity of input features with deployed honeypoints. Experiments show that Neural Honeypoint reduces the ASRs of advanced MIAs to 0%~2%. Furthermore, it can effectively capture inversion queries, which helps defenders to detect and block attacks in time.

PaperID: 698,

Authors: Ying Sun, Hengshu Zhu, Hui Xiong

Affiliations: Thrust of Artificial Intelligence, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China; Computer Network Information Center, Chinese Academy of Sciences, Beijing, China

Title: Toward Faithful Neural Network Intrinsic Interpretation With Shapley Additive Self-Attribution

Abstract:
Self-interpreting neural networks have attracted significant attention from the research community. Along this line, extensive works inherently share the intuitive principle of linear contribution aggregation from diversified perspectives, while often: 1) lacking a solid theoretical foundation ensuring genuine interpretability and 2) compromising model expressiveness. In response, we propose a generic additive self-attribution (ASA) framework to encapsulate the characteristics of various works in this field and underscore the absence of the Shapley value attribution. To fill in this gap, we propose a novel Shapley additive self-attributing neural network (SASANet). SASANet models meaningful outputs for arbitrary-numbered observable features, naturally leading to an unapproximated value function for Shapely value. Designing an intermediate sequential schema based on marginal contributions (MCs) and internal distillation procedure, we theoretically prove that the intermediate self-attribution value converging to the output’s Shapley values. Finally, we conduct extensive experiments on multiple public datasets. The experimental results clearly demonstrate SASANet, being highly interpretable, outperforms existing self-attributing models in performance and is comparable with commonly adopted closed-box models. In addition, compared with adopting post hoc interpretation methods, SASANet’s self-attribution provides a more accurate and efficient interpretation for its own predictions. To the best of the authors’ knowledge, this is the first self-interpreting neural network structure that achieves modelwise Shapley attribution. Our code is available at: https://anonymous.4open.science/r/SASANet-B343

PaperID: 699,

Authors: Shuai Zhou, Dayong Ye, Tianqing Zhu, Wanlei Zhou

Affiliations: Faculty of Data Science, City University of Macau, Macau, China; Centre of Cyber Security and Privacy and the School of Computer Science, University of Technology Sydney, Ultimo, NSW, Australia

Title: Defending Against Neural Network Model Inversion Attacks via Data Poisoning

Abstract:
Model inversion attacks pose a significant privacy threat to machine learning models by reconstructing sensitive data from their outputs. While various defenses have been proposed to counteract these attacks, they often come at the cost of the classifier’s utility, thus creating a challenging trade-off between privacy protection and model utility. Moreover, most existing defenses require retraining the classifier for enhanced robustness, which is impractical for large-scale, well-established models. This article introduces a novel defense mechanism to better balance privacy and utility, particularly against adversaries who employ a machine learning model (i.e., inversion model) to reconstruct private data. Drawing inspiration from data poisoning attacks, which can compromise the performance of machine learning models, we propose a strategy that leverages data poisoning to contaminate the training data of inversion models, thereby preventing model inversion attacks. Two defense methods are presented. The first, termed label-preserving poisoning attacks for all output vectors (LPA), involves subtle perturbations to all output vectors while preserving their labels. Our findings demonstrate that these minor perturbations, introduced through a data poisoning approach, significantly increase the difficulty of data reconstruction without compromising the utility of the classifier. Subsequently, we introduce a second method, label-flipping poisoning for partial output vectors (LFP), which selectively perturbs a small subset of output vectors and alters their labels during the process. Empirical results indicate that LPA is notably effective, outperforming the current state-of-the-art defenses. Our data poisoning-based defense provides a new retraining-free defense paradigm that preserves the victim classifier’s utility.

PaperID: 700,

Authors: Zihong Sun, Hong Wang, Qi Xie, Yefeng Zheng, Deyu Meng

Affiliations: School of Mathematics and Statistics and the Ministry of Education Key Laboratory of Intelligent Networks and Network Security, Xi’an Jiaotong University, Xi’an, Shaanxi, China; School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, China; Medical Artificial Intelligence Laboratory, West Lake University, Hangzhou, China

Title: RSF-Conv: Rotation-and-Scale Equivariant Fourier Parameterized Convolution for Retinal Vessel Segmentation

Abstract:
Retinal vessel segmentation is of great clinical significance for the diagnosis of many eye-related diseases, but it is still a formidable challenge due to the intricate vascular morphology. With the skillful characterization of the translation symmetry existing in retinal vessels, convolutional neural networks (CNNs) have achieved great success in retinal vessel segmentation. However, the rotation-and-scale symmetry, as a more widespread image prior in retinal vessels, fails to be characterized by CNNs. Therefore, we propose a rotation-and-scale equivariant Fourier parameterized convolution (RSF-Conv) specifically for retinal vessel segmentation and provide the corresponding equivariance analysis. As a general module, RSF-Conv can be integrated into existing networks in a plug-and-play manner while significantly reducing the number of parameters. For instance, we replace the traditional convolution filters in U-Net, Iter-Net, DE-DCGCN-EE, and FR-UNet, with RSF-Convs, and faithfully conduct comprehensive experiments. RSF-Conv-enhanced methods not only have slight advantages under in-domain evaluation but also, more importantly, outperform all comparison methods by a significant margin under out-of-domain evaluation. It indicates that the remarkable generalization of RSF-Conv holds greater practical clinical significance for the prevalent cross-device and cross-hospital challenges in clinical practice. To comprehensively demonstrate the effectiveness of RSF-Conv, we also apply RSF-Conv + U-Net and RSF-Conv + Iter-Net to retinal artery/vein classification and achieve promising performance as well, indicating its clinical application potential. The code is available at https://github.com/szhc0gk/RSF-Conv

PaperID: 701,

Authors: Chunjing Xiao, Jiahui Lu, Xovee Xu, Fan Zhou, Tianshu Xie, Wei Lu, Lifeng Xu

Affiliations: School of Computer and Information Engineering, Henan University, Kaifeng, China; School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China; Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People’s Hospital, Quzhou, China

Title: Reconciling Attribute and Structural Anomalies for Improved Graph Anomaly Detection

Abstract:
Graph anomaly detection is critical in domains such as healthcare and economics, where identifying deviations can prevent substantial losses. Existing unsupervised approaches strive to learn a single model capable of detecting both attribute and structural anomalies. However, they confront the tug-of-war problem between two distinct types of anomalies, resulting in suboptimal performance. This work presents TripleAD, a mutual distillation-based triple-channel graph anomaly detection framework. It includes three estimation modules to identify the attribute, structural, and mixed anomalies while mitigating the interference between different types of anomalies. In the first channel, we design a multiscale attribute estimation module to capture extensive node interactions and ameliorate the over-smoothing issue. To better identify structural anomalies, we introduce a link-enhanced structure estimation module in the second channel that facilitates information flow to topologically isolated nodes. The third channel is powered by an attribute-mixed curvature, a new indicator that encapsulates both attribute and structural information for discriminating mixed anomalies. Moreover, a mutual distillation strategy is introduced to encourage communication and collaboration between the three channels. Extensive experiments demonstrate the effectiveness of the proposed TripleAD model against strong baselines.

PaperID: 702,

Authors: Meng Xu, Xinhong Chen, Zihao Wen, Weiwei Fu, Jianping Wang

Affiliations: Department of Computer Science, City University of Hong Kong, Hong Kong, SAR, China

Title: A Two-Stage Selective Experience Replay for Double-Actor Deep Reinforcement Learning

Abstract:
Deep reinforcement learning (DRL) has been widely applied to various applications, but improving the exploration and the accuracy of Q-value estimation remain key challenges. Recently, the double-actor architecture has emerged as a promising DRL framework that can enhance both exploration and Q-value estimation. Existing double-actor DRL methods sample from the replay buffer to update the two actors; however, the samples used to update each actor are generated by its previous versions and the other actor, resulting in a different data distribution compared with the current actor being updated, which can negatively impact the actor’s update and lead to suboptimal policies. To this end, this work proposes a generic solution that can be seamlessly integrated into existing double-actor DRL methods to mitigate the adverse effects of data distribution differences on actor updates, thereby learning better policies. Specifically, we decompose the updates of double-actor DRL methods into two stages, each of which uses the same sampling approach to train a pair of actor-critic. This sampling approach classifies the samples in the replay buffer into distinct categories using a clustering technique, such as K-means, and subsequently employs the Jensen-Shannon (JS) divergence to evaluate the distributional differences between each sample category and the actor currently being updated. Samples are then prioritized from the categories with smaller distribution differences to the current actor to update it. In this way, we can effectively mitigate the distribution difference between the samples and the current actor being updated. Experiments demonstrate that our method enhances the performance of five state-of-the-art (SOTA) double-actor DRL methods and outperforms eight SOTA single-actor DRL methods across eight tasks.

PaperID: 703,

Authors: Jing Zhang, Yumo Kang, Wenxuan Liu, Zhe Wang

Affiliations: Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China

Title: Scale-Wise Semantic Alignment Enhanced Multigrained Adaptive Fusion for Virtual Try-On

Abstract:
Image-based virtual try-on aims to fit garments onto a target person accurately and naturally while preserving the textural details of the garment. Inspired by the dynamic perception process of the human visual system, which transitions from global perception to local details, we propose a novel multigrained adaptive fusion network for virtual try-on framework named MA-VITON. MA-VITON precisely aligns clothing semantic features with human body parts across different scales, reduces unrealistic textures caused by garment distortion, and employs coarse-to-fine clothing features to progressively guide the generation of try-on results. To achieve this, we introduce a scale-wise semantic alignment (SSA) module that extracts local features of clothing and the target person at various scales using flexible query strategies. It learns semantic correspondences between garments and the human body in the latent space through parallel bidirectional interactions, ensuring accurate feature alignment. Additionally, we propose a multigrained adaptive fusion (MAF) module, which identifies critical garment regions using a polyscale attention mechanism and allocates more tokens to adaptively preserve intricate textural details. Extensive experiments on multiple widely used public datasets demonstrate that MA-VITON achieves outstanding performance and surpasses state-of-the-art methods. The code is publicly available at https://github.com/Max-Teapot/MA-VITON.

PaperID: 704,

Authors: Yuepeng Chen, Weiping Ding, Jiashuang Huang, Wei Zhang, Tianyi Zhou

Affiliations: School of Information Science and Technology, Nantong University, Nantong, China; School of Artificial Intelligence and Computer Science, Nantong University, Nantong, China

Title: Multigranularity Fuzzy Autoencoder for Discriminative Feature Selection in High-Dimensional Data

Abstract:
Biological datasets, such as gene expression data, often suffer from high dimensionality, containing numerous irrelevant or redundant features that can lead to overfitting and increased computational complexity. Effective feature selection is essential for reducing dimensionality, enhancing model performance, and improving interpretability. While deep neural networks, such as autoencoders, have shown promise in feature selection, their performance often diminishes when confronted with noisy data. To address these challenges, we propose a novel feature selection method that leverages multigranularity fuzzy autoencoders (FAEs). This approach integrates fuzzy theory with autoencoder models to effectively manage noise and outliers in data. The FAE method introduces a feature selection layer that approximates discrete feature selection using continuous probability distributions. To further enhance the discriminative power of the selected features, we incorporate a coarse-grained loss function designed to exploit clustering structures. In addition, intuitionistic fuzzy weights are applied to account for uncertainty by computing membership and nonmembership degrees for each sample, thereby mitigating the impact of noise and outliers. Test results validate the effectiveness of our approach, demonstrating significant improvements over existing feature selection techniques across 20 public datasets and a real-world schizophrenia dataset. These findings highlight the potential of our method to enhance classification accuracy and robustness, particularly in the context of schizophrenia research.

PaperID: 705,

Authors: Haoyu Ji, Bowen Chen, Wenze Huang, Weihong Ren, Zhiyong Wang, Honghai Liu

Affiliations: State Key Laboratory of Robotics and Systems, Harbin Institute of Technology, Shenzhen, China

Title: Snippet-Aware Transformer With Multiple Action Elements for Skeleton-Based Action Segmentation

Abstract:
The skeleton-based temporal action segmentation (STAS) aims to densely segment and classify human actions within lengthy untrimmed skeletal motion sequences. Current methods primarily rely on graph convolutional networks (GCNs) for intraframe spatial modeling and temporal convolutional networks (TCNs) for interframe temporal modeling to discern motion patterns. However, these approaches often overlook the distinctive nature of essential action elements across various actions, including engaged core body parts and key subactions. This oversight limits the ability to distinguish different actions within a given sequence. To address these limitations, the snippet-aware Transformer with multiple action element (ME-ST) is proposed to enhance the discrimination and segmentation among actions, which leverages intrasnippet attention along joints and sequences to identify core joints and key subactions at different scales. Specifically, in terms of the spatial domain, the intrasnippet cross-joint attention (CJA) module divides the sequence into distinct snippets and computes attention to establish intricate joint semantic relationships, emphasizing the identification of core motion joints. In terms of the temporal domain, in the encoder, the intrasnippet cross-frame attention (CFA) module segments the sequence in a blockwise expansion manner and establishes interframe relationships to highlight the most discriminative frames. In the decoder, clip-level representations at various temporal scales are initially generated through an hourglass-like sampling process, followed by the intrasnippet cross-scale attention (CSA) module to integrate the key clip information across different time scales. The performance evaluation on five public datasets demonstrates that ME-ST achieves state-of-the-art (SOTA) performance.

PaperID: 706,

Authors: Lulu Gong, Xudong Chen, ShiNung Ching

Affiliations: Department of Electrical and Systems Engineering, Washington University, St. Louis, MO, USA

Title: Strong Anti-Hebbian Plasticity Alters the Convexity of Network Attractor Landscapes

Abstract:
In this brief, we study recurrent neural networks in the presence of pairwise learning rules. We are specifically interested in how the attractor landscapes of such networks become altered as a function of the strength and nature (Hebbian versus anti-Hebbian) of learning, which may have a bearing on the ability of such rules to mediate large-scale optimization problems. Through formal mathematical analysis, we show that a transition from Hebbian to anti-Hebbian learning brings about a pitchfork bifurcation that destroys convexity in the network attractor landscape. In larger scale settings, this implies that anti-Hebbian plasticity will bring about multiple stable equilibria, and such effects may be outsized at interconnection or “choke” points. Furthermore, attractor landscapes are more sensitive to slower learning rates than faster ones. These results provide insight into the types of objective functions that can be encoded via different pairwise plasticity rules.

PaperID: 707,

Authors: Lele Ma, Xiangjie Liu, Furong Gao, Kwang Y. Lee

Affiliations: State Key Laboratory of Alternate Electrical Power System with Renewable Energy Sources, North China Electric Power University, Beijing, China; Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Sai Kung, Hong Kong; Department of Electrical and Computer Engineering, Baylor University, Waco, TX, USA

Title: Data-Driven Iterative Learning Model Predictive Control With Self-Modified Prior Knowledge

Abstract:
Iterative learning model predictive control (ILMPC) has become an excellent data-driven intelligent control strategy for digitized batch manufacturing, featured by the progressive improvement of tracking performance along trials, and the persistent rejection of stochastic disturbance along time. The point-to-point learning mechanism of existing ILMPC generally relies on identical operating conditions along trials to guarantee the integrity and accuracy of historical data. However, the variations of production requirements usually lead to trial-varying operating references and durations, resulting in incomplete and inaccurate historical information for the iterative learning of subsequent trials. To promote the adaptability and flexibility of ILMPCs with unconformable prior information, a data-driven self-modification scheme is originally embedded into ILMPC in this article to transfer the prior knowledge contained in the historical operating data into the form consistent with the condition of each current trial. The control actions are imitated along trials by an adaptive deep neural network (DNN), which is then utilized to generate reference control signals for iterative learning in each trial. For attenuating the influence of the considerable DNN approximation error in early trials with limited data accumulation, the 2-D optimization of ILMPC is performed under a tube control frame to ensure the time-domain bounded stability. Based on the intrinsic recursive feasibility and the guaranteed time-domain stability, the iteration-domain bounded convergence of the developed ILMPC system is theoretically validated. Simulations on the nonlinear injection molding process verify the superiority of the proposed method in adapting to significant changes in operating reference and duration.

PaperID: 708,

Authors: Huisi Wu, Zebin Zhao

Affiliations: College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China

Title: EPSegNet: Lightweight Semantic Recalibration and Assembly for Efficient Polyp Segmentation

Abstract:
Colorectal cancer (CRC) is among the most common malignancies and the detection and removal of polyps at the early stage is of great importance to prevent it. However, current state-of-the-art high-accuracy methods for polyps segmentation have a large number of parameters and a stringent requirement for computational cost, while lightweight and fast models significantly sacrifice accuracy. Currently, medical semantic segmentation algorithms are mostly based on encoder-decoder architecture. Pixelwise spatial information has been proven to be very important to the quality of features extracted by encoders. However, almost all existing approaches capturing it suffer from high computational complexity. Furthermore, the capacity of the traditional decoder is limited by its limited receptive fields. To comprehensively address the above problems, we propose a novel efficient polyp segmentation network (EPSegNet) to simultaneously fulfill the requirements of accuracy, size, and speed. First, we propose a lightweight feature extraction and recalibration module (LFERM), which can efficiently extract dense multiscale features. Specifically, in LFERM, we propose a spatial information recalibration (SIR) block for efficiently refining spatial information. Based on LFERMs, we develop an encoder. Moreover, we propose a novel lightweight semantic assembly decoder (LSAD) that assembles both channelwise and pixelwise semantics from a global context view. Finally, we combine the encoder and LSAD to form the proposed EPSegNet. Experiments on Kvasir-SEG, CVC-ClinicDB, and CVC-ColonDB datasets demonstrate that the proposed EPSegNet achieves the best balance between accuracy and size among state-of-the-art models and obtains a fast speed for polyp segmentation. Without any pretraining and postprocessing, our method achieves 79.37% intersection over union (IoU) and 86.74% Dice on the Kvasie-SEG dataset with only 0.34 million parameters and a speed of 128 frames/s (FPS) at the input size of 3 × 384 × 384 on a single NVIDIA GEFORCE RTX 2080Ti card. Codes will be released upon publication.

PaperID: 709,

Authors: Xinyi Zhang, Wei Wei, Shuang Qiu, Xujin Li, Yijun Wang, Huiguang He

Affiliations: Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing, China; State Key Laboratory on Integrated Optoelectronics, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, China

Title: Enhancing SSVEP-Based BCI Performance via Consensus Information Transfer Among Subjects

Abstract:
The brain-computer interface (BCI) based on steady-state visual evoked potential (SSVEP) has received considerable attention for its high communication speed. While large datasets provide an important opportunity to enhance decoding accuracies, the key challenge lies in the exploration of existing data to extract valuable information based on the distinctive characteristics of brain responses. In this study, we introduce ConsenNet, a framework designed to enhance SSVEP classification performance by leveraging information from the diverse perspectives of existing subjects. First, this study exploits the diversity of existing subjects to generate new samples, which retain both task-related components and variability. This effectively enhances the network generalization capability on new subjects. Second, the structured knowledge that encapsulates the interrelationships between categories has been constructed and then transferred from the teacher network to the student network, guiding the student network to extract invariant features across subjects. Finally, our model incorporates a small amount of new subject data for model calibration in the final stage. Offline experiments conducted on three public datasets demonstrate the superiority of ConsenNet over 19 methods compared in this study, while online experiments validate its feasibility for real-world applications.

PaperID: 710,

Authors: Bing Song, Yichen Zhou, Hongbo Shi, Yang Tao, Shuai Tan

Affiliations: Key Laboratory of Smart Manufacturing in Energy Chemical Process of the Ministry of Education, East China University of Science and Technology, Shanghai, China

Title: A Soft Sensor for Multirate Quality Variables Based on MC-CNN

Abstract:
In recent years, data-driven soft sensor modeling methods have been widely used in industrial production, chemistry, and biochemical. In industrial processes, the sampling rates of quality variables are always lower than those of process variables. Meanwhile, the sampling rates among quality variables are also different. However, few multi-input multi-output (MIMO) sensors take this temporal factor into consideration. To solve this problem, a deep-learning (DL) model based on a multitemporal channels convolutional neural network (MC-CNN) is proposed. In the MC-CNN, the network consists of two parts: the shared network used to extract the temporal feature and the parallel prediction network used to predict each quality variable. The modified BP algorithm makes the blank values generated at unsampled moments not participate in the backpropagation (BP) process during training. By predicting multiple quality variables of two industrial cases, the effectiveness of the proposed method is verified.

PaperID: 711,

Authors: Rihao Chang, Yongtao Ma, Weizhi Nie, Jie Nie, Yiqun Zhu, An-An Liu

Affiliations: School of Microelectronics, Tianjin University, Tianjin, China; School of Electrical and Information Engineering, Tianjin University, Tianjin, China; School of Information Science and Engineering, Ocean University of China, Qingdao, Shandong, China; State Grid Tianjin Electric Power Company, Tianjin, China

Title: Causal Disentanglement-Based Hidden Markov Model for Cross-Domain Bearing Fault Diagnosis

Abstract:
In the predictive maintenance of modern industries, accurate fault diagnosis under complex conditions is now a major research focus. Recent research has demonstrated the effectiveness of deep learning in advancing bearing fault diagnosis. However, due to the scarcity of industrial failure data, achieving robust generalization in complex working conditions remains a challenge. To address this, we propose the causal disentanglement-based hidden Markov model (CDHM), which is designed to recognize the underlying causality in bearing vibration signals, capturing essential fault patterns for a more accurate and generalizable fault representation. Compared to signal-processing methods, deep learning approaches bypass the complex signal analysis, yet overlook the significance of signal theories in precise fault diagnosis. Nevertheless, the bearing vibration mechanism sheds light on the fact that the vibration induced by a certain type of fault has a consistent pattern across different system conditions, while the fault-irrelevant vibration such as noise and interference varies. Therefore, the CDHM constructs a time-series structural causal model (SCM), offering a new perspective on the interconnections of bearing vibration signals. Based on the SCM, a hidden Markovian variational autoencoder (VAE) is designed to progressively disentangle the vibration signal into two parts: a fault-relevant representation capturing essential bearing fault characteristics, and a fault-irrelevant representation capturing system and environmental interference. While unsupervised causal disentanglement typically presents optimization challenges, the CDHM benefits from cross-domain fault diagnosis tasks by leveraging the cross-domain consistency of the fault-relevant representation and the domain sensitivity of the fault-irrelevant representation. This design aligns the optimization objectives of causal disentanglement learning and cross-domain transfer learning, enabling mutually reinforcing optimization and ensuring robust generalization across diverse operating conditions. We validate the CDHM through experiments on the Case Western Reserve University (CWRU), Intelligent Maintenance System (IMS), and Paderborn University (PU) datasets, demonstrating its strong potential for industrial applications.

PaperID: 712,

Authors: Dongdong Li, Shengyao Huang, Li Xie, Zhe Wang, Jiazhen Xu

Affiliations: Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China; Cancer Institute, The Affiliated Hospital of Qingdao University, Qingdao, Shandong, China

Title: Neuron Perception Inspired EEG Emotion Recognition With Parallel Contrastive Learning

Abstract:
Considerable interindividual variability exists in electroencephalogram (EEG) signals, resulting in challenges for subject-independent emotion recognition tasks. Current research in cross-subject EEG emotion recognition has been insufficient in uncovering the shared neural underpinnings of affective processing in the human brain. To address this issue, we propose the parallel contrastive multisource domain adaptation (PCMDA) model, inspired by the neural representation mechanism in the ventral visual cortex. Our model employs a neuron-perception-inspired contrastive learning architecture for EEG-based emotion recognition in subject-independent scenarios. A two-stage alignment methodology is employed for the purpose of aligning numerous source domains with the target domain. This approach integrates a parallel contrastive loss (PCL) which simulates the self-supervised learning mechanism inherent in the neural representation of the human brain. Furthermore, a self-attention mechanism is integrated to extract emotion weights for each frequency band. Extensive experiments were conducted on three publicly available EEG emotion datasets, SJTU emotion EEG dataset (SEED), database for emotion analysis using physiological signals (DEAP), and finer-grained affective computing EEG dataset (FACED), to evaluate our proposed method. The results demonstrate that the PCMDA effectively utilizes the unique EEG features and frequency band information of each subject, leading to improved generalization across different subjects in comparison to other methods.

PaperID: 713,

Authors: Hongbo Yin, Daxin Tian, Chunmian Lin, Xuting Duan, Jianshan Zhou, Dezong Zhao, Dongpu Cao

Affiliations: State Key Laboratory of Intelligent Transportation System, Beijing Key Laboratory for Cooperative Vehicle Infrastructure Systems and Safety Control, School of Transportation Science and Engineering, Beihang University, Beijing, China; James Watt School of Engineering, University of Glasgow, Glasgow, U.K.; School of Vehicle and Mobility, Tsinghua University, Beijing, China

Title: CUDA-X: Unsupervised Domain-Adaptive Vehicle-to-Everything Collaboration via Knowledge Transfer and Alignment

Abstract:
Recently emerged vehicle-to-everything (V2X) perception has revealed great potential to overcome the limitation of single-vehicle intelligence aided by vigorous interaction among on-road agents, while prior endeavors are practically developed on parameter-specific simulation or configuration-dynamic real-world setting, overlooking the transferability across various scenarios. In this article, we propose unsupervised domain-adaptive vehicle-to-everything collaboration framework dubbed CUDA-X, which is built on top of a de facto collective model with key-point information exchange and instance adaptation. Specifically, collaborative knowledge transfer (CKT) is responsible for domain-agnostic feature reconstruction from nearby car or infrastructure by spatial–channel pooling operation in an elementwise manner. To promote the candidate alignment, a brand-new bin-based location correction (BLC) provides an auxiliary supervision for cross-dataset box refinement via residual coordinate encoding (RCE), and category-aware pooling alignment (CPA) is further designed for pulling the category-specific instance closer between source and target samples. We benchmark CUDA-X against the counterparts on four prevalent cooperative perception datasets, i.e., OPV2V, V2X-Sim, V2V4Real, and DAIR-V2X: it establishes the new state-of-the-art vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) performances regardless of simulation or reality. We expect that this appealing attempt would provide an in-depth insight into domain generalization in the context of multiagent perception, and the code is publicly available soon.

PaperID: 714,

Authors: Renjie Zhang, Di Lin, Xin Wang, Ruonan Liu, Bin Sheng, George Baciu, C. L. Philip Chen, Ping Li

Affiliations: Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong; College of Intelligence and Computing, Tianjin University, Tianjin, China; Department of Automation, Shanghai Jiao Tong University, Shanghai, China; Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China; Department of Computing, School of Design, Research Institute for Sports Science and Technology, The Hong Kong Polytechnic University, Hung Hom, Hong Kong

Title: Temporal-Interim Pose Synthesis and Distillation for Dynamic Human Pose Estimation

Abstract:
In the task of dynamic human pose estimation (dynamic HPE), the temporal relationships between human body parts should be captured comprehensively to understand the dynamic human motions, where the correlated motion information eventually helps to recognize body parts. The popular methods are successful in terms of utilizing long-term motion information captured by low-speed cameras. Yet they neglect the underlying intermediate motions between captured frames, which comprise the temporal-interim poses lost in the video. In this article, we introduce a novel framework, temporal-interim pose synthesis and distillation, to produce and leverage the intermediate motion information for dynamic motion establishment. The pose synthesis yields the visual feature maps of the intermediate poses, which appear between the existing video frames. It allows the synthesized and current poses to form richer motion patterns. Next, the pose distillation divides the body parts into several groups, where it learns the specific part-wise relationship within each group. It degrades the complexity of learning useful part-wise relationships from rich motion patterns and extracts more detailed motion information for fine-grained part groups. We extensively evaluate our method on challenging datasets for dynamic pose estimation, achieving state-of-the-artresults.

PaperID: 715,

Authors: Zheshun Wu, Zenglin Xu, Dun Zeng, Qifan Wang, Jie Liu

Affiliations: School of Computer Science and Technology, Harbin Institute of Technology Shenzhen, Shenzhen, China; Fudan University, Shanghai, China; Department of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China; FaceBook AI, Menlo Park, CA, USA; Institute for Artificial Intelligence and the National Key Laboratory of Smart Farming Technology and Systems, Harbin Institute of Technology (Shenzhen), Shenzhen, China

Title: Advocating for the Silent: Enhancing Federated Generalization for Nonparticipating Clients

Abstract:
Federated learning (FL) has surged in prominence due to its capability of collaborative model training without direct data sharing. However, the vast disparity in local data distributions among clients, often termed the nonindependent identically distributed (Non-IID) challenge, poses a significant hurdle to FL’s generalization efficacy. The scenario becomes even more complex when not all clients participate in the training process, a common occurrence due to unstable network connections or limited computational capacities. This can greatly complicate the assessment of the trained models’ generalization abilities. While a plethora of recent studies has centered on the generalization gap pertaining to unseen data from participating clients with diverse distributions, the distinction between the training distributions of participating clients and the testing distributions of nonparticipating ones has been largely overlooked. In response, our paper unveils an information-theoretic generalization framework for FL. Specifically, it quantifies generalization errors by evaluating the information entropy of local distributions and discerning discrepancies across these distributions. Inspired by our deduced generalization bounds, we introduce a weighted aggregation approach and a duo of client selection strategies. These innovations are designed to strengthen FL’s ability to generalize and thus ensure that trained models perform better on nonparticipating clients by incorporating a more diverse range of client data distributions. Our extensive empirical evaluations reaffirm the potency of our proposed methods, aligning seamlessly with our theoretical construct.

PaperID: 716,

Authors: Tian-Yu Ma, Heng-Chao Li, Yu-Bang Zheng, Qian Du, Antonio Plaza

Affiliations: School of Information Science and Technology, Southwest Jiaotong University, Chengdu, China; Department of Electrical and Computer Engineering, Mississippi State University, Starkville, MS, USA; Department of Technology of Computers and Communications, Hyperspectral Computing Laboratory, Escuela Polité cnica, University of Extremadura, Cáceres, Spain

Title: Fully Tensorized Lightweight ConvLSTM Neural Networks for Hyperspectral Image Classification

Abstract:
Convolutional long short-term memory (ConvLSTM) possesses a remarkable capability of encoding spatial information and capturing long-range dependencies in sequential data. As a result, ConvLSTM has garnered success in hyperspectral image (HSI) classification. Nonetheless, the design of the special gate structures and convolution operations contributes to a high model complexity, making it challenging to deploy in resource-constrained environments. In this article, we propose a fully tensorized ConvLSTM model for HSI spatial-spectral classification under the premise of low complexity. First, we devise a novel and efficient tensor-sequenced convolution in the tensor train (TT) format, called ETTConv. ETTConv can reduce the number of parameters and computations in the standard convolutional layer by tensorizing the convolution kernels and mapping them to a series of smaller ones. Building upon this innovation, we present a novel ETTConvLSTM unit, formed by jointly compressing all weight tensors within the recurrent units. Using it as the fundamental unit, we construct the lightweight a efficient tensor train ConvLSTM 2-D neural network (ETTCL2DNN) model, characterized by reduced complexity without compromised classification performance. Furthermore, to better preserve the joint spatial-spectral structure of HSI data, we extend the ETTConv layer and the ETTConvLSTM unit to their 3-D versions, resulting in a new lightweight a efficient tensor train ConvLSTM 3-D neural network (ETTCL3DNN) model. Extensive quantitative experimental results on three widely used HSI datasets demonstrate the superiority of the proposed methods, exhibiting enhanced classification performance with reduced model complexity.

PaperID: 717,

Authors: Yugen Yi, Ningyi Zhang, Zehui Zhang, Yijian Fu, Lei Chen, Jianzhong Wang

Affiliations: School of Software, Jiangxi Normal University, Nanchang, Jiangxi, China; BoulderAI Technologies Company Ltd., Hangzhou, Zhejiang, China; College of Information Science and Technology, Northeast Normal University, Changchun, Jilin, China

Title: EMLFCL: An Efficient Multilevel Fusion Contrastive Learning for Multiview Clustering

Abstract:
Multiview clustering (MVC) with contrastive learning (CL) has attracted considerable interest. Nevertheless, current methods have specific drawbacks since the coherence between views in them is limited either at the feature representation level or the cluster representation level. Besides, certain methods demonstrate subpar performance and limited robustness when handling noisy data. This article introduces an efficient multilevel fusion CL framework for MVC called EMLFCL. The EMLFCL model seamlessly incorporates a shared multi-layer perceptron (MLP) network (MNet) and a fusion network (FNet) to capture and merge common representation information, which effectively eliminates the impact of view-specific private information during the clustering process. Specifically, we establish an efficient multilevel CL strategy at both the feature representation level and the clustering representation level. Rather than rely on pairwise comparisons between views, our proposed CL strategy makes comparisons between different views and the anchor view. Since the anchor view contains abundant shared information, this strategy effectively mitigates the influence of view-specific and noisy view information on model performance. The proposed method outperforms numerous advanced approaches, as evidenced by extensive experiments conducted on eleven challenging multiview datasets. Particularly, it achieves 66.4%, 74.7%, 82.3%, and 86.4% clustering accuracies on the four Caltech datasets with different views, respectively.

PaperID: 718,

Authors: Wei Duan, Jie Lu, Junyu Xuan

Affiliations: Australian Artificial Intelligence Institute (AAII), Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW, Australia

Title: Inferring Latent Temporal Sparse Coordination Graph for Multiagent Reinforcement Learning

Abstract:
Effective agent coordination is crucial in cooperative multiagent reinforcement learning (MARL). While agent cooperation can be represented by graph structures, prevailing graph learning methods in MARL are limited. They rely solely on one-step observations, neglecting crucial historical experiences, leading to deficient graphs that foster redundant or detrimental information exchanges. In addition, high computational demands for action-pair calculations in dense graphs impede scalability. To address these challenges, we propose inferring a latent temporal sparse coordination graph (LTS-CG) for MARL. The LTS-CG leverages agents’ historical observations to calculate an agent-pair probability matrix, where a sparse graph is sampled from and used for knowledge exchange between agents, thereby simultaneously capturing agent dependencies and relationship uncertainty. The computational complexity of this procedure is only related to the number of agents. This graph learning process is further augmented by two innovative characteristics: Predict-Future, which enables agents to foresee upcoming observations, and Infer-Present, ensuring a thorough grasp of the environmental context from limited data. These features allow LTS-CG to construct temporal graphs from historical and real-time information, promoting knowledge exchange during policy learning and effective collaboration. Graph learning and agent training occur simultaneously in an end-to-end manner. Our demonstrated results on the StarCraft II benchmark underscore LTS-CG’s superior performance.

PaperID: 719,

Authors: Luciano Prono, Philippe Bich, Chiara Boretti, Mauro Mangia, Fabio Pareschi, Riccardo Rovatti, Gianluca Setti

Affiliations: Department of Electronic and Telecommunication, Politecnico di Torino, Turin, Italy; Department of Electrical, Electronic, and Information Engineering, University of Bologna, Bologna, Italy; King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia

Title: A Multiply-And-Max/Min Neuron Paradigm for Aggressively Prunable Deep Neural Networks

Abstract:
The growing interest in the Internet of Things (IoT) and mobile artificial intelligence applications is pushing the investigation on deep neural networks (DNNs) that can operate at the edge using low-resources/energy devices. To obtain such a goal, several pruning techniques have been proposed in the literature. They aim to reduce the number of interconnections—and consequently the size, and the corresponding computing and storage requirements—of DNNs that traditionally rely on classic multiply-and-accumulate (MAC) neurons. In this work, we propose a novel neuron structure based on a multiply-and-max/min (MAM) map-reduce paradigm, and we show that by exploiting this new paradigm it is possible to build naturally and aggressively prunable DNN layers, with a negligible loss in performance. This novel structure allows a greater interconnection sparsity when compared to classic MAC-based DNN layers. Moreover, most of the already existing state-of-the-art pruning techniques can be used with MAM layers with little to no changes. To test the pruning performance of MAM, we employ different models—AlexNet, VGG-16 and the more recent ViT-B/16—and different computer vision datasets—CIFAR-10, CIFAR-100, and ImageNet-1K. Multiple pruning approaches are applied, ranging from single-shot methods to training-dependent and iterative techniques. As a notable example, we test MAM on the ViT-B/16 model fine-tuned on the ImageNet-1K task and apply one-shot gradient-based pruning. We remove interconnections until the model experiences a 6% decrease in accuracy. While the selected MAC-based layers need at least 38.2% remaining interconnections, MAM-based layers achieve the same accuracy with only 0.1%.

PaperID: 720,

Authors: Xianzhe Xu, Gary G. Yen, Chaoqiang Zhao, Qiyu Sun, Wenqi Ren, Lu Sheng, Yang Tang

Affiliations: Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China; School of Electrical and Computer Engineering, Oklahoma State University, Stillwater, OK, USA; School of Software, Beihang University, Beijing, China

Title: Boundary-Based Active Domain Adaptation for Semantic Segmentation Under Adverse Conditions

Abstract:
Existing domain adaptation semantic segmentation (DASS) methods under adverse conditions often depend on pseudo-labels for network training. However, these pseudo-labels are frequently plagued by noise and bias toward high-confidence predictions, thereby impeding the enhancement of segmentation performance. This article tackles the above challenge by proposing a novel boundary-based active domain adaptation (ADA) framework, which efficiently selects both informative low-confidence samples and high-confident but misclassified samples to be labeled while maximizing the segmentation performance under a limited annotation budget. For the evaluation of sample confidence and informativeness, we first propose ranking weighted feature space impurity (RWFSI) metric to quantify category distribution among a sample’s nearest neighbors within the feature space and consider the samples with higher RWFSI values as low-confidence samples around the decision boundary, which can also alleviate the category imbalance of active labels. Subsequently, we apply Gaussian mixture models (GMMs) to model the distribution across source and target domains. Using the spatial arrangement of each GMM component, we define the intraclass domain shift score (ICDSS), which identifies samples with high ICDSS values as those more likely to be high-confidence but misclassified, aiding in refining sample selection. Extensive experiments demonstrate that our method is superior to the existing state-of-the-art domain adaptation and active learning (AL) methods and comparable with those of full supervision. The code will be released at https://github.com/1061018609/BADA.

PaperID: 721,

Authors: Richard Cornelius Suwandi, Zhidi Lin, Feng Yin, Zhiguo Wang, Sergios Theodoridis

Affiliations: School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China; Department of Mathematics, Sichuan University, Chengdu, Sichuan, China; Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athina, Greece

Title: Sparsity-Aware Distributed Learning for Gaussian Processes With Linear Multiple Kernel

Abstract:
Gaussian processes (GPs) stand as crucial tools in machine learning and signal processing, with their effectiveness hinging on kernel design and hyperparameter optimization. This article presents a novel GP linear multiple kernel (LMK) and a generic sparsity-aware distributed learning framework to optimize the hyperparameters. The newly proposed grid spectral mixture product (GSMP) kernel is tailored for multidimensional data, effectively reducing the number of hyperparameters while maintaining good approximation capability. We further demonstrate that the associated hyperparameter optimization of this kernel yields sparse solutions. To exploit the inherent sparsity of the solutions, we introduce the sparse linear multiple kernel learning (SLIM-KL) framework. The framework incorporates a quantized alternating direction method of multipliers (ADMMs) scheme for collaborative learning among multiple agents, where the local optimization problem is solved using a distributed successive convex approximation (DSCA) algorithm. SLIM-KL effectively manages large-scale hyperparameter optimization for the proposed kernel, simultaneously ensuring data privacy and minimizing communication costs. The theoretical analysis establishes convergence guarantees for the learning framework, while experiments on diverse datasets demonstrate the superior prediction performance and efficiency of our proposed methods.

PaperID: 722,

Authors: Zhiling Fu, Zhe Wang, Xinlei Xu, Wei Guo, Ziqiu Chi, Hai Yang, Wenli Du

Affiliations: Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, Shanghai, China; Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China

Title: BFCP: Pursue Better Forward Compatibility Pretraining for Few-Shot Class-Incremental Learning

Abstract:
Few-shot class-incremental learning (FSCIL) requires learning new knowledge without forgetting old knowledge. Forward compatibility can reserve space for novel classes while maintaining base class knowledge in incremental learning. Better forward compatibility is crucial for effectively mastering all knowledge, especially when dealing with a few unknown new classes. In this article, we propose the better forward compatibility pretraining (BFCP) to further enhance forward compatibility in FSCIL. We adopt a two-stage training for the backbone network in the base session. First, we train the backbone network at the image-level to enhance its feature extraction capability, enabling the model to extract valuable information from unknown class images. Second, we fine-tune the backbone network at the feature-level with fake prototypes and instances to achieve clustering base classes and reserve space for unknown new classes. For all incremental new sessions, we freeze the backbone network and employ prototype rectification without further training to refine the prototypes of the novel classes. We conduct extensive experiments with different input scales, including federated cross-domain pretraining and cross-domain class-incremental experiments. BFCP efficiently handles both novel and base classes of each incremental session and significantly outperforms state-of-the-art methods, achieving an average accuracy of 63.47% on the CIFAR100 dataset.

PaperID: 723,

Authors: Mohamed Shahawy, Elhadj Benkhelifa, David White

Affiliations: Smarts Systems, AI, and Cybersecurity Research Centre (SSAICS), Staffordshire University, Stoke-on-Trent, U.K.

Title: Exploring the Intersection Between Neural Architecture Search and Continual Learning

Abstract:
Despite the significant advances achieved in deep learning, the deep neural networks’ (DNNs) design approach remains notoriously tedious, depending primarily on intuition, experience, and trial and error. This human-dependent process is often time-consuming and prone to errors. Furthermore, the models are generally bound to their training contexts, with no considerations to their surrounding environments. Continual adaptiveness and automation of neural networks is of paramount importance to several domains where model accessibility is limited after deployment (e.g., IoT devices, self-driving vehicles, etc.). Additionally, even accessible models require frequent maintenance postdeployment to overcome issues such as data/concept drift, which can be cumbersome and restrictive. By leveraging and combining approaches from neural architecture search (NAS) and continual learning (CL), more robust and adaptive agents can be developed. This study conducts the first extensive review on the intersection between NAS and CL, formalizing the prospective paradigm and outlining research directions for lifelong autonomous DNNs.

PaperID: 724,

Authors: Yining Shi, Kun Jiang, Jiusi Li, Zelin Qian, Junze Wen, Mengmeng Yang, Ke Wang, Diange Yang

Affiliations: School of Vehicle and Mobility and the State Key Laboratory of Intelligent Green Vehicle and Mobility, Tsinghua University, Beijing, China; Kargobot Inc., Beijing, China

Title: Grid-Centric Traffic Scenario Perception for Autonomous Driving: A Comprehensive Review

Abstract:
The grid-centric perception is a crucial field for mobile robot perception and navigation. Nonetheless, the grid-centric perception is less prevalent than object-centric perception as autonomous vehicles need to accurately perceive highly dynamic, large-scale traffic scenarios, and the complexity and computational costs of grid-centric perception are high. In recent years, the rapid development of deep learning techniques and hardware provides fresh insights into the evolution of grid-centric perception. The fundamental difference between grid-centric and object-centric pipeline lies in that grid-centric perception follows a geometry-first paradigm which is more robust to the open-world driving scenarios with endless long-tailed semantically unknown obstacles. Recent research demonstrates the great advantages of grid-centric perception, such as comprehensive fine-grained environmental representation, greater robustness to occlusion and irregular-shaped objects, better ground estimation, and safer planning policies. There is also a growing trend that the capacity of occupancy networks is greatly expanded to 4-D scene perception and prediction, and the latest techniques are highly related to new research topics, such as 4-D occupancy forecasting, generative artificial intelligence (GenAI), and world models in the field of autonomous driving. Given the lack of current surveys for this rapidly expanding field, we present a hierarchically structured review of grid-centric perception for autonomous vehicles. We organize previous and current knowledge of occupancy grid techniques along the main vein from 2-D bird-eye view (BEV) grids to 3-D occupancy to 4-D occupancy forecasting. We additionally summarize label-efficient occupancy learning and the role of grid-centric perception in driving systems. Finally, we present a summary of the current research trend and provide future outlooks.

PaperID: 725,

Authors: Zheng Wang, Jiaxi Xie, Rong Wang, Feiping Nie, Xuelong Li

Affiliations: School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi’an, Shaanxi, China

Title: Adaptive Graph Convolutional Network for Unsupervised Generalizable Tabular Representation Learning

Abstract:
A challenging open problem in deep learning is the representation of tabular data. Unlike the popular domains such as image and text understanding, where the deep convolutional network is fashionable in many applications, there is still no widely used neural architecture that can effectively explore informative structure from tabular data. In addition, existing antoencoder-based nonlinear representation learning approaches that employ reconstruction loss, are incompetent to preserve discriminative information. As a step toward bridging these gaps, we propose a novel adaptive graph convolutional network (AdaGCN) for unsupervised generalizable tabular representation learning in this article. To be specific, we hypothesize that the keys to boosting the efficiency and practicality of learned representations lie in three aspects, i.e., adaptivity, unsupervised, and generalization. As a result, the adaptive graph learning module is first designed to remove the predefined rules in conventional GCN models, which can explore more local patterns on arbitrary tabular data. Moreover, our AdaGCN directly minimizes the difference between distributions of original tabular data and learned embeddings for training without any label information. Last but not least, the parametric property of AdaGCN makes the unseen data to be handled offline, which extremely expends the scope of applications. We present extensive experiments showing that AdaGCN significantly and consistently outperforms several representation learning and clustering methods on several real-world tabular datasets.

PaperID: 726,

Authors: Yuxin Jiang, Yunkang Cao, Weiming Shen

Affiliations: State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan, China

Title: Prototypical Learning Guided Context-Aware Segmentation Network for Few-Shot Anomaly Detection

Abstract:
Few-shot anomaly detection (FSAD) denotes the identification of anomalies within a target category with a limited number of normal samples. Existing FSAD methods largely rely on pretrained feature representations to detect anomalies, but the inherent domain gap between pretrained representations and target FSAD scenarios is often overlooked. This study proposes a prototypical learning-guided context-aware segmentation network (PCSNet) to address the domain gap, thereby improving feature descriptiveness in target scenarios and enhancing FSAD performance. In particular, PCSNet comprises a prototypical feature adaption (PFA) subnetwork and a context-aware segmentation (CAS) subnetwork. PFA extracts prototypical features as guidance to ensure better feature compactness for normal data while distinct separation from anomalies. A pixel-level disparity classification (PDC) loss is also designed to make subtle anomalies more distinguishable. Then a CAS subnetwork is introduced for pixel-level anomaly localization, where pseudo anomalies are exploited to facilitate the training process. Experimental results on MVTec AD and metal part defect detection (MPDD) demonstrate the superior FSAD performance of PCSNet, with 94.9% and 80.2% image-level area under the receiver operating characteristics (AUROCs) in an eight-shot scenario, respectively. Real-world applications on automotive plastic part inspection further demonstrate that PCSNet can achieve promising results with limited training samples. The code is available at https://github.com/yuxin-jiang/PCSNet.

PaperID: 727,

Authors: Tao Dong, Yadi Song, Huaqing Li, Xin Wang, Tingwen Huang

Affiliations: College of Electronics and Information Engineering, Southwest University, Chongqing, China; Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, China

Title: Global Stability of Phase-Change Neural Networks With Mixed Time Delays

Abstract:
Phase-change memory (PCM) is a novel type of nonvolatile memory and is suitable for artificial neural synapses. This article investigates the Lagrange global exponential stability (LGES) of a class of PCNNs with mixed time delays. First, based on the conductivity characteristics of PCM, a piecewise equation is established to describe the electrical conductivity of PCM. By using the proposed piecewise equation to simulate the neural synapses, a novel PCNN with discrete and distributed time delays is proposed. Then, using comparative theory and fundamental inequalities, the LGES conditions based on the M-matrix are proposed in the sense of Filippov, and the exponential attractive set (EAS) is obtained based on M-matrix and external input. Moreover, the Lyapunov global exponential stability (GES) conditions of PCNNs without external input are obtained by using the inequality technique and eigenvalue theory, which is a form of M-matrix. Finally, two simulation examples are given to verify the validity of the obtained results.

PaperID: 728,

Authors: Liting Sun, Jingwei Xin, Keyu Li, Jie Li, Nannan Wang, Xinbo Gao

Affiliations: State Key Laboratory of Integrated Services Networks, School of Telecommunications Engineering, Xidian University, Xi’an, Shaanxi, China; School of Electronic Engineering, Xidian University, Xi’an, Shaanxi, China; Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China

Title: I2NQ: Inter and Intra Nonuniform Quantization for Single Image Super-Resolution

Abstract:
Quantizing neural network is an efficient model compression technique that converts weights and activations from floating-point to integer. However, existing model quantization methods are primarily designed for high-level visual tasks. They do not sufficiently consider the unique characteristics of feature distribution in image super-resolution (SR) reconstruction models. On the one hand, the objective of SR is to restore high-frequency and fine-detail information while preserving the overall feature distribution. Therefore, the regularization techniques are removed to maintain the original distribution. However, vanilla quantization methods often employ regularization techniques to normalize the features for stable network training, which destroys the inherent information of the feature distribution. On the other hand, the feature distribution in SR models exhibits a nonuniform bell-shaped form. Common quantization methods adopt a uniform quantization strategy with equal quantization intervals. This fails to effectively capture the nonuniform feature distribution in SR. To address the above issue, we propose a novel method named Inter and Intra Nonuniform Quantization, which takes into account the specific characteristics of the feature distribution in the context of SR reconstruction models. Additionally, we propose a weight adjustment method called flex-scale-weight-adjust (FSWA). It can maintain the diversity of weight information and reduce quantization errors. Extensive experiments demonstrate that our proposed method surpasses other quantization methods in both the evaluation of reconstruction metrics and visual reconstruction performance.

PaperID: 729,

Authors: Hongbo Gao, Xiao Zheng, Qingchao Liu, Lin Zhou, Chao Huang, Mingmao Hu, Chengbo Wang, Keqiang Li, Danwei Wang, Deyi Li

Affiliations: Department of Automation, School of Information Science and Technology, Institute of Advanced Technology, University of Science and Technology of China, Hefei, China; School of Computer Science and Technology, Anhui University of Technology, Ma’anshan, Anhui, China; Automotive Engineering Research Institute, Jiangsu University, Zhenjiang, Jiangsu, China; Department of Automation, School of Information Science and Technology, University of Science and Technology of China, Hefei, China; Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hung Hom, Hong Kong; School of Automotive Engineering, Hubei University of Automotive Technology, Shiyan, China; School of Vehicle and Mobility, Tsinghua University, Beijing, China; School of Electrical and Electronic Engineering, Nanyang Technological University (NTU), Jurong West, Singapore; Department of Computer Science and Technology, Tsinghua University, Beijing, China

Title: A Spatial-Temporal Predictive Transformer Network for Level-3 Autonomous Vehicle Decision-Making

Abstract:
This study explores the effect of takeover time (TOT) on decision-making for Level-3 autonomous vehicles (L3-AVs). The existing research on L3-AV lacks an in-depth analysis of the mechanisms affecting TOT, ignores the importance of spatial and temporal variations in features for TOT prediction, and also lacks consideration of TOT in downstream trajectory planning tasks. This study proposed an exponential smoothing transformers (ETS) former model for TOT prediction, and then, the spatial-temporal predictive transformer (ST-Preformer) was employed to forecast the trajectories of surrounding vehicles, assess lane availability, and determine lane-changing probabilities. Ultimately, these evaluations contribute to the decision-making process of L3-AVs. The findings showed that the ETSformer was able to explain more than 83% of the characteristics of the TOT distribution in the TOT prediction task, effectively reducing the absolute percentage error by 0.7%, based on which the decision-making framework was able to make safe and comfortable optimal decisions. Decision-making is closely related to driving conditions and the surrounding traffic state, and TOT has a critical impact on the safety and stability of decision-making. A comprehensive understanding the impact of TOT on decision-making can help improve the safety of autonomous driving and provide guidance for improving decision-making techniques.

PaperID: 730,

Authors: Junying Wang, Hongyuan Zhang, Hongwei Wang, Yuan Yuan

Affiliations: School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi’an, China; The University of Hong Kong, Hong Kong, SAR, China

Title: Graph Convolutional Network With Self-Augmented Weights for Semi-Supervised Multi-View Learning

Abstract:
Recently, owing to the effectiveness in exploiting inherent connections between data in different views, graph-based deep learning approaches have gained widespread popularity in semi-supervised multi-view tasks. Generally, the existing approaches fuse the information from different views via the linear or nonlinear weight strategies, which distinguish the importance of different views by attributing their weights between [0, 1] , i.e., some less important views are discarded since assigned with 0 and the pivotal views are not enhanced. However, these view-weighting strategies ignore the complementary information from the less important views. To address this issue, a superior-performing graph convolutional network (GCN) with self-augmented weights is proposed. The proposed self-augmented weight strategy is based on exponential series integration, which preserves the less important views and simultaneously strengthens the key views for multi-view fusion. Specifically, the designed weight strategy can adaptively preserve the complementary information from the less important views by assigning nonzero weights and strengthen the pivotal views by assigning higher weights based on exponential series integration. Besides, to further improve the model performance, an orthogonal constraint layer with a forced orthogonal weight is introduced, which is capable of making the representation more discriminative. Extensive experiments demonstrate the superiority of the proposed method.

PaperID: 731,

Authors: Zhengdao Shao, Liansheng Zhuang, Houqiang Li, Shafei Wang

Affiliations: School of Information Science and Technology, University of Science and Technology of China (USTC), Hefei, Anhui, China; Peng Cheng Laboratory, Shenzhen, China

Title: COPSRO: An Offline Empirical Game Theoretic Method With Conservative Critic

Abstract:
This article studies how to learn approximate Nash equilibrium (NE) from static historical datasets by empirical game-theoretic analysis (EGTA), which provides a simulation-based framework to model complex multiagent interactions. Generally, EGTA requires plentiful interactions with the environment or simulator to estimate a cogent and tractable game model approximating the underlying game. However, these exploratory interactions often suffer from low data utilization efficiency and may not be feasible in risk-sensitive applications. To address these problems, this article investigates a new EGTA paradigm for offline settings and introduces a novel algorithm called conservative offline policy space response oracle (COPSRO) to identify NE from fixed datasets without active data collection. COPSRO initiates by extracting a set of strategies from the offline dataset to construct an overcomplete strategy population, achieving an approximation to the policy space of the original game. Then, COPSRO integrates the conservative critic (CC) to tackle the challenge of overestimation inherent in offline learning scenarios. Additionally, it devises the offline NE solver to iteratively compute approximate NE. Consequently, COPSRO can ascertain equilibrium strategies without real-world interaction, markedly enhancing its utility in risk-averse settings. This article provides both theoretical analysis and empirical evaluation to demonstrate the effectiveness and superiority of COPSRO across various real-world tasks in the offline setting. Our method surpasses existing approaches in terms of convergence and exploitability, especially when the coverage ration of dataset is low (20% or 10%).

PaperID: 732,

Authors: Tamás Dózsa, Carl Böck, Jens Meier, Péter Kovács

Affiliations: Institute for Computer Science and Control, Hungarian Research Network, Budapest, Hungary; JKU LIT SAL eSPML Laboratory, Institute of Signal Processing, Johannes Kepler University Linz, Linz, Austria; Clinic of Anesthesiology and Intensive Care Medicine, Johannes Kepler University Linz, Linz, Austria; Department of Numerical Analysis, Eötvös Loránd University, Budapest, Hungary

Title: Weighted Hermite Variable Projection Networks for Classifying Visually Evoked Potentials

Abstract:
The occipital cortex responds to visual stimuli regardless of a patient’s level of consciousness or attention, offering a noninvasive diagnostic tool for both ophthalmologists and neurologists. This response signal manifests as a unique waveform referred to as the visually evoked potential (VEP), which can be extracted from the electroencephalogram (EEG) activity of a human being. We propose a trainable VEP representation to disentangle the underlying explanatory factors of the data. To enhance the learning process with domain knowledge, we present an innovative parameterization of classical Hermite functions that effectively captures VEP pattern variations arising from patient-specific factors, disorders, and measurement setup influences. Then, we introduce a differentiable variable projection (VP) layer to fuse Hermite basis function expansions (BFEs) of VEP signals with machine learning (ML) approaches. We prove the existence of an optimal set of parameters in the least-squares sense, assess the representation power of such layers, and calculate their analytical derivatives, which allows us to utilize backpropagation for training. Finally, we evaluate the effectiveness of the proposed learning framework in VEP-based color classification. To achieve this, we have designed a novel measurement system dedicated to intraoperative clinical use cases, which presents new ways for patient monitoring during neurosurgical procedures.

PaperID: 733,

Authors: Bing Tu, Xianchang Yang, Baoliang He, Yunyun Chen, Jun Li, Antonio Plaza

Affiliations: Institute of Optics and Electronics, Jiangsu Key Laboratory for Optoelectronic Detection of Atmosphere and Ocean, and Jiangsu International Joint Laboratory on Meteorological Photonics and Optoelectronic Detection, Nanjing University of Information Science and Technology, Nanjing, China; School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo, China; School of Computer Science, Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, China; Department of Technology of Computers and Communications, Escuela Politecnica, Hyperspectral Computing Laboratory, University of Extremadura, Cáceres, Spain

Title: Anomaly Detection in Hyperspectral Images Using Adaptive Graph Frequency Location

Abstract:
Graph theory-based techniques have recently been adopted for anomaly detection in hyperspectral images (HSIs). However, these methods rely excessively on the relational structure within the constructed graphs and tend to downplay the importance of spectral features in the original HSI. To address this issue, we introduce graph frequency analysis to hyperspectral anomaly detection (HAD), which can serve as a natural tool for integrating graph structure and spectral features. We treat anomaly detection as a problem of graph frequency location, achieved by constructing a beta distribution-based graph wavelet space, where the optimal wavelet can be identified adaptively for anomaly detection. Initially, a high-dimensional, undirected, unweighted graph is built using the pixels in the HSI as vertices. By leveraging the observation of energy shifting to higher frequencies caused by anomalies, we can dynamically pinpoint the specific Beta wavelet associated with the anomalies’ high-frequency content to accurately extract anomalies in the context of HSIs. Furthermore, we introduce a novel entropy definition to address the frequency location problem in an adaptive manner. Experimental results from seven real HSIs validate the remarkable detection performance of our newly proposed approach when compared to various state-of-the-art anomaly detection methods.

PaperID: 734,

Authors: Hongtian Chen, Wenxin Sun, Weidong Zhang, Bin Jiang, Steven X. Ding, Biao Huang

Affiliations: Department of Automation, Shanghai Jiao Tong University, Shanghai, China; College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China; Institute for Automatic Control and Complex Systems (AKS), University of Duisburg-Essen, Duisburg, Germany; Department of Chemical and Materials Engineering, University of Alberta, Edmonton, AB, Canada

Title: Explainable Fault Diagnosis Using Invertible Neural Networks-A Left Manifold-Based Solution

Abstract:
The series includes two parts, articulating the two novel avenues of research on intelligent fault diagnosis (FD) for nonlinear feedback control systems. In Part I of the series, we design a novel FD paradigm by elaborating an invertible neural network (INN) for feedback control systems. With the aid of a left manifold, the core idea behind the INN-based FD scheme is as follows: 1) formulation of residual generator used for FD as a projection of system data onto the null space that has the same dimension as system outputs; 2) in a topological space, elaboration of a homeomorphism that delivers an invertible relationship between system outputs and residual signals when the system input is given; and 3) skillful introduction of both the master and slave objective functions to achieve system/parameter identification with information loseless property. Comparing with the existing FD approaches, the three superior strengths of the proposed FD scheme deserving mentation are as follows: 1) it specializes in nonlinear feedback control systems; 2) it can effectively avoid the overfitting problem when approximating or learning nonlinear system dynamics; and 3) control theory guides the whole design, ensuring the interpretability of the learning process. Finally, two studies on nonlinear systems demonstrate the feasibility of the invertible left manifold (ILM)-based FD strategy. Part I would contribute to the future development of machine learning (ML)-based system identification and explainable FD approaches, and also benefits the right manifold-based FD designs in Part II.

PaperID: 735,

Authors: Chen Zhang, Yingxu Wang, Xuesong Wang, C. L. Philip Chen, Long Chen, Yuehui Chen, Tao Du, Cheng Yang, Bowen Liu, Jin Zhou

Affiliations: School of Artificial Intelligence, Shandong Women’s University, Jinan, China; Shandong Provincial Key Laboratory of Network-Based Intelligent Computing, University of Jinan, Jinan, China; School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China; Department of Computer and Information Science, University of Macau, Macau, China

Title: Cross-View Representation Learning-Based Deep Multiview Clustering With Adaptive Graph Constraint

Abstract:
Deep multiview clustering provides an efficient way to analyze the data consisting of multiple modalities and features. Recently, the autoencoder (AE)-based deep multiview clustering algorithms have attracted intensive attention by virtue of their rewarding capabilities of extracting inherent features. Nevertheless, most existing methods are still confronted by several problems. First, the multiview data usually contains abundant cross-view information, thus parallel performing an individual AE for each view and directly combining the extracted latent together can hardly construct an informative view-consensus feature space for clustering. Second, the intrinsic local structures of multiview data are complicated, hence simply embedding a preset graph constraint into multiview clustering models cannot guarantee expected performance. Third, current methods commonly utilize the Kullback-Leibler (KL) divergence as clustering loss and accordingly may yield appalling clusters that lack discriminate characters. To solve these issues, in this article we propose two new AE-based deep multiview clustering algorithms named AE-based deep multiview clustering model incorporating graph embedding (AG-DMC) and deep discriminative multiview clustering algorithm with adaptive graph constraint (ADG-DMC). In AG-DMC, a novel cross-view representation learning model is established delicately by performing decoding processes based on the cascaded view-specific latent to learn sound view-consensus features for inspiring clustering results. In addition, an entropy-regularized adaptive graph constraint is imposed on the obtained soft assignments of data to precisely preserve potential local structures. Furthermore, in the improved model ADG-DMC, the adversarial learning mechanism is adopted as clustering loss to strengthen the discrimination of different clusters for better performance. In the comprehensive experiments carried out on eight real-world datasets, the proposed algorithms have achieved superior performance in the comparison with other advanced multiview clustering algorithms.

PaperID: 736,

Authors: Zheng Wang, Hongming Ding, Li Pan, Jianhua Li, Zhiguo Gong, Philip S. Yu

Affiliations: Institute of Cyber Science and Technology, Shanghai Jiao Tong University, Shanghai, China; NIO Technology, Shanghai, China; State Key Laboratory of Internet of Things for Smart City and the Department of Computer and Information Science, University of Macau, Macao, Taipa, China; Department of Computer Science, University of Illinois Chicago, Chicago, IL, USA

Title: From Cluster Assumption to Graph Convolution: Graph-Based Semi-Supervised Learning Revisited

Abstract:
Graph-based semi-supervised learning (GSSL) has long been a research focus. Traditional methods are generally shallow learners, based on the cluster assumption. Recently, graph convolutional networks (GCNs) have become the predominant techniques for their promising performance. However, a critical question remains largely unanswered: why do deep GCNs encounter the oversmoothing problem, while traditional shallow GSSL methods do not, despite both progressing through the graph in a similar iterative manner? In this article, we theoretically discuss the relationship between these two types of methods in a unified optimization framework. One of the most intriguing findings is that, unlike traditional ones, typical GCNs may not effectively incorporate both graph structure and label information at each layer. Motivated by this, we propose three simple but powerful graph convolution methods. The first, optimized simple graph convolution (OGC), is a supervised method, which guides the graph convolution process with labels. The others are two “no-learning” unsupervised methods: graph structure preserving graph convolution (GGC) and its multiscale version GGCM, both aiming to preserve the graph structure information during the convolution process. Finally, we conduct extensive experiments to show the effectiveness of our methods.

PaperID: 737,

Authors: Zhu He, Mingwei Lin, Xin Luo, Zeshui Xu

Affiliations: College of Computer and Cyber Security and Fujian Provincial Engineering Research Center for Public Service Big Data Mining and Application, Fujian Normal University, Fuzhou, China; College of Computer and Information Science, Southwest University, Chongqing, China; Business School, Sichuan University, Chengdu, China

Title: Structure-Preserved Self-Attention for Fusion Image Information in Multiple Color Spaces

Abstract:
The selection and utilization of different color spaces significantly impact the recognition performance of deep learning models in downstream tasks. Existing studies typically leverage image information from various color spaces through model integration or channel concatenation. However, these methods result in excessive model size and suboptimal utilization of image information. In this study, we propose the structure-preserved self-attention network (SPSANet) model for efficient fusion of image information from different color spaces. This model incorporates a novel structure-preserved self-attention (SPSA) module that employs a single-head pixel-wise attention mechanism, as opposed to the conventional multihead self-attention (MHSA) approach. Specifically, feature maps from all color space grouping paths are utilized for similarity matching, enabling the model to focus on critical pixel locations across different color spaces. This design mitigates the dependence of the SPSANet model on the choice of color space while enhancing the advantages of integrating multiple color spaces. The SPSANet model also employs channel shuffle operations to facilitate limited interaction between information flows from different color space paths. Experimental results demonstrate that the SPSANet model, utilizing eight common color spaces—RGB, Luv, XYZ, Lab, HSV, YCrCb, YUV, and HLS—achieves superior recognition performance with reduced parameters and computational cost.

PaperID: 738,

Authors: Jianhao Ding, Jiyuan Zhang, Tiejun Huang, Jian K. Liu, Zhaofei Yu

Affiliations: School of Computer Science and the State Key Laboratory of Multimedia Information Processing, Peking University, Beijing, China; School of Computer Science, University of Birmingham, Birmingham, U.K.; Institute for Artificial Intelligence and the School of Computer Science, Peking University, Beijing, China

Title: Assisting Training of Deep Spiking Neural Networks With Parameter Initialization

Abstract:
Spiking neural networks (SNNs) exhibit significant advantages in terms of information encoding, computational capabilities, and power usage. We regard initializing weight distribution as a key problem for effective SNN training. When backpropagation (BP) through time is used in the initial training phase, it has a significant impact on gradient generation. We first derive an asymptotic formula for the response curve of spiking neurons, which approximates the real neuron response distribution. To avoid gradient vanishing, we then provide an initialization technique based on the slant asymptote. Finally, validations on classification tasks on the MNIST and CIFAR10 datasets demonstrate that our strategy can significantly speed up training and improve the model accuracy compared with other initialization methods. Further testing on various neuron configurations and training hyperparameters demonstrates comparable versatility and superiority to other methods. Based on the analyses, some recommendations for SNN training are made.

PaperID: 739,

Authors: Yongyi Chen, Dan Zhang, Ruqiang Yan, Fanghong Guo, Qi Xuan

Affiliations: Department of Automation, Zhejiang University of Technology, Hangzhou, China; School of Mechanical Engineering, Xi’an Jiaotong University, Xi’an, China; Binjiang Institute of Artificial Intelligence, Zhejiang University of Technology, Hangzhou, China

Title: Class-Consistent Matching Attention Wavelet Networks for Partial Transfer Intelligent Diagnosis

Abstract:
In the case of label space alignment, the existing domain adaptation (DA)-based fault diagnosis approaches have achieved high accuracy. In real industrial scenarios, however, the label space of the target domain is usually a subset of the label space of the source domain, called partial DA (PDA). The main challenge of PDA lies in how to separate common samples from private samples. In existing works, different weights are usually assigned to different samples based on the prediction score of the classifier, but the negative transfer caused by the data distribution alignment of private and common samples is ignored. To address this problem, class-consistency matching is proposed in this article, which uses label consensus score to identify classes in target clusters to discover common and private samples. In addition, parameter-free cosine attention wavelet blocks (PCAWBs) are designed to learn the complementary spatial-domain and frequency-domain features to enrich the domain-invariant features extracted by the shared encoder. Experiments on the real motor system demonstrate that the proposed method significantly outperforms state-of-the-art PDA fault diagnosis approaches.

PaperID: 740,

Authors: Hua Li, Tianyang Wang, Feibin Zhang, Fulei Chu

Affiliations: State Key Laboratory of Public Big Data, Guizhou University, Guiyang, China; Department of Mechanical Engineering, Tsinghua University, Beijing, China

Title: AutoVMDPgram: An Effective Method for Fault Diagnosis of Rolling Bearing

Abstract:
In previous studies, the VMDPgram was creatively proposed by combining variational mode decomposition (VMD) with wavelet packet transform (WPT). Although the VMDPgram demonstrates excellent performance in bearing fault diagnosis, there are still some issues that need to be further studied. In light of this, this work conducts the in-depth studies of VMDPgram for the unresolved issues. First, in view of the obvious second-order cyclostationarity of vibration signal of rotating machinery such as bearing, especially in the presence of localized faults, the unbiased autocorrelation (AC) function is introduced. Here, the kurtosis value of the unbiased AC of the squared envelope of each sub-intrinsic modal function (sub-IMF) within the constrained range is calculated, generating the new method named AutoVMDPgram. Second, the modified adaptive resonance bandwidth (MARB) is introduced to constrain the decomposition depth of the AutoVMDPgram. Third, the cumulative evaluation index based on the unbiased AC kurtosis of the square envelope of the sub-IMF is proposed as a measure to locate the optimal sub-IMF without determining whether the resonant frequency range is divided into different sub-IMFs. AutoVMDPgram is tested on simulated and experimental data and compared with Autogram, spectral kurtosis (SKs), and VMD to evaluate its performance in rolling bearing diagnostics.

PaperID: 741,

Authors: Hongquan Zhang, Zhizhong Zhang, Xin Tan, Yanyun Qu, Yuan Xie

Affiliations: School of Computer Science and Technology, East China Normal University, Shanghai, China; School of Informatics, Xiamen University, Xiamen, Fujian, China

Title: Bias to Balance: New-Knowledge-Preferred Few-Shot Class-Incremental Learning via Transition Calibration

Abstract:
Humans can quickly learn new concepts with limited experience, while not forgetting learned knowledge. Such ability in machine learning is referred to as few-shot class-incremental learning (FSCIL). Although some methods try to solve this problem by putting similar efforts to prevent forgetting and promote learning, we find existing techniques do not give enough importance to the new category as new training samples are rather rare. In this article, we propose a new biased-to-unbiased rectification method, which introduces a trainable transition matrix to mitigate the prediction discrepancy between the old classes and the new classes. This transition matrix is to be diagonally dominated, normalized, and differentiable with new-knowledge-preferred prior, to solving the strong bias between heavy old knowledge and limited new knowledge. Hence, we can achieve a balanced solution between learning new concepts and preventing catastrophic forgetting by giving new classes more chances. Extensive experiments on miniImagenet, CIFAR100, and CUB200 demonstrate that our method outperforms the latest state-of-the-art methods by 1.1%, 1.44%, and 2.08%, respectively.

PaperID: 742,

Authors: Hong Zhao, Jie Shi, Yang Zhang

Affiliations: School of Computer Science and the Key Laboratory of Data Science and Intelligence Application, Minnan Normal University, Zhangzhou, Fujian, China

Title: PHFS: Progressive Hierarchical Feature Selection Based on Adaptive Sample Weighting

Abstract:
Hierarchical feature selection is considered an effective technique to reduce the dimensionality of data with complex hierarchical label structures. Incorrect labels are a common and challenging issue in complex hierarchical data. However, the existing hierarchical methods often struggle to dynamically adapt to label noise and lack the flexibility to adjust sample weights. Therefore, their effectiveness in managing complex data with many classes and mitigating label noise is significantly limited. To address these issues, in this article, an adaptive sample weighting-based progressive hierarchical feature selection (PHFS) method was proposed, which dynamically adjusts the sample weights to focus on high-quality data. PHFS integrates progressive sample selection and hierarchical feature selection into a unified framework, thus enhancing its effectiveness in reducing the impact of label noise and achieving optimal performance. The progressive selection process is divided into initial and subsequent stages, focusing on correct and incorrect samples. In the initial stage, PHFS selects valuable and correct samples based on the adaptive weights calculated through hierarchical classification feedback, maximizing the guiding effect of the correctly labeled examples. In the subsequent stages, PHFS uses matrix factorization to preserve the structure of the correctly labeled samples, preventing the forgetting of the early selected samples and minimizing the negative impact of the mislabelled samples. The superiority of PHFS over 13 state-of-the-art methods was demonstrated by performing extensive experiments on eight real-world datasets, highlighting its effectiveness in reducing label noise and achieving optimal performance.

PaperID: 743,

Authors: Min Wang, Mingyu Wang, Chenguang Yang

Affiliations: School of Automation Science and Engineering, Key Laboratory of Autonomous Systems and Networked Control, Ministry of Education, South China University of Technology, Guangzhou, China

Title: Persistent Excitation of Improved RBF Neural Networks: Neuron Dynamic-Growing Strategy

Abstract:
This brief proposes a novel neuron dynamic-growing (NDG) strategy for radial basis function neural networks (RBF NNs). Only one neuron is selected in advance relying on the system initial states, and other neurons are dynamically generated based on the designed threshold for the distance between the current NN input and the closest neuron. Compared with the RBF NN using neuron fixed evenly spaced strategy (NFES), the improved RBF NN has two major advantages: one is to extremely reduce the number of neurons, especially for the high dimensional NN inputs; and the other is to provide a theoretical criteria for the choice of NN structure parameters including the neuron center and the compact set size. To guarantee the dynamic learning ability of the improved RBF NN, the persistent excitation (PE) is verified strictly by subtly constructing the threshold and the center of newly added neurons. Simulation and experimental results illustrate that the improved RBF NN integrated into the existing dynamic learning control effectively enhances the transient control performance, reduces the computational burden, and saves data storage space.

PaperID: 744,

Authors: Chao Guo, Masao Yanagisawa, Youhua Shi

Affiliations: Department of Electronic and Physical Systems, Faculty of Fundamental Science and Engineering, Waseda University, Tokyo, Japan

Title: DSE-Based Hardware Trojan Attack for Neural Network Accelerators on FPGAs

Abstract:
Over the past few years, the emergence and development of design space exploration (DSE) have shortened the deployment cycle of deep neural networks (DNNs). As a result, with these open-sourced DSE, we can automatically compute the optimal configuration and generate the corresponding accelerator intellectual properties (IPs) from the pretrained neural network models and hardware constraints. However, to date, the security of DSE has received little attention. Therefore, we explore this issue from an adversarial perspective and propose an automated hardware Trojan (HT) generation framework embedded within DSE. The framework uses an evolutionary algorithm (EA) to analyze user-input data to automatically generate the attack code before placing it in the final output accelerator IPs. The proposed HT is sufficiently stealthy and suitable for both single and multifield-programmable gate array (FPGA) designs. It can also implement controlled accuracy degradation attacks and specified category attacks. We conducted experiments on LeNet, VGG-16, and YOLO, respectively, and found that for the LeNet model trained on the CIFAR-10 dataset, attacking only one kernel resulted in 97.3% of images being classified in the category specified by the adversary and reduced accuracy by 59.58%. Moreover, for the VGG-16 model trained on the ImageNet dataset, attacking eight kernels can cause up to 96.53% of the images to be classified into the category specified by the adversary and causes the model’s accuracy to decrease to 2.5%. Finally, for the YOLO model trained on the PASCAL VOC dataset, attacking with eight kernels can cause the model to identify the target as the specified category and cause slight perturbations to the bounding boxes. Compared to the un-compromised designs, the look-up tables (LUTs) overhead of the proposed HT design does not exceed 0.6%.

PaperID: 745,

Authors: Kaizhong Zheng, Shujian Yu, Baojuan Li, Robert Jenssen, Badong Chen

Affiliations: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, China; Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, HV, The Netherlands; School of Biomedical Engineering, Fourth Military Medical University, Xi’an, China; Machine Learning Group, UiT-The Arctic University of Norway, Tromsø, Norway

Title: BrainIB: Interpretable Brain Network-Based Psychiatric Diagnosis With Graph Information Bottleneck

Abstract:
Developing new diagnostic models based on the underlying biological mechanisms rather than subjective symptoms for psychiatric disorders is an emerging consensus. Recently, machine learning (ML)-based classifiers using functional connectivity (FC) for psychiatric disorders and healthy controls (HCs) are developed to identify brain markers. However, existing ML-based diagnostic models are prone to overfitting (due to insufficient training samples) and perform poorly in new test environments. Furthermore, it is difficult to obtain explainable and reliable brain biomarkers elucidating the underlying diagnostic decisions. These issues hinder their possible clinical applications. In this work, we propose BrainIB, a new graph neural network (GNN) framework to analyze functional magnetic resonance images (fMRI), by leveraging the famed information bottleneck (IB) principle. BrainIB is able to identify the most informative edges in the brain (i.e., subgraph) and generalizes well to unseen data. We evaluate the performance of BrainIB against three baselines and seven state-of-the-art (SOTA) brain network classification methods on three psychiatric datasets and observe that our BrainIB always achieves the highest diagnosis accuracy. It also discovers the subgraph biomarkers that are consistent with clinical and neuroimaging findings. The source code and implementation details of BrainIB are freely available at the GitHub repository (https://github.com/SJYuCNEL/brain-and-Information-Bottleneck).

PaperID: 746,

Authors: Haiyang Zhang, Jing Na, Lianglin Xiong, Jinde Cao

Affiliations: Faculty of Mechanical and Electrical Engineering, Kunming University of Science and Technology, Kunming, China; Faculty of Mechanical and Electrical Engineering and Yunnan Key Laboratory of Intelligent Control and Application, Kunming University of Science and Technology, Kunming, China; Department of Mechanical and Electrical Engineering, Yunnan Open University, Kunming, China; Jiangsu Provincial Key Laboratory of Networked Collective Intelligence and the School of Mathematics, Southeast University, Nanjing, China

Title: Exponential Asynchronous Stabilization for Delayed Semi-Markovian Neural Networks via DAEIC

Abstract:
The exponential asynchronous stabilization (EAS) issue for a category of neural networks (NNs) with semi-Markov jump (SMJ) parameters and additive time-varying delays (ATDs) is addressed in this article. Here, the SMJ parameters in the controller gain are supposed to be distinct from those in the system structure, which is more consistent with the actual situation. To further relieve the communication load of the network, a new discrete adaptive event-triggered impulsive control (DAEIC) scheme is proposed, where the impulsive moments are the sampling instants satisfying event-triggered constraints, and the triggering threshold can be dynamically adjusted by an adaptive update rule (AUR) related to the current sampling state and the last triggered state. A more flexible looped Lyapunov-Krasovski functional (LLKF) is constructed to commendably capture the available information about impulsive instants, triggering state, sampling interval, ATDs, and heterogeneous SMJ parameters. Combined with the LLKF, DAEIC scheme, and other inequality analysis approaches, some novel results guaranteeing the EAS of the underlying systems are exported. Finally, three explanatory examples are presented to check the validity of our results.

PaperID: 747,

Authors: Yongzhe Yuan, Yue Wu, Maoguo Gong, Qiguang Miao, A. K. Qin

Affiliations: School of Computer Science and Technology, Key Laboratory of Collaborative Intelligence Systems, Ministry of Education, Xidian University, Xi’an, China; Key Laboratory of Collaborative Intelligence Systems, Ministry of Education, School of Electronic Engineering, Xidian University, Xi’an, China; Department of Computer Science and Software Engineering, Swinburne University of Technology, Melbourne, VIC, Australia

Title: One-Nearest Neighborhood Guides Inlier Estimation for Unsupervised Point Cloud Registration

Abstract:
The precision of unsupervised point cloud registration methods is typically limited by the lack of reliable inlier estimation and self-supervised signal, especially in partially overlapping scenarios. In this article, we propose an effective inlier estimation method for unsupervised point cloud registration by capturing geometric structure consistency between the source point cloud and its corresponding reference point cloud copy. Specifically, to obtain a high-quality reference point cloud copy, a one-nearest neighborhood (1-NN) point cloud is generated by input point cloud, which facilitates matching map construction and allows for integrating dual neighborhood matching scores of 1-NN point cloud and input point cloud to improve matching confidence. Benefiting from the high-quality reference copy, we argue that the neighborhood graph formed by inlier and its neighborhood should have consistency between source point cloud and its corresponding reference copy. Based on this observation, we construct transformation-invariant geometric structure representations and capture geometric structure consistency to score the inlier confidence for estimated correspondences between source point cloud and its reference copy. This strategy can simultaneously provide the reliable self-supervised signals for model optimization. Finally, we further calculate transformation estimation by the weighted SVD algorithm with the estimated correspondences and the corresponding inlier confidence. We train the proposed model in an unsupervised manner, and extensive experiments on synthetic and real-world datasets illustrate the effectiveness of the proposed method.

PaperID: 748,

Authors: Hang Ran, Xingyu Gao, Lusi Li, Weijun Li, Songsong Tian, Gang Wang, Hailong Shi, Xin Ning

Affiliations: Institute of Semiconductors, Chinese Academy of Sciences, Beijing, China; Institute of Microelectronics, Chinese Academy of Sciences, Beijing, China; Department of Computer Science, Old Dominion University, Norfolk, VA, USA; School of Computing and Data Engineering, NingboTech University, Ningbo, China

Title: Brain-Inspired Fast- and Slow-Update Prompt Tuning for Few-Shot Class-Incremental Learning

Abstract:
Few-shot class-incremental learning (FSCIL) aims to learn new classes incrementally with a limited number of samples per class. Foundation models combined with prompt tuning showcase robust generalization and zero-shot learning (ZSL) capabilities, endowing them with potential advantages in transfer capabilities for FSCIL. However, existing prompt tuning methods excel in optimizing for stationary datasets, diverging from the inherent sequential nature in the FSCIL paradigm. To address this issue, taking inspiration from the “fast and slow mechanism” of the complementary learning systems (CLSs) in the brain, we present fast- and slow-update prompt tuning FSCIL (FSPT-FSCIL), a brain-inspired prompt tuning method for transferring foundation models to the FSCIL task. We categorize the prompts into two groups: fast-update prompts and slow-update prompts, which are interactively trained through meta-learning. Fast-update prompts aim to learn new knowledge within a limited number of iterations, while slow-update prompts serve as meta-knowledge and aim to strike a balance between rapid learning and avoiding catastrophic forgetting. Through experiments on multiple benchmark tests, we demonstrate the effectiveness and superiority of FSPT-FSCIL. The code is available at https://github.com/qihangran/FSPT-FSCIL.

PaperID: 749,

Authors: Yurui Qian, Yu Wang, Jingjing Zou, Jingyang Lin, Yingwei Pan, Ting Yao, Qibin Sun, Tao Mei

Affiliations: School of Cyber Science and Technology, University of Science and Technology of China, Hefei, China; resides in Saratoga, Saratoga, CA, USA; Herbert Wertheim School of Public Health and Human Longevity Science, University of California at San Diego, La Jolla, CA, USA; Department of Computer Science, University of Rochester, Rochester, NY, USA; HiDream.ai, Beijing, China

Title: Kernel Masked Image Modeling Through the Lens of Theoretical Understanding

Abstract:
Masked image modeling (MIM) has been considered as the state-of-the-art (SOTA) self-supervised learning (SSL) technique in terms of visual pretraining. The impressive generalization ability of MIM also paves the way for the remarkable success of large-scale vision foundation models. In this article, we further discuss the validity and advantages of implementing MIM techniques in the reproducing kernel Hilbert spaces (RKHSs) and we associate the analysis with a novel MIM method named R-MIM (short for RKHS-MIM). Through the careful construction of an augmentation graph and by using spectral decomposition techniques, we establish a systematic theoretical understanding between the proposed R-MIM’s generalization ability and the choice of kernel function used during training. Specifically, we reach a conclusion that both of the local Lipschitz constant of the resultant R-MIM model and the corresponding expected pretraining error can have a strong composite effect on bounding downstream task error, depending on the kernel options. We demonstrate that under mild mathematical assumptions, R-MIM method is guaranteed to return a lower bound on downstream tasks in comparison to vanilla MIM techniques, such as masked autoencoder (MAE) and SimMIM. Empirical justification well corroborates our theoretical hypothesis and analysis in showing the superior generalization of the proposed R-MIM and the theoretical link to kernel choices. The code is available at: https://github.com/yurui-q/R-MIM.

PaperID: 750,

Authors: Xiao Zhang, Min Meng, Zhengping Ji

Affiliations: Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, SAR, China; Department of Control Science and Engineering and Shanghai Research Institute of Intelligent Autonomous Systems, Tongji University, Shanghai, China; Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China

Title: Analysis of Discrete-Time Switched Linear Systems Under Logical Dynamic Switching

Abstract:
The control properties of discrete-time switched linear systems (SLSs) with switching signals generated by logical dynamical systems are studied using the semitensor product (STP) approach. With the algebraic state-space representation (ASSR), the linear modes and the logical generators are aggregated as a system with hybrid states, leading to the criteria of reachability, controllability, observability, and reconstructibility of the SLSs. Algorithms for checking these properties are given. Then, two kinds of realization problems concerning whether the logical dynamical systems can generate the desired switching signals are investigated, and necessary and sufficient conditions for the realizability of the desired switching signals are given with respect to the cases of fixed operating time (FOT) switching and finite reference signal switching.

PaperID: 751,

Authors: Xuesong Liu, Renxin Chu, Baolin Liu

Affiliations: School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China; Harvard Medical School, Boston, MA, USA

Title: TFG-Net: A Text Feature-Guided Network for Small Traffic Sign Detection

Abstract:
Detecting small signs in complex real-world environments remains challenging due to limited feature information and interference from other objects. In this article, we propose a novel text feature-guided network (TFG-Net) to improve the performance of the small signs detection not only enhancing the feature information of small signs but also avoiding the influence of other objects. As the name suggests, TFG-Net incorporates a text detection branch, which extracts additional textual features from the signs and supplies them to the object detection branch. Furthermore, the object detection branch of TFG-Net optimizes the backbone network’s output structure by merging deep features and introducing a high-resolution feature layer. Finally, a fusion method that enhances both overall and local features is proposed to fully integrate detailed and semantic information. Experimental results display that our TFG-Net reaches the highest mean average precision (mAP) of 92.5% on the public datasets Tsinghua-Tencent 100K (TT100K), 83.7% on CCTSDB2021, and 79.1% on DFG, surpassing current state-of-the-art object detectors.

PaperID: 752,

Authors: Phu Pham, Quang-Thinh Bui, Ngoc Thanh Nguyen, Robert Kozma, Philip S. Yu, Bay Vo

Affiliations: Faculty of Information Technology, HUTECH University, Ho Chi Minh City, Vietnam; Faculty of Education and Basic Sciences, Tien Giang University, My Tho City, Vietnam; Department of Applied Informatics, Wrocław University of Science and Technology, Wrocław, Poland; Department of Mathematics, University of Memphis, Memphis, TN, USA; Department of Computer Science, University of Illinois Chicago, Chicago, IL, USA

Title: Topological Data Analysis in Graph Neural Networks: Surveys and Perspectives

Abstract:
For many years, topological data analysis (TDA) and deep learning (DL) have been considered separate data analysis and representation learning approaches, which have nothing in common. The root cause of this challenge comes from the difficulties in building, extracting, and integrating TDA constructs, such as barcodes or persistent diagrams, within deep neural network architectures. Therefore, the powers of these two approaches are still on their islands and have not yet combined to form more powerful tools for dealing with multiple complex data analysis tasks. Fortunately, we have witnessed several remarkable attempts to integrate DL-based architectures with topological learning paradigms in recent years. These topology-driven DL techniques have notably improved data-driven analysis and mining problems, especially within graph datasets. Recently, graph neural networks (GNNs) have emerged as a popular deep neural architecture, demonstrating significant performance in various graph-based analysis and learning problems. Explicitly, within the manifold paradigm, the graph is naturally considered as a topological object (e.g., the topological properties of the given graph can be represented by the edge weights). Therefore, integrating TDA and GNN is considered an excellent combination. Many well-known studies have recently presented the effectiveness of TDA-assisted GNN-based architectures in dealing with complex graph-based data representation analysis and learning problems. Motivated by the successes of recent research, we present systematic literature about this nascent and promising research direction in this article, which includes general taxonomy, preliminaries, and recently proposed state-of-the-art topology-driven GNN models and perspectives.

PaperID: 753,

Authors: Rongrong Wang, Yuhu Cheng, Xuesong Wang

Affiliations: Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, and the School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China

Title: Visual Reinforcement Learning Control With Instance-Reweighted Alignment and Instance-Dimension Uniformity

Abstract:
Visual reinforcement learning (VRL) has demonstrated remarkable capabilities in learning behaviors directly from intricate high-dimensional visual inputs. Despite these advancements, existing VRL methods still encounter obstacles such as complete collapse and dimensional collapse, resulting in representation degradation and dimensional redundancy. Contrastive learning, while helping to mitigate the complete collapse issue, is prone to the class collision dilemma. To tackle the aforementioned challenges, this article proposes a novel VRL control method with instance-reweighted alignment and instance-dimension uniformity (IAIU). In this VRL control method, the instance-reweighted alignment representation learning is introduced by minimizing the Kullback–Leibler (KL) divergence between the distributions of predicted next state representations and their weighted actual counterparts. By doing so, we aim to align state representations within the same semantic class, thereby effectively alleviating class collision. Meanwhile, an instance-dimension uniformity regularization mechanism is adopted to suppress the collapse phenomenon. This is realized by leveraging the Hilbert–Schmidt independence criterion (HSIC) and standard orthogonal constraint at the instance and dimension levels, respectively, ensuring the extraction of task-relevant state representations. In essence, IAIU’s dual-strategy of alignment and uniformity not only addresses the critical issue of class collision but also guarantees uniformity with respect to both the instances and dimensions. Simulation results from the distracting control suite (DCS) benchmark demonstrate IAIU’s superior performance, with substantial enhancements in both representational ability and policy efficacy. The code is available at https://github.com/anonymousforcode/IAIU

PaperID: 754,

Authors: Wenya Hu, Jia Wu, Quan Qian

Affiliations: School of Computer Engineering and Science, Shanghai University, Shanghai, China; School of Computing, Macquarie University, Sydney, NSW, Australia; School of Computer Engineering and Science, the Center of Materials Informatics and Data Science, Materials Genome Institute, the Key Laboratory of Silicate Cultural Relics Conservation, Ministry of Education, and Shanghai Institute for Advanced Communication and Data Science, Shanghai University, Shanghai, China

Title: CiRLExplainer: Causality-Inspired Explainer for Graph Neural Networks via Reinforcement Learning

Abstract:
In this article, we propose a new graph neural network (GNN) explainability model, CiRLExplainer, which elucidates GNN predictions from a causal attribution perspective. Initially, a causal graph is constructed to analyze the causal relationships between the graph structure and GNN predicted values, identifying node attributes as confounding factors between the two. Subsequently, a backdoor adjustment strategy is employed to circumvent these confounders. Additionally, since the edges within the graph structure are not independent, reinforcement learning is incorporated. Through a sequential selection process, each step evaluates the combined effects of an edge and the previous structure to generate an explanatory subgraph. Specifically, a policy network predicts the probability of each candidate edge being selected and adds a new edge through sampling. The causal effect of this action is quantified as a reward, reflecting the interactivity among edges. By maximizing the policy gradient during training, the reward stream of the edge sequence is optimized. The CiRLExplainer is versatile and can be applied to any GNN model. A series of experiments was conducted, including accuracy (ACC) analysis of the explanation results, visualization of the explanatory subgraph, and ablation studies considering node attributes as confounding factors. The experimental results demonstrate that our model not only outperforms current state-of-the-art explanation techniques, but also provides precise semantic explanations from a causal perspective. Additionally, the experiments validate the rationale for considering node attributes as confounding factors, thereby enhancing the explanatory power and ACC of the model. Notably, across different datasets, our explainer achieved improvements over the best baseline models in the ACC-area under the curve (AUC) metrics by 5.89%, 5.69%, and 4.87%, respectively.

PaperID: 755,

Authors: Yiming Tang, Wenbin Wu, Witold Pedrycz, Jianwei Gao, Xianghui Hu, Zhaohong Deng, Rui Chen

Affiliations: School of Computer and Information, Hefei University of Technology, Hefei, China; Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Tecghnology, Gliwice, Poland; School of Computer Science and Engineering, Southeast University, Nanjing, China; School of Artificial Intelligence and Computer, Jiangnan University, Wuxi, Jiangsu, China

Title: Clustering Interval and Triangular Granular Data: Modeling, Execution, and Assessment

Abstract:
In current granular clustering algorithms, numeric representatives were selected by users or an ordinary strategy, which seemed simple; meanwhile, weight settings for granular data could not adequately express their structural characteristics. Aiming at these problems, in this study, a new scheme called a granular weighted kernel fuzzy clustering (GWKFC) algorithm is put forward. We propose the representative selection and granularity generation (RSGG) algorithm enlightened by the density peak clustering (DPC) algorithm. We build interval and triangular granular data on the strength of numeric representatives obtained by RSGG under the principle of justifiable granularity (PJG), in which we establish some combinations of functions and boundary constraints and prove their properties. Furthermore, we present a novel distance formula via the kernel function for granular data and design new weights to affect the coverage and specificity of granular data. In addition, based upon these factors, we come up with the GWKFC algorithm of granular clustering, and its performance with different granularity is assessed. To sum up, a macro framework involving granular modeling, granular clustering, and assessment has been set up. Lastly, the GWKFC algorithm and ten other granular clustering algorithms are compared by experiments on some artificial and UCI datasets together with datasets with large data or those of high dimensionality. It is found that the GWKFC algorithm can provide better granular clustering results by contrast with other algorithms. The originality is embodied as follows. First, we improve the previous density radius and present the RSGG algorithm to acquire numeric representatives. Second, we propose a new strategy to determine granular data boundaries and further obtain novel weights enlightened by the idea of volume. Lastly, we employ the kernel function to calculate the distance between granular data, which has a stronger spatial division ability than the previous Euclidean distance.

PaperID: 756,

Authors: Jiahui Wang, Haiyue Zhu, Haoren Guo, Abdullah Al Mamun, Cheng Xiang, Clarence W. de Silva, Tong Heng Lee

Affiliations: College of Design and Engineering, Electrical and Computer Engineering, National University of Singapore, Cluny Road, Singapore; Singapore Institute of Manufacturing Technology (SIMTech), Agency for Science, Technology and Research (A*STAR), Fusionopolis, Singapore; Department of Mechanical Engineering, The University of British Columbia, Vancouver, BC, Canada

Title: SDSimPoint: Shallow-Deep Similarity Learning for Few-Shot Point Cloud Semantic Segmentation

Abstract:
Three-dimensional point cloud semantic segmentation is a fundamental task in computer vision. As the fully supervised approaches suffer from the generalization issue with limited data, few-shot point cloud segmentation models have been proposed to address the flexible adaptation. Nevertheless, due to the class-agnostic nature of the few-shot pretraining, its pretrained feature extractor is hard to capture the class-related intrinsic and abstract information. Therefore, we introduce the new concept of shallow and deep similarities and propose a shallow-deep similarity learning network (SDSimPoint) that aims to learn both shallow (superficial geometry, color, etc.) and deep similarities (intrinsic context and semantics, etc.) between the support and query samples, thereby boosting the performance. Moreover, we design a beyond-episode attention module (BEAM) to enlarge the region of the attention mechanism from a single episode to the entire dataset by utilizing the memory units, which enhances the extraction ability to better capture the shallow and deep similarities. Furthermore, our distance metric function is learnable in the proposed framework, which can better adapt to complex data distributions. Our proposed SDSimPoint consistently demonstrates substantial improvements compared to baseline approaches across various datasets in diverse few-shot point cloud semantic segmentation settings.

PaperID: 757,

Authors: Ruizhou Liu, Zongsheng Cao, Zhe Wu, Yiling Wu, Qianqian Xu, Qingming Huang

Affiliations: School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China; Chinese Academy of Sciences, Institute of Information Engineering, Beijing, China; Pengcheng Laboratory, Shenzhen, China; Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

Title: SGKGE: Semantically Guided Knowledge Graph Embeddings via Complementary Latent Representations

Abstract:
Knowledge graph (KG) completion is a challenging yet essential task that has attracted increasing attention in recent years. While entities in KGs typically present complex semantics (a phenomenon known as polysemy), previous works primarily focus on holistic but often inaccurate representations of entities, neglecting the diversity of their semantics. This limitation results in suboptimal representations for entities within KGs. To address this issue, we propose a new method termed semantically guided KG embeddings (SGKGE), which captures the precise semantics of entities in KGs from a semantics-guided perspective. Specifically, SGKGE first guides the learning of holistic semantics of entities through a hyperbolic manifold with learnable shared curvature and a geometric attention-fusion module, facilitating efficient reasoning. Subsequently, SGKGE captures fine-grained semantics through a set of Cartesian product Riemannian manifolds with distinct curvatures, coupled with a semantic interactions module. This approach enables SGKGE to produce more accurate entity semantics and enhance downstream applications. Experimental results demonstrate that our model achieves state-of-the-art performance on six well-established KG completion benchmarks. The release code is available at https://github.com/RuizhouLiu/SGKGE.

PaperID: 758,

Authors: Malu Zhang, Xiaoling Luo, Jibin Wu, Ammar Belatreche, Siqi Cai, Yang Yang, Haizhou Li

Affiliations: School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China; Key Laboratory of Higher Education of Sichuan Province for Enterprise Informationalization and Internet of Things, Sichuan University of Science and Engineering, Yibin, China; Department of Data Science and Artificial Intelligence, Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong; Department of Computer and Information Sciences, Faculty of Engineering and Environment, Northumbria University, Newcastle upon Tyne, U.K.; Department of Electrical and Computer Engineering, National University of Singapore, Lower Kent Ridge Rd, Singapore; Shenzhen Research Institute of Big Data, School of Data Science, The Chinese University of Hong Kong (CUHK), Shenzhen, China

Title: Toward Building Human-Like Sequential Memory Using Brain-Inspired Spiking Neural Models

Abstract:
The brain is able to acquire and store memories of everyday experiences in real-time. It can also selectively forget information to facilitate memory updating. However, our understanding of the underlying mechanisms and coordination of these processes within the brain remains limited. However, no existing artificial intelligence models have yet matched human-level capabilities in terms of memory storage and retrieval. This study introduces a brain-inspired spiking neural model that integrates the learning and forgetting processes of sequential memory. The proposed model closely mimics the distributed and sparse temporal coding observed in the biological neural system. It employs one-shot online learning for memory formation and uses biologically plausible mechanisms of neural oscillation and phase precession to retrieve memorized sequences reliably. In addition, an active forgetting mechanism is integrated into the spiking neural model, enabling memory removal, flexibility, and updating. The proposed memory model not only enhances our understanding of human memory processes but also provides a robust framework for addressing temporal modeling tasks.

PaperID: 759,

Authors: Weichao Lan, Yiu-Ming Cheung, Liang Lan, Juyong Jiang, Zhikai Hu

Affiliations: Department of Computer Science, Hong Kong Baptist University, Hong Kong, SAR, China; Department of Interactive Media, Hong Kong Baptist University, Hong Kong, SAR, China; Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China

Title: Convolution Filter Compression via Sparse Linear Combinations of Quantized Basis

Abstract:
Convolutional neural networks (CNNs) have achieved significant performance on various real-life tasks. However, the large number of parameters in convolutional layers requires huge storage and computation resources, making it challenging to deploy CNNs on memory-constrained embedded devices. In this article, we propose a novel compression method that generates the convolution filters in each layer using a set of learnable low-dimensional quantized filter bases. The proposed method reconstructs the convolution filters by stacking the linear combinations of these filter bases. By using quantized values in weights, the compact filters can be represented using fewer bits so that the network can be highly compressed. Furthermore, we explore the sparsity of coefficients through L_1 -ball projection when conducting linear combination to further reduce the storage consumption and prevent overfitting. We also provide a detailed analysis of the compression performance of the proposed method. Evaluations of image classification and object detection tasks using various network structures demonstrate that the proposed method achieves a higher compression ratio with comparable accuracy compared with the existing state-of-the-art filter decomposition and network quantization methods.

PaperID: 760,

Authors: Yi Ding, Chengxuan Tong, Shuailei Zhang, Muyun Jiang, Yong Li, Kevin Junliang Lim, Cuntai Guan

Affiliations: College of Computing and Data Science, Nanyang Technological University, Jurong West, Singapore; School of Computer Science and Engineering, Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Southeast University, Nanjing, China; Wilmar International, Singapore, Singapore

Title: EmT: A Novel Transformer for Generalized Cross-Subject EEG Emotion Recognition

Abstract:
Integrating prior knowledge of neurophysiology into neural network architecture enhances the performance of emotion decoding. While numerous techniques emphasize learning spatial and short-term temporal patterns, there has been a limited emphasis on capturing the vital long-term contextual information associated with emotional cognitive processes. In order to address this discrepancy, we introduce a novel transformer model called emotion transformer (EmT). EmT is designed to excel in both generalized cross-subject electroencephalography (EEG) emotion classification and regression tasks. In EmT, EEG signals are transformed into a temporal graph format, creating a sequence of EEG feature graphs using a temporal graph construction (TGC) module. A novel residual multiview pyramid graph convolutional neural network (RMPG) module is then proposed to learn dynamic graph representations for each EEG feature graph within the series, and the learned representations of each graph are fused into one token. Furthermore, we design a temporal contextual transformer (TCT) module with two types of token mixers to learn the temporal contextual information. Finally, the task-specific output (TSO) module generates the desired outputs. Experiments on four publicly available datasets show that EmT achieves higher results than the baseline methods for both EEG emotion classification and regression tasks. The code is available at https://github.com/yi-ding-cs/EmT.

PaperID: 761,

Authors: Jialu Chen, Rui Chen, Gang Kou

Affiliations: School of Business Administration, Faculty of Business Administration, Southwestern University of Finance and Economics, Chengdu, China

Title: Unifying Attribute and Structure Preservation for Enhanced Graph Contrastive Learning

Abstract:
The graph contrastive learning (GCL) has garnered significant interest due to its strong capability to capture both graph structure and node attribute information through self-supervised learning. However, current GCL frameworks primarily construct final contrastive views based on local structure view, neglecting the valuable complementary information provided by the attribute view. In this article, we aim to effectively incorporate the attribute view into GCL while leveraging multiscale structure views. We identify that directly contrasting the attribute view with the local structure view results in impaired performance, primarily due to the excessively low level of mutual information (MI) between these two contrastive views. To overcome this inherent limitation, we propose a novel Attribute and Structure-Preserving GCL framework, named attribute and structure-preserving graph contrastive learning (ASP). ASP adopts an innovative contrastive view generation process that aggregates different graph views as the final contrastive view. The framework has two main modules: the attribute-preserving contrastive learning module and the structure-preserving contrastive learning module. These modules capture attribute and long-range global structure information of the input graphs. We further extend ASP to ASP-adaptive which can flexibly generate contrastive views with adaptive aggregation mechanisms. Extensive experiments on real-world graph benchmarks demonstrate the superiority of ASP and ASP-adaptive over several representative baselines on both node classification and link prediction tasks. The source code is available at: https://github.com/JialuChenChina/ASP-adaptive.

PaperID: 762,

Authors: Cheng Liu, Rui Li, Hangjun Che, Man-Fai Leung, Si Wu, Zhiwen Yu, Hau-San Wong

Affiliations: Department of Computer Science, Shantou University, Shantou, China; Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, College of Electronic and Information Engineering, Southwest University, Chongqing, China; School of Computing and Information Science, Faculty of Science and Engineering, Anglia Ruskin University, Cambridge, U.K.; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China; Department of Computer Science, City University of Hong Kong, Hong Kong, China

Title: Beyond Euclidean Structures: Collaborative Topological Graph Learning for Multiview Clustering

Abstract:
Graph-based multiview clustering (MVC) approaches have demonstrated impressive performance by leveraging the consistency properties of multiview data in an unsupervised manner. However, existing methods for graph learning heavily rely on either Euclidean structures or the manifold topological structures derived from fixed view-specific graphs. Unfortunately, these approaches may not accurately reflect the consensus topological structure in a multiview setting. To address this limitation and enhance the intrinsic graph learning process, an adaptive exploration of a more appropriate consistency topological structure is required. Toward this end, we propose a novel approach called collaborative topological graph learning (CTGL) for MVC. The key idea is to adaptively discover the consistent topological structure to guide intrinsic graph learning. We achieve this by introducing an auxiliary consistency graph that formulates the topological relevance learning function. However, estimating the auxiliary consistency graph is not straightforward, as it is based on the learned view-specific graphs and requires prior availability. To overcome this challenge, we develop a collaborative learning strategy that simultaneously learns both the auxiliary consistency graph and view-specific graphs using tensor learning techniques. This strategy enables the adaptive exploration of the consistency topological structure during graph learning, resulting in more accurate clustering outcomes. Extensive experiments are provided to show the effectiveness of the proposed method. The source code can be found at https://github.com/CLiu272/CTGL.

PaperID: 763,

Authors: Zhen Cheng, Fei Zhu, Xu-Yao Zhang, Cheng-Lin Liu

Affiliations: State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; Centre for Artificial Intelligence and Robotics, Hong Kong Institute of Science and Innovation, Chinese Academy of Sciences, Hong Kong Science Park, Hong Kong

Title: Average of Pruning: Improving Performance and Stability of Out-of-Distribution Detection

Abstract:
Detecting out-of-distribution (OOD) inputs has been a critical issue for neural networks in the open world. However, the unstable behavior of OOD detection along the optimization trajectory during training has not been explored clearly. In this article, we first find the performance of OOD detection suffers from overfitting and instability during training: 1) the performance could decrease when the training error is near zero and 2) the performance would vary sharply in the final stage of training. Based on our findings, we propose an average of pruning (AoP), consisting of model averaging (MA) and pruning, to mitigate the unstable behaviors. Specifically, MA can help achieve a stable performance by smoothing the landscape, and pruning is theoretically and empirically verified to eliminate overfitting by avoiding redundant features. Comprehensive experiments on various datasets and architectures are conducted to verify the effectiveness of our method.

PaperID: 764,

Authors: Xiaofei Zhang, Zhengping Fan, Ying Shen, Yining Li, Yasong An, Xiaojun Tan

Affiliations: School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China

Title: MAEMOT: Pretrained MAE-Based Antiocclusion 3-D Multiobject Tracking for Autonomous Driving

Abstract:
The existing 3-D multiobject tracking (MOT) methods suffer from object occlusion in real-world traffic scenes. However, previous works have faced challenges in providing a reasonable solution to the fundamental question: “How can the interference of the perception data loss caused by occlusion be overcome?” Therefore, this article attempts to provide a reasonable solution by developing a novel pretrained movement-constrained masked autoencoder (M-MAE) for an antiocclusion 3-D MOT called MAEMOT. Specifically, for the pretrained M-MAE, this article adopts an efficient multistage transformer (MST) encoder and a spatiotemporal-based motion decoder to predict and reconstruct occluded point cloud data, following the properties of object motion. Afterward, the well-trained M-MAE model extracts the global features of occluded objects, ensuring that the features of the intraobjects between interframes are as consistent as possible throughout the spatiotemporal sequence. Next, a proposal-based geometric graph aggregation (PG2A) module is utilized to extract and fuse the spatial features of each proposal, producing refined region-of-interest (RoI) components. Finally, this article designs an object association module that combines geometric and corner affinities, which helps to match the predicted occlusion objects more robustly. According to an extensive evaluation, the proposed MAEMOT method can effectively overcome the interference of occlusion and achieve improved 3-D MOT performance under challenging conditions.

PaperID: 765,

Authors: Xueming Yan, Han Huang, Yaochu Jin, Zilong Wang, Zhifeng Hao

Affiliations: School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China; School of Software Engineering, South China University of Technology, Guangzhou, China; School of Engineering, Westlake University, Hangzhou, China; College of Mathematics and Computer Science, Shantou University, Guangdong, China

Title: Neural Architecture Search Based on Bipartite Graphs for Text Classification

Abstract:
Neural architecture search (NAS) is crucial for text representation in natural language processing (NLP); however, much less work on NAS for text classification has been proposed compared with NAS for computer vision. Similar to NAS for vision tasks, most existing work rely on a manually designed search space defined by a directed acyclic graph (DAG), resulting in limited generalization capability and high computational complexity. In text classification, the topological order of the NAS operators is essential for enhancing generalization, which cannot be accurately represented by a DAG. To address this issue, we propose a bipartite graph-based NAS (BGNAS) for text classification, which converts a DAG into a dual graph and then into a bipartite graph. This transformation makes it possible to accurately capture the topological order using multi-bigraph matching. In addition, we formulate NAS as a problem of identifying the lower bound of a submodular function, theoretically ensuring that optimal architectures in a bipartite graph-based search space can be identified using fewer search operators. Reduction of the search space is achieved by eliminating ineffective associated matching rules among search operators with a pruning strategy. As a result, the bipartite graph-based search space becomes more compact and less dependent on complex contextual semantics of text data. Experimental results on public benchmark problems demonstrate that BGNAS achieves better performance than the state-of-the-art NAS algorithms and is computationally more efficient. We also demonstrate that the bipartite graph search space can more effectively capture contextual semantics, thereby enhancing the generalization capability.

PaperID: 766,

Authors: Yunpeng Xiao, Yutong Guo, Haipeng Zhu, Chaolong Jia, Qian Li, Rong Wang, Guoyin Wang

Affiliations: School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China

Title: An Unsupervised Federated Domain Adaptation Method Based on Knowledge Distillation

Abstract:
Conventional unsupervised multi source domain adaptation (UMDA) methods are based on the assumption that all source domain data are accessible directly. Aiming at the problem that current UMDA methods cannot directly obtain source domain data in federated learning (FL), a knowledge distillation-based multisource domain adaptation method adapted to FL is proposed. First of all, considering that knowledge distillation allows learning solely through model access, this article adopts an improved voting mechanism by applying a smoothing technique to the confidence distribution in the source domain models. This reduces the influence of models with extreme high confidence, thereby extracting high-quality consensus knowledge. Second, this article designs a teacher model adaptive weighting strategy. It identifies irrelevant domains and malicious domains according to the similarity between consensus knowledge and the output of the teacher model. Then, it improves the robustness of the model for negative transfer. Finally, this article introduces the idea of contrastive learning. It can control the drift of a single source domain and bridge the deviation between the representation learned by the local model and the global model. Experiments show that the method proposed in this article is superior to the mainstream UMDA methods. Moreover, it is robust to negative transfer, which is suitable for many practical FL applications.

PaperID: 767,

Authors: Jie Hou, Xianlin Zeng, Gang Wang, Chen Chen, Jian Sun

Affiliations: National Key Laboratory on Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing, China

Title: Distributed Frank-Wolfe Solver for Stochastic Optimization With Coupled Inequality Constraints

Abstract:
Distributed stochastic optimization (DSO) with local set constraints and coupled inequality constraints over a multiagent network is considered in this article. Usually, such problems are tackled by projected primal-dual methods, which require expensive projection operations when set constraints are complicated. In this context, this article focuses on the Frank-Wolfe (FW) framework, which provides computational simplicity by avoiding expensive projection operations, for solving DSO with local set and coupled inequality constraints. By combining recursive momentum and weighted averaging, this article proposes a distributed stochastic FW primal-dual algorithm (DSFWPD), which is the first stochastic FW solver for DSO problems with coupled constraints. The proposed algorithm achieves zero constraint violation on average with a sublinear decay of the optimality gap over a directed and time-varying network. The efficacy of DSFWPD is demonstrated by several numerical experiments.

PaperID: 768,

Authors: Qian Tao, Rongshen Cai, Zicheng Lin, Yufei Tang

Affiliations: Software Service Engineering and Cloud Computing Laboratory, School of Software, South China University of Technology, Guangzhou, China; International Business Group, Ant Group Inc., Shenzhen, Guangdong, China; Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA

Title: Automatic Design of Deep Graph Neural Networks With Decoupled Mode

Abstract:
Graph neural networks (GNNs), a class of deep learning models designed for performing information interaction on non-Euclidean graph data, have been successfully applied to node classification tasks in various applications such as citation networks, recommender systems, and natural language processing. Graph node classification is an important research field for node-level tasks in graph data mining. Recently, due to the limitations of shallow GNNs, many researchers have focused on designing deep graph learning models. Previous GNN architecture search works only solve shallow networks (e.g., less than four layers). It is challenging and nonefficient to manually design deep GNNs for challenges like over-smoothing and information squeezing, which greatly limits their capabilities on large-scale graph data. In this article, we propose a novel neural architecture search (NAS) method for designing deep GNNs automatically and further exploit the application potential on various node classification tasks. Our innovations lie in two aspects, where we first redesign the deep GNNs search space for architecture search with a decoupled mode based on propagation and transformation processes, and we then formulate and solve the problem as a multiobjective optimization to balance accuracy and computational efficiency. Experiments on benchmark graph datasets show that our method performs very well on various node classification tasks, and exploiting large-scale graph datasets further validates that our proposed method is scalable.

PaperID: 769,

Authors: Guanqiang Zhou, Ping Xu, Yue Wang, Zhi Tian

Affiliations: Department of Electrical and Computer Engineering, George Mason University, Fairfax, VA, USA; Department of Electrical and Computer Engineering, University of Texas Rio Grande Valley, Edinburg, TX, USA; Department of Computer Science, Georgia State University, Atlanta, GA, USA

Title: Robust Distributed Learning Against Both Distributional Shifts and Byzantine Attacks

Abstract:
In distributed learning systems, robustness threat may arise from two major sources. On the one hand, due to distributional shifts between training data and test data, the trained model could exhibit poor out-of-sample performance. On the other hand, a portion of working nodes might be subject to Byzantine attacks, which could invalidate the learning result. In this article, we propose a new research direction that jointly considers distributional shifts and Byzantine attacks. We illuminate the major challenges in addressing these two issues simultaneously. Accordingly, we design a new algorithm that equips distributed learning with both distributional robustness and Byzantine robustness. Our algorithm is built on recent advances in distributionally robust optimization (DRO) as well as norm-based screening (NBS), a robust aggregation scheme against Byzantine attacks. We provide convergence proofs in three cases of the learning model being nonconvex, convex, and strongly convex for the proposed algorithm, shedding light on its convergence behaviors and endurability against Byzantine attacks. In particular, we deduce that any algorithm employing NBS (including ours) cannot converge when the percentage of Byzantine nodes is (1/3) or higher, instead of (1/2) , which is the common belief in current literature. The experimental results verify our theoretical findings (on the breakpoint of NBS and others) and also demonstrate the effectiveness of our algorithm against both robustness issues, justifying our choice of NBS over other widely used robust aggregation schemes. To the best of our knowledge, this is the first work to address distributional shifts and Byzantine attacks simultaneously.

PaperID: 770,

Authors: Zhan Li, Zhihan Zhang, Jie Hu, Qunkang Meng, Xingyu Shi, Jun Luo, Hao Wang, Qijun Huang, Sheng Chang

Affiliations: School of Physics and Technology, Wuhan University, Wuhan, Hubei, China; China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou, Guangdong, China; School of Physics and Technology and the School of Microelectronics, Wuhan University, Wuhan, Hubei, China

Title: A High-Performance Pixel-Level Fully Pipelined Hardware Accelerator for Neural Networks

Abstract:
The design of convolutional neural network (CNN) hardware accelerators based on a single computing engine (CE) architecture or multi-CE architecture has received widespread attention in recent years. Although this kind of hardware accelerator has advantages in hardware platform deployment flexibility and development cycle, it is still limited in resource utilization and data throughput. When processing large feature maps, the speed can usually only reach 10 frames/s, which does not meet the requirements of application scenarios, such as autonomous driving and radar detection. To solve the above problems, this article proposes a full pipeline hardware accelerator design based on pixel. By pixel-by-pixel strategy, the concept of the layer is downplayed, and the generation method of each pixel of the output feature map (Ofmap) can be optimized. To pipeline the entire computing system, we expand each layer of the neural network into hardware, eliminating the buffers between layers and maximizing the effect of complete connectivity across the entire network. This approach has yielded excellent performance. Besides that, as the pixel data stream is a fundamental paradigm in image processing, our fully pipelined hardware accelerator is universal for various CNNs (MobileNetV1, MobileNetV2 and FashionNet) in computer vision. As an example, the accelerator for MobileNetV1 achieves a speed of 4205.50 frames/s and a throughput of 4787.15 GOP/s at 211 MHz, with an output latency of 0.60 ms per image. This extremely shorts processing time and opens the door for AI’s application in high-speed scenarios.

PaperID: 771,

Authors: Zhijun Zhang, Yu He, Weijian Mai, Yamei Luo, Xiaoli Li, Yuanxiong Cheng, Xiaoming Huang, Run Lin

Affiliations: School of Automation Science and Engineering, South China University of Technology, Guangzhou, China; Department of Pulmonary and Critical Care Medicine, Guangdong Second Provincial General Hospital, Guangzhou, China; Otolaryngology-Head and Neck Surgery, Sun Yat-sen Memorial Hospital, Guangzhou, China; Department of Radiology, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China

Title: Convolutional Dynamically Convergent Differential Neural Network for Brain Signal Classification

Abstract:
The brain signal classification is the basis for the implementation of brain–computer interfaces (BCIs). However, most existing brain signal classification methods are based on signal processing technology, which require a significant amount of manual intervention, such as channel selection and dimensionality reduction, and often struggle to achieve satisfactory classification accuracy. To achieve high classification accuracy and as little manual intervention as possible, a convolutional dynamically convergent differential neural network (ConvDCDNN) is proposed for solving the electroencephalography (EEG) signal classification problem. First, a single-layer convolutional neural network is used to replace the preprocessing steps in previous work. Then, focal loss is used to overcome the imbalance in the dataset. After that, a novel automatic dynamic convergence learning (ADCL) algorithm is proposed and proved for training neural networks. Experimental results on the BCI Competition 2003, BCI Competition III A, and BCI Competition III B datasets demonstrate that the proposed ConvDCDNN framework achieved state-of-the-art performance with accuracies of 100%, 99%, and 98%, respectively. In addition, the proposed algorithm exhibits a higher information transfer rate (ITR) compared with current algorithms.

PaperID: 772,

Authors: Hejia Gao, Junjie Zhao, Juqi Hu, Changyin Sun

Affiliations: School of Artificial Intelligence, Anhui University, Anhui, China

Title: A Real-Time Grasping Detection Network Architecture for Various Grasping Scenarios

Abstract:
In the field of robot grasping detection, due to uncertain factors such as different shapes, distinct colors, diverse materials, and various poses, robot grasping has become very challenging. This article introduces a integrated robotic system designed to address the challenge of grasping numerous unknown objects within a scene from a set of \alpha -channel images. We propose a lightweight and object-independent pixel-level generative adaptive residual depthwise separable convolutional neural network (GARDSCN) with an inference speed of around 28 ms, which can be applied to real-time grasping detection. It can effectively deal with the grasping detection of unknown objects with different shapes and poses in various scenes and overcome the limitations of current robot grasping technology. The proposed network achieves 98.88% grasp detection accuracy on the Cornell dataset and 95.23% on the Jacquard dataset. To further verify the validity, the grasping experiment is conducted on a physical robot Kinova Gen2, and the grasp success rate is 96.67% in the single-object scene and 94.10% in the multiobject cluttered scene.

PaperID: 773,

Authors: Yonghuang Wu, Guoqing Wu, Jixian Lin, Yuanyuan Wang, Jinhua Yu

Affiliations: School of Information Science and Technology and the Institute of Functional and Molecular Medical Imaging, Fudan University, Shanghai, China; Central Hospital of Minhang District, Stroke Center Shanghai, Shanghai, China

Title: Role Exchange-Based Self-Training Semi-Supervision Framework for Complex Medical Image Segmentation

Abstract:
Segmentation of complex medical images such as vascular network and pulmonary tracheal network requires segmentation of many tiny targets on each tomographic section of the 3-D medical image volume. Although semantic segmentation of medical images based on deep learning has made great progress, fully supervised models require a great amount of annotations, making such complex medical image segmentation a difficult problem. In this article, we propose a semi-supervised model for complex medical image segmentation, which innovatively proposes a bidirectional self-training paradigm, through dynamically exchanging the roles of teacher and student by estimating the reliability at the model level. The direction of information and knowledge transfer between the two networks can be controlled, and the probability distribution of the roles of teacher and student in the next stage will be jointly determined by the model’s uncertainty and instability in the training process. We also resolve the problem that loosely coupled networks are prone to collapse when training on small-scale annotated data by proposing asymmetric supervision (AS) strategy and hierarchical dual student (HDS) structure. In particular, a bidirectional distillation loss combined with the role exchange (RE) strategy and a global-local-aware consistency loss are introduced to obtain stable mutual promotion and achieve matching of global and local features, respectively. We conduct detailed experiments on two public datasets and one private dataset and lead existing semi-supervised methods by a large margin, while achieving fully supervised performance at a labeling cost of 5%.

PaperID: 774,

Authors: Qiang Wang, Xiaojie Chen, Nanrong He, Attila Szolnoki

Affiliations: School of Science, Civil Aviation Flight University of China, Guanghan, China; School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, China; School of Economics and Management, Beihang University, Beijing, China; Institute of Technical Physics and Materials Science, Centre for Energy Research, Budapest, Hungary

Title: Evolutionary Dynamics of Population Games With an Aspiration-Based Learning Rule

Abstract:
Agents usually adjust their strategic behaviors based on their own payoff and aspiration in gaming environments. Hence, aspiration-based learning rules play an important role in the evolutionary dynamics in a population of competing agents. However, there exist different options for how to use the aspiration information for specifying the microscopic learning rules. It is also interesting to investigate under what conditions the aspiration-based learning rules can favor the emergence of cooperative behavior in population games. A new learning rule, called as “Satisfied-Cooperate, Unsatisfied-Defect,” is proposed here, which is based on aspiration. Under this learning rule, agents prefer to cooperate when their income is satisfied; otherwise, they prefer the strategy of defection. We introduce this learning rule to a population of agents playing a generalized two-person game. We, respectively, obtain the mathematical conditions in which cooperation is more abundant in finite well-mixed, infinite well-mixed, and structured populations under weak selection. Interestingly, we find that these conditions are identical, no matter whether the aspiration levels for cooperators and defectors are the same or not. Furthermore, we consider the prisoner’s dilemma game (PDG) as an example and perform numerical calculations and computer simulations. Our numerical and simulation results agree well and both support our theoretical predictions in the three different types of populations. We further find that our aspiration-based learning rule can promote cooperation more effectively than alternative aspiration-based learning rules in the PDG.

PaperID: 775,

Authors: Fan Li, Hong Qu, Liyan Zhang, Mingsheng Fu, Wenyu Chen, Zhang Yi

Affiliations: Key Laboratory of Birth Defects and Related Diseases of Women and Children (Sichuan University), Ministry of Education, Section of Epidemiology and Population Health, Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, China; School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China; State Grid Sichuan Electric Power Company Marketing Service Center, Chengdu, China; School of Computer Science, Sichuan University, Chengdu, China

Title: Q-ADER: An Effective Q-Learning for Recommendation With Diminishing Action Space

Abstract:
Deep reinforcement learning (RL) has been widely applied to personalized recommender systems (PRSs) as they can capture user preferences progressively. Among RL-based techniques, deep Q-network (DQN) stands out as the most popular choice due to its simple update strategy and superior performance. Typically, many recommendation scenarios are accompanied by the diminishing action space setting, where the available action space will gradually decrease to avoid recommending duplicate items. However, existing DQN-based recommender systems inherently grapple with a discrepancy between the fixed full action space inherent in the Q-network and the diminishing available action space during recommendation. This article elucidates how this discrepancy induces an issue termed action diminishing error in the vanilla temporal difference (TD) operator. Due to this discrepancy, standard DQN methods prove impractical for learning accurate value estimates, rendering them ineffective in the context of diminishing action space. To mitigate this issue, we propose the Q-learning-based action diminishing error reduction (Q-ADER) algorithm to modify the value estimate error at each step. In practice, Q-ADER augments the standard TD learning with an error reduction term which is straightforward to implement on top of the existing DQN algorithms. Experiments are conducted on four real-world datasets to verify the effectiveness of our proposed algorithm.

PaperID: 776,

Authors: Eunjin Jeon, Jaehun Choi, Heung-Il Suk

Affiliations: Department of Brain and Cognitive Engineering, Korea University, Seoul, Republic of Korea; Digital Convergence Research Laboratory, Electronics and Telecommunications Research Institute (ETRI), Daejeon, Republic of Korea

Title: ADT2R: Adaptive Decision Transformer for Dynamic Treatment Regimes in Sepsis

Abstract:
Dynamic treatment regimes (DTRs), which comprise a series of decisions taken to select adequate treatments, have attracted considerable attention in the clinical domain, especially from sepsis researchers. Existing sepsis DTR learning studies are mainly based on offline reinforcement learning (RL) approaches working on electronic healthcare records data. However, a trained policy may choose a treatment different from a human clinician’s prescription. Furthermore, most of them do not consider: 1) heterogeneity in sepsis; 2) short-term transitions; and 3) the relationship between a patient’s health state and the prescription. We propose a novel framework, an adaptive decision transformer for DTR (ADT2R), which recommends an optimal treatment action for each time step depending on the heterogeneity of the sepsis and a patient’s evolving health states. Specifically, we devise a trajectory-optimization-based module to be trained with supervision for treatments and adaptively aggregate the multihead self-attentions by deliberating on various inherent time-varying patterns among sepsis patients. Furthermore, we estimate the patient’s health state by adopting an actor-critic (AC) algorithm and inform the treatment recommendation by learning about its short-term changes. We validated the effectiveness of the proposed framework on the Medical Information Mart for Intensive Care III (MIMIC-III) dataset, an extensive intensive care database, by demonstrating performance comparable to the state-of-the-art methods.

PaperID: 777,

Authors: Cong Feng, Ao Li, Haoyue Xu, Hailu Yang, Xinwang Liu

Affiliations: School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China; School of Computer, National University of Defense Technology, Changsha, China

Title: Deep Incomplete Multiview Clustering via Local and Global Pseudo-Label Propagation

Abstract:
Since the rapid progress in multimedia and sensor technologies, multiview clustering (MVC) has become a prominent research area within machine learning and data mining, experiencing significant advancements over recent decades. MVC is distinguished from single-view clustering by its ability to integrate complementary information from multiple distinct data perspectives and enhance clustering performance. However, the efficacy of MVC methods is predicated on the availability of complete views for all samples—an assumption that frequently fails in practical scenarios where data views are often incomplete. To surmount this challenge, various approaches to incomplete MVC (IMVC) have been proposed, with deep neural networks emerging as a favored technique for their representation learning ability. Despite their promise, previous methods commonly adopt sample-level (e.g., features) or affinity-level (e.g., graphs) guidance, neglecting the discriminative label-level guidance (i.e., pseudo-labels). In this work, we propose a novel deep IMVC method termed pseudo-label propagation for deep IMVC (PLP-IMVC), which integrates high-quality pseudo-labels from the complete subset of incomplete data with deep label propagation networks to obtain improved clustering results. In particular, we first design a local model (PLP-L) that leverages pseudo-labels to their fullest extent. Then, we propose a global model (PLP-G) that exploits manifold regularization to mitigate the label noises, promote view-level information fusion, and learn discriminative unified representations. Experimental results across eight public benchmark datasets and three evaluation metrics prove our method’s efficacy, demonstrating superior performance compared to 18 advanced baseline methods.

PaperID: 778,

Authors: Dazhong Shen, Chuan Qin, Qi Zhang, Hengshu Zhu, Hui Xiong

Affiliations: Shanghai Artificial Intelligence Laboratory, Shanghai, China; PBC School of Finance, Tsinghua University, Beijing, China; Computer Network Information Center, Chinese Academy of Sciences, Beijing, China; Artificial Intelligence Thrust, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China

Title: Handling Over-Smoothing and Over-Squashing in Graph Convolution With Maximization Operation

Abstract:
Recent years have witnessed the great success of the applications of graph convolutional networks (GCNs) in various scenarios. However, due to the challenging over-smoothing and over-squashing problems, the ability of GCNs to model information from long-distance nodes has been largely limited. One solution is to aggregate features from different hops of neighborhoods with a linear combination of them followed by a shallow feature transformation. However, we demonstrate that those methods can only achieve a tradeoff between tackling those two problems. To this end, in this article, we design a simple yet effective graph convolution (GC), named maximization-based GC (MGC). Instead of using the linear combination, MGC applies an elementwise maximizing operation for exploiting all possible powers of the normalized adjacent matrix to construct a GC operation. As evidenced by theoretical and empirical analysis, MGC can effectively handle the above two problems. Besides, an efficient approximated model with a linear complexity is developed to extend MGC for large-scale graph learning. To demonstrate the effectiveness, scalability, and efficiency of our models, extensive experiments have been conducted on various benchmark datasets. In particular, our models achieve competitive performance with lower complexity, even on large graphs with more than 100M nodes. Our code is available at https://github.com/SmilesDZgk/MGC.

PaperID: 779,

Authors: Sheng Huang, Lele Fu, Yuecheng Li, Chuan Chen, Zibin Zheng, Hong-Ning Dai

Affiliations: School of Computer Science and Engineering and the School of Systems Science and Engineering, Sun Yat-sen University, Guangzhou, China; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China; School of Software Engineering, Sun Yat-sen University, Zhuhai, China; Department of Computer Science, Hong Kong Baptist University, Hong Kong, China

Title: A Cross-Client Coordinator in Federated Learning Framework for Conquering Heterogeneity

Abstract:
Federated learning, as a privacy-preserving learning paradigm, restricts the access to data of each local client, for protecting the privacy of the parties. However, in the case of heterogeneous data settings, the different data distributions among clients usually lead to the divergence of learning targets, which is an essential challenge for federated learning. In this article, we propose a federated learning framework with a unified coding space, called FedUCS, for learning cross-client uniform coding rules to solve the problem of divergent targets among multiple clients due to heterogeneous data. A cross-client coordinator co-trained by multiple clients is used as a criterion of the coding space to supervise all clients coding to a uniform space, which is the significant contribution of this article. Furthermore, in order to appropriately retain historical information and avoid forgetting previous knowledge, a partial memory mechanism is applied. Moreover, in order to further enhance the distinguishability of the unified encoding space, supervised contrastive learning is used to avoid the intersection of the encoding spaces belonging to different categories. A series of experiments are performed to verify the effectiveness of the proposed method in a federated learning setting with heterogeneous data.

PaperID: 780,

Authors: Zhaoyue Xia, Jun Du, Chunxiao Jiang, H. Vincent Poor, Zhu Han, Yong Ren

Affiliations: Department of Electronic Engineering, Tsinghua University, Beijing, China; Tsinghua Space Center, Tsinghua University, Beijing, China; Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ, USA; Department of Electrical and Computer Engineering, University of Houston, Houston, TX, USA

Title: On Inhomogeneous Infinite Products of Stochastic Matrices and Their Applications

Abstract:
With the growth of the magnitude of multiagent networks, distributed optimization holds considerable significance within complex systems. Convergence, a pivotal goal in this domain, is contingent upon the analysis of infinite products of stochastic matrices (IPSMs). In this work, the convergence properties of inhomogeneous IPSMs are investigated. The convergence rate of inhomogeneous IPSMs toward an absolute probability sequence \pi is derived. We also show that the convergence rate is nearly exponential, which coincides with existing results on ergodic chains. The methodology employed relies on delineating the interrelations among Sarymsakov matrices, scrambling matrices, and positive-column matrices. Based on the theoretical results on inhomogeneous IPSMs, we propose a decentralized projected subgradient method for time-varying multiagent systems with graph-related stretches in (sub)gradient descent directions. The convergence of the proposed method is established for convex objective functions and extended to nonconvex objectives that satisfy Polyak-Lojasiewicz (PL) conditions. To corroborate the theoretical findings, we conduct numerical simulations, aligning the outcomes with the established theoretical framework.

PaperID: 781,

Authors: Marcin Pietron, Dominik Zurek, Kamil Faber, Roberto Corizzo

Affiliations: Faculty of Computer Science, AGH University of Krakow, Kraków, Poland; Department of Computer Science, American University, Washington, DC, USA

Title: AD-NEv: A Scalable Multilevel Neuroevolution Framework for Multivariate Anomaly Detection

Abstract:
Anomaly detection tools and methods present a key capability in modern cyberphysical and failure prediction systems. Despite the fast-paced development in deep learning architectures for anomaly detection, model optimization for a given dataset is a cumbersome and time-consuming process. Neuroevolution could be an effective and efficient solution to this problem, as a fully automated search method for learning optimal neural networks, supporting both gradient and nongradient fine-tuning. However, existing methods mostly focus on optimizing model architectures without taking into account feature subspaces and model weights. In this work, we propose anomaly detection neuroevolution (AD-NEv)—a scalable multilevel optimized neuroevolution framework for multivariate time-series anomaly detection. The method represents a novel approach to synergically: 1) optimize feature subspaces for an ensemble model based on the bagging technique; 2) optimize the model architecture of single anomaly detection models; and 3) perform nongradient fine-tuning of network weights. An extensive experimental evaluation on widely adopted multivariate anomaly detection benchmark datasets shows that the models extracted by AD-NEv outperform well-known deep learning architectures for anomaly detection. Moreover, results show that AD-NEv can perform the whole process efficiently, presenting high scalability when multiple graphics processing units (GPUs) are available.

PaperID: 782,

Authors: Yunzhi Zhuge, Hongyu Gu, Lu Zhang, Jinqing Qi, Huchuan Lu

Affiliations: School of Information and Communication Engineering, Dalian University of Technology, Dalian, China

Title: Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation

Abstract:
In this article, we address the challenges in unsupervised video object segmentation (UVOS) by proposing an efficient algorithm, termed MTNet, which concurrently exploits motion and temporal cues. Unlike previous methods that focus solely on integrating appearance with motion or on modeling temporal relations, our method combines both aspects by integrating them within a unified framework. MTNet is devised by effectively merging appearance and motion features during the feature extraction process within encoders, promoting a more complementary representation. To capture the intricate long-range contextual dynamics and information embedded within videos, a temporal transformer module is introduced, facilitating efficacious interframe interactions throughout a video clip. Furthermore, we employ a cascade of decoders all feature levels across all feature levels to optimally exploit the derived features, aiming to generate increasingly precise segmentation masks. As a result, MTNet provides a strong and compact framework that explores both temporal and cross-modality knowledge to robustly localize and track the primary object accurately in various challenging scenarios efficiently. Extensive experiments across diverse benchmarks conclusively show that our method not only attains state-of-the-art performance in UVOS but also delivers competitive results in video salient object detection (VSOD). These findings highlight the method’s robust versatility and its adeptness in adapting to a range of segmentation tasks. The source code is available at https://github.com/hy0523/MTNet.

PaperID: 783,

Authors: Guangqi Wen, Xin Gao, Wenhui Tan, Peng Cao, Jinzhu Yang, Weiping Li, Osmar R. Zaïane

Affiliations: College of Computer Science and Engineering, Northeastern University, Shenyang, China; School of Software and Microelectronics, Peking University, Beijing, China; Department of Computing Science, University of Alberta, Edmonton, AB, Canada

Title: Exploring Attention and Self-Supervised Learning Mechanism for Graph Similarity Learning

Abstract:
Graph similarity estimation is a challenging task due to the complex graph structures. Though important and well-studied, three critical aspects are yet to be fully handled in a unified framework: 1) how to learn richer cross-graph interactions from a pairwise node perspective; 2) how to map the similarity matrix into a similarity score by exploiting the inherent structure in the similarity matrix; and 3) how to establish a self-supervised learning mechanism for graph similarity learning. To solve these issues, we explore multiple attention and self-supervised mechanisms for graph similarity learning in this work. More specifically, we propose a unified self-supervised nodewise attention-guided graph similarity learning framework (SNA-GSL) involving: 1) a correlation-guided contrastive learning for capturing valuable node embeddings and 2) a graph similarity learning for predicting similarity scores with multiple proposed attention mechanisms. Extensive experimental results on graph-graph regression task and graph classification task demonstrate that the proposed SNA-GSL performs favorably against state-of-the-art methods. Moreover, the remarkable achievement of our model in the graph classification task is a clear indication of its exceptional generalization capabilities. The code is available at https://github.com/IntelliDAL/Graph/SNA-GSL.

PaperID: 784,

Authors: Haonan Zhang, Longjun Liu, Yi Zhang, Xinyu Lei, Fei Hui, Bihan Wen

Affiliations: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, the National Engineering Research Center for Visual Information and Applications, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, China; School of Electronics and Control Engineering, Chang’an University, Xi’an, China; School of Electrical and Electronic Engineering, Nanyang Technological University, Jurong West, Singapore

Title: DenseKD: Dense Knowledge Distillation by Exploiting Region and Sample Importance

Abstract:
Knowledge distillation (KD) can compress deep neural networks (DNNs) by transferring the knowledge of the redundant teacher model to the resource-friendly student model, where cross-layer KD (CKD) conducts KD between each stage of students and the multiple stages of teachers. However, previous CKD schemes select the coarse-grained stagewise features of teachers to teach students, leading to improper channel alignment. Also, most of these methods conduct uniform distillation for all the knowledge, limiting students to focus more on important knowledge. To address these problems, we propose a dense KD (DenseKD) in this article, dubbed as DenseKD. First, to achieve more accurate feature alignment in CKD, we construct the learnable dense architecture to make each channel of student flexibly capture more diverse channelwise features from teacher. Moreover, we introduce region importance to investigate the region’s guiding potential, it distinguishes the influence of different regions by the variation of representations of teacher models. In addition, to make students pay more attention to useful samples in KD, we calculate sample importance by the loss of teacher models. Consistent improvements over state-of-the-art approaches are observed in experiments on multiple vision tasks. For example, in the classification task, DenseKD achieves 72.30% accuracy of ResNet-20 on CIFAR-100, which is higher than the results of previous CKD methods. In addition, in the object detection task, DenseKD gains 2.84% mean average precision (mAP) improvements of Faster R-CNN with ResNet-18 against vanilla KD.

PaperID: 785,

Authors: Yijing Wang, Xu Tang, Jingjing Ma, Xiangrong Zhang, Fang Liu, Licheng Jiao

Affiliations: Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, School of Artificial Intelligence, Xidian University, Xi’an, China; Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China

Title: Cross-Modal Remote Sensing Image-Text Retrieval via Context and Uncertainty-Aware Prompt

Abstract:
The cross-modal remote sensing image-text retrieval (CMRSITR) is a lively research topic in the remote sensing (RS) community. Benefiting from the large pretrained image-text models, many successful CMRSITR methods have been proposed in recent years. Although their performance is attractive, there are still some challenges. First, fine-tuning large pretrained models requires a significant amount of computational resources. Second, most large models are pretrained by natural images, which reduces their effectiveness in processing RS images. To tackle these challenges, we propose a new CMRSITR network named context and uncertainty-aware prompt (CUP). First, prompt tuning theory is introduced into CUP to eliminate the burden of optimization resources. By training the prompt tokens rather than all parameters, the large model’s knowledge can be transferred to CMRSITR tasks with small trainable parameters. Second, considering the differences between natural-image-based prior clues and RS images, apart from adopting the free-prompt tokens, we develop a prompt generation module (PGM) to produce the RS-oriented prompt tokens. The specific prompt tokens are rich in object-level messages of RS images, which help CUP narrow the gaps between natural large models and RS images. Third, we further design an uncertainty estimation module (UEM) to whittle down the uncertainties caused by the model and data. This way, can not only the semantic misalignment and intraclass diversity imbalance problems be mitigated but also the RS clues can be deeply explored. Competitive experimental results counted on three public benchmark datasets demonstrate that our CUP can achieve competitive performance in the CMRSITR task compared with many existing methods. Our source codes are available at: https://github.com/TangXu-Group/Cross-modal-remote-sensing-image-and-text-retrieval-models/tree/main/CUP.

PaperID: 786,

Authors: Kun Wu, Yinuo Zhao, Zhiyuan Xu, Zhengping Che, Chengxiang Yin, Chi Harold Liu, Feifei Feng, Jian Tang

Affiliations: Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY, USA; Beijing Institute of Technology, Beijing, China; Beijing Innovation Center of Humanoid Robotics, Beijing, China; Midea Group, Shanghai, China

Title: ACL-QL: Adaptive Conservative Level in Q-Learning for Offline Reinforcement Learning

Abstract:
Offline reinforcement learning (RL), which operates solely on static datasets without further interactions with the environment, provides an appealing alternative to learning a safe and promising control policy. The prevailing methods typically learn a conservative policy to mitigate the problem of Q-value overestimation, but it is prone to overdo it, leading to an overly conservative policy. Moreover, they optimize all samples equally with fixed constraints, lacking the nuanced ability to control conservative levels in a fine-grained manner. Consequently, this limitation results in a performance decline. To address the above two challenges in a united way, we propose a framework, adaptive conservative level in Q-learning (ACL-QL), which limits the Q-values in a mild range and enables adaptive control on the conservative level over each state-action pair, i.e., lifting the Q-values more for good transitions and less for bad transitions. We theoretically analyze the conditions under which the conservative level of the learned Q-function can be limited in a mild range and how to optimize each transition adaptively. Motivated by the theoretical analysis, we propose a novel algorithm, ACL-QL, which uses two learnable adaptive weight functions to control the conservative level over each transition. Subsequently, we design a monotonicity loss and surrogate losses to train the adaptive weight functions, Q-function, and policy network alternatively. We evaluate ACL-QL on the commonly used datasets for deep data-driven reinforcement learning (D4RL) benchmark and conduct extensive ablation studies to illustrate the effectiveness and state-of-the-art performance compared with existing offline DRL baselines.

PaperID: 787,

Authors: Ping He, Rong Xiao, Chenwei Tang, Shudong Huang, Jiancheng Lv, Huajin Tang

Affiliations: College of Computer Science, Sichuan University and Engineering Research Center of Machine Learning and Industry Intelligence, Ministry of Education, Chengdu, China; College of Computer Science and Technology, Zhejiang University, Hangzhou, China

Title: STSF: Spiking Time Sparse Feedback Learning for Spiking Neural Networks

Abstract:
Spiking neural networks (SNNs) are biologically plausible models known for their computational efficiency. A significant advantage of SNNs lies in the binary information transmission through spike trains, eliminating the need for multiplication operations. However, due to the spatio-temporal nature of SNNs, direct application of traditional backpropagation (BP) training still results in significant computational costs. Meanwhile, learning methods based on unsupervised synaptic plasticity provide an alternative for training SNNs but often yield suboptimal results. Thus, efficiently training high-accuracy SNNs remains a challenge. In this article, we propose a highly efficient and biologically plausible spiking time sparse feedback (STSF) learning method. This algorithm modifies synaptic weights by incorporating a neuromodulator for global supervised learning using sparse direct feedback alignment (DFA) and local homeostasis learning with vanilla spike-timing-dependent plasticity (STDP). Such neuromorphic global-local learning focuses on instantaneous synaptic activity, enabling independent and simultaneous optimization of each network layer, thereby improving biological plausibility, enhancing parallelism, and reducing storage overhead. Incorporating sparse fixed random feedback connections for global error modulation, which uses selection operations instead of multiplication operations, further improves computational efficiency. Experimental results demonstrate that the proposed algorithm markedly reduces the computational cost with significantly higher accuracy comparable to current state-of-the-art algorithms across a wide range of classification tasks. Our implementation codes are available at https://github.com/hppeace/STSF.

PaperID: 788,

Authors: Zhenzhong Wang, Qingyuan Zeng, Wanyu Lin, Min Jiang, Kay Chen Tan

Affiliations: Department of Computing, The Hong Kong Polytechnic University, Hong Kong, SAR, China; Department of Artificial Intelligence, Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan, Ministry of Culture and Tourism, School of Informatics, Xiamen University, Xiamen, Fujian, China; Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong, SAR, China

Title: Multiview Subgraph Neural Networks: Self-Supervised Learning With Scarce Labeled Data

Abstract:
While graph neural networks (GNNs) have become the de facto standard for graph-based node classification, they impose a strong assumption on the availability of sufficient labeled samples. This assumption restricts the classification performance of prevailing GNNs on many real-world applications suffering from low-data regimes. Specifically, features extracted from scarce labeled nodes could not provide sufficient supervision for the unlabeled samples, leading to severe overfitting. We point out that leveraging subgraphs to capture long-range dependencies can augment the node representation, thus alleviating the low-data regime. To this end, we present a novel self-supervised learning (SSL) framework, called multiview subgraph neural networks (Muse), for handling the long-range dependencies. In particular, we propose an information theory-based identification mechanism to identify two types of subgraphs from the views of input space and latent space, respectively. The former is to capture the local structure of the graph, while the latter captures the long-range dependencies among nodes. By fusing these two views of subgraphs, the learned representations can preserve the topological properties of the graph at large, including the local structure and long-range dependencies, thus maximizing their expressiveness. Theoretically, we provide the generalization error bound to show the effectiveness of capturing complementary information from multiview subgraphs. Empirically, we show a proof-of-concept of Muse on canonical node classification problems on graph data.

PaperID: 789,

Authors: Sili Huang, Hechang Chen, Haiyin Piao, Zhixiao Sun, Yi Chang, Lichao Sun, Bo Yang

Affiliations: School of Artificial Intelligence, Jilin University, Changchun, China; Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an, China; Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA; Key Laboratory of Symbolic Computation and Knowledge Engineer of Ministry of Education and the School of Computer Science and Technology, Jilin University, Changchun, China

Title: Boosting Weak-to-Strong Agents in Multiagent Reinforcement Learning via Balanced PPO

Abstract:
Multiagent policy gradients (MAPGs), an essential branch of reinforcement learning (RL), have made great progress in both industry and academia. However, existing models do not pay attention to the inadequate training of individual policies, thus limiting the overall performance. We verify the existence of imbalanced training in multiagent tasks and formally define it as an imbalance between policies (IBPs). To address the IBP issue, we propose a dynamic policy balance (DPB) model to balance the learning of each policy by dynamically reweighting the training samples. In addition, current methods for better performance strengthen the exploration of all policies, which leads to disregarding the training differences in the team and reducing learning efficiency. To overcome this drawback, we derive a technique named weighted entropy regularization (WER), a team-level exploration with additional incentives for individuals who exceed the team. DPB and WER are evaluated in homogeneous and heterogeneous tasks, effectively alleviating the imbalanced training problem and improving exploration efficiency. Furthermore, the experimental results show that our models can outperform the state-of-the-art MAPG methods and boast over 12.1% performance gain on average.

PaperID: 790,

Authors: Shihan Ma, Alexander Kenneth Clarke, Kostiantyn Maksymenko, Samuel Deslauriers-Gauthier, Xinjun Sheng, Xiangyang Zhu, Dario Farina

Affiliations: Department of Bioengineering, Imperial College London, London, U.K.; Neurodec, Antibes, France; State Key Laboratory of Mechanical System and Vibration and the Meta Robotics Institute, Shanghai Jiao Tong University, Shanghai, China

Title: Conditional Generative Models for Simulation of EMG During Naturalistic Movements

Abstract:
Numerical models of electromyography (EMG) signals have provided a huge contribution to our fundamental understanding of human neurophysiology and remain a central pillar of motor neuroscience and the development of human-machine interfaces. However, while modern biophysical simulations based on finite element methods (FEMs) are highly accurate, they are extremely computationally expensive and thus are generally limited to modeling static systems such as isometrically contracting limbs. As a solution to this problem, we propose to use a conditional generative model to mimic the output of an advanced numerical model. To this end, we present BioMime, a conditional generative neural network trained adversarially to generate motor unit (MU) activation potential waveforms under a wide variety of volume conductor parameters. We demonstrate the ability of such a model to predictively interpolate between a much smaller number of numerical model’s outputs with a high accuracy. Consequently, the computational load is dramatically reduced, which allows the rapid simulation of EMG signals during truly dynamic and naturalistic movements.

PaperID: 791,

Authors: Rui Fan, Tingting He, Menghan Chen, Mengyuan Zhang, Xinhui Tu, Ming Dong

Affiliations: Faculty of Artificial Intelligence in Education, Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, and the National Language Resources Monitor and Research Center for Network Media, Central China Normal University, Wuhan, China; School of Computer, Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, and the National Language Resources Monitor and Research Center for Network Media, Central China, Central China Normal University, Wuhan, China

Title: Dual Causes Generation Assisted Model for Multimodal Aspect-Based Sentiment Classification

Abstract:
Multimodal aspect-based sentiment classification (MABSC) aims to identify the sentiment polarity toward specific aspects in multimodal data. It has gained significant attention with the increasing use of social media platforms. Existing approaches primarily focus on analyzing the content of posts to predict sentiment. However, they often struggle with limited contextual information inherent in social media posts, hindering accurate sentiment detection. To overcome this issue, we propose a novel multimodal dual cause analysis (MDCA) method to track the underlying causes behind expressed sentiments. MDCA can provide additional reasoning cause (RC) and direct cause (DC) to explain why users express certain emotions, thus helping improve the accuracy of sentiment prediction. To develop a model with MDCA, we construct MABSC datasets with RC and DC by utilizing large language models (LLMs) and visual-language models. Subsequently, we devise a multitask learning framework that leverages the datasets with cause data to train a small generative model, which can generate RC and DC, and predict the sentiment assisted by these causes. Experimental results on MABSC benchmark datasets demonstrate that our MDCA model achieves the state-of-the-art performance, and the small fine-tuned model exhibits superior adaptability to MABSC compared to large models like ChatGPT and BLIP-2.

PaperID: 792,

Authors: Bo Tan, Yang Xiao, Shuai Li, Xingyu Tong, Tingbing Yan, Zhiguo Cao, Joey Tianyi Zhou

Affiliations: National Key Laboratory of Multispectral Information Intelligent Processing Technology, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China; Centre for Frontier AI Research (CFAR) and the Institute of High Performance Computing (IHPC), A*STAR, Fusionopolis, Singapore

Title: Language-Guided 3-D Action Feature Learning Without Ground-Truth Sample Class Label

Abstract:
This work pays the first research effort to leverage point cloud sequence-based Self-supervised 3-D Action Feature Learning (S3AFL), under text’s cross-modality weak supervision. We intend to fill the huge performance gap between point cloud sequence and 3-D skeleton-based manners. The key intuition derives from the observation that skeleton-based manners actually hold the human pose’s high-level knowledge that leads to attention on the body’s joint-aware local parts. Inspired by this, we propose to introduce the text’s weak supervision of high-level semantics into a point cloud sequence-based paradigm. With RGB-point cloud pair sequence acquired via RGB-D camera, text sequence is first generated from RGB component using pretrained image captioning model, as auxiliary weak supervision. Then, S3AFL runs in a cross and intra-modality contrastive learning (CL) way. To resist text’s missing and redundant semantics, feature learning is conducted in a multistage way with semantic refinement. Essentially, text is only required for training. To facilitate the feature’s representation power on fine-grained actions, a multirank max-pooling (MR-MP) way is also proposed for the point set network to better maintain discriminative clues. Experiments verify that the text’s weak supervision can facilitate performance by 10.8%, 10.4%, and 8.0% on NTU RGB+D 60, 120, and N-UCLA at most. The performance gap between point cloud sequence and skeleton-based manners has been remarkably narrowed down. The idea of transferring text’s weak supervision to S3AFL can also be applied to a skeleton manner, with strong generality. The source code is available at https://github.com/tangent-T/W3AMT.

PaperID: 793,

Authors: Tayyab Manzoor, Hailong Pei, Yuanqing Xia, Zhongqi Sun, Yasir Ali

Affiliations: School of Automation and Electrical Engineering, Zhongyuan University of Technology, Zhengzhou, Henan Province, China; Key Laboratory of Autonomous Systems and Networked Control, Ministry of Education, Unmanned Aerial Vehicle Systems Engineering Technology Research Centre of Guangdong, School of Automation Science and Engineering, South China University of Technology, Guangzhou, China; School of Automation, Beijing Institute of Technology, Beijing, China

Title: Compound Learning-Based Model Predictive Control Approach for Ducted-Fan Aerial Vehicles

Abstract:
Designing an efficient learning-based model predictive control (MPC) framework for ducted-fan unmanned aerial vehicles (DFUAVs) is a difficult task due to several factors involving uncertain dynamics, coupled motion, and unorthodox aerodynamic configuration. Existing control techniques are either developed from largely known physics-informed models or are made for specific goals. In this regard, this article proposes a compound learning-based MPC approach for DFUAVs to construct a suitable framework that exhibits efficient dynamics learning capability with adequate disturbance rejection characteristics. At the start, a nominal model from a largely unknown DFUAV model is achieved offline through sparse identification. Afterward, a reinforcement learning (RL) mechanism is deployed online to learn a policy to facilitate the initial guesses for the control input sequence. Thereafter, an MPC-driven optimization problem is developed, where the obtained nominal (learned) system is updated by the real system, yielding improved computational efficiency for the overall control framework. Under appropriate assumptions, stability and recursive feasibility are compactly ensured. Finally, a comparative study is conducted to illustrate the efficacy of the designed scheme.

PaperID: 794,

Authors: Zhangxi Xiong, Wei Li, Xiaobin Zhao, Baochang Zhang, Ran Tao, Qian Du

Affiliations: School of Information and Electronics, Beijing Institute of Technology, Beijing, China; Zhongguancun Laboratory, Beijing, China; Department of Electrical and Computer Engineering, Mississippi State University, Mississippi State, MS, USA

Title: PRF-Net: A Progressive Remote Sensing Image Registration and Fusion Network

Abstract:
Most of the existing fusion algorithms are not robust to unregistered input images. Even after image registration, nonlinear nonregistration may persist in the local areas of the images, leading to poor quality in the fused image. So, as to tackle these challenges, a progressive remote sensing image registration and fusion network is proposed in this article, and named PRF-Net, which is particularly useful when two images are from different platforms. First, a registration network is designed to register the input image patches, which includes a global spatial transform network (GSTN) and a local spatial warp network (LSWN). The GSTN is primarily used for coarse registration, applying rigid transformation to globally align the input images. After coarse registration, the preliminarily registered moving image is input into the LSWN for local fine-tuning to maximize correlation between the input image patches. Subsequently, the fine registered images are degraded and input into the fusion network to generate the fused image. To maintain sufficient spectral and spatial information of the fused image, a multiscale feature extraction (MSFE) block with a highly interpretable spatial details attention (SDA) block is designed, which can enhance the ability of fusion network to extract and preserve spatial details and spectral information. Three groups of experiments conducted on four types of remote sensing images give evidence of that the proposed PRF-Net exhibits excellent performance in both reduced and full resolutions, showcasing its outstanding registration and fusion quality.

PaperID: 795,

Authors: Junpu Zhang, Liang Li, Pei Zhang, Yue Liu, Siwei Wang, Changbao Zhou, Xinwang Liu, En Zhu

Affiliations: School of Computer, National University of Defense Technology, Changsha, China; Intelligent Game and Decision Laboratory, Beijing, China; College of Computer Science and Technology, Jilin University, Changchun, China

Title: TFMKC: Tuning-Free Multiple Kernel Clustering Coupled With Diverse Partition Fusion

Abstract:
Clustering is a popular research pipeline in unsupervised learning to find potential groupings. As a representative paradigm in multiple kernel clustering (MKC), late fusion-based models learn a consistent partition across multiple base kernels. Despite their promising performance, a common concern is the limited representation capacity caused by the inflexible fusion mechanism. Concretely, the representations are constrained by truncated-k Eigen-decomposition (EVD) without fully exploiting potential information. An intuitive idea to alleviate this concern is to generate a set of augmented partitions and then select the optimal partition by fine-tuning. However, this is overlimited by: 1) introducing undesired hyperparameters and dataset-related consequences; 2) neglecting rich information across diverse partitions; and 3) expensive parameter-tuning costs. To address these problems, we propose transforming the challenging problem of directly determining the optimal partition (optimal parameter) into a diverse partition fusion (parameter ensemble) problem. We design a novel flexible fusion mechanism called tuning-free multiple kernel clustering coupled with diverse partition fusion (TFMKC) by reweighting diverse partitions through optimization, achieving an optimal consensus partition by integrating diverse and complementary information rather than traditional fine-tuning, and distinguishing our work from existing methods. Extensive experiments verify that TFMKC achieves competitive effectiveness and efficiency over comparison baselines. The code can be accessed at https://github.com/ZJP/TFMKC.

PaperID: 796,

Authors: Ning Pang, Xiang Zhao, Weixin Zeng, Zhen Tan, Weidong Xiao

Affiliations: College of Systems Engineering, National University of Defense Technology, Changsha, China

Title: StaRS: Learning a Stable Representation Space for Continual Relation Classification

Abstract:
Relation classification (RC) aims to detect the semantic relation between two annotated entities in a piece of sentence, serving as an essential task in automatic knowledge graph construction. Due to the emergence of new relations, there is a recent trend to train RC models in continual settings. To overcome the catastrophic forgetting problem in continual learning, existing research is devoted in a two-stage training paradigm, fast adaptation to novel relations, and memory replay for all historical relations. These memory-replay-based methods explore different techniques to mitigate the forgetting problem of continual RC (CRC) models during the memory replay stage. However, we find that the representation space undergoes distortion due to the incoming of fresh relations in the fast adaptation phase. To address this issue, we propose using a knowledge distillation strategy and designing a margin loss, aiming to maintain the stability of the RC model during adaptation to new relations. In addition, in the second stage, with a limited number of typical memory instances available, we introduce a self-contrastive learning objective to facilitate learning a balanced decision boundary for RC. Through training in two stages, our objective is to acquire a stable representation space to encode instances for CRC. We experimentally demonstrate the superiority of our model over competing methods in various settings, and the results suggest that our tailored designs can achieve better performance in CRC.

PaperID: 797,

Authors: Licheng Jiao, Mengjiao Wang, Xu Liu, Lingling Li, Fang Liu, Zhixi Feng, Shuyuan Yang, Biao Hou

Title: Multiscale Deep Learning for Detection and Recognition: A Comprehensive Survey

Abstract:
Recently, the multiscale problem in computer vision has gradually attracted people’s attention. This article focuses on multiscale representation for object detection and recognition, comprehensively introduces the development of multiscale deep learning, and constructs an easy-to-understand, but powerful knowledge structure. First, we give the definition of scale, explain the multiscale mechanism of human vision, and then lead to the multiscale problem discussed in computer vision. Second, advanced multiscale representation methods are introduced, including pyramid representation, scale-space representation, and multiscale geometric representation. Third, the theory of multiscale deep learning is presented, which mainly discusses the multiscale modeling in convolutional neural networks (CNNs) and Vision Transformers (ViTs). Fourth, we compare the performance of multiple multiscale methods on different tasks, illustrating the effectiveness of different multiscale structural designs. Finally, based on the in-depth understanding of the existing methods, we point out several open issues and future directions for multiscale deep learning.

PaperID: 798,

Authors: Wenjing Li, Zhigang Li, Junfei Qiao

Affiliations: Faculty of Information Technology, Beijing University of Technology, Beijing, China

Title: A Fast Feedforward Small-World Neural Network for Nonlinear System Modeling

Abstract:
It is well-documented that cross-layer connections in feedforward small-world neural networks (FSWNNs) enhance the efficient transmission for gradients, thus improving its generalization ability with a fast learning. However, the merits of long-distance cross-layer connections are not fully utilized due to the random rewiring. In this study, aiming to further improve the learning efficiency, a fast FSWNN (FFSWNN) is proposed by taking into account the positive effects of long-distance cross-layer connections, and applied to nonlinear system modeling. First, a novel rewiring rule by giving priority to long-distance cross-layer connections is proposed to increase the gradient transmission efficiency when constructing FFSWNN. Second, an improved ridge regression method is put forward to determine the initial weights with high activation for the sigmoidal neurons in FFSWNN. Finally, to further improve the learning efficiency, an asynchronous learning algorithm is designed to train FFSWNN, with the weights connected to the output layer updated by the ridge regression method and other weights by the gradient descent method. Several experiments are conducted on four benchmark datasets from the University of California Irvine (UCI) machine learning repository and two datasets from real-life problems to evaluate the performance of FFSWNN on nonlinear system modeling. The results show that FFSWNN has significantly faster convergence speed and higher modeling accuracy than the comparative models, and the positive effects of the novel rewiring rule, the improved weight initialization, and the asynchronous learning algorithm on learning efficiency are demonstrated.

PaperID: 799,

Authors: Yuanyuan Xu, Wenjie Zhang, Xiwei Xu, Binghao Li, Ying Zhang

Affiliations: School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia; Data’s Architecture and Analytics Platforms Team, CSIRO, Sydney, NSW, Australia; School of Minerals and Energy Resources Engineering, University of New South Wales, Sydney, NSW, Australia; School of Computer Science and Information Technology and the School of Statistics and Mathematics, Zhejiang Gongshang University, Hangzhou, China

Title: Scalable and Effective Temporal Graph Representation Learning With Hyperbolic Geometry

Abstract:
Real-life graphs often exhibit intricate dynamics that evolve continuously over time. To effectively represent continuous-time dynamic graphs (CTDGs), various temporal graph neural networks (TGNNs) have been developed to model their dynamics and topological structures in Euclidean space. Despite their notable achievements, the performance of Euclidean-based TGNNs is limited and bounded by the representation capabilities of Euclidean geometry, particularly for complex graphs with hierarchical and power-law structures. This is because Euclidean space does not have enough room (its volume grows polynomially with respect to radius) to learn hierarchical structures that expand exponentially. As a result, this leads to high-distortion embeddings and suboptimal temporal graph representations. To break the limitations and enhance the representation capabilities of TGNNs, in this article, we propose a scalable and effective TGNN with hyperbolic geometries for CTDG representation (called \mathrm STGN^h ). It captures evolving behaviors and stores hierarchical structures simultaneously by integrating a memory-based module and a structure-based module into a unified framework, which can scale to billion-scale graphs. Concretely, a simple hyperbolic update gate (HuG) is designed as the memory-based module to store temporal dynamics efficiently; for the structure-based module, we propose an effective hyperbolic temporal Transformer (HyT) model to capture complex graph structures and generate up-to-date node embeddings. Extensive experimental results on a variety of medium-scale and billion-scale graphs demonstrate the superiority of the proposed \mathrm STGN^h for CTDG representation, as it significantly outperforms baselines in various downstream tasks.

PaperID: 800,

Authors: Hongmin Liu, Yan Ding, Hui Zeng, Huayan Pu, Jun Luo, Bin Fan

Affiliations: School of Intelligence Science and Technology and the Institute of Artificial Intelligence, University of Science and Technology Beijing, Beijing, China; School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China; State Key Laboratory of Mechanical and Transmissions, Chongqing University, Chongqing, China; State Key Laboratory of Mechanical and Transmissions and the College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing, China

Title: A Cascaded Multimodule Image Enhancement Framework for Underwater Visual Perception

Abstract:
Underwater images usually exhibit severe color cast, hazy appearance, and/or dark regions because of the complex lighting absorption and scattering in water. How to increase the quality of these degraded underwater images has emerged as a key issue for various underwater application tasks. Recent efforts have been made to deal with single type degradation, however, it is still challenging to deal with multiple degradations that usually coexist in an underwater image with a general network. The degradations in underwater images can be divided into medium-agnostic (hazy or low-light which also encountered in in-air images) and medium-specific (color distortion caused by the specific light attenuation property in water) ones. According to this observation, this article proposes a cascaded multimodule underwater image enhancement (UIE) framework to address the coexisted multiple degradations. In the proposed framework, an in-air image enhancement module and a novel proposed adaptive color channel compensation network (AC3Net) are cascaded, in which the former focuses primarily on solving medium-agnostic degradations and the latter is for handling the medium-specific degradation. This framework has good flexibility by cascading different types of in-air image enhancement networks with AC3Net to achieve various UIE. The effectiveness of the proposed framework has been extensively validated on various degraded underwater images as well as different underwater visual perception tasks.

PaperID: 801,

Authors: Dan Mikulincer, Daniel Reichman

Affiliations: Massachusetts Institute of Technology, Cambridge, MA, USA; Worcester Polytechnic Institute, Worcester, MA, USA

Title: Size and Depth of Monotone Neural Networks: Interpolation and Approximation

Abstract:
We study monotone neural networks with threshold gates where all the weights (other than the biases) are nonnegative. We focus on the expressive power and efficiency of the representation of such networks. Our first result establishes that every monotone function over [0,1]^d can be approximated within arbitrarily small additive error by a depth-4 monotone network. When d > 3 , we improve upon the previous best-known construction, which has a depth of d+1 . Our proof goes by solving the monotone interpolation problem for monotone datasets using a depth-4 monotone threshold network. In our second main result, we compare size bounds between monotone and arbitrary neural networks with threshold gates. We find that there are monotone real functions that can be computed efficiently by networks with no restriction on the gates, whereas monotone networks approximating these functions need exponential size in the dimension.

PaperID: 802,

Authors: Meng Liu, Ke Liang, Yawei Zhao, Wenxuan Tu, Sihang Zhou, Xinbiao Gan, Xinwang Liu, Kunlun He

Affiliations: School of Computer, National University of Defense Technology, Changsha, China; Medical Big Data Research Center, Chinese PLA General Hospital, Beijing, China; College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China

Title: Self-Supervised Temporal Graph Learning With Temporal and Structural Intensity Alignment

Abstract:
Temporal graph learning aims to generate high-quality representations for graph-based tasks with dynamic information, which has recently garnered increasing attention. In contrast to static graphs, temporal graphs are typically organized as node interaction sequences over continuous time rather than an adjacency matrix. Most temporal graph learning methods model current interactions by incorporating historical neighborhood. However, such methods only consider first-order temporal information while disregarding crucial high-order structural information, resulting in suboptimal performance. To address this issue, we propose a self-supervised method called S2T for temporal graph learning, which extracts both temporal and structural information to learn more informative node representations. Notably, the initial node representations combine first-order temporal and high-order structural information differently to calculate two conditional intensities. An alignment loss is then introduced to optimize the node representations, narrowing the gap between the two intensities and making them more informative. Concretely, in addition to modeling temporal information using historical neighbor sequences, we further consider structural knowledge at both local and global levels. At the local level, we generate structural intensity by aggregating features from high-order neighbor sequences. At the global level, a global representation is generated based on all nodes to adjust the structural intensity according to the active statuses on different nodes. Extensive experiments demonstrate that the proposed model S2T achieves at most 10.13% performance improvement compared with the state-of-the-art competitors on several datasets.

PaperID: 803,

Authors: Shengjia Chen, Luping Ji, Sicheng Zhu, Mao Ye

Affiliations: School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China

Title: MICPL: Motion-Inspired Cross-Pattern Learning for Small-Object Detection in Satellite Videos

Abstract:
For small-object detection, vision patterns can only provide limited support to feature learning. Most prior schemes mainly depend on a single vision pattern to learn object features, seldom considering more latent motion patterns. In the real world, humans often efficiently perceive small objects through multipattern signals. Inspired by this observation, this article attempts to address small-object detection from a new prospective of latent pattern learning. To fulfill this purpose, it regards a real-world moving object as the spatiotemporal sequences of a static object to capture latent motion patterns. In view of this, we propose a motion-inspired cross-pattern learning (MICPL) scheme to capture the motion patterns for moving small-object scenarios. This scheme mainly consists of two crucial parts: motion pattern mining (MPM) and motion-vision adaption. The former is designed to effectively mine the motion pattern from time-dependent representation space. The latter is devised to correlate between motion patterns and vision semantics. In the meanwhile, we explore their cross-pattern interactions to guide MICPL to capture motion patterns effectively. Comparison experiments verify that, cooperated by motion pattern, even a simple detector could often refresh state-of-the-art (SOTA) results on moving small-object detection. Moreover, the experiments on two small-object-related tasks further prove the adaptivity and advantages of our cross-pattern feature learning scheme. Our source codes are available at https://github.com/UESTC-nnLab/MICPL.

PaperID: 804,

Authors: Rodrigo Alves, Antoine Ledent

Affiliations: Department of Applied Mathematics, Czech Technical University in Prague, Prague , Czech Republic; Singapore Management University, Bras Basah, Singapore

Title: Context-Aware REpresentation: Jointly Learning Item Features and Selection From Triplets

Abstract:
In areas of machine learning such as cognitive modeling or recommendation, user feedback is usually context-dependent. For instance, a website might provide a user with a set of recommendations and observe which (if any) of the links were clicked by the user. Similarly, there is growing interest in the so-called “odd-one-out” learning setting, where human participants are provided with a basket of items and asked which is the most dissimilar to the others. In both of those cases, the presence of all the items in the basket can influence the final decision. In this article, we consider a classification task where each input consists of three items (a triplet), and the task is to predict which of the three will be selected. Our aim is not only to return accurate predictions for the selection task, but also to additionally provide interpretable feature representations for both the context and for each individual item. To achieve this, we introduce CARE, a specialized neural network architecture that yields Context-Aware REpresentations of items based on observations of triplets of items alone. We demonstrate that, in addition to achieving state-of-the-art performance at the selection task, our model can produce meaningful representations both for each item, as well for each context (triplet of items). This is done using only triplet responses: CARE has no access to supervised item-level information. In addition, we prove parameter counting generalization bounds for our model in the i.i.d. setting, demonstrating that the apparent sample sparsity arising from the combinatorially large number of possible triplets is no obstacle to efficient learning.

PaperID: 805,

Authors: Yang Sui, Miao Yin, Yu Gong, Bo Yuan

Affiliations: Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA; Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, USA

Title: Co-Exploring Structured Sparsification and Low-Rank Tensor Decomposition for Compact DNNs

Abstract:
Sparsification and low-rank decomposition are two important techniques to compress deep neural network (DNN) models. To date, these two popular yet distinct approaches are typically used in separate ways; while their efficient integration for better compression performance is little explored, especially for structured sparsification and decomposition. In this article, we perform systematic co-exploration on structured sparsification and decomposition toward compact DNN models. We first investigate and analyze several important design factors for joint structured sparsification and decomposition, including operational sequence, decomposition format, and optimization procedure. Based on the observations from our analysis, we then propose CEPD, a unified DNN compression framework that can co-explore the benefits of structured sparsification and tensor decomposition in an efficient way. Empirical experiments demonstrate the promising performance of our proposed solution. Notably, on the CIFAR-10 dataset, CEPD brings 0.72%–0.45% accuracy increase over the baseline ResNet-56 and MobileNetV2 models, respectively, and meanwhile, the computational costs are reduced by 43.0%–44.2%, respectively. On the ImageNet dataset, our approach can enable 0.10%–1.39% accuracy increase over the baseline ResNet-18 and ResNet-50 models with 59.4%–54.6% fewer parameters, respectively.

PaperID: 806,

Authors: Yourun Zhang, Maoguo Gong, Jianzhao Li, Kaiyuan Feng, Mingyang Zhang

Affiliations: Hangzhou Institute of Technology, Xidian University, Hangzhou, China; Key Laboratory of Collaborative Intelligence Systems, Ministry of Education, Xidian University, Xi’an, China

Title: Few-Shot Learning With Enhancements to Data Augmentation and Feature Extraction

Abstract:
The few-shot image classification task is to enable a model to identify novel classes by using only a few labeled samples as references. In general, the more knowledge a model has, the more robust it is when facing novel situations. Although directly introducing large amounts of new training data to acquire more knowledge is an attractive solution, it violates the purpose of few-shot learning with respect to reducing dependence on big data. Another viable option is to enable the model to accumulate knowledge more effectively from existing data, i.e., improve the utilization of existing data. In this article, we propose a new data augmentation method called self-mixup (SM) to assemble different augmented instances of the same image, which facilitates the model to more effectively accumulate knowledge from limited training data. In addition to the utilization of data, few-shot learning faces another challenge related to feature extraction. Specifically, existing metric-based few-shot classification methods rely on comparing the extracted features of the novel classes, but the widely adopted downsampling structures in various networks can lead to feature degradation due to the violation of the sampling theorem, and the degraded features are not conducive to robust classification. To alleviate this problem, we propose a calibration-adaptive downsampling (CADS) that calibrates and utilizes the characteristics of different features, which can facilitate robust feature extraction and benefit classification. By improving data utilization and feature extraction, our method shows superior performance on four widely adopted few-shot classification datasets.

PaperID: 807,

Authors: Luca Bortolussi, Ginevra Carbone, Luca Laurenti, Andrea Patane, Guido Sanguinetti, Matthew Wicker

Affiliations: Department of Mathematics, Informatics and Geosciences, University of Trieste, Trieste, Italy; Department of Mathematics and Geosciences, University of Trieste, Trieste, Italy; Delft Center for Systems and Control, TU Delft University, Delft, The Netherlands; School of Computer Science and Statistics, Trinity College, Dublin, Ireland; SISSA, International School for Advanced Studies, Trieste, Italy; Department of Computer Science, University of Oxford, Oxford, U.K.

Title: On the Robustness of Bayesian Neural Networks to Adversarial Attacks

Abstract:
Vulnerability to adversarial attacks is one of the principal hurdles to the adoption of deep learning in safety-critical applications. Despite significant efforts, both practical and theoretical, training deep learning models robust to adversarial attacks is still an open problem. In this article, we analyse the geometry of adversarial attacks in the over-parameterized limit for Bayesian neural networks (BNNs). We show that, in the limit, vulnerability to gradient-based attacks arises as a result of degeneracy in the data distribution, i.e., when the data lie on a lower dimensional submanifold of the ambient space. As a direct consequence, we demonstrate that in this limit, BNN posteriors are robust to gradient-based adversarial attacks. Crucially, by relying on the convergence of infinitely-wide BNNs to Gaussian processes (GPs), we prove that, under certain relatively mild assumptions, the expected gradient of the loss with respect to the BNN posterior distribution is vanishing, even when each NN sampled from the BNN posterior does not have vanishing gradients. The experimental results on the MNIST, Fashion MNIST, and a synthetic dataset with BNNs trained with Hamiltonian Monte Carlo and variational inference support this line of arguments, empirically showing that BNNs can display both high accuracy on clean data and robustness to both gradient-based and gradient-free adversarial attacks.

PaperID: 808,

Authors: Jin Li, Qirong Zhang, Wenxi Liu, Antoni B. Chan, Yang-Geng Fu

Affiliations: College of Computer and Data Science, Fuzhou University, Fuzhou, China; Department of Computer Science, City University of Hong Kong, Hong Kong

Title: Another Perspective of Over-Smoothing: Alleviating Semantic Over-Smoothing in Deep GNNs

Abstract:
Graph neural networks (GNNs) are widely used for analyzing graph-structural data and solving graph-related tasks due to their powerful expressiveness. However, existing off-the-shelf GNN-based models usually consist of no more than three layers. Deeper GNNs usually suffer from severe performance degradation due to several issues including the infamous “over-smoothing” issue, which restricts the further development of GNNs. In this article, we investigate the over-smoothing issue in deep GNNs. We discover that over-smoothing not only results in indistinguishable embeddings of graph nodes, but also alters and even corrupts their semantic structures, dubbed semantic over-smoothing. Existing techniques, e.g., graph normalization, aim at handling the former concern, but neglect the importance of preserving the semantic structures in the spatial domain, which hinders the further improvement of model performance. To alleviate the concern, we propose a cluster-keeping sparse aggregation strategy to preserve the semantic structure of embeddings in deep GNNs (especially for spatial GNNs). Particularly, our strategy heuristically redistributes the extent of aggregations for all the nodes from layers, instead of aggregating them equally, so that it enables aggregate concise yet meaningful information for deep layers. Without any bells and whistles, it can be easily implemented as a plug-and-play structure of GNNs via weighted residual connections. Last, we analyze the over-smoothing issue on the GNNs with weighted residual structures and conduct experiments to demonstrate the performance comparable to the state-of-the-arts.

PaperID: 809,

Authors: Honggui Han, Qiyu Zhang, Fangyu Li, Yongping Du

Title: Foreground Capture Feature Pyramid Network-Oriented Object Detection in Complex Backgrounds

Abstract:
Feature pyramids are widely adopted in visual detection models for capturing multiscale features of objects. However, the utilization of feature pyramids in practical object detection tasks is prone to complex background interference, resulting in suboptimal capture of discriminative multiscale foreground semantic features. In this article, a foreground capture feature pyramid network (FCFPN) for multiscale object detection is proposed, to address the problem of inadequate feature learning in complex backgrounds. FCFPN consists of a foreground dual attention (FDA) module and a pathway aggregation (PA) structure. Specifically, the FDA mechanism activates top–down foreground channel responses and lateral spatial foreground location features, so that channel and spatial foreground features are adequately captured. Then, the PA module adaptively learns the fusion weights of multiscale features at different levels of the feature pyramid, which enhances the complementarity of semantic information between different levels of the foreground feature maps. Since the fusion weights are learned adaptively based on different pyramid levels, the detection model accordingly retains the gained information of feature sizes and suppresses the conflicting information. The evaluations on public datasets and the self-built complex background dataset demonstrate that the detection average precision (AP) and the feature learning performance of the proposed method are superior compared with other FPNs, which proves the effectiveness of the proposed FCFPN.

PaperID: 810,

Authors: Hanzhuo Tan, Chunpu Xu, Jing Li, Yuqun Zhang, Zeyang Fang, Zeyu Chen, Baohua Lai

Affiliations: Department of Computing, Hong Kong Polytechnic University, Hung Hom, Hong Kong; Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China; Baidu Inc., Beijing, China

Title: HICL: Hashtag-Driven In-Context Learning for Social Media Natural Language Understanding

Abstract:
Natural language understanding (NLU) is integral to various social media applications. However, the existing NLU models rely heavily on context for semantic learning, resulting in compromised performance when faced with short and noisy social media content. To address this issue, we leverage in-context learning (ICL), wherein language models learn to make inferences by conditioning on a handful of demonstrations to enrich the context and propose a novel hashtag-driven ICL (HICL) framework. Concretely, we pretrain a model #Encoder, which employs #hashtags (user-annotated topic labels) to drive BERT-based pretraining through contrastive learning. Our objective here is to enable #Encoder to gain the ability to incorporate topic-related semantic information, which allows it to retrieve topic-related posts to enrich contexts and enhance social media NLU with noisy contexts. To further integrate the retrieved context with the source text, we employ a gradient-based method to identify trigger terms useful in fusing information from both sources. For empirical studies, we collected 45 M tweets to set up an in-context NLU benchmark, and the experimental results on seven downstream tasks show that HICL substantially advances the previous state-of-the-art results. Furthermore, we conducted an extensive analysis and found that the following hold: 1) combining source input with a top-retrieved post from #Encoder is more effective than using semantically similar posts and 2) trigger words can largely benefit in merging context from the source and retrieved posts.

PaperID: 811,

Authors: Chanjuan Liu, Jinmiao Cong, Guangyuan Liu, Guifei Jiang, Xirong Xu, Enqiang Zhu

Affiliations: School of Computer Science and Technology, Dalian University of Technology, Dalian, China; College of Software, Nankai University, Tianjin, China; Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China

Title: Boosting Reinforcement Learning via Hierarchical Game Playing With State Relay

Abstract:
Due to its wide application, deep reinforcement learning (DRL) has been extensively studied in the motion planning community in recent years. However, in the current DRL research, regardless of task completion, the state information of the agent will be reset afterward. This leads to a low sample utilization rate and hinders further explorations of the environment. Moreover, in the initial training stage, the agent has a weak learning ability in general, which affects the training efficiency in complex tasks. In this study, a new hierarchical reinforcement learning (HRL) framework dubbed hierarchical learning based on game playing with state relay (HGR) is proposed. In particular, we introduce an auxiliary penalty to regulate task difficulty, and one training mechanism, the state relay mechanism, is designed. The relay mechanism can make full use of the intermediate states of the agent and expand the environment exploration of low-level policy. Our algorithm can improve the sample utilization rate, reduce the sparse reward problem, and thereby enhance the training performance in complex environments. Simulation tests are carried out on two public experiment platforms, i.e., MazeBase and MuJoCo, to verify the effectiveness of the proposed method. The results show that HGR significantly benefits the reinforcement learning (RL) area.

PaperID: 812,

Authors: Seungwoo Jeong, Wonsik Jung, Junghyo Sohn, Heung-Il Suk

Affiliations: Department of Artificial Intelligence, Korea University, Seoul, South Korea; Department of Brain and Cognitive Engineering, Korea University, Seoul, South Korea

Title: Deep Geometric Learning With Monotonicity Constraints for Alzheimer's Disease Progression

Abstract:
Alzheimer’s disease (AD) is a devastating neurodegenerative condition that precedes progressive and irreversible dementia; thus, predicting its progression over time is vital for clinical diagnosis and treatment. For this, numerous studies have implemented structural magnetic resonance imaging (MRI) to model AD progression, focusing on three integral aspects: 1) temporal variability; 2) incomplete observations; and 3) temporal geometric characteristics. However, many pioneer deep learning-based approaches addressing data variability and sparsity have yet to consider inherent geometrical properties sufficiently. These properties are integral to modeling as they correlate with brain region size, thickness, volume, and shape in AD progression. The ordinary differential equation-based geometric modeling method (ODE-RGRU) has recently emerged as a promising strategy for modeling time-series data by intertwining a recurrent neural network (RNN) and an ODE in Riemannian space. Despite its achievements, ODE-RGRU encounters limitations when extrapolating positive definite symmetric matrices from incomplete samples, leading to feature reverse occurrences that are particularly problematic, especially within the clinical facet. Therefore, this study proposes a novel geometric learning approach that models longitudinal MRI biomarkers and cognitive scores by combining three modules: topological space shift, ODE-RGRU, and trajectory estimation. We have also developed a training algorithm that integrates the manifold mapping with monotonicity constraints to reflect measurement transition irreversibility. We verify our proposed method’s efficacy by predicting clinical labels and cognitive scores over time in regular and irregular settings. Furthermore, we thoroughly analyze our proposed framework through an ablation study.

PaperID: 813,

Authors: Qiang He, Li Zhang, Hui Fang, Xing-wei Wang, Lianbo Ma, Keping Yu, Jie Zhang

Affiliations: College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China; Research Institute for Interdisciplinary Sciences and the Key Laboratory of Interdisciplinary Research of Computation and Economics, Shanghai University of Finance and Economics, Shanghai, China; College of Computer Science and Engineering, Northeastern University, Shenyang, China; College of Software Engineering, Northeastern University, Shenyang, China; Graduate School of Science and Engineering, Hosei University, Tokyo, Japan; School of Computer Science and Engineering, Nanyang Technological University, Jurong West, Singapore

Title: Multistage Competitive Opinion Maximization With Q-Learning-Based Method in Social Networks

Abstract:
Competitive opinion maximization (COM) aims to determine some individuals (i.e., seed nodes) from social networks, propagating the desired opinions toward a target entity to their neighbors through social relationships when facing with its competitors (components) and maximize the opinion spread after the specific time. Current studies on COM are still in its infancy, while the only work merely considers the scenario that the strategy of competitors is known but ignores the unknown scenario. In addition, previous studies on COM cannot easily address the situation where some users might dynamically change their opinions. To address the COM issue, we investigate the multistage COM and propose a brand-new Q-learning-based opinion maximization framework (QOMF). Our QOMF consists of two components: dynamic opinion propagation and seeding process. We formulate the COM problem by maximizing relative effective opinions. To produce a dynamic opinion series more realistically, we design an opinion propagation model by joining the activation process and a dynamic opinion process. Moreover, we also verify that the opinion propagation model can reach convergence within finite iterations. To acquire the seed nodes, we design a multistage Q-learning seeding scheme by considering known and unknown competitor strategies, respectively. Experimental results on three real datasets demonstrate that the proposed method outperforms the benchmarks on reaching relatively effective opinions.

PaperID: 814,

Authors: Pingyu Wang, Fei Su, Zhicheng Zhao, Yanyun Zhao, Nikolaos V. Boulgouris

Affiliations: Beijing Key Laboratory of Network System and Network Culture, School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China; Department of Electronic and Computer Engineering, Brunel University London, Uxbridge, U.K

Title: GAReID: Grouped and Attentive High-Order Representation Learning for Person Re-Identification

Abstract:
As person parts are frequently misaligned between detected human boxes, an image representation that can handle this part misalignment is required. In this work, we propose an effective grouped attentive re-identification (GAReID) framework to learn part-aligned and background robust representations for person re-identification (ReID). Specifically, the GAReID framework consists of grouped high-order pooling (GHOP) and attentive high-order pooling (AHOP) layers, which generate high-order image and foreground features, respectively. In addition, a novel grouped Kronecker product (GKP) is proposed to use both channel group and shuffle strategies for high-order feature compression, while promoting the representational capabilities of compressed high-order features. We show that our method derives from an interpretable motivation and elegantly reduces part misalignments without using landmark detection or feature partition. This article theoretically and experimentally demonstrates the superiority of the GAReID framework, achieving state-of-the-art performance on various person ReID datasets.

PaperID: 815,

Authors: Xiaofeng Ding, Chaomin Shen, Tieyong Zeng, Yaxin Peng

Affiliations: Department of Mathematics, School of Science, Shanghai University, Shanghai, China; School of Computer Science, East China Normal University, Shanghai, China; Department of Mathematics, The Chinese University of Hong Kong, Hong Kong, China

Title: SAB Net: A Semantic Attention Boosting Framework for Semantic Segmentation

Abstract:
Semantic segmentation has achieved great progress by effectively fusing features of contextual information. In this article, we propose an end-to-end semantic attention boosting (SAB) framework to adaptively fuse the contextual information iteratively across layers with semantic regularization. Specifically, we first propose a pixelwise semantic attention (SAP) block, with a semantic metric representing the pixelwise category relationship, to aggregate the nonlocal contextual information. In addition, we improve the computation complexity of SAP block from O(n^2) to O(n) for images with size n . Second, we present a categorywise semantic attention (SAC) block to adaptively balance the nonlocal contextual dependencies and the local consistency with a categorywise weight, overcoming the contextual information confusion caused by the feature imbalance within intra-category. Furthermore, we propose the SAB module to refine the segmentation with SAC and SAP blocks. By applying the SAB module iteratively across layers, our model shrinks the semantic gap and enhances the structure reasoning by fully utilizing the coarse segmentation information. Extensive quantitative evaluations demonstrate that our method significantly improves the segmentation results and achieves superior performance on the PASCAL VOC 2012, Cityscapes, PASCAL Context, and ADE20K datasets.

PaperID: 816,

Authors: Kai Sun, Jiangshe Zhang, Shuang Xu, Zixiang Zhao, Chunxia Zhang, Junmin Liu, Junying Hu

Affiliations: School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, China; School of Mathematics and Statistics, Northwestern Polytechnical University, Xi’an, China; School of Mathematics, Northwest University, Xi’an, China

Title: CACNN: Capsule Attention Convolutional Neural Networks for 3D Object Recognition

Abstract:
Recently, view-based approaches, which recognize a 3D object through its projected 2-D images, have been extensively studied and have achieved considerable success in 3D object recognition. Nevertheless, most of them use a pooling operation to aggregate viewwise features, which usually leads to the visual information loss. To tackle this problem, we propose a novel layer called capsule attention layer (CAL) by using attention mechanism to fuse the features expressed by capsules. In detail, instead of dynamic routing algorithm, we use an attention module to transmit information from the lower level capsules to higher level capsules, which obviously improves the speed of capsule networks. In particular, the view pooling layer of multiview convolutional neural network (MVCNN) becomes a special case of our CAL when the trainable weights are chosen on some certain values. Furthermore, based on CAL, we propose a capsule attention convolutional neural network (CACNN) for 3D object recognition. Extensive experimental results on three benchmark datasets demonstrate the efficiency of our CACNN and show that it outperforms many state-of-the-art methods.

PaperID: 817,

Authors: Yang Yang, Guan'an Wang, Prayag Tiwari, Hari Mohan Pandey, Zhen Lei

Affiliations: National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, China; Department of Computer Science, Aalto University, Espoo, Finland; Department of Computer Science, Edge Hill University, Ormskirk L QP, U.K

Title: Pixel and Feature Transfer Fusion for Unsupervised Cross-Dataset Person Reidentification

Abstract:
Recently, unsupervised cross-dataset person reidentification (Re-ID) has attracted more and more attention, which aims to transfer knowledge of a labeled source domain to an unlabeled target domain. There are two common frameworks: one is pixel-alignment of transferring low-level knowledge, and the other is feature-alignment of transferring high-level knowledge. In this article, we propose a novel recurrent autoencoder (RAE) framework to unify these two kinds of methods and inherit their merits. Specifically, the proposed RAE includes three modules, i.e., a feature-transfer (FT) module, a pixel-transfer (PT) module, and a fusion module. The FT module utilizes an encoder to map source and target images to a shared feature space. In the space, not only features are identity-discriminative but also the gap between source and target features is reduced. The PT module takes a decoder to reconstruct original images with its features. Here, we hope that the images reconstructed from target features are in the source style. Thus, the low-level knowledge can be propagated to the target domain. After transferring both high-and low-level knowledge with the two proposed modules above, we design another bilinear pooling layer to fuse both kinds of knowledge. Extensive experiments on Market-1501, DukeMTMC-ReID, and MSMT17 datasets show that our method significantly outperforms either pixel-alignment or feature-alignment Re-ID methods and achieves new state-of-the-art results.

PaperID: 818,

Authors: Huafeng Li, Ming Yuan, Jinxing Li, Yu Liu, Guangming Lu, Yong Xu, Zhengtao Yu, David Zhang

Affiliations: Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China; Harbin Institute of Technology at Shenzhen, Shenzhen, China; Department of Biomedical Engineering, Hefei University of Technology, Hefei, China; School of Data Science, The Chinese University of Hong Kong at Shenzhen, Shenzhen, China

Title: Focus Affinity Perception and Super-Resolution Embedding for Multifocus Image Fusion

Abstract:
Despite the fact that there is a remarkable achievement on multifocus image fusion, most of the existing methods only generate a low-resolution image if the given source images suffer from low resolution. Obviously, a naive strategy is to independently conduct image fusion and image super-resolution. However, this two-step approach would inevitably introduce and enlarge artifacts in the final result if the result from the first step meets artifacts. To address this problem, in this article, we propose a novel method to simultaneously achieve image fusion and super-resolution in one framework, avoiding step-by-step processing of fusion and super-resolution. Since a small receptive field can discriminate the focusing characteristics of pixels in detailed regions, while a large receptive field is more robust to pixels in smooth regions, a subnetwork is first proposed to compute the affinity of features under different types of receptive fields, efficiently increasing the discriminability of focused pixels. Simultaneously, in order to prevent from distortion, a gradient embedding-based super-resolution subnetwork is also proposed, in which the features from the shallow layer, the deep layer, and the gradient map are jointly taken into account, allowing us to get an upsampled image with high resolution. Compared with the existing methods, which implemented fusion and super-resolution independently, our proposed method directly achieves these two tasks in a parallel way, avoiding artifacts caused by the inferior output of image fusion or super-resolution. Experiments conducted on the real-world dataset substantiate the superiority of our proposed method compared with state of the arts.

PaperID: 819,

Authors: Cheng Wen, Jianzhi Long, Baosheng Yu, Dacheng Tao

Affiliations: School of Computer Science, The University of Sydney, Sydney, NSW, Australia

Title: PointWavelet: Learning in Spectral Domain for 3-D Point Cloud Analysis

Abstract:
With recent success of deep learning in 2-D visual recognition, deep-learning-based 3-D point cloud analysis has received increasing attention from the community, especially due to the rapid development of autonomous driving technologies. However, most existing methods directly learn point features in the spatial domain, leaving the local structures in the spectral domain poorly investigated. In this article, we introduce a new method, PointWavelet, to explore local graphs in the spectral domain via a learnable graph wavelet transform. Specifically, we first introduce the graph wavelet transform to form multiscale spectral graph convolution to learn effective local structural representations. To avoid the time-consuming spectral decomposition, we then devise a learnable graph wavelet transform, which significantly accelerates the overall training process. Extensive experiments on four popular point cloud datasets, ModelNet40, ScanObjectNN, ShapeNet-Part, and S3DIS, demonstrate the effectiveness of the proposed method on point cloud classification and segmentation.

PaperID: 820,

Authors: Quan Zhou, Huimin Shi, Weikang Xiang, Bin Kang, Longin Jan Latecki

Affiliations: National Engineering Research Center of Communications and Networking, Nanjing University of Posts and Telecommunications, Nanjing, China; Wuxi Esiontech Company Ltd., Wuxi, China; Department of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, China; Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA

Title: DPNet: Dual-Path Network for Real-Time Object Detection With Lightweight Attention

Abstract:
The recent advances in compressing high-accuracy convolutional neural networks (CNNs) have witnessed remarkable progress in real-time object detection. To accelerate detection speed, lightweight detectors always have few convolution layers using a single-path backbone. Single-path architecture, however, involves continuous pooling and downsampling operations, always resulting in coarse and inaccurate feature maps that are disadvantageous to locate objects. On the other hand, due to limited network capacity, recent lightweight networks are often weak in representing large-scale visual data. To address these problems, we present a dual-path network, named DPNet, with a lightweight attention scheme for real-time object detection. The dual-path architecture enables us to extract in parallel high-level semantic features and low-level object details. Although DPNet has a nearly duplicated shape with respect to single-path detectors, the computational costs and model size are not significantly increased. To enhance representation capability, a lightweight self-correlation module (LSCM) is designed to capture global interactions, with only a few computational overheads and network parameters. In the neck, LSCM is extended into a lightweight cross correlation module (LCCM), capturing mutual dependencies among neighboring scale features. We have conducted exhaustive experiments on MS COCO, Pascal VOC 2007, and ImageNet datasets. The experimental results demonstrate that DPNet achieves a state-of-the-art trade off between detection accuracy and implementation efficiency. More specifically, DPNet achieves 31.3% AP on MS COCO test-dev, 82.7% mAP on Pascal VOC 2007 test set, and 41.6% mAP on ImageNet validation set, together with nearly 2.5M model size, 1.04 GFLOPs, and 164 and 196 frames/s (FPS) FPS for 320 \; × \; 320 input images of three datasets.

PaperID: 821,

Authors: Binit Singh, Divij Singh, Rohan Kaushal, Agrya Halder, Pratik Chattopadhyay

Affiliations: Department of Computer Science and Engineering, IIT (Banaras Hindu University) Varanasi, Varanasi, India

Title: GSSTU: Generative Spatial Self-Attention Transformer Unit for Enhanced Video Prediction

Abstract:
Future frame prediction is a challenging task in computer vision with practical applications in areas such as video generation, autonomous driving, and robotics. Traditional recurrent neural networks have limited effectiveness in capturing long-range dependencies between frames, and combining convolutional neural networks (CNNs) with recurrent networks has limitations in modeling complex dependencies. Generative adversarial networks have shown promising results, but they are computationally expensive and suffer from instability during training. In this article, we propose a novel approach for future frame prediction that combines the encoding capabilities of 3-D CNNs with the sequence modeling capabilities of Transformers. We also propose a spatial self-attention mechanism and a novel neighborhood pixel intensity loss to preserve structural information and local intensity, respectively. Our approach outperforms existing methods in terms of structural similarity (SSIM), peak signal-to-noise ratio (PSNR), and learned perceptual image patch similarity (LPIPS) scores on five public datasets. More precisely, our model exhibited an average improvement of 4.64%, 18.5%, and 42% concerning SSIM, PSNR, and LPIPS for the second most proficient method correspondingly, across all datasets. The results demonstrate the effectiveness of our proposed method in generating high-quality predictions of future frames.

PaperID: 822,

Authors: Bo-Jian Zhang, Guang-Hai Liu, Zuo-Yong Li, Shu-Xiang Song

Affiliations: College of Computer Science and Engineering, Guangxi Normal University, Guilin, China; College of Computer and Control Engineering, Minjiang University, Fuzhou, China; College of Electronic Engineering, Guangxi Normal University, Guilin, China

Title: Locating Target Regions for Image Retrieval in an Unsupervised Manner

Abstract:
Image retrieval performance can be improved by training a convolutional neural network (CNN) model with annotated data to facilitate accurate localization of target regions. However, obtaining sufficiently annotated data is expensive and impractical in real settings. It is challenging to achieve accurate localization of target regions in an unsupervised manner. To address this problem, we propose a new unsupervised image retrieval method named unsupervised target region localization (UTRL) descriptors. It can precisely locate target regions without supervisory information or learning. Our method contains three highlights: 1) we propose a novel zero-label transfer learning method to address the problem of co-localization in target regions. This enhances the potential localization ability of pretrained CNN models through a zero-label data-driven approach; 2) we propose a multiscale attention accumulation method to accurately extract distinguishable target features. It distinguishes the importance of features by using local Gaussian weights; and 3) we propose a simple yet effective method to reduce vector dimensionality, named twice-PCA-whitening (TPW), which reduces the performance degradation caused by feature compression. Notably, TPW is a robust and general method that can be widely applied to image retrieval tasks to improve retrieval performance. This work also facilitates the development of image retrieval based on short vector features. Extensive experiments on six popular benchmark datasets demonstrate that our method achieves about 7% greater mean average precision (mAP) compared to existing state-of-the-art unsupervised methods.

PaperID: 823,

Authors: Yuning Qiu, Guoxu Zhou, Andong Wang, Qibin Zhao, Shengli Xie

Affiliations: School of Automation, Guangdong University of Technology, Guangzhou, China; Center for Advanced Intelligence Project (AIP), RIKEN, Tokyo, Japan

Title: Balanced Unfolding Induced Tensor Nuclear Norms for High-Order Tensor Completion

Abstract:
The recently proposed tensor tubal rank has been witnessed to obtain extraordinary success in real-world tensor data completion. However, existing works usually fix the transform orientation along the third mode and may fail to turn multidimensional low-tubal-rank structure into account. To alleviate these bottlenecks, we introduce two unfolding induced tensor nuclear norms (TNNs) for the tensor completion (TC) problem, which naturally extends tensor tubal rank to high-order data. Specifically, we show how multidimensional low-tubal-rank structure can be captured by utilizing a novel balanced unfolding strategy, upon which two TNNs, namely, overlapped TNN (OTNN) and latent TNN (LTNN), are developed. We also show the immediate relationship between the tubal rank of unfolding tensor and the existing tensor network (TN) rank, e.g., CANDECOMP/PARAFAC (CP) rank, Tucker rank, and tensor ring (TR) rank, to demonstrate its efficiency and practicality. Two efficient TC models are then proposed with theoretical guarantees by analyzing a unified nonasymptotic upper bound. To solve optimization problems, we develop two alternating direction methods of multipliers (ADMM) based algorithms. The proposed models have been demonstrated to exhibit superior performance based on experimental findings involving synthetic and real-world tensors, including facial images, light field images, and video sequences.

PaperID: 824,

Authors: Conghui Wang, Zhiguang Cao, Yaoxin Wu, Long Teng, Guohua Wu

Affiliations: School of Traffic and Transportation Engineering, Central South University, Changsha, China; School of Computing and Information Systems, Singapore Management University, Stamford Road, Singapore; Department of Information Systems, Faculty of Industrial Engineering and Innovation Sciences, Eindhoven University of Technology, Eindhoven, The Netherlands; Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong

Title: Deep Reinforcement Learning for Solving Vehicle Routing Problems With Backhauls

Abstract:
The vehicle routing problem with backhauls (VRPBs) is a challenging problem commonly studied in computer science and operations research. Featured by linehaul (or delivery) and backhaul (or pickup) customers, the VRPB has broad applications in real-world logistics. In this article, we propose a neural heuristic based on deep reinforcement learning (DRL) to solve the traditional and improved VRPB variants, with an encoder–decoder structured policy network trained to sequentially construct the routes for vehicles. Specifically, we first describe the VRPB based on a graph and cast the solution construction as a Markov decision process (MDP). Then, to identify the relationship among the nodes (i.e., linehaul and backhaul customers, and the depot), we design a two-stage attention-based encoder, including a self-attention and a heterogeneous attention for each stage, which could yield more informative representations of the nodes so as to deliver high-quality solutions. The evaluation on the two VRPB variants reveals that, our neural heuristic performs favorably against both the conventional and neural heuristic baselines on randomly generated instances and benchmark instances. Moreover, the trained policy network exhibits a desirable capability of generalization to various problem sizes and distributions.

PaperID: 825,

Authors: Lucas B. V. de Amorim, George D. C. Cavalcanti, Rafael M. O. Cruz

Affiliations: Centro de Informática, Universidade Federal de Pernambuco, Recife, Brazil; École de Technologie Supérieure, Université du Quebéc, Montreal, QC, Canada

Title: Meta-Scaler: A Meta-Learning Framework for the Selection of Scaling Techniques

Abstract:
Dataset scaling, a.k.a. normalization, is an essential preprocessing step in a machine learning (ML) pipeline. It aims to adjust the scale of attributes in a way that they all vary within the same range. This transformation is known to improve the performance of classification models. Still, there are several scaling techniques (STs) to choose from, and no ST is guaranteed to be the best for a dataset regardless of the classifier chosen. It is thus a problem- and classifier-dependent decision. Furthermore, there can be a huge difference in performance when selecting the wrong technique; hence, it should not be neglected. That said, the trial-and-error process of finding the most suitable technique for a particular dataset can be unfeasible. As an alternative, we propose the Meta-scaler, which uses meta-learning (MtL) to build meta-models to automatically select the best ST for a given dataset and classification algorithm. The meta-models learn to represent the relationship between meta-features extracted from the datasets and the performance of specific classification algorithms on these datasets when scaled with different techniques. Our experiments using 12 base classifiers, 300 datasets, and five STs demonstrate the feasibility and effectiveness of the approach. When using the ST selected by the Meta-scaler for each dataset, 10 of 12 base models tested achieved statistically significantly better classification performance than any fixed choice of a single ST. The Meta-scaler also outperforms state-of-the-art MtL approaches for ST selection. The source code, data, and results from the experiments in this article are available at a GitHub repository (https://github.com/amorimlb/meta_scaler).

PaperID: 826,

Authors: Jian Liu, Wei Sun, Chongpei Liu, Hui Yang, Xing Zhang, Ajmal Mian

Affiliations: College of Electrical and Information Engineering, Hunan University, Changsha, China; College of Electrical and Information Engineering and the State Key Laboratory of Advanced Design and Manufacturing for Vehicle Body, Hunan University, Changsha, China; Department of Computer Science, The University of Western Australia, Perth, WA, Australia

Title: MH6D: Multi-Hypothesis Consistency Learning for Category-Level 6-D Object Pose Estimation

Abstract:
Six-degree-of-freedom (6DoF) object pose estimation is a crucial task for virtual reality and accurate robotic manipulation. Category-level 6DoF pose estimation has recently become popular as it improves generalization to a complete category of objects. However, current methods focus on data-driven differential learning, which makes them highly dependent on the quality of the real-world labeled data and limits their ability to generalize to unseen objects. To address this problem, we propose multi-hypothesis (MH) consistency learning (MH6D) for category-level 6-D object pose estimation without using real-world training data. MH6D uses a parallel consistency learning structure, alleviating the uncertainty problem of single-shot feature extraction and promoting self-adaptation of domain to reduce the synthetic-to-real domain gap. Specifically, three randomly sampled pose transformations are first performed in parallel on the input point cloud. An attention-guided category-level 6-D pose estimation network with channel attention (CA) and global feature cross-attention (GFCA) modules is then proposed to estimate the three hypothesized 6-D object poses by extracting and fusing the global and local features effectively. Finally, we propose a novel loss function that considers both the process and the final result information allowing MH6D to perform robust consistency learning. We conduct experiments under two different training data settings (i.e., only synthetic data and synthetic and real-world data) to verify the generalization ability of MH6D. Extensive experiments on benchmark datasets demonstrate that MH6D achieves state-of-the-art (SOTA) performance, outperforming most data-driven methods even without using any real-world data. The code is available at https://github.com/CNJianLiu/MH6D.

PaperID: 827,

Authors: Seema Dhull, Walid Al Misba, Arshid Nisar, Jayasimha Atulasimha, Brajesh Kumar Kaushik

Affiliations: Department of Electronics and Communication Engineering, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India; Department of Mechanical and Nuclear Engineering, Virginia Commonwealth University, Richmond, VA, USA; Department of Mechanical and Nuclear Engineering and the Department of Electrical and Computer Engineering, Virginia Commonwealth University, Richmond, VA, USA

Title: Quantized Magnetic Domain Wall Synapse for Efficient Deep Neural Networks

Abstract:
The quantization of synaptic weights using emerging nonvolatile memory (NVM) devices has emerged as a promising solution to implement computationally efficient neural networks on resource constrained hardware. However, the practical implementation of such synaptic weights is hampered by the imperfect memory characteristics, specifically the availability of limited number of quantized states and the presence of large intrinsic device variation and stochasticity involved in writing the synaptic states. This article presents ON-chip training and inference of a neural network using quantized magnetic domain wall (DW)-based synaptic array and CMOS peripheral circuits. A rigorous model of the magnetic DW device considering stochasticity and process variations has been utilized for the synapse. To achieve stable quantized weights, DW pinning has been achieved by means of physical constrictions. Finally, VGG8 architecture for CIFAR-10 image classification has been simulated by using the extracted synaptic device characteristics. The performance in terms of accuracy, energy, latency, and area consumption has been evaluated while considering the process variations and nonidealities in the DW device as well as the peripheral circuits. The proposed quantized neural network (QNN) architecture achieves efficient ON-chip learning with 92.4% and 90.4% training and inference accuracy, respectively. In comparison to pure CMOS-based design, it demonstrates an overall improvement in area, energy, and latency by 13.8× , 9.6× , and 3.5× , respectively.

PaperID: 828,

Authors: Jing Zhang, Xiaoqiang Liu, Zhe Wang

Affiliations: Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China

Title: Latent Attention Network With Position Perception for Visual Question Answering

Abstract:
For exploring the complex relative position relationships among multiobject with multiple position prepositions in the question, we propose a novel latent attention (LA) network for visual question answering (VQA), in which LA with position perception is extracted by a novel LA generation module (LAGM) and encoded along with absolute and relative position relations by our proposed position-aware module (PAM). The LAGM reconstructs original attention into LA by capturing the tendency of visual attention shifting according to the position prepositions in the question. The LA accurately captures the complex relative position features of multiple objects and helps the model locate the attention to the correct object or region. The PAM adopts latent state and relative position relations to enhance the capability of comprehending the multiobject correlations. In addition, we also propose a novel gated counting module (GCM) to strengthen the sensitivity of quantitative knowledge for effectively improving the performance of counting questions. Extensive experiments demonstrate that our proposed method achieves excellent performance on VQA and outperforms state-of-the-art methods on the widely used datasets VQA v2 and VQA v1.

PaperID: 829,

Authors: Lian Xu, Mohammed Bennamoun, Farid Boussaïd, Wanli Ouyang, Ferdous Sohel, Dan Xu

Affiliations: Department of Computer Science and Software Engineering, The University of Western Australia, Perth, WA, Australia; Department of Electrical, Electronic and Computer Engineering, University of Western Australia, Perth, WA, Australia; Shanghai AI Laboratory, Shanghai, China; School of Information Technology, Murdoch University, Perth, WA, Australia; Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, New Territories, Hong Kong

Title: Auxiliary Tasks Enhanced Dual-Affinity Learning for Weakly Supervised Semantic Segmentation

Abstract:
Most existing weakly supervised semantic segmentation (WSSS) methods rely on class activation mapping (CAM) to extract coarse class-specific localization maps using image-level labels. Prior works have commonly used an off-line heuristic thresholding process that combines the CAM maps with off-the-shelf saliency maps produced by a general pretrained saliency model to produce more accurate pseudo-segmentation labels. We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from these saliency maps and the significant intertask correlation between saliency detection and semantic segmentation. In the proposed AuxSegNet+, saliency detection and multilabel image classification are used as auxiliary tasks to improve the primary task of semantic segmentation with only image-level ground-truth labels. We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps. In particular, we propose a cross-task dual-affinity learning module to learn both pairwise and unary affinities, which are used to enhance the task-specific features and predictions by aggregating both query-dependent and query-independent global context for both saliency detection and semantic segmentation. The learned cross-task pairwise affinity can also be used to refine and propagate CAM maps to provide better pseudo labels for both tasks. Iterative improvement of segmentation performance is enabled by cross-task affinity learning and pseudo-label updating. Extensive experiments demonstrate the effectiveness of the proposed approach with new state-of-the-art WSSS results on the challenging PASCAL VOC and MS COCO benchmarks.

PaperID: 830,

Authors: Aichun Zhu, Zijie Wang, Jingyi Xue, Xili Wan, Jing Jin, Tian Wang, Hichem Snoussi

Affiliations: College of Computer and Information Engineering, Nanjing Tech University, Nanjing, China; Institute of Artificial Intelligence, SKLCCSE, Beihang University, Zhongguancun Laboratory, Beijing, China; Institute Charles Delaunay-LMS FRE CNRS , University of Technology of Troyes, Troyes, France

Title: Improving Text-Based Person Retrieval by Excavating All-Round Information Beyond Color

Abstract:
Text-based person retrieval is the process of searching a massive visual resource library for images of a particular pedestrian, based on a textual query. Existing approaches often suffer from a problem of color (CLR) over-reliance, which can result in a suboptimal person retrieval performance by distracting the model from other important visual cues such as texture and structure information. To handle this problem, we propose a novel framework to Excavate All-round Information Beyond Color for the task of text-based person retrieval, which is therefore termed EAIBC. The EAIBC architecture includes four branches, namely an RGB branch, a grayscale (GRS) branch, a high-frequency (HFQ) branch, and a CLR branch. Furthermore, we introduce a mutual learning (ML) mechanism to facilitate communication and learning among the branches, enabling them to take full advantage of all-round information in an effective and balanced manner. We evaluate the proposed method on three benchmark datasets, including CUHK-PEDES, ICFG-PEDES, and RSTPReid. The experimental results demonstrate that EAIBC significantly outperforms existing methods and achieves state-of-the-art (SOTA) performance in supervised, weakly supervised, and cross-domain settings.

PaperID: 831,

Authors: Ben Fei, Tianyue Luo, Weidong Yang, Liwen Liu, Rui Zhang, Ying He

Affiliations: School of Computer Science, Fudan University, Shanghai, China; College of Computing and Data Science, Nanyang Technological University, Nanyang, Singapore

Title: Curriculumformer: Taming Curriculum Pre-Training for Enhanced 3-D Point Cloud Understanding

Abstract:
Learning universal representations of 3-D point clouds is essential for reducing the need for manual annotation of large-scale and irregular point cloud datasets. The current modus operandi for representative learning is self-supervised learning, which has shown great potential for improving point cloud understanding. Nevertheless, it remains an open problem how to employ auto-encoding for learning universal 3-D representations of irregularly structured point clouds, as previous methods focus on either global shapes or local geometries. To this end, we present a cascaded self-supervised point cloud representation learning framework, dubbed Curriculumformer, aiming to tame curriculum pre-training for enhanced point cloud understanding. Our main idea lies in devising a progressive pre-training strategy, which trains the Transformer in an easy-to-hard manner. Specifically, we first pre-train the Transformer using an upsampling strategy, which allows it to learn global information. Then, we follow up with a completion strategy, which enables the Transformer to gain insight into local geometries. Finally, we propose a Multi-Modal Multi-Modality Contrastive Learning (M4CL) strategy to enhance the ability of representation learning by enriching the Transformer with semantic information. In this way, the pre-trained Transformer can be easily transferred to a wide range of downstream applications. We demonstrate the superior performance of Curriculumformer on various discriminant and generative tasks, outperforming state-of-the-art methods. Moreover, Curriculumformer can also be integrated into other off-the-shelf methods to promote their performance. Our code is available at https://github.com/Fayeben/Curriculumformer.

PaperID: 832,

Authors: Jiaxin Gao, Ziyu Yue, Yaohua Liu, Sihan Xie, Xin Fan, Risheng Liu

Affiliations: School of Software Technology, Dalian University of Technology, Dalian, China; School of Mathematical Sciences, Dalian University of Technology, Dalian, China

Title: A Dual-Stream-Modulated Learning Framework for Illuminating and Super-Resolving Ultra-Dark Images

Abstract:
Enhancement of image resolution for scenes captured under extremely dim conditions represents a practical yet challenging problem that has received little attention. In such low-light scenarios, the limited lighting and minimal signal clarity tend to intensify issues such as diminished detail visibility and altered color accuracy, which are often more severe during the image enhancement process than in scenarios with adequate lighting. Consequently, standard methods for enhancing low-light images or improving their resolution, whether implemented independently or through a combined approach, generally face challenges in effectively restoring luminance, preserving color integrity, and detailing intricate features. To conquer these issues, this article introduces an innovative dual-stream (DS) modulated learning framework designed to tackle the real-world coupled degradation issues in super-resolution (SR) under low-light conditions. Leveraging natural image color characteristics, we introduce a self-regularized luminance constraint to specifically target uneven illumination. We develop illumination-semantic dual modulator (ISDM), a refinement middleware embedded in the decoding stage to bridge illumination and semantic features concurrently, aimed at safeguarding the integrity of lighting and color details at the feature level. Our approach replaces simple upsampling methods with the resolution-sensitive merging upsampler (RSMU) module, which integrates diverse sampling techniques to effectively reduce artifacts and halo effects. Comprehensive experiments on three benchmarks showcase the applicability and generalizability of our approach to diverse and challenging ultra-poorly lit settings, outperforming state-of-the-art methods with a notable improvement. The code and benchmark are publicly available at https://github.com/moriyaya/UltraIS.

PaperID: 833,

Authors: Qiyu Zhong, Gengyu Lyu, Zhen Yang

Affiliations: Faculty of Information Technology, Beijing University of Technology, Beijing, China

Title: Align While Fusion: A Generalized Nonaligned Multiview Multilabel Classification Method

Abstract:
In the task of multiview multilabel (MVML) classification, each object is described by several heterogeneous view features and annotated with multiple relevant labels. Existing MVML methods usually assume that these heterogeneous features are strictly view-aligned, and they directly conduct cross-view information fusion to train a multilabel prediction model. However, in real-world scenarios, such strict view-aligned requirement can be hardly satisfied due to the recurrent spatiotemporal asynchronism when collecting MVML data, which would cause inaccurate multiview fusion results and degrade the classification performance. To address this issue, we propose a generalized nonaligned MVML (GNAM) classification method, which achieves multiview information fusion while aligning cross-view features and accordingly learns a desired multilabel classifier. Specifically, we first introduce a multiorder matching alignment strategy to achieve cross-view feature alignments, where both first-order feature correspondence and second-order structure correspondence are jointly integrated to guarantee the compactness of the view-alignment results. Afterward, a commonality- and individuality-based multiview fusion structure is formulated on the aligned-view features to excavate the consistencies and complementarities across different views, which leads all relevant multiview semantic labels, especially rare labels, to be characterized more comprehensively. Finally, we embed adaptive global label correlations to multilabel classification model to further enhance its semantic expression integrity and develop an alternative algorithm to optimize the whole model. Extensive experimental results have verified that GNAM is significantly superior to other state-of-the-art methods.

PaperID: 834,

Authors: Xuming An, Dui Wang, Li Shen, Yong Luo, Han Hu, Bo Du, Yonggang Wen, Dacheng Tao

Affiliations: School of Information and Electronics, Beijing Institute of Technology, Beijing, China; National Engineering Research Center for Multimedia Software, School of Computer Science, Institute of Artificial Intelligence, Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, China; School of Cyber Science and Technology, Sun Yat-sen University, Shenzhen Campus, Shenzhen, China; College of Computing and Data Science, Nanyang Technological University, Nanyang, Singapore

Title: Federated Learning With Only Positive Labels by Exploring Label Correlations

Abstract:
Federated learning (FL) aims to collaboratively learn a model by using the data from multiple users under privacy constraints. In this article, we study the multilabel classification (MLC) problem under the FL setting, where trivial solution and extremely poor performance may be obtained, especially when only positive data with respect to a single class label is provided for each client. This issue can be addressed by adding a specially designed regularizer on the server side. Although effective sometimes, the label correlations are simply ignored and thus suboptimal performance may be obtained. Besides, it is expensive and unsafe to exchange user’s private embeddings between server and clients frequently, especially when training model in the contrastive way. To remedy these drawbacks, we propose a novel and generic method termed federated averaging (FedAvg) by exploring label correlations (FedALCs). Specifically, FedALC estimates the label correlations in the class embedding learning for different label pairs and utilizes it to improve the model training. To further improve the safety and also reduce the communication overhead, we propose a variant to learn fixed class embedding for each client, so that the server and clients only need to exchange class embeddings once. Extensive experiments on multiple popular datasets demonstrate that our FedALC can significantly outperform the existing counterparts.

PaperID: 835,

Authors: Ying Liufu, Long Jin, Shuai Li

Affiliations: School of Automation, Shandong Key Laboratory of Industrial Control Technology, Qingdao University, Qingdao, China; School of Electrical Engineering, Qingdao University, Qingdao, China

Title: Adaptive Individual Q-Learning-A Multiagent Reinforcement Learning Method for Coordination Optimization

Abstract:
Multiagent reinforcement learning (MARL) has been extensively applied to coordination optimization for its task distribution and scalability. The goal of the MARL algorithms for coordination optimization is to learn the optimal joint strategy that maximizes the expected cumulative reward of all agents. Some cooperative MARL algorithms exhibit exciting characteristics in empirical studies. However, the majority of the convergence results are confined to repeated games. Moreover, few MARL algorithms consider adaptation to the switched environments such as the alternation between peak hours and off-peak hours of urban traffic flow or an obstacle suddenly appearing on the planned route for the automated guided vehicle. To this end, we propose a cooperative MARL algorithm known as adaptive individual Q-learning (A-IQL). Each agent updates the Q-function of its own action with period T to adapt to the switched environments. Convergence analysis shows that the optimal joint strategy can be obtained in stochastic games with deterministic state transitions occurring in chronological order. The influence of period T on convergence is studied through a fictitious stochastic game. The efficacy of the A-IQL algorithm is validated through two switched environments—the distributed sensor network (DSN) task and the target transportation task.

PaperID: 836,

Authors: Zhanchao Huang, Wei Li, Xiang-Gen Xia, Hao Wang, Ran Tao

Affiliations: Key Laboratory of Spatial Data Mining and Information Sharing of Ministry of Education, The Academy of Digital China, and the National and Local Joint Engineering Research Center of Satellite Geospatial Information Technology, Fuzhou University, Fuzhou, China; School of Information and Electronics and Beijing Key Laboratory of Fractional Signals and Systems, Beijing Institute of Technology, Beijing, China; Department of Electrical and Computer Engineering, University of Delaware, Newark, DE, USA

Title: Task-Wise Sampling Convolutions for Arbitrary-Oriented Object Detection in Aerial Images

Abstract:
Arbitrary-oriented object detection (AOOD) has been widely applied to locate and classify objects with diverse orientations in remote sensing images. However, the inconsistent features for the localization and classification tasks in AOOD models may lead to ambiguity and low-quality object predictions, which constrains the detection performance. In this article, an AOOD method called task-wise sampling convolutions (TS-Conv) is proposed. TS-Conv adaptively samples task-wise features from respective sensitive regions and maps these features together in alignment to guide a dynamic label assignment for better predictions. Specifically, sampling positions of the localization convolution in TS-Conv are supervised by the oriented bounding box (OBB) prediction associated with spatial coordinates, while sampling positions and convolutional kernel of the classification convolution are designed to be adaptively adjusted according to different orientations for improving the orientation robustness of features. Furthermore, a dynamic task-consistent-aware label assignment (DTLA) strategy is developed to select optimal candidate positions and assign labels dynamically according to ranked task-aware scores obtained from TS-Conv. Extensive experiments on several public datasets covering multiple scenes, multimodal images, and multiple categories of objects demonstrate the effectiveness, scalability, and superior performance of the proposed TS-Conv.

PaperID: 837,

Authors: Zhihan Ning, Zhixing Jiang, David Zhang

Affiliations: School of Science and Engineering, The Chinese University of Hong Kong (CUHK-Shenzhen), Shenzhen, China; School of Data Science, The Chinese University of Hong Kong (CUHK-Shenzhen), Shenzhen, China

Title: To Combat Multiclass Imbalanced Problems by Aggregating Evolutionary Hierarchical Classifiers

Abstract:
Real-world datasets are often imbalanced, posing frequent challenges to canonical machine learning algorithms that assume a balanced class distribution. Moreover, the imbalance problem becomes more complicated when the dataset is multiclass. Although many approaches have been presented for imbalanced learning (IL), research on the multiclass imbalanced problem is relatively limited and deficient. To alleviate these issues, we propose a forest of evolutionary hierarchical classifiers (FEHC) method for multiclass IL (MCIL). FEHC can be seen as a classifier fusion framework with a forest structure, and it aggregates several evolutionary hierarchical multiclassifiers (EHMCs) to reduce generalization error. Specifically, a multichromosome genetic algorithm (MCGA) is designed to simultaneously select (sub)optimal features, classifiers, and hierarchical structures when generating these EHMCs. The MCGA adopts a dynamic weighting module to learn difficult classes and promote the diversity of FEHC. We also present the “stratified underbagging” (SUB) strategy to address class imbalance and the “soft tree traversal” (STT) strategy to make FEHC converge faster and better. We thoroughly evaluate the proposed algorithm using 14 multiclass imbalanced datasets with various properties. Compared with popular and state-of-the-art approaches, FEHC obtains better performance under different evaluation metrics. Codes have been made publicly available on GitHub.1

PaperID: 838,

Authors: Zhuo Li, Jian Sun, Antonio G. Marques, Gang Wang, Keyou You

Affiliations: National Key Laboratory of Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing, China; Department of Signal Theory and Communications, King Juan Carlos University, Madrid, Spain; Department of Automation and BNRist, Tsinghua University, Beijing, China

Title: Pontryagin's Minimum Principle-Guided RL for Minimum-Time Exploration of Spatiotemporal Fields

Abstract:
This article studies the trajectory planning problem of an autonomous vehicle for exploring a spatiotemporal field subject to a constraint on cumulative information. Since the resulting problem depends on the signal strength distribution of the field, which is unknown in practice, we advocate the use of a model-free reinforcement learning (RL) method to find the solution. Given the vehicle’s dynamical model, a critical (and open) question is how to judiciously merge the model-based optimality conditions into the model-free RL framework for improved efficiency and generalization, for which this work provides some positive results. Specifically, we discretize the continuous action space by leveraging analytic optimality conditions for the minimum-time optimization problem via Pontryagin’s minimum principle (PMP). This allows us to develop a novel discrete PMP-based RL trajectory planning algorithm, which learns a planning policy faster than those based on a continuous action space. Simulation results: 1) validate the effectiveness of the PMP-based RL algorithm and 2) demonstrate its advantages, in terms of both learning efficiency and the vehicle’s exploration time, over two baseline methods for continuous control inputs.

PaperID: 839,

Authors: Xinlei Xu, Zhe Wang, Shuangyan Ren, Saisai Niu, Dongdong Li

Affiliations: Ministry of Education and the Department of Computer Science and Engineering, Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, China; Shanghai Aerospace Control Technology Institute, Shanghai, China; Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China

Title: Local-Global Geometric Information and View Complementarity Introduced Multiview Metric Learning

Abstract:
Geometry studies the spatial structure and location information of objects, providing a priori knowledge and intuitive explanation for classification methods. Considering samples from a geometric perspective offers a novel approach to understanding their information. In this article, we propose a method called local–global geometric information and view complementarity introduced multiview metric learning (GIVCMML). Our method effectively exploits the geometric information of multiview samples. The learned metric space retains the geometric relations of samples and makes them more separable. First, we propose the global geometrical constraint in the maximum margin criterion framework. By maximizing the distance between class centers in the metric space, we ensure that samples from different classes are well separated. Second, to maintain the manifold structure of the original space, we build an adjacency matrix that contains the sample label information. This helps explore the local geometric information of sample pairs. Finally, to better mine the complementary information of multiview samples, GIVCMML maximizes the correlation between each view in the metric space. This enables each view to adaptively learn from the others and explore the complementary information between views. We extensively evaluate the effectiveness of our method on real-world datasets. The experimental results demonstrate that GIVCMML achieves competitive performance compared with multiview metric learning (MvML) methods.

PaperID: 840,

Authors: Chuan Tang, Minhui Wang, Kun Sun

Affiliations: School of Computer Science, China University of Geosciences, Wuhan, China; Department of Pharmacy, Lianshui People’s Hospital Affiliated to Kangda College, Nanjing Medical University, Huai’an, China

Title: One-Step Multiview Clustering via Adaptive Graph Learning and Spectral Rotation

Abstract:
In graph based multiview clustering methods, the ultimate partition result is usually achieved by spectral embedding of the consistent graph using some traditional clustering methods, such as k -means. However, optimal performance will be reduced by this multistep procedure since it cannot unify graph learning with partition generation closely. In this article, we propose a one-step multiview clustering method through adaptive graph learning and spectral rotation (AGLSR). For every view, AGLSR adaptively learns affinity graphs to capture similar relationships of samples. Then, a spectral embedding is designed to take advantage of the potential feature space shared by different views. In addition, AGLSR utilizes a spectral rotation strategy to obtain the discrete clustering labels from the learned spectral embeddings directly. An effective updating algorithm with proven convergence is derived to optimize the optimization problem. Sufficient experiments on benchmark datasets have clearly demonstrated the effectiveness of the proposed method in six metrics. The code of AGLSR is uploaded at https://github.com/tangchuan2000/AGLSR.

PaperID: 841,

Authors: Lvlong Lai, Jian Chen, Zehong Zhang, Guosheng Lin, Qingyao Wu

Affiliations: School of Software Engineering, South China University of Technology, Guangzhou, China; School of Computer Science and Engineering, Nanyang Technological University, Jurong West, Singapore

Title: CMFAN: Cross-Modal Feature Alignment Network for Few-Shot Single-View 3D Reconstruction

Abstract:
Few-shot single-view 3D reconstruction learns to reconstruct the novel category objects based on a query image and a few support shapes. However, since the query image and the support shapes are of different modalities, there is an inherent feature misalignment problem damaging the reconstruction. Previous works in the literature do not consider this problem. To this end, we propose the cross-modal feature alignment network (CMFAN) with two novel techniques. One is a strategy for model pretraining, namely, cross-modal contrastive learning (CMCL), here the 2D images and 3D shapes of the same objects compose the positives, and those from different objects form the negatives. With CMCL, the model learns to embed the 2D and 3D modalities of the same object into a tight area in the feature space and push away those from different objects, thus effectively aligning the global cross-modal features. The other is cross-modal feature fusion (CMFF), which further aligns and fuses the local features. Specifically, it first re-represents the local features with the cross-attention operation, making the local features share more information. Then, CMFF generates a descriptor for the support features and attaches it to each local feature vector of the query image with dense concatenation. Moreover, CMFF can be applied to multilevel local features and brings further advantages. We conduct extensive experiments to evaluate the effectiveness of our designs, and CMFAN sets new state-of-the-art performance in all of the 1-/10-/25-shot tasks of ShapeNet and ModelNet datasets.

PaperID: 842,

Authors: Tianwei Yan, Shan Zhao, Minghao Hu, Mengzhu Wang, Xiang Zhang, Zhigang Luo, Meng Wang

Affiliations: College of Computer, National University of Defense Technology, Changsha, China; College of Computer, Hefei University of Technology, Heifei, China; College of Computer Science, Academy of Military Sciences, Beijing, China

Title: HCL: A Hierarchical Contrastive Learning Framework for Zero-Shot Relation Extraction

Abstract:
Zero-shot relation extraction (ZSRE) is shown to become more significant in the current information extraction system, which aims at predicting relation classes that lack annotations or have just never appeared during training. Previous works focus on projecting sentences with their corresponding relation descriptions to an intermediate semantic space and searching the nearest semantic for predicting unseen classes. Though these methods can achieve sound performance, they only obtain inferior semantic information via a trivial distance metric and neglect the interaction in the instance representations. We are thus motivated to tackle these issues and propose a hierarchical contrastive learning (HCL) framework for ZSRE including projection-level and instance-level modules. Specifically, the projection-level component replaces the distance score function by contrastive loss to connect the input sentence with the relation semantic space. And the instance-level component integrates the external knowledge from sentence entities to establish new contrastive pairs for efficiently learning representations from mutual information. The experimental results on three well-known datasets demonstrate that our model surpasses the existing SOTA by at most 18.97% improvement on the F1 score when unseen classes are 15. Moreover, our model can achieve more competitive performance alone with the increasing number of unseen classes.

PaperID: 843,

Authors: Hong Zhao, Yuling Su, Zhiping Wu, Weiping Ding

Affiliations: School of Computer Science, Minnan Normal University, Zhangzhou, Fujian, China; School of Information Science and Technology, Nantong University, Nantong, China

Title: CSTS: Exploring Class-Specific and Task-Shared Embedding Representation for Few-Shot Learning

Abstract:
Few-shot learning (FSL) is a challenging yet promising technique that aims to discriminate objects based on a few labeled examples. Learning a high-quality feature representation is key with few-shot data, and many existing models attempt to extract general information from the sample or task levels. However, the common sample-level means of feature representation limits the models generalizability to different tasks, while task-level representation may lose class characteristics due to excessive information aggregation. In this article, we synchronize the class-specific and task-shared information from the class and task levels to obtain a better representation. Structure-based contrastive learning is introduced to obtain class-specific representations by increasing the interclass distance. A hierarchical class structure is constructed by clustering semantically similar classes using the idea of granular computing. When guided by a class structure, it is more difficult to distinguish samples in different classes that have similar characteristics than those with large interclass differences. To this end, structure-guided contrastive learning is introduced to study class-specific information. A hierarchical graph neural network is established to transfer task-shared information from coarse to fine. It hierarchically infers the target sample based on all samples in the task and yields a more general representation for FSL classification. Experiments on four benchmark datasets demonstrate the advantages of our model over several state-of-the-art models.

PaperID: 844,

Authors: Zhizheng Wang, Yuanyuan Sun, Zhihao Yang, Liang Yang, Hongfei Lin

Affiliations: School of Computer Science and Technology, Dalian University of Technology, Dalian, China; National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA

Title: Temporal Network Embedding Enhanced With Long-Range Dynamics and Self-Supervised Learning

Abstract:
Temporal network embedding (TNE) has promoted the research of knowledge discovery and reasoning on networks. It aims to embed vertices of temporal networks into a low-dimensional vector space while preserving network structures and temporal properties. However, most existing methods have limitations in capturing dynamics over long distances, which makes it difficult to explore multihop topological associations among vertices. To tackle this challenge, we propose LongTNE, which learns the long-range dynamics of vertices to endow TNE with the ability to capture high-order proximity (HP) of networks. In LongTNE, we employ graph self-supervised learning (Graph SSL) to optimize the establishment probability of deep links in each network snapshot. We also present an accumulated forward update (AFU) module to fathom global temporal evolution among multiple network snapshots. The empirical results on six temporal networks demonstrate that, in addition to achieving state-of-the-art performance on network mining tasks, LongTNE can be handily extended to existing TNE methods.

PaperID: 845,

Authors: Seung Park, Yong-Goo Shin

Affiliations: Department of Biomedical Engineering, Chungbuk National University Hospital, Chungbuk National University College of Medicine, Cheongju-si, Chungcheongbuk-do, Republic of Korea; Department of Electronics and Information Engineering, Korea University, Sejong-si, Republic of Korea

Title: A Novel Generator With Auxiliary Branch for Improving GAN Performance

Abstract:
The generator in the generative adversarial network (GAN) learns image generation in a coarse-to-fine manner in which earlier layers learn the overall structure of the image and the latter ones refine the details. To propagate the coarse information well, recent works usually build their generators by stacking up multiple residual blocks. Although the residual block can produce a high-quality image as well as be trained stably, it often impedes the information flow in the network. To alleviate this problem, this brief introduces a novel generator architecture that produces the image by combining features obtained through two different branches: the main and auxiliary branches. The goal of the main branch is to produce the image by passing through the multiple residual blocks, whereas the auxiliary branch is to convey the coarse information in the earlier layer to the later one. To combine the features in the main and auxiliary branches successfully, we also propose a gated feature fusion module (GFFM) that controls the information flow in those branches. To prove the superiority of the proposed method, this brief provides extensive experiments using various standard datasets including CIFAR-10, CIFAR-100, LSUN, CelebA-HQ, AFHQ, and tiny-ImageNet. Furthermore, we conducted various ablation studies to demonstrate the generalization ability of the proposed method. Quantitative evaluations prove that the proposed method exhibits impressive GAN performance in terms of Inception score (IS) and Frechet inception distance (FID). For instance, the proposed method boosts the FID and IS scores on the tiny-ImageNet dataset from 35.13 to 25.00 and 20.23 to 25.57, respectively.

PaperID: 846,

Authors: Yueyang Men, Liang Li, Ziqing Hu, Yongli Xu

Affiliations: College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing, China

Title: Learning Rates of Deep Nets for Geometrically Strongly Mixing Sequence

Abstract:
The great success of deep learning poses an urgent challenge to establish the theoretical basis for its working mechanism. Recently, research on the convergence of deep neural networks (DNNs) has made great progress. However, the existing studies are based on the assumption that the samples are independent, which is too strong to be applied to many real-world scenarios. In this brief, we establish a fast learning rate for the empirical risk minimization (ERM) on DNN regression with dependent samples, and the dependence is expressed in terms of geometrically strongly mixing sequence. To the best of our knowledge, this is the first convergence result of DNN methods based on mixing sequences. This result is a natural generalization of the independent sample case.

PaperID: 847,

Authors: Jieting Wang, Feijiang Li, Jue Li, Chenping Hou, Yuhua Qian, Jiye Liang

Affiliations: Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China; College of Science, National University of Defense Technology, Changsha, China; Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan, China

Title: RSS-Bagging: Improving Generalization Through the Fisher Information of Training Data

Abstract:
The bagging method has received much application and attention in recent years due to its good performance and simple framework. It has facilitated the advanced random forest method and accuracy-diversity ensemble theory. Bagging is an ensemble method based on simple random sampling (SRS) method with replacement. However, SRS is the most foundation sampling method in the field of statistics, where exists some other advanced sampling methods for probability density estimation. In imbalanced ensemble learning, down-sampling, over-sampling, and SMOTE methods have been proposed for generating base training set. However, these methods aim at changing the underlying distribution of data rather than simulating it better. The ranked set sampling (RSS) method uses auxiliary information to get more effective samples. The purpose of this article is to propose a bagging ensemble method based on RSS, which uses the ordering of objects related to the class to obtain more effective training sets. To explain its performance, we give a generalization bound of ensemble from the perspective of posterior probability estimation and Fisher information. On the basis of RSS sample having a higher Fisher information than SRS sample, the presented bound theoretically explains the better performance of RSS-Bagging. The experiments on 12 benchmark datasets demonstrate that RSS-Bagging statistically performs better than SRS-Bagging when the base classifiers are multinomial logistic regression (MLR) and support vector machine (SVM).

PaperID: 848,

Authors: Chenghao Li, Tonghan Wang, Chengjie Wu, Qianchuan Zhao, Jun Yang, Chongjie Zhang

Affiliations: Tsinghua University, Beijing, China; Harvard University, Boston, MA, USA; Washington University in St. Louis, St. Louis, MO, USA

Title: Celebrating Diversity With Subtask Specialization in Shared Multiagent Reinforcement Learning

Abstract:
Subtask decomposition offers a promising approach for achieving and comprehending complex cooperative behaviors in multiagent systems. Nonetheless, existing methods often depend on intricate high-level strategies, which can hinder interpretability and learning efficiency. To tackle these challenges, we propose a novel approach that specializes subtasks for subgroups by employing diverse observation representation encoders within information bottlenecks. Moreover, to enhance the efficiency of subtask specialization while promoting sophisticated cooperation, we introduce diversity in both optimization and neural network architectures. These advancements enable our method to achieve state-of-the-art performance and offer interpretable subtask factorization across various scenarios in Google Research Football (GRF).

PaperID: 849,

Authors: Keisuke Fujii, Koh Takeuchi, Atsushi Kuribayashi, Naoya Takeishi, Yoshinobu Kawahara, Kazuya Takeda

Affiliations: Graduate School of Informatics, Nagoya University, Nagoya, Japan; Graduate School of Informatics, Kyoto University, Kyoto, Japan; Graduate School of Engineering, University of Tokyo, Sierre, Switzerland; Graduate School of Information Science and Technology, Osaka University, Osaka, Japan

Title: Estimating Counterfactual Treatment Outcomes Over Time in Complex Multiagent Scenarios

Abstract:
Evaluation of intervention in a multiagent system, for example, when humans should intervene in autonomous driving systems and when a player should pass to teammates for a good shot, is challenging in various engineering and scientific fields. Estimating the individual treatment effect (ITE) using counterfactual long-term prediction is practical to evaluate such interventions. However, most of the conventional frameworks did not consider the time-varying complex structure of multiagent relationships and covariate counterfactual prediction. This may lead to erroneous assessments of ITE and difficulty in interpretation. Here, we propose an interpretable, counterfactual recurrent network in multiagent systems to estimate the effect of the intervention. Our model leverages graph variational recurrent neural networks (GVRNNs) and theory-based computation with domain knowledge for the ITE estimation framework based on long-term prediction of multiagent covariates and outcomes, which can confirm the circumstances under which the intervention is effective. On simulated models of an automated vehicle and biological agents with time-varying confounders, we show that our methods achieved lower estimation errors in counterfactual covariates and the most effective treatment timing than the baselines. Furthermore, using real basketball data, our methods performed realistic counterfactual predictions and evaluated the counterfactual passes in shot scenarios.

PaperID: 850,

Authors: Yingkui Zhang, Mingqiang Wei, Lei Zhu, Guibao Shen, Fu Lee Wang, Jing Qin, Qiong Wang

Affiliations: Guangdong Provincial Key Laboratory of Computer Vision and Virtual Reality Technology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; Shenzhen Institute of Research, Nanjing University of Aeronautics and Astronautics, Shenzhen, China; ROAS Thrust, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China; AI Thrust, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China; School of Science and Technology, Hong Kong Metropolitan University, Hong Kong, SAR, China; Centre for Smart Health, The Hong Kong Polytechnic University, Hong Kong, SAR, China

Title: Norest-Net: Normal Estimation Neural Network for 3-D Noisy Point Clouds

Abstract:
The widely deployed ways to capture a set of unorganized points, e.g., merged laser scans, fusion of depth images, and structure-from- x , usually yield a 3-D noisy point cloud. Accurate normal estimation for the noisy point cloud makes a crucial contribution to the success of various applications. However, the existing normal estimation wisdoms strive to meet a conflicting goal of simultaneously performing normal filtering and preserving surface features, which inevitably leads to inaccurate estimation results. We propose a normal estimation neural network (Norest-Net), which regards normal filtering and feature preservation as two separate tasks, so that each one is specialized rather than traded off. For full noise removal, we present a normal filtering network (NF-Net) branch by learning from the noisy height map descriptor (HMD) of each point to the ground-truth (GT) point normal; for surface feature recovery, we construct a normal refinement network (NR-Net) branch by learning from the bilaterally defiltered point normal descriptor (B-DPND) to the GT point normal. Moreover, NR-Net is detachable to be incorporated into the existing normal estimation methods to boost their performances. Norest-Net shows clear improvements over the state of the arts in both feature preservation and noise robustness on synthetic and real-world captured point clouds.

PaperID: 851,

Authors: Lican Kang, Yuhui Liu, Yuan Luo, Jerry Zhijian Yang, Han Yuan, Chang Zhu

Affiliations: Cardiovascular and Metabolic Disorders Program, Duke-NUS Medical School, College Road, Singapore; School of Mathematics and Statistics, Wuhan University, Wuhan, China; School of Mathematics and Statistics and the Hubei Key Laboratory of Computational Science, Wuhan University, Wuhan, China; Centre for Quantitative Medicine, Duke-NUS Medical School, College Road, Singapore; Department of Anesthesiology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China

Title: Approximate Policy Iteration With Deep Minimax Average Bellman Error Minimization

Abstract:
In this work, we investigate the utilization of deep approximate policy iteration (DAPI) in estimating the optimal action-value function Q^\ast within the context of reinforcement learning, employing rectified linear unit (ReLU) ResNet as the underlying framework. The iterative process of DAPI incorporates the minimax average Bellman error minimization principle. It employs ReLU ResNet to estimate the fixed point of the Bellman equation, which is aligned with the estimated greedy policy. Through error propagation, we derive nonasymptotic error bounds between Q^\ast and the estimated Q function induced by the output greedy policy in DAPI. To effectively control the Bellman residual error, we address both the statistical and approximation errors associated with the \alpha -mixing dependent data derived from Markov decision processes, using the techniques of empirical process and deep approximation theory, respectively. Furthermore, we present a novel generalization bound for ReLU ResNet in the presence of dependent data, as well as an approximation bound for ReLU ResNet within the Hölder class. Notably, this approximation bound contributes to a significant improvement in the dependence on the ambient dimension, transitioning from an exponential relationship to a polynomial one. The derived nonasymptotic error bounds explicitly depend on factors such as the sample size, the ambient dimension (in polynomial terms), and the width and depth of the neural networks. Consequently, these bounds serve as valuable theoretical guidelines for appropriately setting the hyperparameters, thereby enabling the achievement of the desired convergence rate during the training process of DAPI.

PaperID: 852,

Authors: Hui Zhang, Guiyang Luo, Xiao Wang, Yidong Li, Weiping Ding, Fei-Yue Wang

Affiliations: School of Computer and Information Technology and the Key Laboratory of Big Data and Artificial Intelligence in Transportation, Ministry of Education, Beijing Jiaotong University, Beijing, China; State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China; Engineering Research Center of Autonomous Unmanned System Technology, Ministry of Education, Anhui University, Hefei, China; School of Information Science and Technology, Nantong University, Nantong, China; State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China

Title: SASAN: Shape-Adaptive Set Abstraction Network for Point-Voxel 3D Object Detection

Abstract:
Point-voxel 3D object detectors have achieved impressive performance in complex traffic scenes. However, they utilize the 3D sparse convolution (spconv) layers with fixed receptive fields, such as voxel-based detectors, and inherit the fixed sphere radius from point-based methods for generating the features of keypoints, which make them weak in adaptively modeling various geometrical deformations and sizes of real objects. To tackle this issue, we propose a shape-adaptive set abstraction network (SASAN) for point-voxel 3D object detection. First, the proposal and offset generation module is adopted to learn the coordinates and confidences of 3D proposals and shape-adaptive offsets of the certain number of offset points for each voxel. Meanwhile, an extra offset supervision task is employed to guide the learning of shifting values of offset points, aiming at motivating the predicted offsets to preferably adapt to the various shapes of objects. Then, the shape-adaptive set abstraction module is proposed to extract multiscale keypoints features by grouping the neighboring offset points’ features, as well as features learned from adjacent raw points and the 2-D bird-view map. Finally, the region of interest (RoI)-grid proposal refinement module is used to aggregate the keypoints features for further proposal refinement and confidence prediction. Extensive experiments on the competitive KITTI 3D detection benchmark demonstrate that the proposed SASAN gains superior performance as compared with state-of-the-art methods.

PaperID: 853,

Authors: Yuqing Zhao, Divya Saxena, Jiannong Cao

Affiliations: Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China

Title: AdaptCL: Adaptive Continual Learning for Tackling Heterogeneity in Sequential Datasets

Abstract:
Managing heterogeneous datasets that vary in complexity, size, and similarity in continual learning presents a significant challenge. Task-agnostic continual learning is necessary to address this challenge, as datasets with varying similarity pose difficulties in distinguishing task boundaries. Conventional task-agnostic continual learning practices typically rely on rehearsal or regularization techniques. However, rehearsal methods may struggle with varying dataset sizes and regulating the importance of old and new data due to rigid buffer sizes. Meanwhile, regularization methods apply generic constraints to promote generalization but can hinder performance when dealing with dissimilar datasets lacking shared features, necessitating a more adaptive approach. In this article, we propose a novel adaptive continual learning (AdaptCL) method to tackle heterogeneity in sequential datasets. AdaptCL employs fine-grained data-driven pruning to adapt to variations in data complexity and dataset size. It also utilizes task-agnostic parameter isolation to mitigate the impact of varying degrees of catastrophic forgetting caused by differences in data similarity. Through a two-pronged case study approach, we evaluate AdaptCL on both datasets of MNIST variants and DomainNet, as well as datasets from different domains. The latter include both large-scale, diverse binary-class datasets and few-shot, multiclass datasets. Across all these scenarios, AdaptCL consistently exhibits robust performance, demonstrating its flexibility and general applicability in handling heterogeneous datasets.

PaperID: 854,

Authors: Jiawei Wang, Yuquan Le, Da Cao, Shaofei Lu, Zhe Quan, Meng Wang

Affiliations: College of Computer Science and Electronic Engineering, Hunan University, Changsha, China; School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, Anhui, China

Title: Graph Reasoning With Supervised Contrastive Learning for Legal Judgment Prediction

Abstract:
Given the fact descriptions of legal cases, the legal judgment prediction (LJP) problem aims to determine three judgment tasks of law articles, charges, and the term of penalty. Most existing studies have considered task dependencies while neglecting the prior dependencies of labels among different tasks. Therefore, how to make better use of the information on the relation dependencies among tasks and labels becomes a crucial issue. To this end, we transform the text classification problem into a node classification framework based on graph reasoning and supervised contrastive learning (SCL) techniques, named GraSCL. Specifically, we first design a graph reasoning network to model the potential dependency structures and facilitate relational learning under various graph topologies. Then, we introduce the SCL method for the LJP task to further leverage the label relation on the graph. To accommodate the node classification settings, we extend the traditional SCL method to novel variants for SCL at the node level, which allows the GraSCL framework to be trained efficiently even with small batches. Furthermore, to recognize the importance of hard negative samples in contrastive learning, we introduce a simple yet effective technique called online hard negative mining (OHNM) to enhance our SCL approach. This technique complements our SCL method and enables us to control the number and complexity of negative samples, leading to further improvements in the model’s performance. Finally, extensive experiments are conducted on two well-known benchmarks, demonstrating the effectiveness and rationality of our proposed SCL approach as compared to the state-of-the-art competitors.

PaperID: 855,

Authors: Ding Chen, Peixi Peng, Tiejun Huang, Yonghong Tian

Affiliations: Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China; Network Intelligence Research, Peng Cheng Laboratory, Shenzhen, China; Department of Computer Science and Technology, Peking University, Beijing, China

Title: Fully Spiking Actor Network With Intralayer Connections for Reinforcement Learning

Abstract:
With the help of special neuromorphic hardware, spiking neural networks (SNNs) are expected to realize artificial intelligence (AI) with less energy consumption. It provides a promising energy-efficient way for realistic control tasks by combining SNNs with deep reinforcement learning (DRL). In this article, we focus on the task where the agent needs to learn multidimensional deterministic policies to control, which is very common in real scenarios. Recently, the surrogate gradient method has been utilized for training multilayer SNNs, which allows SNNs to achieve comparable performance with the corresponding deep networks in this task. Most existing spike-based reinforcement learning (RL) methods take the firing rate as the output of SNNs, and convert it to represent continuous action space (i.e., the deterministic policy) through a fully connected (FC) layer. However, the decimal characteristic of the firing rate brings the floating-point matrix operations to the FC layer, making the whole SNN unable to deploy on the neuromorphic hardware directly. To develop a fully spiking actor network (SAN) without any floating-point matrix operations, we draw inspiration from the nonspiking interneurons found in insects and employ the membrane voltage of the nonspiking neurons to represent the action. Before the nonspiking neurons, multiple population neurons are introduced to decode different dimensions of actions. Since each population is used to decode a dimension of action, we argue that the neurons in each population should be connected in time domain and space domain. Hence, the intralayer connections are used in output populations to enhance the representation capacity. This mechanism exists extensively in animals and has been demonstrated effectively. Finally, we propose a fully SAN with intralayer connections (ILC-SAN). Extensive experimental results demonstrate that the proposed method outperforms the state-of-the-art performance on continuous control tasks from OpenAI gym. Moreover, we estimate the theoretical energy consumption when deploying ILC-SAN on neuromorphic chips to illustrate its high energy efficiency.

PaperID: 856,

Authors: Weigang Cui, Yansong Xiang, Yifan Wang, Tao Yu, Xiao-Feng Liao, Bin Hu, Yang Li

Affiliations: School of Engineering Medicine, Beihang University, Beijing, China; Department of Automation Science and Electrical Engineering, Beihang University, Beijing, China; Beijing Institute of Functional Neurosurgery, Xuanwu Hospital, Capital Medical University, Beijing, China; College of Computer Science, Chongqing University, Chongqing, China; School of Medical Technology, Beijing Institute of Technology, Beijing, China; Department of Automation Science and Electrical Engineering and the State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China

Title: Deep Multiview Module Adaption Transfer Network for Subject-Specific EEG Recognition

Abstract:
Transfer learning is one of the popular methods to solve the problem of insufficient data in subject-specific electroencephalogram (EEG) recognition tasks. However, most existing approaches ignore the difference between subjects and transfer the same feature representations from source domain to different target domains, resulting in poor transfer performance. To address this issue, we propose a novel subject-specific EEG recognition method named deep multiview module adaption transfer (DMV-MAT) network. First, we design a universal deep multiview (DMV) network to generate different types of discriminative features from multiple perspectives, which improves the generalization performance by extensive feature sets. Second, module adaption transfer (MAT) is designed to evaluate each module by the feature distributions of source and target samples, which can generate an optimal weight sharing strategy for each target subject and promote the model to learn domain-invariant and domain-specific features simultaneously. We conduct extensive experiments in two EEG recognition tasks, i.e., motor imagery (MI) and seizure prediction, on four datasets. Experimental results demonstrate that the proposed method achieves promising performance compared with the state-of-the-art methods, indicating a feasible solution for subject-specific EEG recognition tasks. Implementation codes are available at https://github.com/YangLibuaa/DMV-MAT.

PaperID: 857,

Authors: Tao Xie, Kun Dai, Zhiqiang Jiang, Ruifeng Li, Shouren Mao, Ke Wang, Lijun Zhao

Affiliations: State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China

Title: ViT-MVT: A Unified Vision Transformer Network for Multiple Vision Tasks

Abstract:
In this work, we seek to learn multiple mainstream vision tasks concurrently using a unified network, which is storage-efficient as numerous networks with task-shared parameters can be implanted into a single consolidated network. Our framework, vision transformer (ViT)-MVT, built on a plain and nonhierarchical ViT, incorporates numerous visual tasks into a modest supernet and optimizes them jointly across various dataset domains. For the design of ViT-MVT, we augment the ViT with a multihead self-attention (MHSE) to offer complementary cues in the channel and spatial dimension, as well as a local perception unit (LPU) and locality feed-forward network (locality FFN) for information exchange in the local region, thus endowing ViT-MVT with the ability to effectively optimize multiple tasks. Besides, we construct a search space comprising potential architectures with a broad spectrum of model sizes to offer various optimum candidates for diverse tasks. After that, we design a layer-adaptive sharing technique that automatically determines whether each layer of the transformer block is shared or not for all tasks, enabling ViT-MVT to obtain task-shared parameters for a reduction of storage and task-specific parameters to learn task-related features such that boosting performance. Finally, we introduce a joint-task evolutionary search algorithm to discover an optimal backbone for all tasks under total model size constraint, which challenges the conventional wisdom that visual tasks are typically supplied with backbone networks developed for image classification. Extensive experiments reveal that ViT-MVT delivers exceptional performances for multiple visual tasks over state-of-the-art methods while necessitating considerably fewer total storage costs. We further demonstrate that once ViT-MVT has been trained, ViT-MVT is capable of incremental learning when generalized to new tasks while retaining identical performances for trained tasks. The code is available at https://github.com/XT-1997/vitmvt.

PaperID: 858,

Authors: Yanjiang Yu, Puyang Zhang, Kaihao Zhang, Wenhan Luo, Changsheng Li

Affiliations: School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China; College of Engineering and Computer Science, Australian National University, Canberra, ACT, Australia; School of Cyber Science and Technology, Sun Yat-sen University, Guangzhou, China

Title: Multiprior Learning Via Neural Architecture Search for Blind Face Restoration

Abstract:
Blind face restoration (BFR) aims to recover high-quality (HQ) face images from low-quality (LQ) ones and usually resorts to facial priors for improving restoration performance. However, current methods still suffer from two major difficulties: 1) how to derive a powerful network architecture without extensive hand tuning and 2) how to capture complementary information from multiple facial priors in one network to improve restoration performance. To this end, we propose a face restoration searching network (FRSNet) to adaptively search the suitable feature extraction architecture within our specified search space, which can directly contribute to the restoration quality. On the basis of FRSNet, we further design our multiple facial prior searching network (MFPSNet) with a multiprior learning scheme. MFPSNet optimally extracts information from diverse facial priors and fuses the information into image features, ensuring that both external guidance and internal features are reserved. In this way, MFPSNet takes full advantage of semantic-level (parsing maps), geometric-level (facial heat maps), reference-level (facial dictionaries), and pixel-level (degraded images) information and, thus, generates faithful and realistic images. Quantitative and qualitative experiments show that the MFPSNet performs favorably on both synthetic and real-world datasets against the state-of-the-art (SOTA) BFR methods. The codes are publicly available at: https://github.com/YYJ1anG/MFPSNet.

PaperID: 859,

Authors: Haolin Qin, Daquan Zhou, Tingfa Xu, Ziyang Bian, Jianan Li

Affiliations: Beijing Institute of Technology, Beijing, China; ByteDance, San Jose, CA, USA

Title: Factorization Vision Transformer: Modeling Long-Range Dependency With Local Window Cost

Abstract:
Transformers have astounding representational power but typically consume considerable computation which is quadratic with image resolution. The prevailing Swin transformer reduces computational costs through a local window strategy. However, this strategy inevitably causes two drawbacks: 1) the local window-based self-attention (WSA) hinders global dependency modeling capability and 2) recent studies point out that local windows impair robustness. To overcome these challenges, we pursue a preferable trade-off between computational cost and performance. Accordingly, we propose a novel factorization self-attention (FaSA) mechanism that enjoys both the advantages of local window cost and long-range dependency modeling capability. By factorizing the conventional attention matrix into sparse subattention matrices, FaSA captures long-range dependencies, while aggregating mixed-grained information at a computational cost equivalent to the local WSA. Leveraging FaSA, we present the factorization vision transformer (FaViT) with a hierarchical structure. FaViT achieves high performance and robustness, with linear computational complexity concerning input image spatial resolution. Extensive experiments have shown FaViT’s advanced performance in classification and downstream tasks. Furthermore, it also exhibits strong model robustness to corrupted and biased data and hence demonstrates benefits in favor of practical applications. In comparison to the baseline model Swin-T, our FaViT-B2 significantly improves classification accuracy by 1% and robustness by 7%, while reducing model parameters by 14%. Our code will soon be publicly available: at https://github.com/q2479036243/FaViT.

PaperID: 860,

Authors: Ziming Wang, Yuhao Zhang, Shuang Lian, Xiaoxin Cui, Rui Yan, Huajin Tang

Affiliations: College of Computer Science and Technology, Zhejiang University, Hangzhou, China; Research Center for Intelligent Computing Hardware, Zhejiang Lab, Hangzhou, China; School of Integrated Circuits, Peking University, Beijing, China; College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China; College of Computer Science and Technology and the State Key Laboratory of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China

Title: Toward High-Accuracy and Low-Latency Spiking Neural Networks With Two-Stage Optimization

Abstract:
Spiking neural networks (SNNs) operating with asynchronous discrete events show higher energy efficiency with sparse computation. A popular approach for implementing deep SNNs is artificial neural network (ANN)–SNN conversion combining both efficient training of ANNs and efficient inference of SNNs. However, the accuracy loss is usually nonnegligible, especially under few time steps, which restricts the applications of SNN on latency-sensitive edge devices greatly. In this article, we first identify that such performance degradation stems from the misrepresentation of the negative or overflow residual membrane potential in SNNs. Inspired by this, we decompose the conversion error into three parts: quantization error, clipping error, and residual membrane potential representation error. With such insights, we propose a two-stage conversion algorithm to minimize those errors, respectively. In addition, we show that each stage achieves significant performance gains in a complementary manner. By evaluating on challenging datasets including CIFAR- 10, CIFAR- 100, and ImageNet, the proposed method demonstrates the state-of-the-art performance in terms of accuracy, latency, and energy preservation. Furthermore, our method is evaluated using a more challenging object detection task, revealing notable gains in regression performance under ultralow latency, when compared with existing spike-based detection algorithms. Codes will be available at: https://github.com/Windere/snn-cvt-dual-phase.

PaperID: 861,

Authors: Zhibin Dong, Jiaqi Jin, Yuyang Xiao, Bin Xiao, Siwei Wang, Xinwang Liu, En Zhu

Affiliations: School of Computer, National University of Defense Technology, Changsha, China; School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, China; Department of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China; Intelligent Game and Decision Laboratory, Beijing, China

Title: Subgraph Propagation and Contrastive Calibration for Incomplete Multiview Data Clustering

Abstract:
The success of multiview raw data mining relies on the integrity of attributes. However, each view faces various noises and collection failures, which leads to a condition that attributes are only partially available. To make matters worse, the attributes in multiview raw data are composed of multiple forms, which makes it more difficult to explore the structure of the data especially in multiview clustering task. Due to the missing data in some views, the clustering task on incomplete multiview data confronts the following challenges, namely: 1) mining the topology of missing data in multiview is an urgent problem to be solved; 2) most approaches do not calibrate the complemented representations with common information of multiple views; and 3) we discover that the cluster distributions obtained from incomplete views have a cluster distribution unaligned problem (CDUP) in the latent space. To solve the above issues, we propose a deep clustering framework based on subgraph propagation and contrastive calibration (SPCC) for incomplete multiview raw data. First, the global structural graph is reconstructed by propagating the subgraphs generated by the complete data of each view. Then, the missing views are completed and calibrated under the guidance of the global structural graph and contrast learning between views. In the latent space, we assume that different views have a common cluster representation in the same dimension. However, in the unsupervised condition, the fact that the cluster distributions of different views do not correspond affects the information completion process to use information from other views. Finally, the complemented cluster distributions for different views are aligned by contrastive learning (CL), thus solving the CDUP in the latent space. Our method achieves advanced performance on six benchmarks, which validates the effectiveness and superiority of our SPCC.

PaperID: 862,

Authors: Kai Li, Xin Yuan, Jingjing Zheng, Wei Ni, Falko Dressler, Abbas Jamalipour

Affiliations: Department of Engineering, University of Cambridge, Cambridge, U.K.; Digital Productivity and Services Flagship, Commonwealth Scientific and Industrial Research Organization (CSIRO), Marsfield, NSW, Australia; Real-Time and Embedded Computing Systems Research Center (CISTER), Porto, Portugal; School of Electrical Engineering and Computer Science, TU Berlin, Berlin, Germany; School of Electrical and Information Engineering, The University of Sydney, Sydney, NSW, Australia

Title: Leverage Variational Graph Representation for Model Poisoning on Federated Learning

Abstract:
This article puts forth a new training data-untethered model poisoning (MP) attack on federated learning (FL). The new MP attack extends an adversarial variational graph autoencoder (VGAE) to create malicious local models based solely on the benign local models overheard without any access to the training data of FL. Such an advancement leads to the VGAE-MP attack that is not only efficacious but also remains elusive to detection. VGAE-MP attack extracts graph structural correlations among the benign local models and the training data features, adversarially regenerates the graph structure, and generates malicious local models using the adversarial graph structure and benign models’ features. Moreover, a new attacking algorithm is presented to train the malicious local models using VGAE and sub-gradient descent, while enabling an optimal selection of the benign local models for training the VGAE. Experiments demonstrate a gradual drop in FL accuracy under the proposed VGAE-MP attack and the ineffectiveness of existing defense mechanisms in detecting the attack, posing a severe threat to FL.

PaperID: 863,

Authors: Xue Wang, Zheng Guan, Wenhua Qian, Jinde Cao, Chengchao Wang, Runzhuo Ma

Affiliations: School of Information Science and Engineering, Yunnan University, Kunming, China; School of Mathematics, Southeast University, Nanjing, China; Department of Electrical Engineering, Faculty of Engineering, Hong Kong Polytechnic University, Hong Kong, China

Title: STFuse: Infrared and Visible Image Fusion via Semisupervised Transfer Learning

Abstract:
Infrared and visible image fusion (IVIF) aims to obtain an image that contains complementary information about the source images. However, it is challenging to define complementary information between source images in the lack of ground truth and without borrowing prior knowledge. Therefore, we propose a semisupervised transfer learning-based method for IVIF, termed STFuse, which aims to transfer knowledge from an informative source domain to a target domain, thus breaking the above limitations. The critical aspect of our method is to borrow supervised knowledge from the multifocus image fusion (MFIF) task and to filter out task-specific attribute knowledge by using a guidance loss L_g , which motivates its cross-task use in IVIF tasks. Using this cross-task knowledge effectively alleviates the limitation of the lack of ground truth on fusion performance, and the complementary expression ability under the constraint of supervised knowledge is more instructive than prior knowledge. Moreover, we designed a cross-feature enhancement module (CEM) that utilizes self-attention and mutual-attention features to guide each branch to refine features and then facilitate the integration of cross-modal complementary features. Extensive experiments demonstrate that our method has good advantages in terms of visual quality and statistical metrics, as well as the docking of high-level vision tasks, compared with other state-of-the-art methods.

PaperID: 864,

Authors: Jiao Shi, Tiancheng Wu, A. K. Qin, Tao Shao, Yu Lei, Gwanggil Jeon

Affiliations: School of Electronics and Information, Northwestern Polytechnical University, Xi’an, China; Department of Computer Science and Software Engineering, Swinburne University of Technology, Hawthorn, VIC, Australia; Department of Embedded Systems Engineering, Incheon National University, Incheon, South Korea

Title: Deep-Growing Neural Network With Manifold Constraints for Hyperspectral Image Classification

Abstract:
In the absence of sufficient labels, deep neural networks (DNNs) are prone to overfitting, resulting in poor performance and difficulty in training. Thus, many semisupervised methods aim to use unlabeled sample information to compensate for the lack of label quantity. However, as the available pseudolabels increase, the fixed structure of traditional models has difficulty in matching them, limiting their effectiveness. Therefore, a deep-growing neural network with manifold constraints (DGNN-MC) is proposed. It can deepen the corresponding network structure with the expansion of a high-quality pseudolabel pool and preserve the local structure between the original and high-dimensional data in semisupervised learning. First, the framework filters the output of the shallow network to obtain pseudolabeled samples with high confidence and adds them to the original training set to form a new pseudolabeled training set. Second, according to the size of the new training set, it increases the depth of the layers to obtain a deeper network and conducts the training. Finally, it obtains new pseudolabeled samples and deepens the layers again until the network growth is completed. The growing model proposed in this article can be applied to other multilayer networks, as their depth can be transformed. Taking HSI classification as an example, a natural semisupervised problem, the experimental results demonstrate the superiority and effectiveness of our method, which can mine more reliable information for better utilization and fully balance the growing amount of labeled data and network learning ability.

PaperID: 865,

Authors: Wei Wei, Jingjing Wang, Jun Du, Zhengru Fang, Yong Ren, C. L. Philip Chen

Affiliations: Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China; School of Cyber Science and Technology, Beihang University, Beijing, China; Department of Electronic Engineering, Tsinghua University, Beijing, China; Department of Computer Science, City University of Hong Kong, Hong Kong, China; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

Title: Differential Game-Based Deep Reinforcement Learning in Underwater Target Hunting Task

Abstract:
To meet requirements for real-time trajectory scheduling and distributed coordination, underwater target hunting task is challenging in terms of turbulent ocean environments and dynamic adversarial environment. Despite the existing research in game-based target hunting area, few approaches have considered dynamic environmental factors, such as sea currents, winds, and communication delay. In this article, we focus on a target hunting system consisted of multiple unmanned underwater vehicles (UUVs) and a target with high maneuverability. Besides, differential game theory is leveraged to analyze adversarial behaviors between hunters and the escapee. However, it is intractable that UUVs have to deploy an adaptive scheme to guarantee the consistency and avoid the escape of the target without collision. Therefore, we conceive the Hamiltonian function with Leibniz’s formula to obtain feedback control policies. In addition, it proves that the target hunting system is asymptotically stable in the mean, and the system can satisfy Nash equilibrium relying on the proposed control policies. Furthermore, we design a modified multiagent reinforcement learning (MARL) to facilitate the underwater target hunting task under the constraints of energetic flows and acoustic propagation delay. Simulation results show that the proposed scheme is superior to the typical MARL algorithm in terms of reward and success rate.

PaperID: 866,

Authors: Ruojing Li, Wei An, Chao Xiao, Boyang Li, Yingqian Wang, Miao Li, Yulan Guo

Affiliations: College of Electronic Science and Technology, National University of Defense Technology (NUDT), Changsha, China

Title: Direction-Coded Temporal U-Shape Module for Multiframe Infrared Small Target Detection

Abstract:
Infrared small target (IRST) detection aims at separating targets from cluttered background. Although many deep learning-based single-frame IRST (SIRST) detection methods have achieved promising detection performance, they cannot deal with extremely dim targets while suppressing the clutters since the targets are spatially indistinctive. Multiframe IRST (MIRST) detection can well handle this problem by fusing the temporal information of moving targets. However, the extraction of motion information is challenging since general convolution is insensitive to motion direction. In this article, we propose a simple yet effective direction-coded temporal U-shape module (DTUM) for MIRST detection. Specifically, we build a motion-to-data mapping to distinguish the motion of targets and clutters by indexing different directions. Based on the motion-to-data mapping, we further design a direction-coded convolution block (DCCB) to encode the motion direction into features and extract the motion information of targets. Our DTUM can be equipped with most single-frame networks to achieve MIRST detection. Moreover, in view of the lack of MIRST datasets, including dim targets, we build a multiframe infrared small and dim target dataset (namely, NUDT-MIRSDT) and propose several evaluation metrics. The experimental results on the NUDT-MIRSDT dataset demonstrate the effectiveness of our method. Our method achieves the state-of-the-art performance in detecting infrared small and dim targets and suppressing false alarms. Our codes will be available at https://github.com/TinaLRJ/Multi-frame-infrared-small-target-detection-DTUM.

PaperID: 867,

Authors: Renzhi Lu, Ruichang Bai, Ruidong Li, Lijun Zhu, Mingyang Sun, Feng Xiao, Dong Wang, Huaming Wu, Yuemin Ding

Affiliations: Key Laboratory of Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China; Shanghai Electric Group Company, Ltd., Central Academe, Shanghai, China; Institute of Science and Engineering, Kanazawa University, Kakuma, Kanazawa, Japan; School of Artificial Intelligence and Automation, State Key Laboratory of Intelligent Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan, China; Department of Control Science and Engineering, State Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou, China; State Key Laboratory of Alternate Electrical Power System with Renewable Energy Sources and the School of Control and Computer Engineering, North China Electric Power University, Beijing, China; Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education and the School of Control Science and Engineering, Dalian University of Technology, Dalian, China; Center for Applied Mathematics, Tianjin University, Tianjin, China; Tecnun School of Engineering, University of Navarra, San Sebastián, Spain

Title: A Novel Sequence-to-Sequence-Based Deep Learning Model for Multistep Load Forecasting

Abstract:
Load forecasting is critical to the task of energy management in power systems, for example, balancing supply and demand and minimizing energy transaction costs. There are many approaches used for load forecasting such as the support vector regression (SVR), the autoregressive integrated moving average (ARIMA), and neural networks, but most of these methods focus on single-step load forecasting, whereas multistep load forecasting can provide better insights for optimizing the energy resource allocation and assisting the decision-making process. In this work, a novel sequence-to-sequence (Seq2Seq)-based deep learning model based on a time series decomposition strategy for multistep load forecasting is proposed. The model consists of a series of basic blocks, each of which includes one encoder and two decoders; and all basic blocks are connected by residuals. In the inner of each basic block, the encoder is realized by temporal convolution network (TCN) for its benefit of parallel computing, and the decoder is implemented by long short-term memory (LSTM) neural network to predict and estimate time series. During the forecasting process, each basic block is forecasted individually. The final forecasted result is the aggregation of the predicted results in all basic blocks. Several cases within multiple real-world datasets are conducted to evaluate the performance of the proposed model. The results demonstrate that the proposed model achieves the best accuracy compared with several benchmark models.

PaperID: 868,

Authors: Yifan Hu, Junjie Fu, Guanghui Wen

Affiliations: School of Mathematics, Southeast University, Nanjing, China

Title: Graph Soft Actor-Critic Reinforcement Learning for Large-Scale Distributed Multirobot Coordination

Abstract:
Learning distributed cooperative policies for large-scale multirobot systems remains a challenging task in the multiagent reinforcement learning (MARL) context. In this work, we model the interactions among the robots as a graph and propose a novel off-policy actor–critic MARL algorithm to train distributed coordination policies on the graph by leveraging the ability of information extraction of graph neural networks (GNNs). First, a new type of Gaussian policy parameterized by the GNNs is designed for distributed decision-making in continuous action spaces. Second, a scalable centralized value function network is designed based on a novel GNN-based value function decomposition technique. Then, based on the designed actor and the critic networks, a GNN-based MARL algorithm named graph soft actor–critic (G-SAC) is proposed and utilized to train the distributed policies in an effective and centralized fashion. Finally, two custom multirobot coordination environments are built, under which the simulation results are performed to empirically demonstrate both the sample efficiency and the scalability of G-SAC as well as the strong zero-shot generalization ability of the trained policy in large-scale multirobot coordination problems.

PaperID: 869,

Authors: Jaeyeon Jang, Chang Ouk Kim

Affiliations: Department of Data Science, Catholic University of Korea, Bucheon, Republic of Korea; Department of Industrial Engineering, Yonsei University, Seoul, Republic of Korea

Title: Teacher-Explorer-Student Learning: A Novel Learning Method for Open Set Recognition

Abstract:
When an unknown example, one that was not seen during training, appears, most recognition systems usually produce overgeneralized results and determine that the example belongs to one of the known classes. To address this problem, teacher–explorer–student (T/E/S) learning, which adopts the concept of open set recognition (OSR) to reject unknown samples while minimizing the loss of classification performance on known samples, is proposed in this study. In this novel learning method, the overgeneralization of deep-learning classifiers is significantly reduced by exploring various possibilities for unknowns. The teacher network extracts hints about unknowns by distilling the pretrained knowledge about knowns and delivers this distilled knowledge to the student network. After learning the distilled knowledge, the student network shares its learned information with the explorer network. Next, the explorer network shares its exploration results by generating unknown-like samples and feeding those samples to the student network. As this alternating learning process is repeated, the student network experiences a variety of synthetic unknowns, reducing overgeneralization. The results of extensive experiments show that each component proposed in this article significantly contributes to improving OSR performance. It is found that the proposed T/E/S learning method outperforms current state-of-the-art methods.

PaperID: 870,

Authors: Pablo Morala, Jenny Alexandra Cifuentes, Rosa E. Lillo, Iñaki Ucar

Affiliations: Department of Statistics and the UCM-Santander Big Data Institute, Universidad Carlos III de Madrid, Getafe, Spain; Department of Quantitative Methods, ICADE, Faculty of Economics and Business Administration, and the Institute for Research in Technology (IIT), ICAI School of Engineering, Universidad Pontificia Comillas, Madrid, Spain

Title: NN2Poly: A Polynomial Representation for Deep Feed-Forward Artificial Neural Networks

Abstract:
Interpretability of neural networks (NNs) and their underlying theoretical behavior remain an open field of study even after the great success of their practical applications, particularly with the emergence of deep learning. In this work, NN2Poly is proposed: a theoretical approach to obtain an explicit polynomial model that provides an accurate representation of an already trained fully connected feed-forward artificial NN [a multilayer perceptron (MLP)]. This approach extends a previous idea proposed in the literature, which was limited to single hidden layer networks, to work with arbitrarily deep MLPs in both regression and classification tasks. NN2Poly uses a Taylor expansion on the activation function, at each layer, and then applies several combinatorial properties to calculate the coefficients of the desired polynomials. Discussion is presented on the main computational challenges of this method, and the way to overcome them by imposing certain constraints during the training phase. Finally, simulation experiments as well as applications to real tabular datasets are presented to demonstrate the effectiveness of the proposed method.

PaperID: 871,

Authors: Runqing Jiang, Yan Yan, Jing-Hao Xue, Si Chen, Nannan Wang, Hanzi Wang

Affiliations: Fujian Key Laboratory of Sensing and Computing for Smart City, School of Informatics, Xiamen University, Xiamen, China; Department of Statistical Science, University College London, London, U.K; School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, China; State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, China

Title: Knowledge Distillation Meets Label Noise Learning: Ambiguity-Guided Mutual Label Refinery

Abstract:
Knowledge distillation (KD), which aims at transferring the knowledge from a complex network (a teacher) to a simpler and smaller network (a student), has received considerable attention in recent years. Typically, most existing KD methods work on well-labeled data. Unfortunately, real-world data often inevitably involve noisy labels, thus leading to performance deterioration of these methods. In this article, we study a little-explored but important issue, i.e., KD with noisy labels. To this end, we propose a novel KD method, called ambiguity-guided mutual label refinery KD (AML-KD), to train the student model in the presence of noisy labels. Specifically, based on the pretrained teacher model, a two-stage label refinery framework is innovatively introduced to refine labels gradually. In the first stage, we perform label propagation (LP) with small-loss selection guided by the teacher model, improving the learning capability of the student model. In the second stage, we perform mutual LP between the teacher and student models in a mutual-benefit way. During the label refinery, an ambiguity-aware weight estimation (AWE) module is developed to address the problem of ambiguous samples, avoiding overfitting these samples. One distinct advantage of AML-KD is that it is capable of learning a high-accuracy and low-cost student model with label noise. The experimental results on synthetic and real-world noisy datasets show the effectiveness of our AML-KD against state-of-the-art KD methods and label noise learning (LNL) methods. Code is available at https://github.com/Runqing-forMost/ AML-KD.

PaperID: 872,

Authors: Siyu Wang, Xiaocong Chen, Julian J. McAuley, Sally Cripps, Lina Yao

Affiliations: School of Computer Science and Engineering, The University of New South Wales, Sydney, NSW, Australia; Data, CSIRO, Eveleigh, NSW, Australia; Computer Science Department, University of California at San Diego (UCSD), La Jolla, CA, USA; Human Technology Institute, University of Technology Sydney, Sydney, NSW, Australia

Title: Plug-and-Play Model-Agnostic Counterfactual Policy Synthesis for Deep Reinforcement Learning-Based Recommendation

Abstract:
Recent advances in recommender systems have proved the potential of reinforcement learning (RL) to handle the dynamic evolution processes between users and recommender systems. However, learning to train an optimal RL agent is generally impractical with commonly sparse user feedback data in the context of recommender systems. To circumvent the lack of interaction of current RL-based recommender systems, we propose to learn a general model-agnostic counterfactual synthesis (MACS) policy for counterfactual user interaction data augmentation. The counterfactual synthesis policy aims to synthesize counterfactual states while preserving significant information in the original state relevant to the user’s interests, building upon two different training approaches we designed: learning with expert demonstrations and joint training. As a result, the synthesis of each counterfactual data is based on the current recommendation agent’s interaction with the environment to adapt to users’ dynamic interests. We integrate the proposed policy deep deterministic policy gradient (DDPG), soft actor critic (SAC), and twin delayed DDPG (TD3) in an adaptive pipeline with a recommendation agent that can generate counterfactual data to improve the performance of recommendation. The empirical results on both online simulation and offline datasets demonstrate the effectiveness and generalization of our counterfactual synthesis policy and verify that it improves the performance of RL recommendation agents.

PaperID: 873,

Authors: Fujin Wang, Quanquan Zhi, Zhibin Zhao, Zhi Zhai, Yingkai Liu, Huan Xi, Shibin Wang, Xuefeng Chen

Affiliations: National Key Laboratory of Aerospace Power System and Plasma Technology and the School of Mechanical Engineering, Xi’an Jiaotong University, Xi’an, China; Beijing Aerospace Control Centre, Beijing, China; Key Laboratory of Thermo-Fluid Science and Engineering of Ministry of Education, School of Energy and Power Engineering, Xi’an Jiaotong University, Xi’an, China

Title: Inherently Interpretable Physics-Informed Neural Network for Battery Modeling and Prognosis

Abstract:
Lithium-ion batteries are widely used in modern society. Accurate modeling and prognosis are fundamental to achieving reliable operation of lithium-ion batteries. Accurately predicting the end-of-discharge (EOD) is critical for operations and decision-making when they are deployed to critical missions. Existing data-driven methods have large model parameters, which require a large amount of labeled data and the models are not interpretable. Model-based methods need to know many parameters related to battery design, and the models are difficult to solve. To bridge these gaps, this study proposes a physics-informed neural network (PINN), called battery neural network (BattNN), for battery modeling and prognosis. Specifically, we propose to design the structure of BattNN based on the equivalent circuit model (ECM). Therefore, the entire BattNN is completely constrained by physics. Its forward propagation process follows the physical laws, and the model is inherently interpretable. To validate the proposed method, we conduct the discharge experiments under random loading profiles and develop our dataset. Analysis and experiments show that the proposed BattNN only needs approximately 30 samples for training, and the average required training time is 21.5 s. Experimental results on three datasets show that our method can achieve high prediction accuracy with only a few learnable parameters. Compared with other neural networks, the prediction MAEs of our BattNN are reduced by 77.1%, 67.4%, and 75.0% on three datasets, respectively. Our data and code will be available at: https://github.com/wang-fujin/BattNN.

PaperID: 874,

Authors: Zuowei Zhang, Zhunga Liu, Liangbo Ning, Arnaud Martin, Jiexuan Xiong

Affiliations: School of Automation, Northwestern Polytechnical University, Xi’an, China; Institute de Recherche en Informatique et Systèmes Aléatoires, University of Rennes , Lannion, France

Title: Representation of Imprecision in Deep Neural Networks for Image Classification

Abstract:
Quantification and reduction of uncertainty in deep-learning techniques have received much attention but ignored how to characterize the imprecision caused by such uncertainty. In some tasks, we prefer to obtain an imprecise result rather than being willing or unable to bear the cost of an error. For this purpose, we investigate the representation of imprecision in deep-learning (RIDL) techniques based on the theory of belief functions (TBF). First, the labels of some training images are reconstructed using the learning mechanism of neural networks to characterize the imprecision in the training set. In the process, a label assignment rule is proposed to reassign one or more labels to each training image. Once an image is assigned with multiple labels, it indicates that the image may be in an overlapping region of different categories from the feature perspective or the original label is wrong. Second, those images with multiple labels are rechecked. As a result, the imprecision (multiple labels) caused by the original labeling errors will be corrected, while the imprecision caused by insufficient knowledge is retained. Images with multiple labels are called imprecise ones, and they are considered to belong to meta-categories, the union of some specific categories. Third, the deep network model is retrained based on the reconstructed training set, and the test images are then classified. Finally, some test images that specific categories cannot distinguish will be assigned to meta-categories to characterize the imprecision in the results. Experiments based on some remarkable networks have shown that RIDL can improve accuracy (AC) and reasonably represent imprecision both in the training and testing sets.

PaperID: 875,

Authors: Zhengming Li, Jiahui Chen, Peifeng Zhang, Huiwu Huang, Guanbin Li

Affiliations: School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China

Title: DSFedCon: Dynamic Sparse Federated Contrastive Learning for Data-Driven Intelligent Systems

Abstract:
Federated learning (FL) makes it possible for multiple clients to collaboratively train a machine-learning model through communicating models instead of data, reducing privacy risk. Thus, FL is more suitable for processing data security and privacy for intelligent systems and applications. Unfortunately, there are several challenges in FL, such as the low training accuracy for nonindependent and identically distributed (non-IID) data and the high cost of computation and communication. Considering these, we propose a novel FL framework named dynamic sparse federated contrastive learning (DSFedCon). DSFedCon combines FL with dynamic sparse (DSR) training of network pruning and contrastive learning to improve model performance and reduce computation costs and communication costs. We analyze DSFedCon from the perspective of accuracy, communication, and security, demonstrating it is communication-efficient and safe. To give a practical evaluation for non-IID data training, we perform experiments and comparisons on the MNIST, CIFAR-10, and CIFAR-100 datasets with different parameters of Dirichlet distribution. Results indicate that DSFedCon can get higher accuracy and better communication cost than other state-of-the-art methods in these two datasets. More precisely, we show that DSFedCon has a 4.67-time speedup of communication rounds in MNIST, a 7.5-time speedup of communication rounds in CIFAR-10, and an 18.33-time speedup of communication rounds in CIFAR-100 dataset while achieving the same training accuracy.

PaperID: 876,

Authors: Wenwen Wei, Ping Wei, Zhimin Liao, Jialu Qin, Xiang Cheng, Meiqin Liu, Nanning Zheng

Affiliations: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence and the Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, China; State Key Laboratory of Advanced Optical Communication Systems and Networks, School of Electronics, Peking University, Beijing, China

Title: Semantic Consistency Reasoning for 3-D Object Detection in Point Clouds

Abstract:
Point cloud-based 3-D object detection is a significant and critical issue in numerous applications. While most existing methods attempt to capitalize on the geometric characteristics of point clouds, they neglect the internal semantic properties of point and the consistency between the semantic and geometric clues. We introduce a semantic consistency (SC) mechanism for 3-D object detection in this article, by reasoning about the semantic relations between 3-D object boxes and its internal points. This mechanism is based on a natural principle: the semantic category of a 3-D bounding box should be consistent with the categories of all points within the box. Driven by the SC mechanism, we propose a novel SC network (SCNet) to detect 3-D objects from point clouds. Specifically, the SCNet is composed of a feature extraction module, a detection decision module, and a semantic segmentation module. In inference, the feature extraction and the detection decision modules are used to detect 3-D objects. In training, the semantic segmentation module is jointly trained with the other two modules to produce more robust and applicable model parameters. The performance is greatly boosted through reasoning about the relations between the output 3-D object boxes and segmented points. The proposed SC mechanism is model-agnostic and can be integrated into other base 3-D object detection models. We test the proposed model on three challenging indoor and outdoor benchmark datasets: ScanNetV2, SUN RGB-D, and KITTI. Furthermore, to validate the universality of the SC mechanism, we implement it in three different 3-D object detectors. The experiments show that the performance is impressively improved and the extensive ablation studies also demonstrate the effectiveness of the proposed model.

PaperID: 877,

Authors: Zhaofei Yu, Tong Bu, Yijun Zhang, Shanshan Jia, Tiejun Huang, Jian K. Liu

Affiliations: Institute for Artificial Intelligence and the National Engineering Research Center of Visual Technology, Peking University, Beijing, China; Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China; School of Computer Science, University of Birmingham, Birmingham, U.K.

Title: Robust Decoding of Rich Dynamical Visual Scenes With Retinal Spikes

Abstract:
Sensory information transmitted to the brain activates neurons to create a series of coping behaviors. Understanding the mechanisms of neural computation and reverse engineering the brain to build intelligent machines requires establishing a robust relationship between stimuli and neural responses. Neural decoding aims to reconstruct the original stimuli that trigger neural responses. With the recent upsurge of artificial intelligence, neural decoding provides an insightful perspective for designing novel algorithms of brain–machine interface. For humans, vision is the dominant contributor to the interaction between the external environment and the brain. In this study, utilizing the retinal neural spike data collected over multi trials with visual stimuli of two movies with different levels of scene complexity, we used a neural network decoder to quantify the decoded visual stimuli with six different metrics for image quality assessment establishing comprehensive inspection of decoding. With the detailed and systematical study of the effect and single and multiple trials of data, different noise in spikes, and blurred images, our results provide an in-depth investigation of decoding dynamical visual scenes using retinal spikes. These results provide insights into the neural coding of visual scenes and services as a guideline for designing next-generation decoding algorithms of neuroprosthesis and other devices of brain–machine interface.

PaperID: 878,

Authors: Richeng Jin, Yuding Liu, Yufan Huang, Xiaofan He, Tianfu Wu, Huaiyu Dai

Affiliations: Department of Information and Communication Engineering, Zhejiang University, Hangzhou, China; resides, Santa Clara, CA, USA; School of Electronic Information, Wuhan University, Wuhan, China; Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, USA

Title: Sign-Based Gradient Descent With Heterogeneous Data: Convergence and Byzantine Resilience

Abstract:
Communication overhead has become one of the major bottlenecks in the distributed training of modern deep neural networks. With such consideration, various quantization-based stochastic gradient descent (SGD) solvers have been proposed and widely adopted, among which SignSGD with majority vote shows a promising direction because of its communication efficiency and robustness against Byzantine attackers. However, SignSGD fails to converge in the presence of data heterogeneity, which is commonly observed in the emerging federated learning (FL) paradigm. In this article, a sufficient condition for the convergence of the sign-based gradient descent method is derived, based on which a novel magnitude-driven stochastic-sign-based gradient compressor is proposed to address the non-convergence issue of SignSGD. The convergence of the proposed method is established in the presence of arbitrary data heterogeneity. The Byzantine resilience of sign-based gradient descent methods is quantified, and the error-feedback mechanism is further incorporated to boost the learning performance. Experimental results on the MNIST dataset, the CIFAR-10 dataset, and the Tiny-ImageNet dataset corroborate the effectiveness of the proposed methods.

PaperID: 879,

Authors: Wei Dai, Jicong Fan, Yiming Miao, Kai Hwang

Affiliations: School of Data Science, The Chinese University of Hong Kong, Shenzhen, China

Title: Deep Learning Model Compression With Rank Reduction in Tensor Decomposition

Abstract:
Large neural network models are hard to deploy on lightweight edge devices demanding large network bandwidth. In this article, we propose a novel deep learning (DL) model compression method. Specifically, we present a dual-model training strategy with an iterative and adaptive rank reduction (RR) in tensor decomposition. Our method regularizes the DL models while preserving model accuracy. With adaptive RR, the hyperparameter search space is significantly reduced. We provide a theoretical analysis of the convergence and complexity of the proposed method. Testing our method for the LeNet, VGG, ResNet, EfficientNet, and RevCol over MNIST, CIFAR-10/100, and ImageNet datasets, our method outperforms the baseline compression methods in both model compression and accuracy preservation. The experimental results validate our theoretical findings. For the VGG-16 on CIFAR-10 dataset, our compressed model has shown a 0.88% accuracy gain with 10.41 times storage reduction and 6.29 times speedup. For the ResNet-50 on ImageNet dataset, our compressed model results in 2.36 times storage reduction and 2.17 times speedup. In federated learning (FL) applications, our scheme reduces 13.96 times the communication overhead. In summary, our compressed DL method can improve the image understanding and pattern recognition processes significantly.

PaperID: 880,

Authors: Min Wu, Weijun Li, Lina Yu, Linjun Sun, Jingyi Liu, Wenqiang Li

Affiliations: AnnLab, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, China

Title: Discovering Mathematical Expressions Through DeepSymNet: A Classification-Based Symbolic Regression Framework

Abstract:
Symbolic regression (SR) is the process of finding an unknown mathematical expression given the input and output and has important applications in interpretable machine learning and knowledge discovery. The major difficulty of SR is that finding the expression structure is an NP-hard problem, which makes the entire process time-consuming. In this study, the solution of expression structures was regarded as a classification problem and solved by supervised learning such that SR can be solved quickly by using the solving experience. Techniques for classification tasks, such as equivalent label merging and sample balance, were used to enhance the robustness of the algorithm. We proposed a symbolic network called DeepSymNet to represent symbolic expressions to improve the performance of the algorithm. DeepSymNet has been proven to have a strong representation ability with a shorter label compared to the current popular representation methods, reducing the search space when predicting. Moreover, DeepSymNet conveniently decomposes SR into two smaller subproblems, which makes solving the problem easier. The proposed algorithm was tested on artificially generated expressions and public datasets and compared with other algorithms. The results demonstrate the effectiveness of the proposed algorithm.

PaperID: 881,

Authors: Ruihang Ji, Dongyu Li, Shuzhi Sam Ge, Haizhou Li

Affiliations: Department of Electrical and Computer Engineering, National University of Singapore, Queenstown, Singapore; School of Cyber Science and Technology, Beihang University, Beijing, China

Title: Tunnel Prescribed Control of Nonlinear Systems With Unknown Control Directions

Abstract:
This article solves the entry capture problem (ECP) such that for any initial tracking error, it can be regulated into the prescribed performance constraints within a user-given time. The challenge lies in how to remove the initial condition limitation and to handle the ECP for nonlinear systems under unknown control directions and asymmetric performance constraints. For better tracking performance, we propose a unified tunnel prescribed performance (TPP) providing strict and tight allowable set. With the aid of a scaling function, error self-tuning functions (ESFs) are then developed to make the control scheme suitable to any initial condition (including the initial constraint violation), where the initial values of ESFs always satisfy performance constraints. In lieu of the Nussbaum technique, an orientation function is introduced to deal with unknown control directions while such way is capable of reducing the control peaking problem. Using ESFs, together with TPP and an orientation function, the resulted tunnel prescribed control (TPC) leads to a solution for the underlying ECP, which also exhibits a low complexity level since no command filters or dynamic surface control is required. Finally, simulation results are provided to further demonstrate these theoretical findings.

PaperID: 882,

Authors: Ali Jameel Hashim, M. A. Balafar, Jafar Tanha, Aryaz Baradarani

Affiliations: Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, East Azarbaijan, Iran; Center for Diagnostic Imaging Research, Tessonics Inc., Windsor, ON, Canada

Title: AEVAE: Adaptive Evolutionary Autoencoder for Anomaly Detection in Time Series

Abstract:
Anomaly detection (AD) has witnessed substantial advancements in recent years due to the increasing need for identifying outliers in various engineering applications that undergo environmental adaptations. Consequently, researchers have focused on developing robust AD methods to enhance system performance. The primary challenge faced by AD algorithms lies in effectively detecting unlabeled abnormalities. This study introduces an adaptive evolutionary autoencoder (AEVAE) approach for AD in time-series data. The proposed methodology leverages the integration of unsupervised machine learning techniques with evolutionary intelligence to classify unlabeled data. The unsupervised learning model employed in this approach is the AE network. A systematic programming framework has been devised to transform AEVAE into a practical and applicable model. The primary objective of AEVAE is to detect and predict outliers in time-series data from unlabeled data sources. The effectiveness, speed, and functionality enhancements of the proposed method are demonstrated through its implementation. Furthermore, a comprehensive statistical analysis based on performance metrics is conducted to validate the advantages of AEVAE in terms of unsupervised AD.

PaperID: 883,

Authors: Shaofei Cai, Liang Li, Xinzhe Han, Shan Huang, Qi Tian, Qingming Huang

Affiliations: Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing, China; Tencent, Beijing, China; Cloud BU, Huawei Technologies, Shenzhen, China

Title: Semantic and Correlation Disentangled Graph Convolutions for Multilabel Image Recognition

Abstract:
Multilabel image recognition (MLR) aims to annotate an image with comprehensive labels and suffers from object occlusion or small object sizes within images. Although the existing works attempt to capture and exploit label correlations to tackle these issues, they predominantly rely on global statistical label correlations as prior knowledge for guiding label prediction, neglecting the unique label correlations present within each image. To overcome this limitation, we propose a semantic and correlation disentangled graph convolution (SCD-GC) method, which builds the image-specific graph and employs graph propagation to reason the labels effectively. Specifically, we introduce a semantic disentangling module to extract categorywise semantic features as graph nodes and develop a correlation disentangling module to extract image-specific label correlations as graph edges. Performing graph convolutions on this image-specific graph allows for better mining of difficult labels with weak visual representations. Visualization experiments reveal that our approach successfully disentangles the dominant label correlations existing within the input image. Through extensive experimentation, we demonstrate that our method achieves superior results on the challenging Microsoft COCO (MS-COCO), PASCAL visual object classes (PASCAL-VOC), NUS web image dataset (NUS-WIDE), and Visual Genome 500 (VG-500) datasets. Code is available at GitHub: https://github.com/caigitrepo/SCDGC.

PaperID: 884,

Authors: Bin Du, Wei Xie, Yang Li, Qisong Yang, Weidong Zhang, Rudy R. Negenborn, Yusong Pang, Hongtian Chen

Affiliations: Ocean Institute, Northwestern Polytechnical University, Taicang, China; Department of Automation, Shanghai Jiao Tong University, Shanghai, China; College of Mechanical and Vehicle Engineering, Hunan University, Changsha, China; Xi’an Institute of High-Tech, Xi’an, China; Department of Maritime and Transport Technology, Delft University of Technology, Delft, The Netherlands

Title: Safe Adaptive Policy Transfer Reinforcement Learning for Distributed Multiagent Control

Abstract:
Multiagent reinforcement learning (RL) training is usually difficult and time-consuming due to mutual interference among agents. Safety concerns make an already difficult training process even harder. This study proposes a safe adaptive policy transfer RL approach for multiagent cooperative control. Specifically, a pioneer and follower off-policy policy transfer learning (PFOPT) method is presented to help follower agents acquire knowledge and experience from a single well-trained pioneer agent. Notably, the designed approach can transfer both the policy representation and sample experience provided by the pioneer policy in the off-policy learning. More importantly, the proposed method can adaptively adjust the learning weight of prior experience and exploration according to the Wasserstein distance between the policy probability distributions of the pioneer and the follower. Case studies show that the distributed agents trained by the proposed method can complete a collaborative task and acquire the maximum rewards while minimizing the violation of constraints. Moreover, the proposed method can also achieve satisfactory performance in terms of learning speed and success rate.

PaperID: 885,

Authors: Saiyang Na, Yuzhi Guo, Feng Jiang, Hehuan Ma, Jean Gao, Junzhou Huang

Affiliations: Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX, USA

Title: Segment Any Cell: A SAM-Based Auto-Prompting Fine-Tuning Framework for Nuclei Segmentation

Abstract:
In the rapidly evolving field of AI research, foundational models like BERT and GPT have significantly advanced language and vision tasks. The advent of pretrain-prompting models, such as ChatGPT and segment anything model (SAM), has further revolutionized image segmentation. However, their applications in specialized areas, particularly in nuclei segmentation within medical imaging, reveal a key challenge: the generation of high-quality, informative prompts is as crucial as applying state-of-the-art (SOTA) fine-tuning techniques on foundation models. To address this, we introduce segment any cell (SAC), an innovative framework that enhances SAM specifically for nuclei segmentation. SAC integrates a low-rank adaptation (LoRA) within the attention layer of the Transformer to improve the fine-tuning process, outperforming existing SOTA methods. It also introduces an innovative auto-prompt generator that produces effective prompts to guide segmentation, a critical factor in handling the complexities of nuclei segmentation in biomedical imaging. Our extensive experiments demonstrate the superiority of SAC in nuclei segmentation tasks, proving its effectiveness as a tool for pathologists and researchers. Our contributions include a novel prompt generation strategy, automated adaptability for diverse segmentation tasks, the innovative application of low-rank attention adaptation in SAM, and a versatile framework for semantic and instance segmentation challenges.

PaperID: 886,

Authors: Xu Yang, Jiyuan Feng, Yongxin Tong, Lingzhi Wang, Songyue Guo, Binxing Fang, Qing Liao

Affiliations: Department of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China; State Key Laboratory of Software Development Environment and Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, School of Computer Science, Beihang University, Beijing, China

Title: DA-PFL: Dynamic Affinity Aggregation in Personalized Federated Learning Under Class Imbalance

Abstract:
Personalized federated learning (PFL) has become a hot research topic that can learn a personalized learning model for each client. Existing PFL models prefer to aggregate similar clients with similar data distribution to improve the performance of learning models. However, similarity-based PFL methods may exacerbate the class imbalance problem. In this article, we propose a novel dynamic affinity-based PFL (DA-PFL) model to alleviate the class imbalanced problem during federated learning. Specifically, we build an affinity metric from a complementary perspective to guide which clients should be aggregated. We then design a dynamic aggregation strategy that adjusts client aggregation based on the affinity metric in each round, thereby reducing the risk of class imbalance. Extensive experiments demonstrate that the proposed DA-PFL model can significantly improve the accuracy of each client in four real-world datasets with state-of-the-art comparison methods.

PaperID: 887,

Authors: Mingjin Zhang, Jin Feng, Handi Yang, Jie Guo, Yunsong Li, Xinbo Gao

Affiliations: State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, China; School of Electronic Engineering, Xidian University, Xi’an, China

Title: IRPruneDeXt: Efficient Infrared Small Target Detection via Musical Wavelet-Regularized Channel Pruning

Abstract:
Infrared small target detection (IRSTD) refers to detecting faint targets in infrared (IR) images, which has achieved notable progress with the advent of deep learning. However, the drive for improved detection accuracy has led to larger, intricate models with redundant parameters, causing storage and computation inefficiencies. In this pioneering study, we introduce the concept of utilizing network pruning to enhance the efficiency of IRSTD. Due to the challenge posed by low signal-to-noise ratios (SNRs) and the absence of detailed semantic information in IR images, directly applying existing pruning techniques yields suboptimal performance. To address this, we propose a novel wavelet structure-regularized multidimensional musical scale soft channel pruning (SCP) method, giving rise to the efficient IRPruneDeXt model. Our approach involves representing the weight matrix in the wavelet domain and formulating a wavelet channel pruning (WCP) strategy. We incorporate wavelet regularization to induce structural sparsity without incurring extra memory usage. Additionally, we design a multidimensional musical scale soft channel reconstruction (MMSCR) method that adapts the strategy across temporal and spatial dimensions to preserve key target information and prevent premature pruning. By leveraging interactions between criteria, it balances pruning and reconstruction through a musical scale feedback effect, achieving an optimal sparse structure while maintaining overall sparsity. Through extensive experiments on many widely used benchmarks, our IRPruneDeXt method surpasses established techniques in both model complexity and accuracy. Specifically, when employing U-net as the baseline network, IRPruneDeXt achieves a 65.68% reduction in parameters and a 51.77% decrease in floating-point operations (FLOPs) while improving intersection over union (IoU) from 73.31% to 76.17% and normalized IoU (nIoU) from 70.92% to 75.08%. The code is available at github.com/hd0013/IRPruneDet

PaperID: 888,

Authors: Wei-Yen Hsu, Shih-Hao Huang

Affiliations: College of Artificial Intelligence, National Yang Ming Chiao Tung University, Tainan, Taiwan; Department of Information Management, National Chung Cheng University, Chiayi, Taiwan

Title: Progressive Structure Preservation and Detail Refinement for Remote Sensing Single-Image Super-Resolution

Abstract:
Recent advances in deep-learning-based remote sensing image super-resolution (RSISR) have garnered significant attention. Conventional models typically perform upsampling at the end of the architecture, which reduces computational effort but leads to information loss and limits image quality. Moreover, the structural complexity and texture diversity of remote sensing images pose challenges in detail preservation. While transformer-based approaches improve global feature capture, they often introduce redundancy and overlook local details. To address these issues, we propose a novel progressive structure preservation and detail refinement super-resolution (PSPDR-SR) model, designed to enhance both structural integrity and fine details in RSISR. The model comprises two primary subnetworks: the structure-aware super-resolution (SaSR) subnetwork and the detail recovery and refinement (DR&R) subnetwork. To efficiently leverage multilayer and multiscale feature representations, we introduce coarse-to-fine dynamic information transmission (C2FDIT) and fine-to-coarse dynamic information transmission (F2CDIT) modules, which facilitate the extraction of richer details from low-resolution (LR) remote sensing images. These modules integrate transformers and convolutional long short-term memory (ConvLSTM) blocks to form dynamic information transmission modules (DITMs), enabling effective bidirectional feature transmission both horizontally and vertically. This method ensures comprehensive feature fusion, mitigates redundant information, and preserves essential extracted features within the deep network. Experimental results demonstrate that PSPDR-SR outperforms the state-of-the-art approaches on two benchmark datasets in both quantitative and qualitative evaluations, excelling in structure preservation and detail enhancement across various metrics, including SSIM, MS_SSIM, learned perceptual image patch similarity (LPIPS), deep image structure and texture similarity (DISTS), spatial correlation coefficient (SCC), and spectral angle mapper (SAM).

PaperID: 889,

Authors: Xuan Zhang, Wang Zheng, Zhigang Li, Yi Yang, Weijia Liu, Hongxin Cai, Junru Zhu, Jingyu Liu, Bin Hu, Qunxi Dong

Affiliations: Key Laboratory of Brain Health Intelligent Evaluation and Intervention, Ministry of Education, and the School of Medical Technology, Beijing Institute of Technology, Beijing, China

Title: Constraint-Driven Causal Representation Learning for Vigilance Robust Estimation in Brain-Computer Interface

Abstract:
Vigilance estimation is a critical task within the field of brain–computer interfaces, extensively applied in monitoring and optimizing user states during human–machine interaction using electroencephalography (EEG). However, most existing vigilance prediction frameworks are prone to spurious correlations stemming from inherent biases in collected data. These biases involve relevant but vigilance-independent information, which may lack robustness when applied to different data distributions, i.e., out-of-distribution (OOD) scenarios. The core idea of this study is to learn constraints that capture causal information from the input based on the assumed underlying data generating process. Leveraging the disentanglement and invariance principles behind the assumptions, we propose a constraint-driven causal representation learning (CCRL) to identify and separate spurious latent variables from biased training data for generalized vigilance estimation. The CCRL training process consists of two phases: self-supervised pretraining and constraint-driven causal information disentanglement. In the first phase, based on the masked autoencoder (MAE) architecture, unlabeled training data are used for reconstructing pretext tasks to capture the comprehensive and intrinsic contextual information from EEG data, which provides a powerful input for downstream disentanglement learning. In the second phase, we propose a novel disentanglement strategy to learn spurious-free latent representations causally related to the vigilance state driven by adversarial and invariance constraints. Comprehensive validation experiments conducted on two well-known public datasets demonstrate the effectiveness and superiority of the proposed framework. In general, this work has promising implications for addressing OOD challenges in vigilance estimation.

PaperID: 890,

Authors: Hui He, Qi Zhang, Kun Yi, Xiaojun Xue, Shoujin Wang, Liang Hu, Longbing Cao

Affiliations: School of Medical Technology, Beijing Institute of Technology, Beijing, China; School of Computer Science and Technology, Tongji University, Shanghai, China; North China Institute of Computing Technology, Beijing, China; School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China; Data Science Institute, University of Technology Sydney, Ultimo, NSW, Australia; School of Computing, Macquarie University, Macquarie Park, NSW, Australia

Title: Robust Multivariate Time Series Forecasting Against Intraseries and Interseries Transitional Shift

Abstract:
The nonstationary nature of real-world multivariate time series (MTS) data presents forecasting models with a formidable challenge of the time-variant distribution of time series, referred to as distribution shift. Existing studies on the distribution shift mostly adhere to adaptive normalization techniques for alleviating temporal mean and covariance shifts or time-variant modeling for capturing temporal shifts. Despite improving model generalization, these normalization-based methods often assume a time-invariant transition between outputs and inputs but disregard specific intraseries/interseries correlations, while time-variant models overlook the intrinsic causes of the distribution shift. This limits the model’s expressiveness and interpretability in tackling the distribution shift for MTS forecasting. To mitigate such a dilemma, we present a unified Probabilistic Graphical Model to Jointly capture intraseries/interseries correlations and model the time-variant transitional distribution and instantiate a neural framework called JointPGM for nonstationary MTS forecasting. Specifically, JointPGM first employs multiple Fourier basis functions to learn dynamic time factors and designs two distinct learners: intraseries and interseries learners. The intraseries learner effectively captures temporal dynamics by utilizing temporal gates, while the interseries learner explicitly models spatial dynamics through multihop propagation, incorporating Gumbel-softmax sampling. These two types of series dynamics are subsequently fused into a latent variable, which is inversely employed to infer time factors, generate a final prediction, and perform the reconstruction. We validate the effectiveness and efficiency of JointPGM through extensive experiments on six highly nonstationary MTS datasets, achieving state-of-the-art (SOTA) forecasting performance of MTS forecasting.

PaperID: 891,

Authors: Raman Goyal, Mohamed Naveed Gul Mohamed, Ran Wang, Aayushman Sharma, Suman Chakravorty

Affiliations: Department of Aerospace Engineering, Texas A&M University, College Station, TX, USA

Title: Information-State-Based Reinforcement Learning for the Control of Partially Observed Nonlinear Systems

Abstract:
This article develops a model-based reinforcement learning (RL) approach to the closed-loop control of nonlinear dynamical systems with a partial nonlinear observation model. We propose an “information-state”-based approach to rigorously transform the partially observed problem into a fully observed problem where the information state consists of the past several observations and control inputs. We further show the equivalence of the transformed and the initial partially observed optimal control problems and provide the conditions to solve for the deterministic optimal solution. We develop a data-based generalization of the iterative linear quadratic regulator (ILQR) for the RL of partially observed systems using a local linear time-varying model of the information-state dynamics approximated by an autoregressive-moving-average (ARMA) model that is generated using only the input–output data. This approach allows us to design a local perturbation feedback control law that provides an optimum solution to the partially observed feedback design problem locally. The efficacy of the developed method is shown by controlling complex high-dimensional nonlinear dynamical systems in the presence of model and sensing uncertainty.

PaperID: 892,

Authors: John Sum, Chi-Sing Leung, Janet C. C. Chang

Affiliations: Institute of Technology Management, National Chung Hsing University, Taichung, Taiwan; Electrical Engineering Department, City University of Hong Kong, Kowloon Tong, Hong Kong

Title: A Fast Wang kWTA With Application in Sealed-Bid Uniform Price Auction

Abstract:
In this brief, two fast discrete-time Wang kWTA (Fast Wang kWTA) algorithms are presented with an application in sealed-bid uniform price auctions. These algorithms can either be implemented in centralized or distributed manner. The structure of the Fast Wang kWTA is essentially the same as the original Wang k-winner-take-all (kWTA), except that our state update method is based on bisection method instead of gradient descent. By that, the number of iterations for getting correct output is largely reduced. Besides, the number is just a factor depended on the guess of the maximum input value. It is independent of the number of inputs, the number of winners, and the learning step size. The number of iterations is far smaller than the number required in the original Wang kWTA. In sequel, this Fast Wang kWTA is particularly suitable to be applied in solving the winner (resp. price) determination in real time and in distributed manner for a sealed-bid auction. In addition, the Fast Wang kWTA can ensure bidding price protection even if the communicated data are not encrypted and leaked.

PaperID: 893,

Authors: Mengqing Ye, C. L. Philip Chen, Tong Zhang

Affiliations: School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

Title: Hierarchical Dynamic Graph Convolutional Network With Interpretability for EEG-Based Emotion Recognition

Abstract:
Graph convolutional networks (GCNs) have shown great prowess in learning topological relationships among electroencephalogram (EEG) channels for EEG-based emotion recognition. However, most existing GCN-only methods are designed with a single spatial pattern, lacking connectivity enhancement within local functional regions and ignoring the data dependencies of EEG original data. In this article, hierarchical dynamic GCN (HD-GCN) is proposed to explore dynamic multilevel spatial information among EEG channels, with discriminative features of EEG signals as auxiliary information. Specifically, representation learning in topological space consists of two branches: one for extracting global dynamic information and one for exploring augmentation information in local functional regions. In each branch, a layerwise adjacency matrix is utilized to enrich the expressive power of GCN. Furthermore, a data-dependent auxiliary information module (AIM) is developed to capture multidimensional fusion features. Extensive experiments on two public datasets, SJTU emotion EEG dataset (SEED) and DREAMER, demonstrate that the proposed method consistently exceeds state-of-the-art methods. Interpretability analysis of the proposed model is performed, discovering the active brain regions and important electrode pairs related to emotion.

PaperID: 894,

Authors: Fan Yang, Wenrui Chen, Kailun Yang, Haoran Lin, Dongsheng Luo, Conghui Tang, Zhiyong Li, Yaonan Wang

Affiliations: School of Artificial Intelligence and Robotics, Hunan University, Changsha, China

Title: Learning Granularity-Aware Affordances From Human-Object Interaction for Tool-Based Functional Dexterous Grasping

Abstract:
To enable robots to use tools, the initial step is teaching robots to employ dexterous gestures for touching specific areas precisely where tasks are performed. Affordance features of objects serve as a bridge in the functional interaction between agents and objects. However, leveraging these affordance cues to help robots achieve functional tool grasping remains unresolved. To address this, we propose a granularity-aware affordance feature extraction method for locating functional affordance areas and predicting dexterous coarse gestures. We study the intrinsic mechanisms of human tool use. On the one hand, we use fine-grained affordance features of object-functional finger contact areas to locate functional affordance regions. On the other hand, we use highly activated coarse-grained affordance features in hand–object interaction regions to predict grasp gestures. Additionally, we introduce a model-based postprocessing module that transforms affordance localization and gesture prediction into executable robotic actions. This forms GAAF-Dex, a complete framework that learns granularity-aware affordances from human–object interaction to enable tool-based functional grasping with dexterous hands. Unlike fully supervised methods that require extensive data annotation, we employ a weakly supervised approach to extract relevant cues from exocentric (Exo) images of hand–object interactions to supervise feature extraction in egocentric (Ego) images. To support this approach, we have constructed a small-scale dataset, functional affordance hand (FAH)-object interaction dataset, which includes nearly 6k images of functional hand–object interaction Exo images and Ego images of 18 commonly used tools performing six tasks. Extensive experiments on the dataset demonstrate that our method outperforms state-of-the-art methods, and real-world localization and grasping experiments validate the practical applicability of our approach. The source code and the established dataset are available at https://github.com/yangfan293/GAAF-DEX

PaperID: 895,

Authors: Yue Zhao, Ruoyu Wu, Pengyu Dai, Hong Huang, Yang Liu

Affiliations: School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China; Department of Field Medical Equipment, Daping Hospital, Army Medical University, Chongqing, China; Institute of Science Tokyo, Kanagawa, Japan; Key Laboratory of Optoelectronic Technology and Systems, Ministry of Education, Chongqing University, Chongqing, China; Department of Orthodontics, The Affiliated Stomatological Hospital of Chongqing Medical University, Chongqing, China

Title: SMTLNet: Domain Prior-Inspired Tooth Segmentation Based on Self-Supervised Manifold Transfer Learning

Abstract:
Accurate identification and delineation of teeth in cone-beam computed tomography (CBCT) images are crucial in the advancement of digital dentistry technology. Teeth exhibit high interclass similarity and often have fuzzy boundaries. In addition, it is difficult to obtain teeth samples due to the time-consuming annotation process. However, existing methods typically fail to incorporate this domain-specific prior information under limited labeled samples, which limits the improvement of segmentation performance. Based on the intrinsic characteristics of the tooth CBCT images, a self-supervised manifold transfer learning network (SMTLNet) is proposed to improve segmentation accuracy. Initially, an object-oriented self-supervised pretraining approach is designed to fully explore valuable image representations from unannotated images, and this helps reduce dependence on labeled samples. Furthermore, a manifold optimization strategy is employed to regularize the segmentation model to separate interclass samples while compacting intraclass neighbors. Finally, to address the issue of blurred tooth boundaries, a multiscale boundary constraint module is developed to extract multiscale boundary-aware features, and more discriminative tooth descriptions can be acquired in this way. The proposed SMTLNet method is evaluated on clinical datasets containing diverse challenging cases (e.g., impacted wisdom teeth, crowded dentition), and it achieves state-of-the-art performance with dice similarity coefficients (DSCs) of 91.8%/89.08% and Jaccard similarities (JSs) of 86.71%/82.87% under full (100%) and limited (20%) training data regimes, respectively. The method maintains anatomical precision with Hausdorff distances (HDs) of 1.41 mm (high-resource) and 2.35 mm (low-resource), demonstrating strong clinical applicability in digital dentistry workflows.

PaperID: 896,

Authors: Zhihao Wen, Yuan Fang, PengCheng Wei, Fayao Liu, Zhenghua Chen, Min Wu

Affiliations: School of Computing and Information Systems, Singapore Management University, Bras Basah, Singapore; Information Systems Technology and Design Pillar, Singapore University of Technology and Design, Tampines, Singapore; Institute for Infocomm Research, A*STAR, Fusionopolis, Singapore

Title: Temporal and Heterogeneous Graph Neural Network for Remaining Useful Life Prediction

Abstract:
Predicting remaining useful life (RUL) plays a crucial role in the prognostics and health management of industrial systems that involve a variety of interrelated sensors. Given a constant stream of time-series sensory data from such systems, deep learning (DL) models have risen to prominence at identifying complex, nonlinear temporal dependencies in these data. In addition to the temporal dependencies of individual sensors, spatial dependencies emerge as important correlations among these sensors, which can be naturally modeled by a temporal graph that describes time-varying spatial relationships. However, the majority of existing studies have relied on capturing discrete snapshots of this temporal graph, a coarse-grained approach that leads to a loss of temporal information. Moreover, given the variety of heterogeneous sensors, it becomes vital that such inherent heterogeneity is leveraged for RUL prediction in temporal sensor graphs. To capture the nuances of the temporal and spatial relationships and heterogeneous characteristics in an interconnected graph of sensors, we introduce a novel model named temporal and heterogeneous graph neural networks (THGNNs). Specifically, THGNN aggregates historical data from neighboring nodes to accurately capture the temporal dynamics and spatial correlations within the stream of sensor data in a fine-grained manner. Moreover, the model leverages feature-wise linear modulation (FiLM) to address the diversity of sensor types, significantly improving the model’s capacity to learn the heterogeneity in the data sources. Finally, we have validated the effectiveness of our approach through comprehensive experiments. Our empirical findings demonstrate significant advancements on the N-CMAPSS dataset, achieving improvements of up to 19.2% and 31.6% in terms of two different evaluation metrics over state-of-the-art methods.

PaperID: 897,

Authors: Sayan Saha, Monidipa Das, Sanghamitra Bandyopadhyay

Affiliations: Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India; Indian Institute of Science Education and Research Kolkata, Mohanpur, India

Title: Gen-GraphEx: Generative In-Distribution Graph Explanations for Time-Efficient Model-Level Interpretability of GNNs

Abstract:
Graph neural networks (GNNs) have become the prevailing methodology for addressing graph data-related tasks, permeating critical domains like recommendation systems and drug development. The necessity for trustworthiness and interpretability of GNNs has risen to the forefront, especially given their direct impact on end users’ lives. To address this need, we present Gen-GraphEx, a model-agnostic, model-level explanation method that prioritizes user centricness by eliminating the need for having access to the hidden layers of the GNN model it seeks to explain. Given a particular class label, Gen-GraphEx learns a graph generative model (GGM) that produces explanation graphs that not only contain discriminative patterns that the GNN has learned for that class but also lie in distribution with real graphs that belong to that class according to the GNN. Unlike existing state-of-the-art models, Gen-GraphEx also has the unique ability to interpolate the GGMs of two target classes to generate instances that lie near the decision boundary of the two classes giving a deeper insight into the model’s decision-making. Its advantages over existing methods in the literature also include nonreliance on another subsequent deep learning module for explanation generation, ability to generate graphs with various node and edge features, and being more computationally efficient. Extensive validation and thorough comparative analysis of the proposed approach is carried out across an array of real and synthetic datasets that consistently demonstrate its exceptional performance and competitiveness ranking alongside state-of-the-art model-level explainers. Our code is available at https://github.com/amisayan/Gen-GraphEx

PaperID: 898,

Authors: Boyu Zhao, Mengmeng Zhang, Wei Li, Yunhao Gao, Junjie Wang

Affiliations: School of Information and Electronics, Beijing Institute of Technology, Beijing, China

Title: Domain Information Mining and State-Guided Adaptation Network for Multispectral Image Segmentation

Abstract:
Segment anything model (SAM), as a prompt-based image segmentation foundation model, demonstrates strong task versatility and domain generalization (DG) capabilities, providing a new direction for solving cross-scene segmentation tasks. However, SAM still has limitations in multispectral cross-domain segmentation tasks, mainly reflected in: 1) insufficient information utilization, which is reflected in the neglect of nonvisible spectral information and the shift information contained in source domain (SD) samples and target domain (TD) samples; and 2) lack of cross-domain strategies, which leads to insufficient cross-domain adaptation (DA) ability in downstream tasks. To address these challenges, we combine the respective advantages of masked autoencoder (MAE) and cross-domain strategies, propose an improved SAM DA network structure called domain information mining and state-guided adaptation network (DSAnet), aiming to enhance SAM’s performance in multispectral cross-domain segmentation tasks from both data and task levels. At the data level, DSAnet incorporates a style masking learning component, which randomly masks image features and replaces them with domain-specific learnable tokens, integrated with the image reconstruction task, to mine the style information and domain invariance of the image itself. At the task level, DSAnet introduces domain state learning and style-guided segmentation: domain state learning, through a state sequence modeling approach, designs specific state representations for SD and TD to capture interdomain differences, thereby reducing task shift. Meanwhile, the learned domain state information can be directly applied to the inference stage. Style prompt segmentation guides the segmentation training process of SD images with TD style prompts, improving SAM’s adaptability in cross-domain multispectral segmentation downstream tasks. Extensive experiments on three multitemporal multispectral image (MSI) datasets demonstrate the superiority of the proposed method compared to state-of-the-art cross-domain strategies and SAM variant methods.

PaperID: 899,

Authors: Daowen Xiong, Liangliang Hu, Jiahao Jin, Yikang Ding, Congming Tan, Jing Zhang, Yin Tian

Affiliations: School of Life Health Information Science and Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China; School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China; School of Life Health Information Science and Engineering, the School of Computer Science and Technology, and the Institute for Advanced Sciences, Chongqing University of Posts and Telecommunications, Chongqing, China

Title: Interpretable Cross-Modal Alignment Network for EEG Visual Decoding With Algorithm Unrolling

Abstract:
Accurate decoding in electroencephalography (EEG) technology, particularly for rapid visual stimuli, remains challenging due to the low signal-to-noise ratio (SNR). Additionally, existing neural networks struggle with issues related to generalization and interpretability. This article proposes a cross-modal aligned network, E2IVAE, which leverages shared information from multiple modalities for self-supervised alignment of EEG to images for extracting visual perceptual information and features a novel EEG encoder, ISTANet, based on algorithm unrolling. This network framework significantly enhances the accuracy and stability of EEG decoding for object recognition in novel classes while reducing the extensive neural data typically required for training neural decoders. The proposed ISTANet employs algorithm unrolling to transform the multilayer sparse coding algorithm into an end-to-end format, extracting features from noisy EEG signals while incorporating the interpretability of traditional machine learning. The experimental results demonstrate that our method achieves SOTA top-1 accuracy of 62.39% and top-5 accuracy of 88.98% on a comprehensive rapid serial visual presentation (RSVP) dataset for public comparison in a 200-class zero-shot neural decoding task. Additionally, ISTANet enables visualization and analysis of multiscale atom features and overall reconstruction features, exploring biological plausibility across temporal, spatial, and spectral dimensions. On another more challenging RSVP large-scale dataset, the proposed framework also achieves significantly above chance-level performance, proving its robustness and generalization. This research provides critical insights into neural decoding and brain–computer interfaces (BCIs) within the fields of cognitive science and artificial intelligence.

PaperID: 900,

Authors: Jiufang Chen, Ye Yuan, Xin Luo, Xinbo Gao

Affiliations: School of Computer Science, Civil Aviation Flight University of China, Deyang, China; College of Computer and Information Science, Southwest University, Chongqing, China; School of Electronic Engineering, Xidian University, Xi’an, China

Title: An Adaptive Neighborhood-Resonated Graph Convolution Network for Undirected Weighted Graph Representation

Abstract:
An undirected weighted graph (UWG) is the fundamental data representation in various real applications. A graph convolution network is frequently utilized for representation learning to a UWG. Nevertheless, existing graph convolutional networks (GCNs) only consider a node’s neighborhood during the embedding propagation, which regrettably decreases its representation learning capability due to the information loss in the modeling phase. Motivated by this discovery, this study proposes an adaptive neighborhood-resonated graph convolution network (ANR-GCN) with the following ideas: 1) establishing the weighted embedding propagation with the consideration of link weights in a UWG, thereby incorporating the interaction strength of each node pair into the ANR-GCN model; 2) building the neighborhood-regularization (NR) to make each node resonate with its neighborhoods, thus reinforcing the informative neighborhood information for improving the ANR-GCN’s representation capability to the complex topology of the target UWG; and 3) diversifying the NR effects following the attention principle for guaranteeing the ANR-GCN’s learning capacity. The proposed ANR-GCN’s representation learning ability to a UWG is theoretically guaranteed from the perspectives of bounded generalization error and uniform stability. Extensive experiments on four UWG datasets illustrate that the proposed ANR-GCN significantly outperforms state-of-the-art GCNs in missing edge detection in a UWG, which evidently demonstrates its superior performance.

PaperID: 901,

Authors: Leqi Shen, Tianxiang Hao, Tao He, Yifeng Zhang, Pengzhang Liu, Sicheng Zhao, Jungong Han, Guiguang Ding

Affiliations: Beijing National Research Center for Information Science and Technology (BNRist) and the School of Software, Tsinghua University, Beijing, China; GRG Banking Equipment Company Ltd., Guangzhou, China; JD.com Inc., Beijing, China; Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing, China

Title: Temporal Modeling With Frozen Vision-Language Foundation Models for Parameter-Efficient Text-Video Retrieval

Abstract:
Temporal modeling plays an important role in the effective adaption of the powerful pretrained text–image foundation model into text–video retrieval. However, existing methods often rely on additional heavy trainable modules, such as transformer or BiLSTM, which are inefficient. In contrast, we avoid introducing such heavy components by leveraging frozen foundation models. To this end, we propose temporal modeling with frozen vision–language foundation models (TFVL) to model the temporal dynamics with fixed encoders. Specifically, text encoder temporal modeling (TextTemp) and image encoder temporal modeling (ImageTemp) apply frozen text and image encoders within the video head and video backbone, respectively. TextTemp uses a frozen text encoder to interpret frame representations as “visual words” within a temporal “sentence,” capturing temporal dependencies. On the other hand, ImageTemp uses a frozen image encoder to treat all frame tokens as a unified visual entity, learning spatiotemporal information. The total trainable parameters of our method, comprising a lightweight projection and several prompt tokens, are significantly fewer than those in other existing methods. We evaluate the effectiveness of our method on MSR-VTT, DiDeMo, ActivityNet, and LSMDC. Compared with full fine-tuning on MSR-VTT, our TFVL achieves an average 3.25% gain in R@1 with merely 0.35% of the parameters. Extensive experiments demonstrate that the proposed TFVL outperforms state-of-the-art methods with significantly fewer parameters.

PaperID: 902,

Authors: Wenbin He, Jianxu Mao, Yaonan Wang, Zhe Li, Hui Zhang

Affiliations: National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University, Changsha, Hunan, China; National Engineering Research Center of Robot Visual Perception and Control Technology and the School of Robotics, Hunan University, Changsha, Hunan, China

Title: Contrastive Learning Framework With Cross-Sensor Adaptive Signal Representation for Fault Diagnosis

Abstract:
Although multisource sensor (MS) signal-based mechanical fault diagnosis (MFD) can significantly improve the diagnostic performance, the existing methods often lack sufficient adaptability and generalization when retraining on single-sensor signals or inferring from partial sensor signals. Thus, a general two-stage signal representation contrastive learning fault diagnosis framework (T-SCF) is proposed to adapt the trained model to varying numbers of sensor signals. This framework enhances model robustness and data fusion by comparing sensor signal views, offering a new approach for information fusion, fault detection, and classification in MFD. In the first stage, an adaptive contrastive algorithm is proposed to generate contrastive samples (C-Ss) and contrastive labels (C-Ls) for MS signals. Then, a supervised contrastive loss (SCL) is designed to minimize the similarity between different fault MS signals while maximizing the similarity between identical ones. By designing a parallel encoder architecture, SCL enables it to merge contrasting the features of different sensor signals during training. This strategy preserves the time-domain dimension properties of different sensors during the training of the second-stage classifier, thereby improving the adaptability of the model to different sensor signals without affecting the global information. The effectiveness of the method was verified from multiple different evaluation dimensions using two public datasets and one self-built dataset.

PaperID: 903,

Authors: Yuhua Wen, Qifei Li, Yingying Zhou, Yingming Gao, Zhengqi Wen, Jianhua Tao, Ya Li

Affiliations: School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China; Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China; Department of Automation, Tsinghua University, Beijing, China

Title: DashFusion: Dual-Stream Alignment With Hierarchical Bottleneck Fusion for Multimodal Sentiment Analysis

Abstract:
Multimodal sentiment analysis (MSA) integrates various modalities, such as text, image, and audio, to provide a more comprehensive understanding of sentiment. However, effective MSA is challenged by alignment and fusion issues. Alignment requires synchronizing both temporal and semantic information across modalities, while fusion involves integrating these aligned features into a unified representation. Existing methods often address alignment or fusion in isolation, leading to limitations in performance and efficiency. To tackle these issues, we propose a novel framework called dual-stream alignment with hierarchical bottleneck fusion (DashFusion). First, the dual-stream alignment module synchronizes multimodal features through temporal and semantic alignment. Temporal alignment employs cross-modal attention (CA) to establish frame-level correspondences among multimodal sequences. Semantic alignment ensures consistency across the feature space through contrastive learning. Second, supervised contrastive learning (SCL) leverages label information to refine the modality features. Finally, hierarchical bottleneck fusion (HBF) progressively integrates multimodal information through compressed bottleneck tokens, which achieves a balance between performance and computational efficiency. We evaluate DashFusion on three datasets: CMU-MOSI, CMU-MOSEI, and CH-SIMS. Experimental results demonstrate that DashFusion achieves state-of-the-art (SOTA) performance across various metrics, and ablation studies confirm the effectiveness of our alignment and fusion techniques. The codes for our experiments are available at https://github.com/ultramarineX/DashFusion

PaperID: 904,

Authors: Licheng Liu, Qibin Zhang, Tingyun Liu, C. L. Philip Chen

Affiliations: College of Electrical and Information Engineering, Hunan University, Changsha, China; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

Title: TC3Net: Transformer and Convolution Coupled Contrastive Network for Single Image Super-Resolution

Abstract:
The convolutional neural network (CNN) and transformer have gained significant attention in the field of single image super-resolution (SISR), owing to their powerful capacity in nonlinear feature extraction. Nonetheless, these two types of approaches hold their own limitations. For instance, the interaction between convolutional kernels and image content is agnostic in CNN, while the computational complexity increases quadratically along with the spatial resolution in the transformer. To address these concerns, in this article, we propose a novel unified framework named transformer and convolution coupled contrastive network (TC3Net) for SISR, which holds a triple-branch structure to integrate the merits of both CNN and transformer. The proposed TC3Net is mainly composed of several stacked CNN feature extraction (CFE) blocks, transformer feature extraction (TFE) blocks, and coupled contrastive blocks (CCBs) for diverse feature extraction. Particularly, the CCB that consists of the coupled attention block (CAB) and the local-global feature extraction (LGFE) block is designed to fuse feature maps and extract coupled information for better image reconstruction. Moreover, a contrastive loss between the transformer and CNN feature maps is further introduced to enhance their discriminative characteristics and complement the fused features. Experimental results demonstrate that TC3Net outperforms several state-of-the-art (SOTA) methods in the aspect of achieving a better balance between model size and performance.

PaperID: 905,

Authors: Jing Li, Yinghua Yao, Yuangang Pan, Xuanqian Wang, Ivor W. Tsang, Xiuju Fu

Affiliations: Technology and Research (A*STAR), Center for Frontier AI Research and the Institute of High-Performance Computing, Agency for Science, Connexis, Singapore; School of Computer Science and Engineering, Beihang University, Beijing, China; Technology and Research, Institute of High-Performance Computing, Agency for Science, Connexis, Singapore

Title: Alpha and Prejudice: Improving α-Sized Worst Case Fairness via Intrinsic Reweighting

Abstract:
Achieving worst case group fairness typically relies on maximizing the utility of the worst-off demographic group. However, in practice, demographic information is often unavailable, making direct max-min formulations infeasible. To address this, recent work introduces a relaxed setting, using a lower bound \alpha on the minimal group size—referred to as “ \alpha -sized worst case fairness” in this article. We first motivate the importance of this setting by highlighting its relevance to data privacy, a critical yet underexplored perspective. Rather than simply retraining on worst-off samples, we propose a reweighting approach that assigns sample weights based on their intrinsic contributions to fairness. To handle the global nature of worst case objectives efficiently, we develop a stochastic learning algorithm that simplifies training without sacrificing performance. We also address the impact of outliers by introducing a robust variant of our method. Through theoretical analysis and extensive experiments on standard fairness benchmarks, we show that our methods not only connect naturally to existing fairness-through-reweighting approaches but also outperform strong baselines.

PaperID: 906,

Authors: Xiangyi Teng, Minghao Zhong, Jing Liu

Affiliations: Guangzhou Institute of Technology, Xidian University, Guangzhou, China

Title: A Self-Supervised Heterogeneous Graph Attention Model Based on Adaptable Step-Size Metapaths

Abstract:
Graphs are widely used to model networks in real-world applications, with heterogeneous graph neural networks gaining increasing attention in recent years. Existing methods generally rely on first-order or high-order neighbors to capture semantic relationships, where metapath-based approaches are the most popular ones. However, existing metapath-based models not only require predefined metapaths based on prior knowledge, but also lack the consideration of metapath sequence modeling. Additionally, labeled data are scarce in massive graph data, and existing self-supervised or semisupervised models heavily rely on data enhancement strategies and complex frameworks. To address these limitations, we propose a self-supervised heterogeneous graph attention model (HGAM) based on adaptable step-size metapaths. Our model requires no prior knowledge to select the type of metapath and can adaptively capture the specific step-size metapath with high importance. The adaptable step-size metapaths module not only considers the attention weight in different step sizes, but also pays attention to the changing trend of attention, which expands the receptive field of the model and integrates global information preferably. To alleviate labeled data scarcity, our model employs a dual contrastive learning strategy. HGAM learns global representations by contrasting a high-order meta-graph against nodes, while preserving local structure through a cross-view comparison of first-order and high-order semantics. Extensive experiments on three different types of tasks, including node classification, clustering, and link prediction, are conducted on real-world datasets. Experimental results demonstrate that HGAM achieves superior performance compared to state-of-the-art methods.

PaperID: 907,

Authors: Dazi Li, Yanyang Bao, Xin Xu

Affiliations: College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China; College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China

Title: Enhancing Graph Reconstruction: Uniting Dual-Level Graph Structure With Graph Reinforcement Learning

Abstract:
A combinatorial optimization problem is typically regarded as a 1-D sorting problem in most existing research. The representation ignores some information about the problem because of dimension compression. When applying reinforcement learning (RL) to this problem, convolutional neural networks (CNNs) used in conventional RL cannot directly extract the connection information between two elements in the feature matrix. A typical class of combinatorial optimization problems, the job shop scheduling problem (JSSP), is used in this article as an example. Considering the limitations in previous research, this article reexamines the task from the perspective of graph reconstruction and proposes a graph RL (GRL) method that combines a double deep Q-network (DDQN) and graph attention network (GAT) to achieve breakthroughs beyond the constraints of CNN performance. Moreover, a dual-level graph representation structure is constructed to comprehensively learn the features of scheduling information and overcome the difficulty of learning dynamic graphs. Experiments show that the quality of the obtained solution and generalization performance are both improved compared with models based on original deep RL (DRL) algorithms.

PaperID: 908,

Authors: Yuming Xiang, Sizhao Li, Rongpeng Li, Zhifeng Zhao, Honggang Zhang

Affiliations: College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China; College of Information Science and Electronic Engineering, Zhejiang University, Zhejiang Lab, Hangzhou, China; School of Computer Science and Engineering, Macau University of Science and Technology, Taipa, Macau, China

Title: Decentralized Consensus Inference-Based Hierarchical Reinforcement Learning for Multiconstrained UAV Pursuit-Evasion Game

Abstract:
Multiple quadrotor uncrewed aerial vehicles (UAVs) systems have garnered widespread research interest and fostered tremendous interesting applications, especially in multiconstrained pursuit-evasion games (MC-PEGs). The cooperative evasion and formation coverage (CEFC) task, where the UAV swarm aims to maximize formation coverage across multiple target zones while collaboratively evading predators, belongs to one of the most challenging issues in MC-PEGs, especially under communication-limited constraints. This multifaceted problem, which intertwines responses to obstacles, adversaries, target zones, and formation dynamics, brings up significant high-dimensional complications in locating a solution. In this article, we propose a novel two-level framework [i.e., consensus inference-based hierarchical reinforcement learning (CI-HRL)], which delegates target localization to a high-level policy, while adopting a low-level policy to manage obstacle avoidance, navigation, and formation. Specifically, in the high-level policy, we develop a novel multiagent reinforcement learning (RL) module, consensus-oriented multiagent communication (ConsMAC), to enable agents to perceive global information and establish consensus from local states by effectively aggregating neighbor messages. Meanwhile, we leverage an alternative training-based MAPPO (AT-M) and policy distillation to accomplish the low-level control. The experimental results, including the high-fidelity software-in-the-loop (SITL) simulations, validate that CI-HRL provides a superior solution with enhanced swarm’s collaborative evasion and task completion capabilities.

PaperID: 909,

Authors: Feng Yu, Zhongrui Rao, Neng Chen, Li Liu, Minghua Jiang

Affiliations: School of Computer Science and Artificial Intelligence and the Engineering Research Center of Hubei Province for Clothing Information, Wuhan Textile University, Wuhan, Hubei, China; School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, Hubei, China

Title: ArmBCIsys: Robot Arm BCI System With Time-Frequency Network for Multiobject Grasping

Abstract:
Brain–computer interface (BCI) offers a direct communication and control channel between the human brain and external devices, presenting new pathways for individuals with physical disabilities to operate robotic arms for complex tasks. However, achieving multiobject grasping tasks under low signal-to-noise ratio (SNR) consumer-grade EEG signals is a significant challenge due to the lack of robust decoding algorithms and precise visual tracking methods. This article proposes, ArmBCIsys, an integrated robotic arm system that combines a novel dual-branch frequency-enhanced network (DBFENet) to robustly decode EEG signals under noisy conditions with the high-precision vision-guided grasping module. The proposed DBFENet designs the scaling temporal convolution block (STCB) to extract multiscale spatiotemporal features from the time domain, while the designed DropScale projected Transformer (DSPT) utilizes discrete cosine transform (DCT) to capture key frequency-domain features, significantly improving decoding robustness. We fine-tune the masked-attention mask Transformer (Mask2Former) model on the Jacquard dataset and incorporate the multiframe centroid-intersection over union (IoU) tracking algorithm to build visual grasp segmenter (VisGraspSeg), enabling reliable segmentation and dynamic tracking for diverse daily objects. Experimental validations on both self-built code-modulated visual evoked potential (c-VEP) dataset (1344 samples) and two public c-VEP datasets demonstrate that DBFENet achieves the state-of-the-art recognition performance, and the system integrates the DBFENet and proposed vision-guided module and ensures stable multiobject selecting and automatic object grasping in dynamic environments, extending promising applications in healthcare robotics, assistive technology, and industrial automation. The self-built dataset has been made publicly accessible at https://github.com/wtu1020/ ArmBCIsys-Self-built-cVEP-Dataset

PaperID: 910,

Authors: Mingjie Wang, Song Yuan, Xian-Feng Han, Zili Yi

Affiliations: Department of Mathematics, School of Science, Zhejiang Sci-Tech University, Hangzhou, China; School of Computer Science and Technology (School of Artificial Intelligence), Zhejiang Sci-Tech University, Hangzhou, China; College of Computer and Information Science, Southwest University, Chongqing, China; School of Intelligence Science and Technology, Nanjing University, Nanjing, China

Title: Draw What You Hear: High-Fidelity Image Generation and Manipulation via SoundAdapter

Abstract:
Currently, the text-to-image (T2I) generation has established itself as a cornerstone within the realm of AI-generated content (AIGC), due its remarkable success to the availability of extensive datasets comprising paired text-vision samples. Nevertheless, the absence of audio-visual pairs hinders the growth of audio-to-image (A2I). Although prior approaches have pioneered the A2I task, the tight entanglement between initial audio and image encoders imposes the challenge of gathering audio-visual samples, resulting in degraded performance and limited sound flexibility. Therefore, this article proposes a novel SoundAdapter to draw what you hear. Specifically, the SoundAdapter’s structure is meticulously designed around transformer blocks, which are critical for capturing overarching patterns and dependencies within the data. In addition, it integrates a sophisticated multigranularity approach coupled with a hybrid supervisory signal, ensuring both fine-grained semantic alignment and seamless optimization across various levels of representation. Extensive tests demonstrate that the SoundAdapter excels in training, setting new benchmarks in zero-shot audio classification, as well as in creating and modifying images across a variety of datasets. The implementation code and several demos supporting this study are openly accessible at https://github.com/CV-MM-Lab/SoundAdapter, facilitating reproducibility and further research.

PaperID: 911,

Authors: Seung-Hyup Na, Seong-Whan Lee

Affiliations: Department of Artificial Intelligence, Korea University, Seoul, South Korea

Title: Counterfactual Explanation Through Latent Adjustment in Disentangled Space of Diffusion Model

Abstract:
With the rise of explainable artificial intelligence (XAI), counterfactual (CF) explanations have gained significant attention. Effective CFs must be valid (classified as the CF class), practical (minimally deviated from the input), and plausible (close to the CF data manifold). However, practicality and plausibility often conflict, making valid CF generation challenging. To address this, we propose a novel framework that generates CFs by adjusting only semantic information in the disentangled latent space of a diffusion model. This shifts the sample closer to the CF manifold and across the decision boundary. In our framework, the latent vector mapping step occasionally produces invalid CFs or CFs insufficiently close to the decision boundary, resulting in dissimilarity to the input. Our method overcomes this with a two-stage latent vector adjustment: 1) linear interpolation and 2) time-step-wise optimization during reverse diffusion within the space accommodating linear changes in class information from the input. Experiments demonstrate that our approach generates more valid, plausible, and practical CFs by effectively leveraging the properties of the disentangled latent space.

PaperID: 912,

Authors: Jinchao Zhu, Yuxuan Wang, Siyuan Pan, Pengfei Wan, Di Zhang, Gao Huang

Affiliations: College of Software, Nankai University, Tianjin, China; Department of Computing, Imperial College London, London, U.K.; Kuaishou Technology, Beijing, China; Department of Automation, BNRist, Tsinghua University, Beijing, China

Title: A-SDM: Accelerating Stable Diffusion Through Model Assembly and Feature Inheritance Strategies

Abstract:
The stable diffusion model (SDM) is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation. Despite various attempts at sampler optimization, model distillation, and network quantification, these approaches typically maintain the original network architecture. The extensive parameter scale and substantial computational demands have limited research into adjusting the model architecture. This study focuses on reducing redundant computation in SDM and optimizes the model through both tuning and tuning-free methods: 1) for the tuning method, we design a model assembly strategy to reconstruct a lightweight model while preserving performance and ensuring semantic stability through distillation and 2) for the tuning-free method, we propose a feature inheritance strategy to accelerate inference by skipping local computations at the block, layer, or unit level within the network structure. We also examine multiple sampling modes for feature inheritance at the time-step level. Experiments demonstrate that both the proposed tuning and the tuning-free methods can improve the speed and performance of the SDM. The lightweight model reconstructed by the model assembly strategy increases generation speed by 22.4%, while the feature inheritance strategy enhances the SDM generation speed by 40.0%.

PaperID: 913,

Authors: Xinghao Wu, Jianwei Niu, Xuefeng Liu, Guogang Zhu, Shaojie Tang, Wanyu Lin, Jiannong Cao

Affiliations: State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China; Department of Management Science and Systems, School of Management, Center for AI Business Innovation, University at Buffalo, Getzville, NY, USA; Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong

Title: The Diversity Bonus: Learning From Dissimilar Clients in Personalized Federated Learning

Abstract:
Personalized federated learning (PFL) allows clients to collaboratively train their personalized models to handle situations where data from different clients are not independent and identically distributed (non-IID). Previous PFL research implicitly assumes that clients benefit most from those with similar data distributions. Correspondingly, methods such as personalized weight aggregation assign higher weights to similar clients during aggregation. We pose a question: can a client benefit from other clients with dissimilar data distributions, and if so, how? This question is particularly relevant in scenarios with a high degree of non-IID, where clients have widely different distributions, and learning from only similar clients will result in a loss of knowledge from many other clients. We note that when dealing with clients with similar distributions, current methods tend to enforce their models to be close in the parameter space. It is reasonable to conjecture that a client can benefit from dissimilar clients if we allow their models to depart from each other. Based on this idea, we propose DiversiFed, which allows each client to learn from clients with diversified distribution. DiversiFed pushes personalized models of clients with dissimilar distributions apart in the parameter space while pulling together those with similar distributions. In addition, to achieve the above effect without using prior knowledge of distribution, we design a loss function that leverages model similarity to determine the degree of attraction and repulsion between any two models. Experiments on benchmark and medical datasets show that DiversiFed can outperform the state-of-the-art (SOTA) methods by up to 3.19%.

PaperID: 914,

Authors: Shuhuang Chen, Shiming Chen, Shuo Ye, Yuetian Wang, Xinge You

Affiliations: School of Electronic Information and Communication, Huazhong University of Science and Technology, Wuhan, China; Department of Computer Vision, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates; Department of Computer Science, Great Bay University, Dongguan, Guangdong, China

Title: Toward Disentangled and Controllable Deep Metric Learning With Human-Like Concept Decomposition

Abstract:
Deep metric learning (DML) has shown significant advancements in learning discriminative embeddings for images, playing a crucial role in various vision tasks. However, existing methods typically rely on deep neural networks to extract holistic embeddings, which are challenging to disentangle and interpret. To address this issue, we take inspiration from human cognition, where objects are decomposed into distinct concepts for better understanding. Specifically, we propose the concept metrics network (CMNs) to achieve disentangled and controllable DML. CMN begins by initializing learnable concept vectors to represent various visual concepts. These vectors are then associated with regional visual features via cross-attention mechanism, ensuring each vector corresponds to specific visual properties. Finally, the concept values, determined by their presence in the image, form the output embedding. Comprehensive experiments demonstrate that CMN effectively disentangles visual concepts, with each embedding dimension corresponding to a specific concept. Our method not only outperforms existing state-of-the-art methods in conventional DML application (i.e., image retrieval), but also enables more flexible and controllable application. The code is available at https://github.com/shchen0001/CMN

PaperID: 915,

Authors: Lingzhi Hu, Ding Wang, Jin Ren, Junfei Qiao

Affiliations: School of Information Science and Technology, Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing Laboratory of Smart Environmental Protection, and Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China

Title: Online Self-Triggered Transmission Control With Critic Learning for Discrete Nonlinear Systems

Abstract:
In this article, a novel online self-triggered transmission control (STTC) framework is constructed based on the critic learning technique, which aims at tackling the optimal regulation issue of discrete-time nonlinear systems. On the premise of ensuring the system stability, a self-sampling function is designed only related to the sampling state, so that the next triggering moment can be determined. This not only effectively reduces the computational burden, but also avoids continuous judgment for the triggering condition similar to traditional event-based methods. Furthermore, the developed control method can be found to possess excellent triggering performance through theoretical analysis. Then, the model, critic, and action networks are established to execute the online critic learning algorithm, which make the control policy is adjusted in real-time to the optimal level. Finally, an experimental plant with nonlinear characteristics is given to illustrate the overall performance of the proposed online STTC method.

PaperID: 916,

Authors: Yaqi Xiao, Haiyin Zhou, Xuanying Zhou, Jiongqi Wang

Affiliations: College of Science, National University of Defense Technology, Changsha, Hunan, China

Title: Multilabel Transfer Learning Method With Dynamic Multimetric for Coupling Fault Diagnosis

Abstract:
In industrial practice, as systems become increasingly complex and integrated, the simultaneous failure of multicomponents, namely, coupling faults, has become more prevalent, which can be viewed as multilabel data. In addition, due to the changing industrial tasks, coupling fault diagnosis problems under varying operating conditions can be treated as cross-domain multilabel learning problems, which can be solved by multilabel transfer learning methods. However, existing multilabel transfer learning methods are all preliminary explorations lacking an in-depth exploring the multilevel similarity and complex features of coupling faults. To address this challenging problem, we propose a novel multilabel transfer learning method for coupling fault diagnosis, which achieves dual domain alignment at two levels. At the global feature level, the hypothesis space is reduced by minimizing the maximum mean discrepancy (MMD) at multistages of the network to align the global distribution. Furthermore, we decomposed the overall similarity into a combination of multiple local similarities and innovatively designed a dynamic multimetric structure to capture the diverse similarity characteristics of the data. By integrating the multimetric structure and dynamic mapping selection technique to form mathematical representations of this diverse similarity, this approach constrains the consistency of the local spatial structure of the two domains to achieve local space structure alignment. This method performs high superiority in multiple transfer tasks on the public and laboratory datasets, strongly demonstrating its effectiveness.

PaperID: 917,

Authors: Zhaolin Yuan, Long Ma, Wenjia Wei, Xia Zhu, Mingjie Sun, Duxin Chen, Xiaojuan Ban

Affiliations: Institute of Artificial Intelligence, Shunde Innovation School, University of Science and Technology Beijing, Beijing, China; Huawei Technologies Company Ltd., Shenzhen, China; School of Computer Science and Technology, Soochow University, Suzhou, China; School of Mathematics, Southeast University, Nanjing, China; Beijing Advanced Innovation Center for Materials Genome Engineering, the School of Intelligence Science and Technology, and the Key Laboratory of Intelligent Bionic Unmanned Systems, Ministry of Education, University of Science and Technology Beijing, Beijing, China

Title: NetEventCause: Event-Driven Root Cause Analysis for Large Network System Without Topology

Abstract:
Root cause analysis (RCA) is a crucial technique in network systems for uncovering the abnormal nodes that lead to the network alarm flood. Within private cloud network systems, the calling chains and topologies among entities, such as hosts, routes, and services, are always incomplete due to nonstandardized management. Existing topology-free RCA techniques, which rely on the casual discovery, are inapplicable when the scale of the network system is extremely large or the number of triggered alarms is sparse. This article proposes NetEventCause (NEC), an event-driven, unsupervised, and nonintrusive RCA algorithm for large network systems, where the network topology is unknown. NEC learns from historical alarm events to model the occurrences of various alarm types using a multivariate neural temporal point process (TPP). Based on the conditional intensity predicted by the learned TPP, NEC can identify the root alarms from a cascade of alarm events and locate the causal alarms of derivative alarms using the attribution method. The experimental section evaluates the NEC using both a synthetic event dataset and a large real-world dataset. The real-world dataset is exported from the Huawei Shennong Intelligent Maintenance and Operation Center (IMOC), a platform deployed at one of China’s largest airports and manages over 200000 entities. Results obtained from the two datasets demonstrate that NEC outperforms most state of the art (SOTA) TPP models in modeling alarm events and surpasses general RCA methods in terms of identifying root alarms and recovering transmission chains of anomalies.

PaperID: 918,

Authors: Jiaqiang Zhang, Xinrui Wang, Songcan Chen

Affiliations: College of Computer Science and Technology and MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Title: You Never Walk Alone: A Generalizable and Nonparametric Structure Learning Framework

Abstract:
The graph-structured learning (GSL) aims to assist graph neural networks (GNNs) to yield effective node embeddings for downstream tasks, especially in scenarios with the absence of structures or the existence of unreliable edges. Most GSL models are built on i.i.d. assumption across training and testing data. However, this assumption can be violated, where testing data contain out-of-distribution (OOD) samples. Consequently, those models are limited in generalization, which will lead to a poor structure. On the other hand, while they have made great progress, additional optimized parameters are required due to their implementation with parametric models. To tackle the above problems, we propose a novel generalizable and nonparametric structure learning framework named GNS, which can be easily and effectively applied to various tasks. GNS neither relies on i.i.d. assumption nor even involves any parameters being optimized, instead to find an appropriate similarity between nodes and an associated threshold to establish desirable structures. Specifically, we first incorporate the candidate neighbor distributions for nodes to refine the similarity. Then, we introduce an adaptive threshold discovery method inspired by Fisher’s criterion to determine final structures. Extensive experiments demonstrate that GNS excels not only in OOD scenarios but also in the general classification and regression prediction tasks.

PaperID: 919,

Authors: Xuan Rao, Bo Zhao, Derong Liu, Cesare Alippi

Affiliations: School of Systems Science, Beijing Normal University, Beijing, China; School of System Design and Intelligent Manufacturing, Southern University of Science and Technology, Shenzhen, China; Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milan, Italy

Title: FX-DARTS: Designing Topology-Unconstrained Architectures With Differentiable Architecture Search and Entropy-BasedSuper-Network Shrinking

Abstract:
Strong priors are imposed on the search space of differentiable architecture search (DARTS), such that cells of the same type share the same topological structure and each intermediate node retains two operators from distinct nodes. While these priors reduce optimization difficulties and improve the applicability of searched architectures, they hinder the subsequent development of automated machine learning (auto-ML) and prevent the optimization algorithm from exploring more powerful neural networks through improved architectural flexibility. This article aims to reduce these prior constraints by eliminating restrictions on cell topology and modifying the discretization mechanism for super-networks. Specifically, the flexible DARTS (FX-DARTS) method, which leverages an entropy-based super-network shrinking (ESS) framework, is presented to address the challenges arising from the elimination of prior constraints. Notably, FX-DARTS enables the derivation of neural architectures without strict prior rules while maintaining the stability in the enlarged search space. Experimental results on image classification benchmarks demonstrate that FX-DARTS is capable of exploring a set of neural architectures with competitive trade-offs between performance and computational complexity within a single search procedure.

PaperID: 920,

Authors: Lei Gao, Zheng Guo, Ling Guan

Affiliations: Department of Electrical, Computer, and Biomedical Engineering, Toronto Metropolitan University, Toronto, ON, Canada

Title: ODMTCNet: An Interpretable Multiview Deep Neural Network Architecture for Feature Representation

Abstract:
Recently, deep cascade architecture-based algorithms have attracted wide attention and have been applied to numerous application domains successfully. Nevertheless, the black-box structure of such algorithms has always been considered the Achilles’ heel by the machine learning community. Moreover, due to its data-driven nature, the deep cascade architecture likely causes over-fitting problems when there is no sufficient data available. In order to solve these pressing issues, this work proposes a novel multiview deep neural network (DNN) model, namely, optimal discriminant multiview tensor convolutional network (ODMTCNet), which integrates statistics-guided optimization (SGO) principles with the DNN architecture. Specifically, a discriminant multiview tensor convolution strategy is proposed and integrated with a deep cascade architecture. Different from the traditional DNN models, the parameters of the convolutional layers in ODMTCNet are determined by solving SGO problems. Based on the SGO principles, the relation between the optimal performance and parameters (e.g., the number of convolutional filters) can be analytically predicted, with each layer generating justified knowledge representations. In addition, information quality (IQ) is adopted to further improve multiview feature representation. Because of its unique design, ODMTCNet is able to handle different types of features (e.g., raw, hand-crafted, prior knowledge-based, and DNN-generated features), forming a general platform for multiview feature representation. To validate the genericness and effectiveness of the ODMTCNet model, we conducted experiments on five datasets of different scales: The Olivetti Research Lab (ORL) database, the Facial Recognition Technology (FERET) database, the ETH-80 database, the Caltech 256 database, and the nanyang technological university (NTU) red green blue-depth (RGB+D) 120 database. Experimental results show the superiority of the presented solution over the state-of-the-art. Implementation codes will be made available in the final version.

PaperID: 921,

Authors: Tao Mao, Junlong Zhu, Mingchuan Zhang, Quanbo Ge, Ruijuan Zheng, Qingtao Wu

Affiliations: School of lnformation Engineering, Henan University of Science and Technology, Luoyang, China; Longmen Laboratory, Luoyang, China; School of Automation, Nanjing University of Information Science and Technology, Nanjing, China

Title: A Decentralized Actor-Critic Algorithm With Entropy Regularization and Its Finite-Time Analysis

Abstract:
Decentralized actor-critic (AC) is one of the most dominant algorithms for dealing with multiagent reinforcement learning (MARL) problems. However, exploration-efficient, sample-efficient, and communication-efficient are difficult to achieve simultaneously by existing decentralized AC methods. For this reason, this article develops a decentralized multiagent AC algorithm by incorporating entropy regularization to improve exploration with theoretical guarantees, referred to as multi-agent AC algorithm with entropy regularization (MACE). Moreover, we rigorously prove that MACE can achieve sample complexity \mathcal O(\epsilon ^-2\ln \epsilon ^-1) and communication complexity of \mathcal O(\epsilon ^-1\ln \epsilon ^-1) , which match the best complexities at present. Finally, the performance of MACE is also evaluated on reinforcement learning (RL) tasks. The experimental results show that the proposed algorithm achieves better exploration efficiency than state-of-the-art decentralized AC-type algorithms.

PaperID: 922,

Authors: S. R. Shreyas

Affiliations: Department of Mathematics, Indian Institute of Technology (IIT) Indore, Indore, Madhya Pradesh, India

Title: Double Successive Over-Relaxation Q-Learning With an Extension to Deep Reinforcement Learning

Abstract:
Q-learning (QL) is a widely used algorithm in reinforcement learning (RL), but its convergence can be slow, especially when the discount factor is close to one. Successive over-relaxation (SOR) QL, which introduces a relaxation factor to speed up convergence, addresses this issue but has two major limitations. In the tabular setting, the relaxation parameter depends on transition probability, making it not entirely model-free, and it suffers from overestimation bias. To overcome these limitations, we propose a sample-based, model-free double SORQL (MF-DSORQL) algorithm. Theoretically and empirically, this algorithm is shown to be less biased than SORQL. Furthermore, in the tabular setting, the convergence analysis under boundedness assumptions on iterates is discussed. The proposed algorithm is extended to large-scale problems using deep RL. Finally, both the tabular version of the proposed algorithm and its deep RL extension are tested on benchmark examples.

PaperID: 923,

Authors: Huanyu Chen, Weisheng Li, Bin Xiao, Xinbo Gao

Affiliations: College of Economics and Management, Chongqing University of Posts and Telecommunications, Chongqing, China; College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China

Title: MDFA: A Quantitative Framework for the Analysis of Multimodal Facial Esthetics

Abstract:
In the era of big data, the problem of facial beauty prediction (FBP) has been addressed using a combination of deep learning and esthetics based on data and models. Most existing methods are based on 2-D unimodal information processing. Owing to the high cost of 3-D data acquisition equipment, studies on the use of multimodal features of 2-D and 3-D for esthetic evaluation are scarce. Moreover, most existing methods are based on self-built 3-D datasets, which are limited to practical application scenarios of 2-D facial images. This study proposed a label distribution-based multimodal facial esthetic analysis framework (LDMFE). The LDMFE performed facial esthetic evaluation by combining 2-D and 3-D information following the process used by the human brain to conduct the 3-D esthetic evaluation. FBP was performed by extracting facial depth structure information using a depth information extraction network, DIENet, which comprises a facial structure perception layer (FSP-Layer) and an attention decision block (AD-Block). Furthermore, to ensure a high degree of agreement between the predicted label distribution of the network and the true distribution, a simple and efficient distribution measurement loss function called \mathcal L_\text WD was proposed. Compared with the label distribution-based FBP loss and the latest FBP loss, \mathcal L_\text WD was more stable and effective. The performance of LDMFE was evaluated using three datasets. The experimental results demonstrate that the LDMFE exhibits state-of-the-art performance.

PaperID: 924,

Authors: Nick-Marios T. Kokolakis, Zhen Zhang, Shanqing Liu, Kyriakos G. Vamvoudakis, Jérôme Darbon, George Em Karniadakis

Affiliations: Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, USA; Division of Applied Mathematics, Brown University, Providence, RI, USA; Daniel Guggenheim School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA, USA

Title: Safe Physics-Informed Machine Learning for Optimal Predefined-Time Stabilization: A Lyapunov-Based Approach

Abstract:
In this article, we introduce the notion of safe predefined-time stability and address an optimal safe predefined-time stabilization problem. In particular, safe predefined-time stability characterizes parameter-dependent nonlinear dynamical systems whose trajectories starting in a given set of admissible states remain in the set of admissible states for all time and converge to an equilibrium point in a predefined time. Furthermore, we provide a Lyapunov theorem establishing sufficient conditions for safe predefined-time stability. We address the optimal safe predefined-time stabilization problem by synthesizing feedback controllers that guarantee closed-loop system safe predefined-time stability while optimizing a given performance measure. Specifically, safe predefined-time stability of the closed-loop system is guaranteed via a Lyapunov function satisfying a differential inequality while simultaneously serving as a solution to the steady-state Hamilton-Jacobi–Bellman (HJB) equation ensuring optimality. Given that the HJB equation is generally difficult to solve, we develop a physics-informed machine learning-based algorithm for learning the safely predefined-time stabilizing solution to the steady-state HJB equation. Several simulation results are provided to demonstrate the efficacy of the proposed approach.

PaperID: 925,

Authors: Yamei Luo, Qingyi Ren, Zihao Zheng, Siyuan Chen, Xin Ma, Yu Liu, Xiaoli Li, Junzhi Yu, Zhijun Zhang

Affiliations: Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong; School of Automation Science and Engineering, South China University of Technology, Guangzhou, China; College of Engineering, Peking University, Beijing, China

Title: Binary Channel Fuzzy Self-Adjusted Neural Network for Solving Time-Changing QP Problems

Abstract:
A novel binary channel fuzzy self-adjusted neural network (BCF-SANN) is proposed and researched for solving time-changing quadratic programming (QP) problems in this article. Unlike the fixed parameters of the typical zeroing neural network, the main parameters of the proposed BCF-SANN are time-changing, and its errors are adaptively quickly convergent. The biggest advantage of the novel neural network is that it combines a fuzzy self-adjusted controller, which takes the errors and derivatives of errors as fuzzy inputs and neural networks, further improving the convergence and robustness of the neural networks. To design the novel neural network, a time-changing QP problem is first established; then, using Lagrange’s law, the time-changing QP problem is transformed into a time-changing matrix equation; and finally, based on the time-changing parameter neural dynamics method, a novel BCF-SANN is proposed. The detailed design process is given in this article, and the convergence and robustness of the proposed BCF-SANN are proved by theoretical analysis. Through comparative experiments, it is demonstrated that the proposed BCF-SANN has a faster convergence rate and stronger robustness than the traditional zeroing neural network and 1-D fuzzy recurrent neural network (RNN).

PaperID: 926,

Authors: Ioannis Kordonis, Petros Maragos

Affiliations: School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece

Title: Revisiting Tropical Polynomial Division: Theory, Algorithms, and Application to Neural Networks

Abstract:
Tropical geometry has recently found several applications in the analysis of neural networks with piecewise linear activation functions. This article presents a new look at the problem of tropical polynomial division and its application to the simplification of neural networks. We analyze tropical polynomials with real coefficients, extending earlier ideas and methods developed for polynomials with integer coefficients. We first prove the existence of a unique quotient-remainder pair and characterize the quotient in terms of the convex bi-conjugate of a related function. Interestingly, the quotient of tropical polynomials with integer coefficients does not necessarily have integer coefficients. Furthermore, we develop a relationship of tropical polynomial division with the computation of the convex hull of unions of convex polyhedra and use it to derive an exact algorithm for tropical polynomial division. An approximate algorithm is also presented, based on an alternation between data partition and linear programming. We also develop special techniques to divide composite polynomials, described as sums or maxima of simpler ones. Finally, we provide numerical results to demonstrate the efficiency of the proposed algorithms, using the MNIST handwritten digits, SVHN, CIFAR-10, and CIFAR-100 datasets, along with an application example in learning model predictive control (LMPC).

PaperID: 927,

Authors: Bowen Xing, Ivor W. Tsang

Affiliations: Beijing Key Laboratory of Knowledge Engineering for Materials Science, School of Computer and Communication Engineering, Beijing Key Laboratory of SMART Traditional Chinese Medicine for Chronic Disease Prevention and Treatment, University of Science and Technology Beijing, Beijing, China; CFAR, Agency for Science, Technology and Research, IHPC, Agency for Science, Technology and Research, College of Computing and Data Science, Nanyang Technological University, Jurong West, Singapore

Title: DigNet: Digging Clues From Local-Global Interactive Graph for Aspect-Level Sentiment Classification

Abstract:
In aspect-level sentiment classification (ASC), state-of-the-art models encode either syntax graphs or relation graphs to capture the local syntactic information or global relational information. Despite the advantages of syntax and relation graphs, they have respective shortages which are neglected, limiting the representation power in the graph modeling process. To resolve their limitations, we design a novel local–global interactive graph (LGIG), which marries their advantages by stitching the two graphs via interactive edges. To model this LGI graph, we propose a novel neural network termed DigNet, whose core module is the stacked local–global interactive (LGI) layers performing two processes: intragraph message passing (IGMP) and cross-graph message passing (CGMP). In this way, the local syntactic and global relational information can be reconciled as a whole in understanding the aspect-level sentiment. Concretely, we design two variants of LGIGs with different kinds of interactive edges and three variants of LGI layers. We conduct experiments on several public benchmark datasets and the results show that we outperform previous best scores by 3%, 2.32%, and 6.33% in terms of Macro- F1 on Lap14, Res14, and Res15 datasets, respectively, confirming the effectiveness and superiority of the proposed LGIG and DigNet.

PaperID: 928,

Authors: Yiqun Zhang, Sen Feng, Pengkai Wang, Zexi Tan, Xiaopeng Luo, Yuzhu Ji, Rong Zou, Yiu-Ming Cheung

Affiliations: School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China; Department of Computer Science, Hong Kong Baptist University, Hong Kong, SAR, China

Title: Learning Self-Growth Maps for Fast and Accurate Imbalanced Streaming Data Clustering

Abstract:
Streaming data clustering is a popular research topic in data mining and machine learning. Since streaming data is usually analyzed in data chunks, it is more susceptible to encountering the dynamic cluster imbalance issue. That is, the imbalance ratio (IR) of clusters changes over time, which can easily lead to fluctuations in either the accuracy or the efficiency of streaming data clustering. Therefore, an accurate and efficient streaming data clustering approach is proposed to adapt to the drifting and imbalanced cluster distributions. We first design a self-growth map (SGM) that can automatically arrange neurons on demand according to local distribution, and thus achieve fast and incremental adaptation to the streaming distributions. Since SGM allocates an excess number of density-sensitive neurons to describe the global distribution, it can avoid missing small clusters among imbalanced distributions. We also propose a fast hierarchical merging (HM) strategy to combine the neurons that break up the relatively large clusters. It exploits the maintained SGM to quickly retrieve the intracluster distribution pairs for merging, which circumvents the most laborious global searching. It turns out that the proposed SGM can incrementally adapt to the distributions of new chunks, and the self-growth map-guided hierarchical merging for the imbalanced data clustering (SOHI) approach can quickly explore a true number of imbalanced clusters. Extensive experiments demonstrate that SOHI can efficiently and accurately explore cluster distributions for streaming data.

PaperID: 929,

Authors: Chun Zhou, Hua Meng, Ming Li, Zhengchun Zhou

Affiliations: School of Mathematics, Southwest Jiaotong University, Chengdu, Sichuan, China; Zhejiang Institute of Optoelectronics, Jinhua, China; School of Information Science and Technology, Southwest Jiaotong University, Chengdu, Sichuan, China

Title: On Learning Label Noise Robust Networks via Regularization: A Topological View

Abstract:
Neural networks, especially those update parameters by optimizing the difference between fit values and actual labels, often encounter challenges with real-world data containing mislabeled samples (called label noise). This label noise adversely affects the generalization performance of the network by disturbing local fit values. While existing network regularization methods such as data augmentation and label smoothing (LS) have shown usefulness in mitigating the devastation caused by label noise, they primarily focus on global constraints and overlook the local impacts of label noise. Furthermore, the detailed influence of label noise on network function remains underexplored. To fill this gap, our article presents an in-depth analysis of the local effects of label noise on neural networks from a topological perspective. A novel regularization method, network boundary topology regularization (NBTR), based on persistent homology, is introduced. This method is specifically designed for local fit values of the network, with the aim of simplifying the topology of each class boundary. By doing so, it effectively reduces the tendency of a network to memorize label noise. Extensive experiments have been conducted across a range of datasets, network structures, and noise types to validate the effectiveness of this method. Our findings demonstrate that this method not only surpasses strong baseline methods in network generalization accuracy, especially in asymmetric noise conditions (improving average generalization accuracy by 7.72%), but also enhances the anti-noise capabilities of traditional methods when integrated as a complementary method.

PaperID: 930,

Authors: Kunchi Li, Hongyang Chen, Jun Wan, Shan Yu

Affiliations: School of Computer and Information Engineering and Fujian Key Laboratory of Pattern Recognition and Image Understanding, Xiamen University of Technology (XMUT), Xiamen, China; Research Center for Graph Computing, Zhejiang Laboratory, Hangzhou, China; Institute of Automation, Chinese Academy of Sciences, Beijing, China

Title: CKDF-V2: Effectively Alleviating Representation Shift for Continual Learning With Small Memory

Abstract:
In continual learning (CL), the newly arrived data are often out-of-distribution from the previous ones, causing drastic representation shift (RS) when updating the old model on the new data, leading to catastrophic forgetting. In this work, we propose feature boosting calibration (FBC) to tackle this problem. Specifically, an expanded module is trained to learn all the classes, including the old and new classes, discovering critical features missed by the original/old model. Then, an FBC network (FBCN) is trained to exploit these missed features to calibrate the old representations. As the missed features increase the information needed for distinguishing between the old and new classes, FBCN generates the calibrated ones with more transferable features, thus alleviating the RS. Next, given the limited memory to store samples of the old/learned classes, the data are severely imbalanced between the old and new classes. To cope with this problem, we propose blockwise knowledge distillation (BWKD), which splits the softmax layer into blocks according to class frequency and then distills each block separately, resolving data imbalance effectively. Building upon the two improvements, we propose a two-stage training framework for CL, named CKDF-V2, providing an enhanced version of the cascaded knowledge distillation framework (CKDF). Furthermore, we integrate it with a task-token expansion method to develop a novel approach for CL based on the vision transformer (ViT). Extensive experiments show that both a convolutional neural network (CNN) and ViT-based CKDF-V2 obtain favorable results across multiple CL benchmarks.

PaperID: 931,

Authors: Hanzheng Wang, Wei Li, Xiang-Gen Xia, Qian Du

Affiliations: School of Information and Electronics, Beijing Key Laboratory of Fractional Signals and Systems, and the National Key Laboratory of Science and Technology on Space-Born Intelligent Information Processing, Beijing Institute of Technology, Beijing, China; Department of Electrical and Computer Engineering, University of Delaware, Newark, DE, USA; Department of Electrical and Computer Engineering, Mississippi State University, Starkville, MS, USA

Title: BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking

Abstract:
Hyperspectral object tracking (HOT) has many important applications, particularly in scenes where objects are camouflaged. The existing trackers can effectively retrieve objects via band regrouping because of the bias in the existing HOT datasets, where most objects tend to have distinguishing visual appearances rather than spectral characteristics. This bias allows a tracker to directly use the visual features obtained from the false-color images generated by hyperspectral images (HSIs) without extracting spectral features. To tackle this bias, the tracker should focus on the spectral information when object appearance is unreliable. Thus, we provide a new task called hyperspectral camouflaged object tracking (HCOT) and meticulously construct a large-scale HCOT dataset, BihoT, consisting of 41912 HSIs covering 49 video sequences. The dataset covers various artificial camouflage scenes, where objects have similar appearances, diverse spectrums, and frequent occlusion (OCC), making it a challenging dataset for HCOT. Besides, a simple but effective baseline model, named spectral prompt-based distractor-aware network (SPDAN), is proposed, comprising a spectral embedding network (SEN), a spectral prompt-based backbone network (SPBN), and a distractor-aware module (DAM). Specifically, the SEN extracts spectral-spatial features via 3-D and 2-D convolutions to form a refined prompt representation. Then, the SPBN fine-tunes powerful RGB trackers with spectral prompts and alleviates the insufficiency of training samples. Moreover, the DAM utilizes a novel statistic to capture the distractor caused by occlusion from objects and background and corrects the deterioration of the tracking performance via a novel motion predictor. Extensive experiments demonstrate that our proposed SPDAN achieves the state-of-the-art performance on the proposed BihoT and other HOT datasets.

PaperID: 932,

Authors: Haochen Han, Minnan Luo, Huan Liu, Fang Nan, Jun Liu

Affiliations: Pengcheng Laboratory, Shenzhen, China; School of Computer Science and Technology and the Ministry of Education Key Laboratory of Intelligent Networks and Network Security, Xi’an Jiaotong University, Xi’an, China; Faculty of Electronic and Information Engineering and Shaanxi Province Key Laboratory of Big Data Knowledge Engineering, Xi’an Jiaotong University, Xi’an, China

Title: A Unified Optimal Transport Framework for Cross-Modal Retrieval With Noisy Labels

Abstract:
Cross-modal retrieval (CMR) aims to establish interaction between different modalities, among which supervised CMR is emerging due to its flexibility in learning semantic category discrimination. Despite the remarkable performance of previous supervised CMR methods, much of their success can be attributed to the well-annotated data. However, even for unimodal data, precise annotation is expensive and time-consuming, and it becomes more challenging with the multimodal scenario. In practice, massive multimodal data are collected from the Internet with coarse annotation, which inevitably introduces noisy labels. Training with such misleading labels would bring two key challenges—enforcing the multimodal samples to align incorrect semantics and widen the heterogeneous gap, resulting in poor retrieval performance. To tackle these challenges, this work proposes UOT-RCL, a unified framework based on optimal transport (OT) for robust CMR. First, we propose a semantic alignment based on partial OT to progressively correct the noisy labels, where a novel cross-modal consistent cost function is designed to blend different modalities and provide precise transport cost. Second, to narrow the discrepancy in multimodal data, an OT-based relation alignment is proposed to infer the semantic-level cross-modal matching. Both of these components leverage the inherent correlation among multimodal data to facilitate effective cost function. The experiments on three widely used CMR datasets demonstrate that our UOT-RCL surpasses the state-of-the-art approaches and significantly improves the robustness against noisy labels.

PaperID: 933,

Authors: Zekang Li, Ruonan Liu, Dongyue Chen, Qinghua Hu

Affiliations: College of Intelligence and Computing and Tianjin Key Laboratory of Machine Learning, Tianjin University, Tianjin, China; Department of Automation, Shanghai Jiao Tong University, Shanghai, China

Title: OR-Gate Mixup Multiscale Spectral Graph Neural Network for Node Anomaly Detection

Abstract:
Graph node anomaly detection has important applications in practical scenarios. Although many graph neural networks (GNNs) have been proposed, how to design tailored spectral filters for node anomaly detection to fully mine high-frequency signals in the graph is still a challenge. Most GNNs are equivalent to low-pass filters and mine multiorder signals through a series structure. The computational cost increases as the number of layers increases and further leads to an over-smoothing problem. They mainly focus on low-frequency signals and suppress high-frequency signals, thus smoothing the differences between abnormal and normal nodes, making them indistinguishable. Due to the difficulty in mining high-frequency signals, the poorly distinguishable feature representations learned by low-pass GNNs can even harm the performance of data augmentation. To solve the above challenges, in this article, we propose a or-gate mixup multiscale spectral GNN (MMGNN) from the spectral domain. Specifically, we design multiorder multiscale bandpass filters through the superposition of polynomial spectral filters and then decompose them into preprocessing parts and training parts to form a double-parallel structure, which can effectively mine high-frequency signals in the graph and reduce computational cost. Finally, we propose or-gate mixup to perform data augmentation in the spectral space to improve model generalization. Experimental results on four real-world datasets demonstrate the effectiveness of the proposed MMGNN against the state-of-the-art methods.

PaperID: 934,

Authors: Jie Feng, Tianshu Zhang, Junpeng Zhang, Ronghua Shang, Weisheng Dong, Guangming Shi, Licheng Jiao

Affiliations: Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education of China, Xidian University, Xi’an, China

Title: S4DL: Shift-Sensitive Spatial-Spectral Disentangling Learning for Hyperspectral Image Unsupervised Domain Adaptation

Abstract:
Unsupervised domain adaptation (UDA) techniques, extensively studied in hyperspectral image (HSI) classification, aim to use labeled source domain data and unlabeled target domain data to learn domain invariant features for cross-scene classification. Compared to natural images, numerous spectral bands of HSIs provide abundant semantic information, but they also increase the domain shift significantly. In most existing methods, both explicit alignment and implicit alignment simply align feature distribution, ignoring domain information in the spectrum. We noted that when the spectral channel between source and target domains is distinguished obviously, the transfer performance of these methods tends to deteriorate. Additionally, their performance fluctuates greatly owing to the varying domain shifts across various datasets. To address these problems, a novel shift-sensitive spatial-spectral disentangling learning (S4DL) approach is proposed. In S4DL, gradient-guided spatial-spectral decomposition (GSSD) is designed to separate domain-specific and domain-invariant representations by generating tailored masks under the guidance of the gradient from domain classification. A shift-sensitive adaptive monitor is defined to adjust the intensity of disentangling according to the magnitude of domain shift. Furthermore, a reversible neural network is constructed to retain domain information that lies not only in semantic but also the shallow-level detailed information. Extensive experimental results on several cross-scene HSI datasets consistently verified that S4DL is better than the state-of-the-art UDA methods. Our source code will be available at https://github.com/xdu-jjgs/IEEE_TNNLS_S4DL.

PaperID: 935,

Authors: Jianxun Lou, Huasheng Wang, Xinbo Wu, John Cho Hui Ng, Richard D. White, Kaveri A. Thakoor, Padraig Corcoran, Ying Chen, Hantao Liu

Affiliations: School of Computer Science, Northeast Electric Power University, Jilin, China; School of Computer Science and Informatics, Cardiff University, Cardiff, U.K.; Department of Radiology, University Hospital of Wales, Cardiff, U.K.; Department of Ophthalmology, Columbia University Irving Medical Center, New York, NY, USA; Alibaba Group, Hangzhou, China

Title: Chest X-Ray Visual Saliency Modeling: Eye-Tracking Dataset and Saliency Prediction Model

Abstract:
Radiologists’ eye movements during medical image interpretation reflect their perceptual-cognitive processes of diagnostic decisions. The eye movement data can be modeled to represent clinically relevant regions in a medical image and potentially integrated into an artificial intelligence (AI) system for automatic diagnosis in medical imaging. In this article, we first conduct a large-scale eye-tracking study involving 13 radiologists interpreting 191 chest X-ray (CXR) images, establishing a best-of-its-kind CXR visual saliency benchmark. We then perform analysis to quantify the reliability and clinical relevance of saliency maps (SMs) generated for CXR images. We develop CXR image saliency prediction method (CXRSalNet), a novel saliency prediction model that leverages radiologists’ gaze information to optimize the use of unlabeled CXR images, enhancing training and mitigating data scarcity. We also demonstrate the application of our CXR saliency model in enhancing the performance of AI-powered diagnostic imaging systems.

PaperID: 936,

Authors: Xinran Qin, Yuhui Quan, Zhuojie Chen, Hui Ji

Affiliations: School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China; Department of Mathematics, National University of Singapore, Queenstown, Singapore

Title: Robust Unsupervised Deep Learning for Nonblind Image Deconvolution With Inaccurate Kernels

Abstract:
Nonblind image deconvolution/deblurring aims at restoring sharp images from their noisy blurred versions using an associated blur kernel with potential inaccuracy. Current deep learning (DL) models of nonblind image deconvolution (NBID) predominantly reply on ground truth (GT) images for supervision, which restricts their applicability to certain real-world scenarios such as scientific imaging. This article proposes a fully unsupervised DL approach for NBID, utilizing a GT-free end-to-end training process that adeptly handles both measurement noise and kernel error. Specifically, in the absence of GT images, a self-reconstruction loss is proposed to handle measurement noise, by effectively emulating its supervised counterpart. Recognizing the likely occurrence of kernel error during both training and testing data, we introduce a self-ensemble loss function and an ensemble inference scheme, anchored by a phase-keeping kernel perturbation strategy. Furthermore, a shifting mechanism is integrated so as to the loss functions to resolve the shift ambiguity caused by kernel error. Extensive experiments show the superiority of our proposed approach over existing unsupervised NBID methods, as well as its competitive performance against some of the recent supervised methods.

PaperID: 937,

Authors: Fandi Gou, Haikuo Du, Chenyu Zhao, Yunze Cai

Affiliations: Department of Automation, Shanghai Jiao Tong University, Shanghai, China; Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China

Title: A Policy-Guided Reinforcement Learning Method for Encirclement Control in Multiobstacle Environment

Abstract:
The problem of multiagent encirclement with multiobstacle collision avoidance (EMOCA) has been challenging since it is difficult to balance the tradeoff between surrounding a mobile target and avoiding obstacles simultaneously. To address the EMOCA problem, we proposed a novel policy-guided reinforcement learning (RL) method, namely, multiregulator-assisted RL for encirclement control (MRA-RLEC) which leverages the jump-start learning and curriculum learning (CL) mechanism to enhance training efficiency. MRA-RLEC divides the complex encirclement task into a sequence of subtasks, progressively increasing in difficulty. In this process, multiple regulators are utilized to adjust various training aspects, including encirclement condition, obstacle avoidance, and the transition from guide to learned policy execution. Besides, a global encirclement reward decomposition (GERD) method is presented to alleviate reward sparsity, and we design a bidirectional communication protocol to reduce communication. Extensive experiments are carried out to showcase the robustness and superiority of our method, and the practical applicability of MRA-RLEC is demonstrated through experiments conducted in the robot operating system 2 (ROS2)-based simulation platform, Gazebo, using a self-designed omnidirectional vehicle model.

PaperID: 938,

Authors: Xinhui Wang, Zunxian Li

Affiliations: Department of Mathematics, Tianjin University of Technology, Tianjin, China

Title: Turing Instability and Hopf Bifurcation in 2-D Coupled Cellular Neural Networks

Abstract:
The dynamics of 2-D two-grid coupled cellular neural networks (CNNs) are considered. Assuming the boundary conditions of zero-flux type, the linearized model is analyzed by using the decoupling method, which is described as matrix operations, such as the Kronecker product and Kronecker sum. Then, the local stability of the zero equilibrium related to system parameters is studied. Based on the results, the sufficient conditions that induce Turing instability are derived. Furthermore, as a special kind of Turing patterns, the occurrence conditions for Hopf bifurcations are considered. In addition, the global stability of the zero equilibrium is analyzed by constructing proper Lyapunov functions. Finally, numerical simulations are given to illustrate the theoretical results.

PaperID: 939,

Authors: Longyu Niu, Baihui Li, Xingjian Fan, Hao Fang, Jun Li, Junliang Xing, Jun Wan, Zhen Lei

Affiliations: State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, China; School of Computer Science and Technology, Nanjing University of Posts and Telecommunications (NJUPT), Nanjing, Jiangsu, China; Department of Computer Science and Technology, Tsinghua University, Beijing, China

Title: UBG: An Unreal BattleGround Benchmark With Object-Aware Hierarchical Proximal Policy Optimization

Abstract:
The deep reinforcement learning (DRL) has made significant progress in various simulation environments. However, applying DRL methods to real-world scenarios poses certain challenges due to limitations in visual fidelity, scene complexity, and task diversity within existing environments. To address limitations and explore the potential ability of DRL, we developed a 3-D open-world first-person shooter (FPS) game called Unreal BattleGround (UBG) using the unreal engine (UE). UBG provides a realistic 3-D environment with variable complexity, random scenes, diverse tasks, and multiple scene interaction methods. This benchmark involves far more complex state-action spaces than classic pseudo-3-D FPS games (e.g., ViZDoom), making it challenging for DRL to learn human-level decision sequences. Then, we propose the object-aware hierarchically proximal policy optimization (OaH-PPO) method in the UBG. It involves a two-level hierarchy, where the high-level controller is tasked with learning option control, and the low-level workers focus on mastering subtasks. To boost the learning of subtasks, we propose three modules: an object-aware module for extracting depth detection information from the environment, potential-based intrinsic reward shaping for efficient exploration, and annealing imitation learning (IL) to guide the initialization. Experimental results have demonstrated the broad applicability of the UBG and the effectiveness of the OaH-PPO. We will release the code of the UBG and OaH-PPO after publication.

PaperID: 940,

Authors: Yijin Huang, Pujin Cheng, Roger C. Tam, Xiaoying Tang

Affiliations: Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, China; School of Biomedical Engineering, The University of British Columbia, Vancouver, BC, Canada

Title: Boosting Memory Efficiency in Transfer Learning for High-Resolution Medical Image Classification

Abstract:
The success of large-scale pretrained models has established fine-tuning as a standard method for achieving significant improvements in downstream tasks. However, fine-tuning the entire parameter set of a pretrained model is costly. Parameter-efficient transfer learning (PETL) has recently emerged as a cost-effective alternative for adapting pretrained models to downstream tasks. Despite its advantages, the increasing model size and input resolution present challenges for PETL, as the training memory consumption is not reduced as effectively as the parameter usage. In this article, we introduce fine-grained prompt tuning plus (FPT+), a PETL method designed for high-resolution medical image classification, which significantly reduces the training memory consumption compared to other PETL methods. FPT+ performs transfer learning by training a lightweight side network and accessing pretrained knowledge from a large pretrained model (LPM) through fine-grained prompts and fusion modules. Specifically, we freeze the LPM of interest and construct a learnable lightweight side network. The frozen LPM processes high-resolution images to extract fine-grained features, while the side network employs corresponding downsampled low-resolution images to minimize memory usage. To enable the side network to leverage pretrained knowledge, we propose fine-grained prompts and fusion modules, which collaborate to summarize information through the LPM’s intermediate activations. We evaluate FPT+ on eight medical image datasets of varying sizes, modalities, and complexities. Experimental results demonstrate that FPT+ outperforms other PETL methods, using only 1.03% of the learnable parameters and 3.18% of the memory required for fine-tuning an entire ViT-B model. Our code is available https://github.com/YijinHuang/FPT.

PaperID: 941,

Authors: Shruti Shukla, Dimitris A. Pados, George Sklivanitis, Elizabeth Serena Bentley, Michael J. Medley

Affiliations: Department of Electrical Engineering and Computer Science, Center for Connected Autonomy and AI, Florida Atlantic University, Boca Raton, FL, USA; Air Force Research Laboratory, AFRL/RI, Rome, NY, USA; SUNY Polytechnic Institute, Utica, NY, USA

Title: Training Dataset Curation by L1-Norm Principal-Component Analysis for Support Vector Machines

Abstract:
Support vector machines (SVMs) have been the learning model of choice in numerous classification applications. While SVMs are widely successful in real-world deployments, they remain susceptible to mislabeled examples in training datasets where the presence of few faults can severely affect decision boundaries, thereby affecting the model’s performance on unseen data. In this brief, we develop and describe in implementation detail a novel method based on L_1 -norm principal-component data analysis and geometry that aims to filter out atypical data instances on a class-by-class basis before the training phase of SVMs and thus provide the classifier with robust support-vector candidates for making classification boundaries. The proposed dataset curation method is entirely data-driven (touch-free), unsupervised, and computationally efficient. Extensive experimental studies on real datasets included in this brief illustrate the L_1 -norm curation method and demonstrate its efficacy in protecting SVM models from data faults during learning.

PaperID: 942,

Authors: Yixia Li, Rong Xiang, Yanlin Song, Jing Li

Affiliations: Department of Computing, The Hong Kong Polytechnic University (PolyU), Hong Kong, SAR, China; Department of Computing, PolyU, Hong Kong, SAR, China; Department of Computing and the Research Centre on Data Science and Artificial Intelligence (RC-DSAI), PolyU, Hong Kong, SAR, China

Title: UniPoll: A Unified Social Media Poll Generation Framework via Multiobjective Optimization

Abstract:
Social media platforms are vital for expressing opinions and understanding public sentiment, yet many analytical tools overlook passive users who mainly consume content without engaging actively. To address this, we introduce UniPoll, an advanced framework designed to automatically generate polls from social media posts using sophisticated natural language generation (NLG) techniques. Unlike traditional methods that struggle with social media’s informal and context-sensitive nature, UniPoll leverages enriched contexts from user comments and employs multiobjective optimization to enhance poll relevance and engagement. To tackle the inherently noisy nature of social media data, UniPoll incorporates retrieval-augmented generation (RAG) and synthetic data generation, ensuring robust performance across real-world scenarios. The framework surpasses existing models, including T5, ChatGLM3, and GPT-3.5, in generating coherent and contextually appropriate question–answer pairs. Evaluated on the Chinese WeiboPolls dataset and the newly introduced English RedditPolls dataset, UniPoll demonstrates superior cross-lingual and cross-platform capabilities, making it a potent tool to boost user engagement and create a more inclusive environment for interaction.

PaperID: 943,

Authors: Julius Martinetz, Thomas Martinetz

Affiliations: Machine Learning Group, Technische Universität Berlin, Berlin, Germany; Institute for Neuro- and Bioinformatics, University of Lübeck, Lübeck, Germany

Title: Do Highly Over-Parameterized Neural Networks Generalize Since Bad Solutions are Rare?

Abstract:
We study over-parameterized classifiers where empirical risk minimization (ERM) for learning leads to zero training error. In these over-parameterized settings, there are many global minima with zero training error, some of which generalize better than others. We show that under certain conditions, the fraction of “bad” global minima with a true error larger than \varepsilon decays to zero exponentially fast with the number of training data n. The bound depends on the distribution of the true error over the set of classifier functions used for the given classification problem, and does not necessarily depend on the size or complexity (e.g., the number of parameters) of the classifier function set. This insight provides an alternative perspective on the unexpectedly good generalization even of highly over-parameterized neural networks. We substantiate our theoretical findings through experiments on synthetic data and a subset of MNIST. Additionally, we assess our hypothesis using VGG19 and ResNet18 on a subset of Caltech101.

PaperID: 944,

Authors: Runyu Lu, Yuanheng Zhu, Dongbin Zhao, Yu Liu, You He

Affiliations: School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China; Department of Electronic Engineering, Tsinghua University, Beijing, China

Title: Last-Iterate Convergence to Approximate Nash Equilibria in Multiplayer Imperfect Information Games

Abstract:
Imperfect information and multiple players are the two common features of real-world games. However, few of the existing game-theoretic methods are applicable to multiplayer imperfect information games (IIGs) when it comes to finding Nash equilibria. Moreover, the commonly used methods that rely on average-iterate convergence are not conducive to deep reinforcement learning (DRL), which is widely applied to large-scale problems, as it is costly to preserve average policies under function approximation. To deal with these problems, we construct a continuous-time dynamic named imperfect-information exponential-decay score-based learning (IESL) by considering the concept of Nash distribution [a type of quantal response equilibrium (QRE)] in IIGs. Theoretically, we prove the last-iterate convergence of IESL to approximate Nash equilibria in multiplayer IIGs under the assumption of individual concavity. Empirically, we verify that IESL converges in six poker scenarios, with the ultimate NashConv lower than that of the comparative methods (including counterfactual regret minimization (CFR), replicator dynamics (RDs), and their variants) in multiplayer Leduc hold’em. When compared with the existing equilibrium-finding algorithms in multiplayer normal-form games (NFGs), IESL also demonstrates a more stable performance. In addition, we observe a trade-off between the difficulty of IESL’s last-iterate convergence and the NashConv of the convergent policies, which aligns with our convergence analysis based on the hypomonotonicity of the game.

PaperID: 945,

Authors: Bo Yang, Changzhe Jiao, Jinjian Wu, Leida Li

Affiliations: School of Artificial Intelligence, Xidian University, Xi’an, China

Title: Variational Multiple-Instance Learning With Embedding Correlation Modeling for Hyperspectral Target Detection

Abstract:
The hyperspectral target detection is widely concerned in geoscience and remote sensing due to the abundant spectral information in hyperspectral imagery. However, the detection performance is highly dependent on the high-quality target signature or pixel-level supervised signals, which are extremely challenging and costly. In this article, we propose a variational multiple-instance neural network with embedding correlation modeling (VMIL-ECM) for weakly supervised hyperspectral target detection, which relaxes the rigid target prior (e.g., target signatures and/or pixel-level annotations), and only region-level labels are required. VMIL-ECM explicitly models the location of the targets within the region as a latent variable under the nonindependent and identically distributed (non-i.i.d.) assumption to estimate the underlying ground-truth target locations. The expectation-maximization (EM) algorithm is employed to iteratively optimize the posterior distribution of latent variables and learn discriminative spectral features for the target detection. To fully utilize the contextual information within the hyperspectral region, a permutation-invariant transformer-based structure is devised to explore the embedding correlation among instances. Moreover, a dynamic thresholding strategy is adopted to produce the reliable fine-grained supervised signals. Extensive experiments on three simulated datasets and two real-field datasets are conducted to verify the effectiveness of VMIL-ECM, and the state-of-the-art performance has been achieved over the existing comparison methods. The code for the VMIL-ECM is publicly available at: https://github.com/BoYangXDU/VMIL-ECM.

PaperID: 946,

Authors: Bocheng Ren, Laurence T. Yang, Xin Nie, Jun Feng, Xianjun Deng, Chenlu Zhu

Affiliations: School of Computer Science and Technology and the Institute of Artificial Intelligence, Huazhong University of Science and Technology, Wuhan, China; School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China; School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan, China; Hubei Chutian Expressway Intelligent Industry Research Institute Company Ltd., Wuhan, China

Title: Zero-Shot Fault Diagnosis for Smart Process Manufacturing via Tensor Prototype Alignment

Abstract:
Identifying unseen faults is a crux of the digital transformation of process manufacturing. The ever-changing manufacturing process requires preset models to cope with unseen problems. However, most current works focus on recognizing objects seen during the training phase. Conventional zero-shot recognition methods perform poorly when they are applied directly to these tasks due to the different scenarios and limited generalizability. This article yields a tensor-based zero-shot fault diagnosis framework, termed MetaEvolver, which is dedicated to improving fault diagnosis accuracy and unseen domain generalizability for practical process manufacturing scenarios. MetaEvolver learns to evolve the dual prototype distributions for each uncertain meta-domain from seen faults and then adapt to unseen faults. We first propose the concept of the uncertain meta-domain and then construct corresponding sample prototypes with the guidance of class-level attributes, which produce the sample-attribute alignment at the prototype level. MetaEvolver further collaboratively evolves the uncertain meta-domain dual prototypes by injecting the prototype distribution information of another modality, boosting the sample-attribute alignment at the distribution level. Building on the uncertain meta-domain strategy, MetaEvolver is prone to achieving knowledge transferring and unseen domain generalization with the optimization of several devised loss functions. Comprehensive experimental results on five process manufacturing data groups and five zero-shot benchmarks demonstrate that our MetaEvolver has great superiority and potential to tackle zero-shot fault diagnosis for smart process manufacturing.

PaperID: 947,

Authors: Chao Pang, Yu Wang, Yi Jiang, Ruheng Wang, Xiaojun Yao, Quan Zou, Xiangxiang Zeng, Ran Su, Leyi Wei

Affiliations: School of Software and the Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China; Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macau, China; Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; College of Computer Science and Electronic Engineering, Hunan University, Changsha, China; College of Intelligence and Computing, Tianjin University, Tianjin, China

Title: Multiview Deep Learning-Based Molecule Design and Structural Optimization Accelerates Inhibitor Discover

Abstract:
In this work, we propose MEDICO, a multiview deep generative model for molecule generation, structural optimization, and the SARS-CoV-2 inhibitor discovery. To the best of our knowledge, MEDICO is the first-of-this-kind graph generative model that can generate molecular graphs similar to the structure of targeted molecules, with a multiview representation learning framework to sufficiently and adaptively learn comprehensive structural semantics from targeted molecular topology and geometry. We show that our MEDICO significantly outperforms the state-of-the-art methods in generating valid, novel, and unique molecules under benchmarking comparisons, particularly achieving \tilde 85 % improvement compared with the state-of-the-art methods in terms of validity. Importantly, we showcase that the multiview deep learning model enables us to generate not only the molecules structurally similar to the targeted molecules but also the molecules with desired chemical properties. Moreover, case study results on targeted molecule generation for the SARS-CoV-2 main protease (Mpro) show that we successfully generate new small molecules with desired drug-like properties for the Mpro by integrating molecular docking into our model as a chemical priori, potentially accelerating the de novo design of COVID-19 drugs. Furthermore, we apply MEDICO to the structural optimization of three well-known Mpro inhibitors (N3, 11a, and GC376) and achieve \tilde 88 % improvement compared with the origin inhibitors in their binding affinity to Mpro, demonstrating the application value of our model for the development of therapeutics for SARS-CoV-2 infection.

PaperID: 948,

Authors: José de Jesús Rubio

Affiliations: Sección de Estudios de Posgrado e Investigación, ESIME Azcapotzalco, Instituto Politécnico Nacional, Ciudad de México, Mexico

Title: Differential Evolution Algorithm for Fast Gains Learning in a High-Gain Controller

Abstract:
The twin delayed deep deterministic policy gradient (TD3) algorithm and genetic (G) algorithm can take significant time to converge. Hence, it would be interesting to propose an alternative algorithm for fast gains learning in a high-gain controller, being reflected as fast trajectory tracking. In a differential evolution (DE) algorithm, the population is installed, and the mutation, crossover, and selection operations are repeated until the convergence is located. In this way, compared with the TD3 and G algorithms, a DE algorithm can converge faster. In this article, the fast gains learning in a DE high-gain controller (DEHGC) is proposed. The DEHGC contains a high-gain controller for trajectory tracking and a DE algorithm for fast gains learning. The error stability of the high-gain controller is assured. The pseudocode of the DEHGC is detailed. The DE, TD3, and G algorithms are compared for fast gains learning in the high-gain controller.

PaperID: 949,

Authors: Yiming Fei, Jiangang Li, Yanan Li

Affiliations: School of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen, China; Department of Engineering and Design, University of Sussex, Brighton, U.K.

Title: Real-Time Progressive Learning: Accumulate Knowledge From Control With Neural-Network-Based Selective Memory

Abstract:
Memory, as the basis of learning, determines the storage, update, and forgetting of knowledge and further determines the efficiency of learning. Featured with the mechanism of memory, a radial basis function neural network (RBFNN)-based learning control scheme named real-time progressive learning (RTPL) is proposed to learn the unknown dynamics of the system with guaranteed stability and closed-loop performance. Instead of the Lyapunov-based weight update law of conventional neural network learning control (NNLC), which mainly concentrates on stability and control performance, RTPL uses the selective memory recursive least squares (SMRLS) algorithm to update the weights of the neural network and achieves the following merits: 1) improved learning speed without filtering; 2) robustness to hyperparameter setting of neural networks; 3) good generalization ability, i.e., reuse of learned knowledge in different tasks; and 4) guaranteed learning performance under parameter perturbation. Moreover, RTPL realizes continuous accumulation of knowledge as a result of its reasonably allocated memory while NNLC may gradually forget knowledge that it has learned. Corresponding theoretical analysis and simulation studies demonstrate the effectiveness of RTPL.

PaperID: 950,

Authors: Henghao Zhao, Kevin Qinghong Lin, Rui Yan, Zechao Li

Affiliations: School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China; Department of Computer Science and Technology, National University of Singapore, Queenstown, Singapore; Department of Computer Science and Technology, Nanjing University, Nanjing, China

Title: DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection

Abstract:
Video moment retrieval and highlight detection have received attention in the current era of video content proliferation, aiming to localize moments and estimate clip relevances based on user-specific queries. Most existing methods approach these challenges from a discriminative learning perspective, focusing on learning the correspondence between query and activity boundary locations through complex cross-modal interactions. However, the continuous nature of video content often results in unclear boundaries between temporal events. This boundary ambiguity may confuse models, resulting in the subpar performance in predicting target boundaries. To alleviate this problem, we propose to solve the two tasks jointly from the perspective of denoising generation. Moreover, the target boundary can be localized clearly by iterative refinement from coarse to fine. Specifically, a novel framework, DiffusionVMR, is proposed to redefine the two tasks as a unified conditional denoising generation process by combining the diffusion model. During training, the Gaussian noise is added to corrupt the ground truth (GT), with noisy candidates produced as input. The model is trained to reverse this noise addition process. In the inference phase, DiffusionVMR initiates directly from Gaussian noise and progressively refines the proposals from the noise to the meaningful output. Notably, the proposed DiffusionVMR inherits the advantages of diffusion models that allow for iteratively refined results during inference, enhancing the boundary transition from coarse to fine. Furthermore, the training and inference of DiffusionVMR are decoupled. An arbitrary setting can be used in DiffusionVMR during inference without consistency with the training phase. Extensive experiments conducted on five widely used benchmarks (i.e., QVHighlight, Charades-STA, TACoS, YouTubeHighlights, and TVSum) across two tasks (moment retrieval and/or highlight detection) demonstrate the effectiveness and flexibility of the proposed DiffusionVMR.

PaperID: 951,

Authors: Runmin Wang, Hua Chen, Yanbin Zhu, Juan Xu, Xiaofei Cao, Zhenlin Zhu, Shengyou Qian, Changxin Gao, Li Liu, Nong Sang

Affiliations: School of Information Science and Engineering, Hunan Normal University, Changsha, China; School of Physical and Electronic Sciences, Hunan Normal University, Changsha, China; School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China; College of Electronic Science and Technology, National University of Defense Technology, Changsha, China

Title: S3INet: Semantic-Information Space Sharing Interaction Network for Arbitrary Shape Text Detection

Abstract:
The detecting arbitrary shape text is a challenging task due to the significant variation in text shape, size, and aspect ratio, as well as the complexity of scene backgrounds. The enhancing feature extraction capabilities is essential for the boosting text detection accuracy. However, traditional text feature extraction methods face several issues, including insufficient multiscale feature fusion, limited information transfer between different feature levels, and constrained receptive field expansion when using asymmetric convolutional kernels for long text detection. To address these challenges, this article introduces an arbitrarily shaped scene text detector called the semantic-information space sharing interaction network (S3INet). The proposed network leverages the semantic-information space sharing module (S3M) to generate a single-level feature map capable of capturing multiscale features with rich semantic information and prominent foreground elements. In addition, we propose the multibranch parallel asymmetric convolutional module (MPACM) group to enhance the representation of text features, thereby further enhancing text detection performance. Extensive experimental evaluations on five publicly available natural scene text datasets (CTW-1500, Total-Text, MSRA-TD500, ICDAR2015, and ICDAR2017-MLT) and two traffic text datasets (CTST-1600 and TPD) demonstrate the superiority of our method. The results indicate that S3INet significantly outperforms most existing state-of-the-art methods in both accuracy and robustness. The code will be released at: https://github.com/runminwang/S3INet.

PaperID: 952,

Authors: Boyuan Yang, Jinyuan Zhang, Ruonan Liu, Di Lin, Ping Li, C. L. Philip Chen

Affiliations: Center for Advanced Control and Smart Operations, Nanjing University, Suzhou, China; Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, Tianjin, China; Department of Automation, Shanghai Jiao Tong University, Shanghai, China; College of Intelligence and Computing, Tianjin University, Tianjin, China; Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

Title: Point-to-Set Metric-Gated Mixture of Experts for Multisource Domain Adaptation Fault Diagnosis

Abstract:
The multisource unsupervised domain adaptation (MUDA) scenario poses a significant challenge in the field of intelligent fault diagnosis (IFD), where the goal is to transfer the knowledge learned from multiple labeled source domains to an unlabeled target domain. Existing IFD-oriented MUDA approaches frequently fail to recognize the distinct importance of each source domain relative to specific target samples, or lack flexibility in integrating diagnostic insights from multiple sources. In response, a novel MUDA approach is proposed for IFD, termed point-to-set metric-gated mixture of experts (PSMMoEs). This method leverages a mixture-of-experts (MoEs) framework to automatically integrate the complementary information from multiple source domains. It develops a deep point-to-set distance (PSD) metric learning technique within the MoE’s gating mechanism, effectively fusing domain-specific features by assessing the similarity between individual target samples and each source domain. The method ensures balanced training across progressive stages, harmonizing multitask learning with joint training for the MoE framework. Furthermore, a multilayer maximum mean discrepancy (MMD) measurement is employed for domain alignment, ensuring feature alignment across different domains at multiple levels. In order to assess the efficacy of the proposed method, it is compared with several leading domain adaptation methods on publicly available and laboratory-based rotating machinery fault datasets. The experimental results demonstrate superior classification and adaptation capabilities of the proposed fault diagnosis method.

PaperID: 953,

Authors: Lingkai Hu, Feng Zhan, Wenkai Huang, Weiming Gan, Haoxiang Hu, Hao He, Kunbo Han

Affiliations: School of Mechanical and Electrical Engineering, Guangzhou University, Guangzhou, China; College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China; School of Politics and International Relations, East China Normal University, Shanghai, China

Title: Weird-Net: Weighted Relative Distance Attention for Efficient and Robust Sequence Processing

Abstract:
Sequence processing is a fundamental research area in artificial intelligence (AI) that encompasses various tasks and applications. Existing models—such as recurrent neural networks (RNNs) and transformers—have drawbacks such as slow computation, high complexity, and overfitting. In this article, we propose Weird-Net, a novel sequence processing model that leverages the weighted relative distance (Weird)-attention mechanism. Weird-Net can capture positional inductive relationships more robustly and efficiently than transformers. Moreover, it enables parallel computation and can handle extremely long sequences with near-linear complexity. We conduct extensive experiments on multiple datasets and tasks and evaluate Weird-Net using various metrics. The results demonstrate that Weird-Net achieves state-of-the-art (SOTA) performance on several language modeling benchmarks and surpasses other models in terms of accuracy, speed, and memory usage.

PaperID: 954,

Authors: Thanveer Shaik, Xiaohui Tao, Haoran Xie, Lin Li, Xiaofeng Zhu, Qing Li

Affiliations: School of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, Australia; Division of Artificial Intelligence, School of Data Science, Lingnan University, TuenMun, Hong Kong; School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China; School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China; Department of Computing, The Hong Kong Polytechnic University, Hong Kong, SAR, China

Title: Exploring the Landscape of Machine Unlearning: A Comprehensive Survey and Taxonomy

Abstract:
Machine unlearning (MU) is gaining increasing attention due to the need to remove or modify predictions made by machine learning (ML) models. While training models have become more efficient and accurate, the importance of unlearning previously learned information has become increasingly significant in fields such as privacy, security, and ethics. This article presents a comprehensive survey of MU, covering current state-of-the-art techniques and approaches, including data deletion, perturbation, and model updates. In addition, commonly used metrics and datasets are presented. This article also highlights the challenges that need to be addressed, including attack sophistication, standardization, transferability, interpretability, training data, and resource constraints. The contributions of this article include discussions about the potential benefits of MU and its future directions. Additionally, this article emphasizes the need for researchers and practitioners to continue exploring and refining unlearning techniques to ensure that ML models can adapt to changing circumstances while maintaining user trust. The importance of unlearning is further highlighted in making artificial intelligence (AI) more trustworthy and transparent, especially with the growing importance of AI across various domains that involve large amounts of personal user data.

PaperID: 955,

Authors: Nicolò Romandini, Alessio Mora, Carlo Mazzocca, Rebecca Montanari, Paolo Bellavista

Affiliations: Department of Computer Science and Engineering (DISI), University of Bologna, Bologna, Italy; Department of Computer and Electrical Engineering and Applied Mathematics (DIEM), University of Salerno, Fisciano, Italy

Title: Federated Unlearning: A Survey on Methods, Design Guidelines, and Evaluation Metrics

Abstract:
Federated learning (FL) enables collaborative training of a machine learning (ML) model across multiple parties, facilitating the preservation of users’ and institutions’ privacy by maintaining data stored locally. Instead of centralizing raw data, FL exchanges locally refined model parameters to build a global model incrementally. While FL is more compliant with emerging regulations such as the European General Data Protection Regulation (GDPR), ensuring the right to be forgotten in this context—allowing FL participants to remove their data contributions from the learned model—remains unclear. In addition, it is recognized that malicious clients may inject backdoors into the global model through updates, e.g., to generate mispredictions on specially crafted data examples. Consequently, there is the need for mechanisms that can guarantee individuals the possibility to remove their data and erase malicious contributions even after aggregation, without compromising the already acquired “good” knowledge. This highlights the necessity for novel federated unlearning (FU) algorithms, which can efficiently remove specific clients’ contributions without full model retraining. This article provides background concepts, empirical evidence, and practical guidelines to design/implement efficient FU schemes. This study includes a detailed analysis of the metrics for evaluating unlearning in FL and presents an in-depth literature review categorizing state-of-the-art FU contributions under a novel taxonomy. Finally, we outline the most relevant and still open technical challenges, by identifying the most promising research directions in the field.

PaperID: 956,

Authors: Haoling Li, Jie Song, Mengqi Xue, Haofei Zhang, Mingli Song

Affiliations: School of Software Technology and the State Key Laboratory of Blockchain and Security, Zhejiang University, Hangzhou, China; School of Computer and Computing Science, Hangzhou City University, Hangzhou, China; Learning and Vision Laboratory, National University of Singapore, Queenstown, Singapore; College of Computer Science and Technology and the State Key Laboratory of Blockchain and Security, Zhejiang University, Hangzhou, China

Title: A Survey of Neural Trees: Co-Evolving Neural Networks and Decision Trees

Abstract:
Neural networks (NNs) and decision trees (DTs) are both popular models of machine learning, yet coming with mutually exclusive advantages and limitations. To bring the best of the two worlds, a variety of approaches are proposed to integrate NNs and DTs explicitly or implicitly. In this survey, these approaches are organized in a school which we term neural trees (NTs). This survey aims to present a comprehensive review of NTs and explore in detail how they enhance the model interpretability. Our first contribution is a detailed taxonomy of NTs, which characterizes the seamless integration and co-evolution of NNs and DTs. Subsequently, we analyze NTs in terms of their interpretability and performance and suggest potential solutions to the remaining challenges. Finally, this survey concludes with a discussion about other considerations like conditional computation and promising directions toward this field. A list of papers reviewed in this survey, along with their corresponding codes, is available at: https://github.com/ zju-vipa/awesome-neural-trees.

PaperID: 957,

Authors: Jie Chen, Rongpei Zhou, Jie Wu, Hui Zhang, Weihua Gui

Affiliations: College of Electronic Engineering, National University of Defense Technology, Hefei, China; School of Information Engineering, Nanchang University, Nanchang, China; School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China; School of Biomedical Engineering, Anhui Medical University, Hefei, China; School of Automation, Central South University, Changsha, China

Title: Vertex Cover of Networks and Its Related Optimization Problems: An Overview

Abstract:
As a well-known NP-hard problem, the vertex cover problem has broad applications, which has aroused the concern of many researchers. In recent years, its related optimization problems, including the weighted vertex cover problem, the \ell \geq 3 path vertex cover problem, and the connected vertex cover problem, and other related optimization problems have came into the view of researchers, who have designed various optimization algorithms to solve those related optimization problems. First, based on the existing works, we give detailed descriptions of the vertex cover problem and its related optimization problems and then review the current research progress. Then, we present some main representative optimization algorithms and provide numerical results and corresponding analysis. Finally, we summarize the existing works and present the future research directions.

PaperID: 958,

Authors: Rongchao Zhang, Yu Huang, Yiwei Lou, Weiping Ding, Yongzhi Cao, Hanpin Wang

Affiliations: Key Laboratory of High Confidence Software Technologies, Ministry of Education, School of Computer Science, Peking University, Beijing, China; National Engineering Research Center for Software Engineering, Peking University, Beijing, China; School of Artificial Intelligence and Computer Science, Nantong University, Nantong, China

Title: Synergistic Attention-Guided Cascaded Graph Diffusion Model for Complementarity Determining Region Synthesis

Abstract:
Complementarity determining region (CDR) is a specific region in antibody molecules that binds to antigens, where a small portion of residues undergoes particularly pronounced variations. Generating CDRs with high affinity and specificity is a pivotal milestone in accelerating drug development for daunting and unresolved diseases. However, existing approaches predominantly center on characterizing the attributes of residues through sequential generation models, thus falling short in effectively modeling the intricate spatial correlations among residues and frequently succumbing to the trap of generating sequences that exhibit a high degree of arbitrariness. In this article, we propose a novel synergistic attention-guided cascaded graph diffusion model, termed GraphCas, which offers a pathway for optimized generation of high-affinity CDRs. Our approach is the first cascaded-based graph diffusion model for CDR synthesis. Specifically, we design a graph propagation algorithm with a relation-aware synergistic attention mechanism, enabling the targeted acquisition of structural insights from diverse protein sequences and bolstering the global information representation of the graph by precisely localizing to long-range key residue sites. We design a cascaded conditional enhanced diffusion approach, providing the capability to incorporate additional control constraints into the input. Experimental results demonstrate that GraphCas can generate photo-realistic CDRs and achieve performance comparable to top-tier approaches. In particular, GraphCas reduces the RMSD by nearly 0.42 units in the H1 region and improves the ERRAT by 9.36% points in the L1 region.

PaperID: 959,

Authors: Yuanze Li, Chun-Mei Feng, Qilong Wang, Guanglei Yang, Wangmeng Zuo

Affiliations: School of Computer Science and Technology, Harbin Institute of Technology (HIT), Harbin, China; Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Fusionopolis, Singapore; College of Intelligence and Computing, Tianjin University, Tianjin, China

Title: Unprejudiced Training Auxiliary Tasks Makes Primary Better: A Multitask Learning Perspective

Abstract:
Human beings can leverage knowledge from relative tasks to improve learning on a primary task. Similarly, multitask learning (MTL) methods suggest using auxiliary tasks to enhance a neural network’s performance on a specific primary task. However, previous methods often select auxiliary tasks carefully but treat them as secondary during training. The weights assigned to auxiliary losses are typically smaller than the primary loss weight, leading to insufficient training on auxiliary tasks and ultimately failing to support the main task effectively. To address this issue, we propose an uncertainty-based impartial learning method that ensures balanced training across all tasks. In addition, we consider both gradients and uncertainty information during backpropagation to further improve performance on the primary task. Extensive experiments show that our method achieves performance comparable to or better than state-of-the-art approaches. Moreover, our weighting strategy is effective and robust in enhancing the performance of the primary task regardless of the noise auxiliary tasks’ pseudolabels.

PaperID: 960,

Authors: Shuyin Xia, Bolun Shi, Yifan Wang, Jiang Xie, Guoyin Wang, Xinbo Gao

Affiliations: Chongqing Key Laboratory of Computational Intelligence, the Key Laboratory of Cyberspace Big Data Intelligent Security, Ministry of Education, and the Key Laboratory of Big Data Intelligent Computing, Chongqing University of Posts and Telecommunications, Chongqing, China; Department of Computer Science, Chongqing University of Posts and Telecommunications, Chongqing, China

Title: GBCT: Efficient and Adaptive Clustering via Granular-Ball Computing for Complex Data

Abstract:
Traditional clustering algorithms often focus on the most fine-grained information and achieve clustering by calculating the distance between each pair of data points or implementing other calculations based on points. This way is not inconsistent with the cognitive mechanism of “global precedence” in the human brain, resulting in those methods’ bad performance in efficiency, generalization ability, and robustness. To address this problem, we propose a new clustering algorithm called granular-ball clustering via granular-ball computing. First, clustering algorithm based on granular-ball (GBCT) generates a smaller number of granular-balls to represent the original data and forms clusters according to the relationship between granular-balls, instead of the traditional point relationship. At the same time, its coarse-grained characteristics are not susceptible to noise, and the algorithm is efficient and robust; besides, as granular-balls can fit various complex data, GBCT performs much better in nonspherical datasets than other traditional clustering methods. The completely new coarse granularity representation method of GBCT and cluster formation mode can also be used to improve other traditional methods. All codes can be available at https://github.com/wylbdthxbw/GBC.

PaperID: 961,

Authors: Danping Zeng, Yaonan Wang, Yiming Jiang, Haoran Tan, Zhiqiang Miao, Yun Feng

Affiliations: School of Robotics and the National Engineering Research Center for Robot Visual Perception and Control Technology, Hunan University, Changsha, China; College of Electrical and Information Engineering and the National Engineering Research Center for Robot Visual Perception and Control Technology, Hunan University, Changsha, China

Title: Distributed Neural Adaptive Impedance Control for Cooperative Manipulation With Unknown Objects

Abstract:
Existing cooperative manipulation methods for multiple manipulator systems usually assume that the grasp matrix and the desired trajectory of each manipulator are known in advance. In this work, distributed neural adaptive impedance control (AIC) strategies integrating fully distributed observers are proposed to remove both limitations. Specifically, two fully distributed finite-time observers are designed to estimate the actual and ideal states of the reference point without using global information. The estimates of the grasp matrix and the desired trajectory of each end-effector (EE) are then obtained by kinematic constraints and the estimates of the reference point’s states. At the controller development, a distributed adaptive impedance model is established to achieve an adaptive trade-off between tracking performance and compliance. Then, distributed neural network (NN)-based tracking control strategies are developed to asymptotically realize the desired adaptive impedance dynamics in the presence of uncertainties. Additionally, a virtual energy tank (EK) is employed to interact with the impedance system to correct the adaptive impedance laws for system passivity. A simulation for four mobile manipulators tightly cooperative transport an unknown object is carried out to demonstrate the established results.

PaperID: 962,

Authors: Yuxuan Du, Dacheng Tao

Affiliations: College of Computing and Data Science, Nanyang Technological University, Jurong West, Singapore

Title: On Exploring the Potential of Quantum Auto-Encoder for Learning Quantum Systems

Abstract:
The frequent interactions between quantum computing and machine learning revolutionize both fields. One prototypical achievement is the quantum auto-encoder (QAE), as the leading strategy to relieve the curse of dimensionality ubiquitous in the quantum world. Despite its attractive capabilities, practical applications of QAE have yet largely unexplored. To narrow this knowledge gap, here, we devise three effective QAE-based learning protocols to address three classically computational hard learning problems when learning quantum systems, which are low-rank state fidelity estimation, quantum Fisher information (QFI) estimation, and Gibbs state preparation. Attributed to the versatility of QAE, our proposals can be readily executed on near-term quantum machines. Besides, we analyze the error bounds of the trained protocols and showcase the necessary conditions to provide practical utility from the perspective of complexity theory. We conduct numerical simulations to confirm the effectiveness of the proposed three protocols. This work sheds new light on developing advanced quantum learning algorithms to accomplish hard quantum physics and quantum information processing tasks.

PaperID: 963,

Authors: Xian Wei, Yingjie Liu, Xuan Tang, Shui Yu, Mingsong Chen

Affiliations: MoE Engineering Research Center of Hardware/Software Co-Design Technology and Application, East China Normal University, Shanghai, China; School of Communication and Electronic Engineering, East China Normal University, Shanghai, China; School of Computer Science, University of Technology Sydney, Sydney, NSW, Australia

Title: Integrating Convolution and Sparse Coding for Learning Low-Dimensional Discriminative Image Representations

Abstract:
This work investigates the problem of efficiently learning discriminative low-dimensional (LD) representations of multiclass image objects. We propose a generic end-to-end approach that jointly optimizes sparse dictionary and convolutions for learning LOW-dimensional discriminative image representations, named SparConvLow, taking advantage of convolutional neural networks (CNNs), dictionary learning, and orthogonal projections. The whole learning process can be summarized as follows. First, a CNN module is employed to extract high-dimensional (HD) preliminary convolutional features. Second, to avoid the high computational cost of direct sparse coding on HD CNN features, we learn sparse representation (SR) over a task-driven dictionary in the space with the feature being orthogonally projected. We then exploit the discriminative projection on SR. The whole learning process is consistently treated as an end-to-end joint optimization problem of trace quotient maximization. The cost function is well-defined on the product of the CNN parameters space, the Stiefel manifold, the Oblique manifold, and the Grassmann manifold. By using the explicit gradient delivery, the cost function is optimized via a geometrical stochastic gradient descent (SGD) algorithm along with the chain rule and the backpropagation. The experimental results show that the proposed method can achieve a highly competitive performance with the state-of-the-art (SOTA) image classification, object categorization, and face recognition methods, under both supervised and semi-supervised settings. The code is available at https://github.com/MVPR-Group/SparConvLow.

PaperID: 964,

Authors: Zhijie Rao, Jingcai Guo, Luyao Tang, Yue Huang, Xinghao Ding, Song Guo

Affiliations: Department of Computing, The Hong Kong Polytechnic University, Hong Kong, SAR, China; School of Information Science and Engineering, Xiamen University, Xiamen, China; Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, SAR, China

Title: SRCD: Semantic Reasoning With Compound Domains for Single-Domain Generalized Object Detection

Abstract:
This article provides a novel framework for single-domain generalized object detection (i.e., Single-DGOD), where we are interested in learning and maintaining the semantic structures of self-augmented compound cross-domain samples to enhance the model’s generalization ability. Different from domain generalized object detection (DGOD) trained on multiple source domains, Single-DGOD is far more challenging to generalize well to multiple target domains with only one single source domain. Existing methods mostly adopt a similar treatment from DGOD to learn domain-invariant features by decoupling or compressing the semantic space. However, there may exist two potential limitations: 1) pseudo attribute-label correlation due to extremely scarce single-domain data and 2) the semantic structural information is usually ignored, i.e., we found the affinities of instance-level semantic relations in samples are crucial to model generalization. In this article, we introduce semantic reasoning with compound domains (SRCD) for Single-DGOD. Specifically, our SRCD contains two main components, namely, the texture-based self-augmentation (TBSA) module and the local-global semantic reasoning (LGSR) module. TBSA aims to eliminate the effects of irrelevant attributes associated with labels, such as light, shadow, and color, at the image level by a light-yet-efficient self-augmentation. Moreover, LGSR is used to further model the semantic relationships on instance features to uncover and maintain the intrinsic semantic structures. Extensive experiments on multiple benchmarks demonstrate the effectiveness of the proposed SRCD. Code is available at github.com/zjrao/SRCD.

PaperID: 965,

Authors: Tian Qiu, Qianmu Li

Affiliations: School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China

Title: A Framework for Counterfactual Explanation of Predictive Uncertainty in Multimodal Models

Abstract:
Both predictive uncertainty estimation and visual explanation are crucial elements in helping humans understand the artificial intelligence (AI) decision-making process and in building trustworthy AI. However, there has been comparatively limited investigation into the intersection of these two domains in multimodal scenarios. In this article, we propose a universal explanation framework to evaluate counterfactual samples of predictive uncertainty in multimodal models. Inspired by multimodal representation learning, our framework leverages a shared latent space of multimodal variational autoencoders (MVAEs) to generate counterfactual explanations (CEs) of predictive uncertainty, enabling us to identify the input features contributing to high predictive uncertainty. To further evaluate the quality of counterfactual samples, we propose a Bayesian local linear approximation (BLLA) method. This method models the overall linear space as an inverse chi-square distribution while representing feature importance and the error term as normal distributions. By doing so, it captures the uncertainty and feature importance of each modality. Through a comprehensive suite of experiments conducted on multimodal classification and regression tasks, we demonstrate that our framework successfully generates accurate CEs of predictive uncertainty, establishes the consistency of feature importance, and comprehensively facilitates users’ comprehension of multimodal model behavior.

PaperID: 966,

Authors: Tongjian Liu, Zidong Wang, Yang Liu, Rui Wang

Affiliations: Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education, and the School of Control Science and Engineering, Dalian University of Technology, Dalian, China; College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao, China; School of Computing and Engineering, University of Huddersfield, Huddersfield, U.K.; Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education, and the School of Mechanics and Aerospace Engineering, Dalian University of Technology, Dalian, China

Title: Token-Bucket-Protocol-Based Recursive Remote State Estimation for Complex Networks Under Amplify-and-Forward Relays

Abstract:
This article is concerned with a recursive remote estimation problem for a class of nonlinear complex networks subject to the token bucket protocol (TBP) and amplify-and-forward (AF) relays. The influence of the TBP is considered, for the first time, in the context of networked state estimation, where the token consumptions are modeled in a stochastic manner, so as to describe the potential size variability of transmitted measurement signals. Once processed by the TBP, the signals, with stochastic channel coefficients, are transmitted to the remote estimator via AF relays, where a failure in transmission under the TBP may occur due to insufficient tokens in the bucket. An extended-Kalman-filter-based novel recursive estimator is proposed, and by solving Riccati-like difference equations, an upper bound of prediction/estimation error covariance is determined and further minimized through the design of an appropriate estimator gain. The impact of the TBP on estimation performance is also investigated. Some numerical simulations are presented to demonstrate the effectiveness of the proposed estimator and the effects of the TBP.

PaperID: 967,

Authors: Boyue Wang, Guangchao Wu, Xiaoyan Li, Junbin Gao, Yongli Hu, Baocai Yin

Affiliations: Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing Artificial Intelligence Institute, Faculty of Information Technology, Beijing University of Technology, Beijing, China; Discipline of Business Analytics, The University of Sydney Business School, The University of Sydney, Camperdown, NSW, Australia

Title: Modality Perception Learning-Based Determinative Factor Discovery for Multimodal Fake News Detection

Abstract:
The dissemination of fake news, often fueled by exaggeration, distortion, or misleading statements, significantly jeopardizes public safety and shapes social opinion. Although existing multimodal fake news detection methods focus on multimodal consistency, they occasionally neglect modal heterogeneity, missing the opportunity to unearth the most related determinative information concealed within fake news articles. To address this limitation and extract more decisive information, this article proposes the modality perception learning-based determinative factor discovery (MoPeD) model. MoPeD optimizes the steps of feature extraction, fusion, and aggregation to adaptively discover determinants within both unimodality features and multimodality fusion features for the task of fake news detection. Specifically, to capture comprehensive information, the dual encoding module integrates a modal-consistent contrastive language-image pre-training (CLIP) pretrained encoder with a modal-specific encoder, catering to both explicit and implicit information. Motivated by the prompt strategy, the output features of the dual encoding module are complemented by learnable memory information. To handle modality heterogeneity during fusion, the multilevel cross-modality fusion module is introduced to deeply comprehend the complex implicit meaning within text and image. Finally, for aggregating unimodal and multimodal features, the modality perception learning module gauges the similarity between modalities to dynamically emphasize decisive modality features based on the cross-modal content heterogeneity scores. The experimental evaluations conducted on three public fake news datasets show that the proposed model is superior to other state-of-the-art fake news detection methods.

PaperID: 968,

Authors: Lei Zhao, Lin Cai, Wu-Sheng Lu

Affiliations: Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada

Title: Tailored Federated Learning With Adaptive Central Acceleration on Diversified Global Models

Abstract:
We consider a setting engaging in collaborative learning with other machines where each individual machine has its own interests. How to effectively collaborate among machines with diverse requirements to maximize the profits of each participant poses a challenge in federated learning (FL). Our studies are motivated by the observation that in FL the global model attempts to acquire knowledge from each individual machine, while aggregating all local models into one optimal solution may not be desirable for some machines. To effectively leverage the knowledge of others while obtaining the customized solution for individual machine, we propose the accelerated federated training procedures with diversified global models. Based on the federated stochastic variance reduced gradient (FSVRG) framework, we propose the model-based grouping mechanism with adaptive central acceleration (MA-FSVRG) and gradients-based grouping mechanism with adaptive central acceleration (GA-FSVRG) to tackle the challenges of heterogeneous demands. The simulation results demonstrate the advantages of the proposed MA-FSVRG and GA-FSVRG over the state-of-the-art FL baselines. MA-FSVRG exhibits greater stability in performance and significant cost savings in local computation expenses compared to GA-FSVRG. On the other hand, GA-FSVRG attains higher test accuracy and faster convergence speed, particularly in scenarios with limited individual machine participation.

PaperID: 969,

Authors: Zhi-An Huang, Pengwei Hu, Lun Hu, Zhu-Hong You, Kay Chen Tan, Yu-An Huang

Affiliations: Research Office, City University of Hong Kong (Dongguan), Dongguan, China; Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China; School of Computer Science, Northwestern Polytechnical University, Xi’an, China; Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong, SAR, China

Title: Toward Multilabel Classification for Multiple Disease Prediction Using Gut Microbiota Profiles

Abstract:
Advancements in high-throughput technologies have yielded large-scale human gut microbiota profiles, sparking considerable interest in exploring the relationship between the gut microbiome and complex human diseases. Through extracting and integrating knowledge from complex microbiome data, existing machine learning (ML)-based studies have demonstrated their effectiveness in the precise identification of high-risk individuals. However, these approaches struggle to address the heterogeneity and sparsity of microbial features and explore the intrinsic relatedness among human diseases. In this work, we reframe human gut microbiome-based disease detection as a multilabel classification (MLC) problem and integrate a range of innovative techniques within the proposed MLC framework, aptly named GutMLC. Specifically, the entity semantic similarity as priori knowledge is incorporated into multilabel feature selection and loss functions by capturing the shared attributes and inherent associations among diseases and microbes. To tackle the issue of label imbalance, both within and between labels, we adapt the focal loss (FL) function for MLC using debiased inverse weighting. Extensive experiment results consistently demonstrate the competitive performance of GutMLC in comparison with commonly used MLC and single-label classification (SLC) algorithms. This work seeks to unlock the potential of gut microbiota as robust biomarkers for multiple disease prediction.

PaperID: 970,

Authors: Rushuang Zhou, Weishan Ye, Zhiguo Zhang, Yanyang Luo, Li Zhang, Linling Li, Gan Huang, Yining Dong, Yuan-Ting Zhang, Zhen Liang

Affiliations: School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China; Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China; School of Data Science, City University of Hong Kong, Hong Kong, China; Department of Biomedical Engineering, City University of Hong Kong, Hong Kong, China

Title: EEGMatch: Learning With Incomplete Labels for Semisupervised EEG-Based Cross-Subject Emotion Recognition

Abstract:
Electroencephalography (EEG) is an objective tool for emotion recognition and shows promising performance. However, the label scarcity problem is a main challenge in this field, which limits the wide application of EEG-based emotion recognition. In this article, we propose a novel semisupervised transfer learning framework (EEGMatch) to leverage both labeled and unlabeled EEG data. First, an EEG-Mixup-based data augmentation method is developed to generate more valid samples for model learning. Second, a semisupervised two-step pairwise learning method is proposed to bridge prototypewise and instancewise pairwise learning, where the prototypewise pairwise learning measures the global relationship between EEG data and the prototypical representation of each emotion class and the instancewise pairwise learning captures the local intrinsic relationship among EEG data. Third, a semisupervised multidomain adaptation is introduced to align the data representation among multiple domains (labeled source domain, unlabeled source domain, and target domain), where the distribution mismatch is alleviated. Extensive experiments are conducted on three benchmark databases (SEED, SEED-IV, and SEED-V) under a cross-subject leave-one-subject-out cross-validation evaluation protocol. The results show the proposed EEGMatch performs better than the state-of-the-art methods under different incomplete label conditions (with 5.89% improvement on SEED, 0.93% improvement on SEED-IV, and 0.28% improvement on SEED-V), which demonstrates the effectiveness of the proposed EEGMatch in dealing with the label scarcity problem in emotion recognition using EEG signals. The source code is available at https://github.com/KAZABANA/EEGMatch.

PaperID: 971,

Authors: Jiangtong Li, Ziyuan Zhou, Jingkai Zhang, Dawei Cheng, Changjun Jiang

Affiliations: Department of Computer Science and Technology, Tongji University, Shanghai, China; School of Economics and Management, Tongji University, Shanghai, China; Department of Software Engineering, Tongji University, Shanghai, China

Title: HFTCRNet: Hierarchical Fusion Transformer for Interbank Credit Rating and Risk Assessment

Abstract:
As a prominent application of deep neural networks in financial literature, bank credit ratings play a pivotal role in safeguarding global economic stability and preventing crises. In the contemporary financial system, interconnectivity among banks has reached unprecedented levels. However, many existing credit risk models continue to assess each bank independently, resulting in inevitable suboptimal performance. Thus, developing advanced neural networks to model intricate temporal dynamics and interconnected relationships in the banking system is essential for an effective credit rating and risk assessment learning system. To this end, we propose a novel hierarchical fusion transformer for interbank credit rating and risk assessment (HFTCRNet), which includes the long-term temporal transformer (LT3) module, short-term cross-graph transformer (STCGT) module, attentive risk contagion transformer (ARCT) module, and hierarchical fusion transformer (HFT) module to capture the long-term growth trajectories of banks, the short-term interbank network variance, the potential propagation of risks within interbank network, and integrate these information hierarchically. We further develop an interbank credit rating dataset, encompassing quarterly financial data, interbank lending networks, and key indicators such as credit ratings and systemic risk (SRISK) for 4548 banks from 2016Q1 to 2023Q1. Notably, we also adapt the minimum density algorithm to stabilize the interbank loan network over time, aiding in the analysis of long-term and short-term network effects. Our learning system uses semi-supervised learning to handle labels of varying sparsity, integrating credit ratings and SRISK for a comprehensive assessment of individual bank creditworthiness and systemic interbank risk. Extensive experimental results on our interbank dataset show that HFTCRNet not only outperforms all the baselines in terms of credit rating accuracy but also can evaluate the systemic risk within the interbank network. Code will be available at: https://github.com/AI4Risk/HFTCRNet.

PaperID: 972,

Authors: Leyuan Qu, Cornelius Weber, Wei Wang, Jia Jin, Yingming Gao, Taihao Li, Stefan Wermter

Affiliations: Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China; Department of Informatics, University of Hamburg, Hamburg, Germany; International Cultural Exchange College, Xinjiang University, Ürümqi, China; School of Business and Management, Shanghai International Studies University, Shanghai, China; School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China

Title: Disentanglement of Prosody Representations via Diffusion Models and Scheduled Gradient Reversal

Abstract:
Prosody plays a fundamental role in human speech and communication, facilitating intelligibility and conveying emotional and cognitive states. Extracting accurate prosodic information from speech is vital for building assistive technology, such as controllable speech synthesis, speaking style transfer, and speech emotion recognition (SER). However, it is challenging to disentangle speaker-independent prosody representations since prosodic attributes, such as intonation, excessively entangle with speaker-specific attributes, e.g., pitch. In this article, we propose a novel model, called Diffsody, to disentangle and refine prosody representations: 1) to disentangle prosody representations, we leverage the expressive generative ability of a diffusion model by conditioning it on quantified semantic information and pretrained speaker embeddings. Additionally, a prosody encoder automatically learns prosody representations used for spectrogram reconstruction in an unsupervised fashion; and 2) to refine and learn speaker-invariant prosody representations, a scheduled gradient reversal layer (sGRL) is proposed and integrated into the prosody encoder of Diffsody. We extensively evaluate Diffsody through qualitative and quantitative means. t-SNE visualization and speaker verification experiments demonstrate the efficacy of the sGRL method in preventing speaker-specific information leakage. Experimental results on speaker-independent SER and automatic depression detection (ADD) tasks demonstrate that Diffsody can efficiently factorize speaker-independent prosody representations, resulting in a significant boost in SER and ADD. In addition, Diffsody synergistically integrates with the semantic representation model WavLM, which leads to a discernibly elevated performance, outperforming contemporary methods in both SER and ADD tasks. Furthermore, the Diffsody model exhibits promising potential for various practical applications, such as voice or style conversion. Some audio samples can be found on our https://leyuanqu.github.io/Diffsody/demo website.

PaperID: 973,

Authors: Yukun Li, Guansong Pang, Wei Suo, Chenchen Jing, Yuling Xi, Lingqiao Liu, Hao Chen, Guoqiang Liang, Peng Wang

Affiliations: School of Computer Science, Ningbo Institute, Northwestern Polytechnical University, Xi’an, China; School of Computing and Information Systems, Singapore Management University, Bras Basah, Singapore; College of Computer Science and Technology, Zhejiang University, Hangzhou, China; School of Computer Science, The University of Adelaide, Adelaide, Australia

Title: CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning

Abstract:
This article investigates the problem of continual learning (CL) of vision-language models (VLMs) in open domains, where models are required to perform continual updating and inference on a stream of datasets from diverse seen and unseen domains with novel classes. Such a capability is crucial for various applications in open environments, e.g., AI assistants, autonomous driving systems, and robotics. Current CL studies mostly focus on closed-set scenarios in a single domain with known classes. Large pretrained VLMs such as CLIP have showcased exceptional zero-shot recognition capabilities, and several recent studies have leveraged the unique characteristics of VLMs to mitigate catastrophic forgetting in CL. However, they primarily focus on closed-set CL in a single-domain dataset. Open-domain CL of large VLMs is significantly more challenging due to 1) large class correlations and domain gaps across the datasets and 2) the forgetting of zero-shot knowledge in the pretrained VLMs and the knowledge learned from the newly adapted datasets. In this work, we introduce a novel approach, termed CoLeCLIP, which learns an open-domain CL model based on CLIP. It addresses these challenges through joint learning of a set of task prompts and a cross-domain class vocabulary. Extensive experiments on 11 domain datasets show that CoLeCLIP achieves new state-of-the-art performance for open-domain CL under both task- and class-incremental learning (CIL) settings.

PaperID: 974,

Authors: Jingwei Chen, Shasha Fu, Hui Yang, Feiping Nie

Affiliations: Jiangxi Key Laboratory of Flood and Drought Disaster Defense, Jiangxi Academy of Water Science and Engineering, Nanchang, China; School of Electrical and Automation Engineering and the Key Laboratory of Advanced Control and Optimization of Jiangxi Province, East China Jiaotong University, Nanchang, Jiangxi, China; School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), School of Computer Science and the Key Laboratory of Intelligent Interaction and Applications, Ministry of Industry and Information Technology, Northwestern Polytechnical University, Xi’an, China

Title: Harmonic Fast One-Step Cut: An Efficient Strategy for Spectral Clustering Optimization

Abstract:
Due to the excellent performance of spectral clustering (SC), it has been widely used in many fields of application. However, the high computational complexity and two successive steps have limited SC’s development. In addition, the traditional SC is formulated to maximize the arithmetic mean of trace ratios which is dominated by the larger objectives and may reduce the recognition accuracy in practical applications. In this article, we propose a novel graph cut criterion to minimize the trace ratios of harmonic mean with objectives, which can avoid the worst-cluster issue without imposing any regularization or constraints. Furthermore, an efficient and effective coordinate descent (CD) method is exploited to achieve a one-step solution. Therefore, this article can simultaneously solve three main challenges in a unified framework. Extensive experiments verify that the harmonic fast one-step graph cut (HFOC) achieves superior clustering performance with relatively less time-consuming compared to the other state-of-the-art clustering methods.

PaperID: 975,

Authors: Xingyang He, Jie Liu, Yutai Duan

Affiliations: College of Artificial Intelligence, Nankai University, Tianjin, China

Title: 2-D Transformer: Extending Large Language Models to Long-Context With Few Memory

Abstract:
The ability of processing long contexts is crucial for large language models (LLMs), but training LLMs with a long-context window requires substantial computational resources. Many sought to mitigate this through the sparse attention mechanism. However, sparse attention faces a noticeable gap compared with full attention in capturing long-distance information, leading to limited long-context processing capabilities. To effectively address this issue, this article proposes a novel sparse transformer architecture called 2-D transformer (2D-former), aimed at extending the context windows of pretrained LLMs while reducing GPU memory requirements. The 2D-former incorporates a 2-D attention mechanism that consists of a long-distance information compressor (LDIC) and a blockwise attention (BA) mechanism. LDIC can self-adaptively extract blockwise representational features by convolution and compress long-distance information into a set of tokens based on the significance of each block. The BA mechanism integrates these features, enabling each token to directly communicate with any of its preceding tokens during the computation of sparse attention. In this way, sparse attention can fully utilize long-distance information to bridge the gap with full attention while greatly reducing computational requirements. The 2D-former only needs to add less than 0.14% of additional trainable parameters to extend the context length of LLaMA2 7B to 32k on 4 A100 GPUs with 40-GB memory. In addition, it is compatible with most current acceleration techniques and parameter-efficient fine-tuning (PEFT) methods. Furthermore, we conduct supervised fine-tuning with 2D-former using our self-collected long-instruction fine-tuning dataset, named LongTuning, which comprises over 11k long-context question-answer (QA) pairs. Experimental results demonstrate that 2D-former achieves efficient long-context extension with minimal GPU memory and computational time consumption, while maintaining superior performance across both downstream long-context and short-context tasks.

PaperID: 976,

Authors: Fengyi Wang, Guanghui Zhu, Hongqing Ding, Pengfei Zhang, Chunfeng Yuan, Yihua Huang

Affiliations: State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; Department of Planning and Construction, China Mobile Communications Group Company Ltd., Beijing, China

Title: Boosting Temporal Graph Learning From Perspectives of Global and Local Structures

Abstract:
Learning on temporal graphs has attracted tremendous research interest due to its wide range of applications. Some works intuitively merge graph neural networks (GNNs) and recurrent neural networks (RNNs) to capture structural and temporal information, and recent works propose to aggregate information from neighbor nodes in local subgraphs based on message passing or random walks. These methods produce node embeddings from a global or local perspective and ignore the complementarity between them, thus facing limitations in capturing complex and entangled dynamic patterns when applied to diverse datasets or evaluated by more challenging evaluation protocols. To address the issues, we propose the global and local embedding network (GLEN) for effective and efficient temporal graph representation learning. Specifically, GLEN dynamically generates embeddings for graph nodes by considering both global and local perspectives using specially designed modules. Then, global and local embeddings are combined by a devised cross-perspective fusion module to extract high-order semantic relations of node embeddings. We evaluate GLEN on multiple real-world datasets and apply more stringent evaluation procedures. Extensive experimental results demonstrate that GLEN outperforms other baselines in both link prediction and dynamic node classification tasks. Moreover, with concise and effective modules, GLEN can achieve a better balance between inference precision and training efficiency.

PaperID: 977,

Authors: Ruijie Du, Deepan Muthirayan, Pramod P. Khargonekar, Yanning Shen

Affiliations: Department of Electrical Engineering and Computer Science and the Center for Pervasive Communications and Computing, University of California at Irvine, Irvine, CA, USA; Department of Computer Science and Artificial Intelligence, Plaksha University, Mohali, India; Department of Electrical Engineering and Computer Science, University of California at Irvine, Irvine, CA, USA

Title: Long-Term Fairness for Real-Time Decision Making: A Constrained Online Optimization Approach

Abstract:
As machine learning (ML)-driven decisions proliferate, particularly in cases involving sensitive attributes, such as gender, race, and age, to name a few, the need for equity and impartiality has emerged as a fundamental concern. In situations demanding real-time decision-making, fairness objectives become more nuanced and complex: instantaneous fairness to ensure equity in every time slot and long-term fairness to ensure fairness over a period of time. There is a growing awareness that real-world systems operating over long periods require fairness over different timelines. Most existing approaches mainly address dynamic costs with time-invariant fairness constraints, often disregarding the challenges posed by time-varying fairness constraints. Time-varying fairness constraints require the learners to adapt their decisions to meet the changing constraints. However, long-term dynamics are hard to assess and accurately predicting the changes in constraints can be difficult. To bridge this gap, this work introduces a framework for ensuring long-term fairness within dynamic decision-making systems characterized by time-varying fairness constraints. We formulate the decision problem with fairness constraints over a period as a constrained online optimization problem. A novel online algorithm, named long-term fairness-aware online learning algorithm (LoTFair), is presented that solves the problem “on the fly.” We demonstrate that long-term fairness for real-time decision making can be addressed flexibly and efficiently by LoTFair: it achieves overall fairness while maintaining performance over the long run.

PaperID: 978,

Authors: Yuerong Xue

Affiliations: State Grid Quanzhou Electric Power Supply Company, Quanzhou, Fujian, China

Title: Orthogonal Capsule Networks With Positional Information Preservation and Lightweight Feature Learning

Abstract:
Both transformer and convolutional neural network (CNN) models require supplementary elements to acquire positional information. To address this issue, we propose a novel orthogonal capsule network (OthogonalCaps) that preserves location information during lightweight feature learning. The proposed network simplifies complex training processes and enables end-to-end training for object detection tasks. Specifically, there is no need to solve the regression problem of positions and the classification problem of objects separately, nor is there a need to encode the positional information as an additional token, as in transformer models. We generate the next capsule layer via orthogonality-based dynamic routing, which reduces the number of parameters and preserves positional information via its voting mechanism. Moreover, we propose Capsule ReLU as an activation function to avoid the problem of gradient vanishing and to facilitate capsule normalization across various scales, thus empowering OrthogonalCaps to better adapt to objects of diverse scales. The orthogonal capsule network (CapsNet) demonstrates an accuracy and run-time performance on a par with those of Faster R-CNN on the VOC dataset. Our network outperforms the baseline approach in detecting small-scale samples. The simulation results suggest that the proposed network surpasses other capsule network models in achieving a favorable balance between parameters and accuracy. Furthermore, an ablation experiment indicates that both Capsule ReLU and orthogonality-based dynamic routing play essential roles in enhancing the classification performance. The training code and pretrained models are available at https://github.com/l1ack/OrthogonalCaps.

PaperID: 979,

Authors: Guogang Zhu, Xuefeng Liu, Shaojie Tang, Jianwei Niu, Xinghao Wu, Jiaxing Shen, Wanyu Lin

Affiliations: State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China; Department of Management Science and Systems, School of Management, Center for AI Business Innovation, University at Buffalo, Buffalo, NY, USA; Department of Computing and Decision Sciences, Lingnan University, Tuen Mun, Hong Kong; Department of Computing, The Hong Kong Polytechnic University, Cluny Road, Hong Kong

Title: Take Your Pick: Enabling Effective Distributed Learning Within Low-Dimensional Feature Space

Abstract:
Personalized federated learning (PFL) is a popular distributed learning framework that allows clients to have different models and has many applications where clients’ data are in different domains, including autonomous driving, traffic surveillance, and medical diagnosis. The typical model of a client in PFL features a global encoder trained by all clients to extract universal features from the raw data and personalized layers (e.g., a classifier) trained using the client’s local data. Nonetheless, due to the differences between the data distributions of different clients (also known as, domain gaps), the universal features produced by the global encoder largely encompass numerous components irrelevant to a certain client’s local task. Some recent PFL methods address the above problem by personalizing specific parameters within the encoder. However, these methods encounter substantial challenges attributed to the high dimensionality and nonlinearity of neural network parameter space. In contrast, the feature space exhibits a lower dimensionality, providing greater intuitiveness and interpretability as compared to the parameter space. To this end, we propose a novel PFL framework named FedPick. FedPick achieves PFL within the low-dimensional feature space by adaptively selecting task-relevant features for each client from the features generated by the global encoder based on its local data distribution. It presents a more accessible and interpretable implementation of PFL compared to those methods working in the parameter space. Extensive experimental results on multiple cross-domain datasets show that FedPick can effectively select task-relevant features for each client and improve model performance in cross-domain FL.

PaperID: 980,

Authors: Kesheng Zhang, Wen Yu, Yao Jia, Tianyou Chai

Affiliations: Key Laboratory of Integrated Automation for Process Industry, Northeastern University, Shenyang, China; Departamento de Control Automatico, CINVESTAV-IPN (National Polytechnic Institute), Mexico City, Mexico

Title: Comprehensive Production Index Prediction Using Dual-Scale Deep Learning in Mineral Processing

Abstract:
In mineral processing, the dynamic nature of industrial data poses challenges for decision-makers in accurately assessing current production statuses. To enhance the decision-making process, it is crucial to predict comprehensive production indices (CPIs), which are influenced by both human operators and industrial processes, and demonstrate a strong dual-scale property. To improve the accuracy of CPIs’ prediction, we introduce the high-frequency (HF) unit and low-frequency (LF) unit within our proposed dual-scale deep learning (DL) network. This architecture enables the exploration of nonlinear dynamic mapping in dual-scale industrial data. By integrating the Cloud-Edge collaboration mechanism with DL, our training strategy mitigates the dominance of HF data and guides networks to prioritize different frequency information. Through self-tuning training via Cloud-Edge collaboration, the optimal model structure and parameters on the cloud server are adjusted, with the edge model self-updating accordingly. Validated through online industrial experiments, our method significantly enhances CPIs’ prediction accuracy compared to the baseline approaches.

PaperID: 981,

Authors: Yuji Cao, Huan Zhao, Yuheng Cheng, Ting Shu, Yue Chen, Guolong Liu, Gaoqi Liang, Junhua Zhao, Jinyue Yan, Yun Li

Affiliations: Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Hong Kong, SAR, China; Department of Building Environment and Energy Engineering, The Hong Kong Polytechnic University, Hong Kong, China; School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China; National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China; School of Electrical and Electronic Engineering, Nanyang Technological University, Jurong West, Singapore; School of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen, China; Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, China

Title: Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods

Abstract:
With extensive pretrained knowledge and high-level general capabilities, large language models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in aspects, such as multitask learning, sample efficiency, and high-level task planning. In this survey, we provide a comprehensive review of the existing literature in LLM-enhanced RL and summarize its characteristics compared with conventional RL methods, aiming to clarify the research scope and directions for future studies. Utilizing the classical agent-environment interaction paradigm, we propose a structured taxonomy to systematically categorize LLMs’ functionalities in RL, including four roles: information processor, reward designer, decision-maker, and generator. For each role, we summarize the methodologies, analyze the specific RL challenges that are mitigated and provide insights into future directions. Finally, the comparative analysis of each role, potential applications, prospective opportunities, and challenges of the LLM-enhanced RL are discussed. By proposing this taxonomy, we aim to provide a framework for researchers to effectively leverage LLMs in the RL field, potentially accelerating RL applications in complex applications, such as robotics, autonomous driving, and energy systems.

PaperID: 982,

Authors: Jinghua Zhang, Li Liu, Kai Gao, Dewen Hu

Affiliations: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China; College of Electronic Science, National University of Defense Technology, Changsha, China

Title: A Forward and Backward Compatible Framework for Few-Shot Class-Incremental Pill Recognition

Abstract:
Automatic pill recognition (APR) systems are crucial for enhancing hospital efficiency, assisting visually impaired individuals, and preventing cross-infection. However, most existing deep learning-based pill recognition systems can only perform classification on classes with sufficient training data. In practice, the high cost of data annotation and the continuous increase in new pill classes necessitate the development of a few-shot class-incremental pill recognition (FSCIPR) system. This article introduces the first FSCIPR framework, discriminative and bidirectional compatible few-shot class-incremental learning (DBC-FSCIL). It encompasses forward-compatible and backward-compatible learning components. In forward-compatible learning, we propose an innovative virtual class generation strategy and a center-triplet (CT) loss to enhance discriminative feature learning. These virtual classes serve as placeholders in the feature space for future class updates, providing diverse semantic knowledge for model training. For backward-compatible learning, we develop a strategy to synthesize reliable pseudo-features of old classes using uncertainty quantification, facilitating data replay (DR) and knowledge distillation (KD). This approach allows for the flexible synthesis of features and effectively reduces additional storage requirements for samples and models. Additionally, we construct a new pill image dataset for FSCIL and assess various mainstream FSCIL methods, establishing new benchmarks. Our experimental results demonstrate that our framework surpasses existing state-of-the-art (SOTA) methods.

PaperID: 983,

Authors: Heejo Kong, Sung-Jin Kim, Gunho Jung, Seong-Whan Lee

Affiliations: Department of Brain and Cognitive Engineering, Korea University, Seoul, Republic of Korea; Department of Artificial Intelligence, Korea University, Seoul, Republic of Korea

Title: Diversify and Conquer: Open-Set Disagreement for Robust Semi-Supervised Learning With Outliers

Abstract:
Conventional semi-supervised learning (SSL) ideally assumes that labeled and unlabeled data share an identical class distribution; however, in practice, this assumption is easily violated, as unlabeled data often includes unknown class data, i.e., outliers. The outliers are treated as noise, considerably degrading the performance of SSL models. To address this drawback, we propose a novel framework, diversify and conquer (DAC), to enhance SSL robustness in the context of open-set SSL (OSSL). In particular, we note that existing OSSL methods rely on prediction discrepancies between inliers and outliers from a single model trained on labeled data. This approach can be easily failed when the labeled data are insufficient, leading to performance degradation that is worse than naive SSL that do not account for outliers. In contrast, our approach exploits prediction disagreements among multiple models that are differently biased toward the unlabeled distribution. By leveraging the discrepancies arising from training on unlabeled data, our method enables robust outlier detection, even when the labeled data are underspecified. Our key contribution is constructing a collection of differently biased models through a single training process. By encouraging divergent heads to be differently biased toward outliers while making consistent predictions for inliers, we exploit the disagreement among these heads as a measure to identify unknown concepts. Extensive experiments demonstrate that our method significantly surpasses state-of-the-art OSSL methods across various protocols.

PaperID: 984,

Authors: Shan Xue, Ning Zhao, Weidong Zhang, Biao Luo, Derong Liu

Affiliations: School of Information and Communication Engineering, Hainan University, Haikou, China; School of Automation, Central South University, Changsha, China; School of Automation and Intelligent Manufacturing, Southern University of Science and Technology, Shenzhen, China

Title: A Hybrid Adaptive Dynamic Programming for Optimal Tracking Control of USVs

Abstract:
This article presents an efficient method for solving the optimal tracking control policy of unmanned surface vehicles (USVs) using a hybrid adaptive dynamic programming (ADP) approach. This approach integrates data-driven integral reinforcement learning (IRL) and dynamic event-driven (DED) mechanisms into the solution of the control policy of the established augmented system while obtaining both the feedforward and feedback components of the tracking controller. For the USV model and the reference trajectory, an augmented system is established, and the tracking Hamilton-Jacobi–Bellman (HJB) equation is derived based on IRL, aiming to fully utilize system data information and reduce model dependency. For the solution of the tracking HJB equation, the DED-based controller update rule is used to further reduce the burden of network transmission. In implementing the ADP method, the DED experience replay-based weight update rule is utilized to recycle data resources. Experiments show that compared with the static event-driven (SED) approach, the DED approach reduces the sample size by 78% and increases the average interval by about four times.

PaperID: 985,

Authors: Yuhan Zhang, Zidong Wang, Lei Zou, Wei Qian, Shuxin Du

Affiliations: College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao, China; College of Information Science and Technology, Donghua University, Shanghai, China; School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo, China; Huzhou Key Laboratory of Intelligent Sensing and Optimal Control for Industrial Systems, School of Engineering, Huzhou University, Huzhou, China

Title: Neural-Network-Based Recursive State Estimation for Nonlinear Networked Systems With Binary-Encoding Mechanisms

Abstract:
This work addresses the problem of recursive state estimation for networked control systems with unknown nonlinearities and binary-encoding mechanisms (BEMs). To enhance transmission reliability and reduce network resource consumption, BEMs are used to convert measurement signals into binary bit strings (BBSs) of limited length, which are then transmitted to the estimator through noisy communication channels. During transmission, random bit errors may occur in the BBSs due to channel noise. For the considered nonlinear networked control systems affected by random bit errors, a neural-network (NN)-based recursive estimation strategy is proposed, where an NN with a time-varying tuning scalar is employed to approximate the unknown nonlinearity of the networked control systems. By using the proposed strategy, the upper bounds of the estimation error of the system state and the trace of the estimation error of the NN weight (NNW) are first derived. These bounds are then minimized by recursively designing both the estimator gain matrix and the tuning scalar of the NNW. Finally, the effectiveness of the proposed estimation strategy is demonstrated through a numerical example.

PaperID: 986,

Authors: Mingwen Shao, Lingzhuang Meng, Yuanjian Qiao, Lixu Zhang, Wangmeng Zuo

Affiliations: Shandong Key Laboratory of Intelligent Oil and Gas Industrial Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China; School of Computer Science, Harbin Institute of Technology, Harbin, China

Title: Latent Code Augmentation Based on Stable Diffusion for Data-Free Substitute Attacks

Abstract:
Since the training data of the target model is not available in the black-box substitute attack, most recent schemes utilize generative adversarial networks (GANs) to generate data for training the substitute model. However, these GANs-based schemes suffer from low training efficiency as the generator needs to be retrained for each target model during the substitute training process, as well as low generation quality. To overcome these limitations, we consider utilizing the diffusion model (DM) to generate data and propose a novel data-free substitute attack scheme based on stable diffusion (SD) to improve the efficiency and accuracy of substitute training. Despite the data generated by the SD exhibited high quality, it presented a different distribution of domains and a large variation of positive and negative samples for the target model. For this problem, we propose latent code augmentation (LCA) to facilitate SD in generating data that aligns with the data distribution of the target model. Specifically, we augment the latent codes of the inferred member data with LCA and use them as guidance for SD. With the guidance of LCA, the data generated by the SD not only meets the discriminative criteria of the target model but also exhibits high diversity. By utilizing this data, it is possible to train the substitute model that closely resembles the target model more efficiently. Extensive experiments demonstrate that our LCA achieves higher attack success rates (ASRs) and requires fewer query budgets compared to GANs-based schemes for different target models. Our codes are available at https://github.com/LzhMeng/LCA.

PaperID: 987,

Authors: Zhixiang Shen, Zhao Kang

Affiliations: School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China

Title: When Heterophily Meets Heterogeneous Graphs: Latent Graphs Guided Unsupervised Representation Learning

Abstract:
Unsupervised heterogeneous graph representation learning (UHGRL) has gained increasing attention due to its significance in handling practical graphs without labels. However, heterophily has been largely ignored, despite its ubiquitous presence in real-world heterogeneous graphs. In this article, we define semantic heterophily and propose an innovative framework called latent graphs guided unsupervised representation learning (LatGRL) to handle this problem. First, we develop a similarity mining method that couples global structures and attributes, enabling the construction of fine-grained homophilic and heterophilic latent graphs (LGs) to guide the representation learning. Moreover, we propose an adaptive dual-frequency semantic fusion mechanism to address the problem of node-level semantic heterophily. To cope with the massive scale of real-world data, we further design a scalable implementation. Extensive experiments on benchmark datasets validate the effectiveness and efficiency of our proposed framework. The source code and datasets have been made available at https://github.com/zxlearningdeep/LatGRL.

PaperID: 988,

Authors: Wenjie Yuan, Xiaowei Zhang, Xuejuan Zhang, Shuangyan Wang, Tianzhi Wang, Tong Zhang, Qinglin Zhao, Bin Hu

Affiliations: Gansu Provincial Key Laboratory of Wearable Computing, School of Information Science and Engineering, Lanzhou University, Lanzhou, China; Guangdong Provincial Key Laboratory of Computational Intelligence and Cyberspace Information, School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

Title: Discovery of Shared Latent Nonlinear Effective Connectivity for EEG-Based Depression Detection

Abstract:
Granger causality (GC) effective connectivity (EC) calculated from electroencephalogram (EEG) signals has been widely used in mental disorder detection. However, the existing methods only take into account linear dynamics or nonlinear dynamics within a single sample, ignoring the nonlinear dynamics shared by the same class of subjects. In this article, a model combining graph neural networks (GNNs) and variational autoencoders (VAEs) is proposed to construct shared latent nonlinear EC from raw EEG signals for depression detection. Several convolution modules and fully connected layers are used in the graph encoding network to learn the embeddings of the connectivity connected by every two EEG channels. In the graph decoding network, a class-specific Gaussian mixture model (GMM) is introduced in the VAEs to model shared dynamics in EC of the same class of subjects, and the shared dynamics combine the encoded embeddings of the EC and the past time series to restore raw EEG signals. Through a node-to-edge encoding process and an edge-to-node decoding process, the shared latent nonlinear EC in EEG signals can ultimately be learned by gradually optimizing the model’s loss function. The performance of the proposed method is verified on several open-accessed datasets. The excellent results prove that the proposed neural networks can learn more generalized nonlinear EC representations, and shared latent dynamics discovery can also help to identify depression better. The code is available at https://github.com/william-yuan2012/DSLNEC-tscausality.

PaperID: 989,

Authors: Kuijie Zhang, Shanchen Pang, Huahui Yang, Yuanyuan Zhang, Wenhao Wu, Hengxiao Li, Jerry Chun-Wei Lin

Affiliations: College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, Shandong, China; College of Qilu Transportation, Shandong University, Jinan, Shandong, China; College of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong, China; Department of Distributed Systems and IT Devices, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland

Title: Convolution Bridge: An Effective Algorithmic Migration Strategy From CNNs to GNNs

Abstract:
Graph neural networks (GNNs), as a rising star in machine learning, are widely used in relational data models and have achieved outstanding performance in graph tasks. GNN continuously takes inspiration from mature models in other domains such as computer vision and natural language processing to motivate the development of graph algorithms. However, due to the various data structures from different domains, the cross-domain migration of models has to go through a long period of disassembly and reconstruction, which may not yield the desired results. To preserve the excellent properties of convolution and optimize the migration process from convolutional neural networks (CNNs) to GNNs, we propose a convolution bridge. The convolution bridge realizes the data alignment from CNN to GNN, so that the CNN-based model can be efficiently migrated to the graph structure model. To demonstrate the effectiveness of our migration strategy, we migrated the inception module and U-Net architecture from CNNs to GNNs, named GraInc and GraU-Net, for the node-level task and the graph-level task, respectively. Experimental results show that GraInc and GraU-Net are highly competitive compared to the current state-of-the-art models, particularly on dense graph datasets.

PaperID: 990,

Authors: Tianwei Yan, Shan Zhao, Wentao Ma, Shezheng Song, Chengyu Wang, Zhibo Rao, Shizhao Chen, Zhigang Luo, Xinwang Liu

Affiliations: College of Computer, Hefei University of Technology, Heifei, China; School of Information and Artificial Intelligence, Anhui Agricultural University, Heifei, China; College of Computer, National University of Defense Technology, Changsha, China; College of Electronic Information Engineering, Nanchang Hangkong University, Nanchang, China; Academy of Military Science Defense Innovation Institute, Beijing, China

Title: FRCL-MNER: A Finer Grained Rank-Based Contrastive Learning Framework for Multimodal NER

Abstract:
Multimodal named entity recognition (MNER) is an emerging field that aims to automatically detect named entities and classify their categories, utilizing input text and auxiliary resources such as images. While previous studies have leveraged object detectors to preprocess images and fuse textual semantics with corresponding image features, these methods often overlook the potential finer grained information within each modality and may exacerbate error propagation due to predetection. To address these issues, we propose a finer grained rank-based contrastive learning (FRCL) framework for MNER. This framework employs a global-level contrastive learning to align multimodal semantic features and a Top-K rank-based mask strategy to construct positive-negative pairs, thereby learning a finer grained multimodal interaction representation. Experimental results from three well-known social media datasets reveal that our approach surpasses existing strong baselines, and achieves up to a 1.54% improvement on the Twitter2015 dataset. Extensive discussions further confirm the effectiveness of our approach. We will release the source code on https://github.com/augusyan/FRCL.

PaperID: 991,

Authors: Yi Zhang, Guoxia Xu, Meng Zhao, Hao Wang, Fan Shi, Shengyong Chen

Affiliations: Key Laboratory of Computer Vision and System of Ministry of Education, School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China; School of Communications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China; School of Cyber Engineering, Xidian University, Xi’an, China

Title: TDSF-Net: Tensor Decomposition-Based Subspace Fusion Network for Multimodal Medical Image Classification

Abstract:
Data from multimodalities bring complementary information for deep learning-based medical image classification models. However, data fusion methods simply concatenating features or images barely consider the correlations or complementarities among different modalities and easily suffer from exponential growth in dimensions and computational complexity when the modality increases. Consequently, this article proposes a subspace fusion network with tensor decomposition (TD) to heighten multimodal medical image classification. We first introduce a Tucker low-rank TD module to map the high-level dimensional tensor to the low-rank subspace, reducing the redundancy caused by multimodal data and high-dimensional features. Then, a cross-tensor attention mechanism is utilized to fuse features from the subspace into a high-dimension tensor, enhancing the representation ability of extracted features and constructing the interaction information among components in the subspace. Extensive comparison experiments with state-of-the-art (SOTA) methods are conducted on one self-established and three public multimodal medical image datasets, verifying the effectiveness and generalization ability of the proposed method. The code is available at https://github.com/1zhang-yi/TDSFNet.

PaperID: 992,

Authors: Young-Eun Kim, Gyeong-Min Bak, Seong-Whan Lee

Affiliations: Department of Artificial Intelligence, Korea University, Seongbuk-gu, Seoul, South Korea; Vision AI Laboratory, NC Research, NCSOFT Corporation, Seongnam-si, South Korea

Title: Language-Driven Spatial-Semantic Cross-Attention for Face Attribute Recognition With Limited Labeled Data

Abstract:
Recent advances in deep learning have demonstrated excellent results for face attribute recognition (FAR), which is generally trained with large-scale labeled data. Despite the significant progress in this field, most existing works mainly rely on large-scale labeled data, which is not practical in many real-world FAR applications. Numerous studies have been conducted to address this problem, but they require either large external face datasets or complex auxiliary tasks for pretraining the backbone network. In this article, we propose a new method named language-driven spatial–semantic cross-attention (LSA) that does not require any pretraining steps with additional datasets or auxiliary tasks. Driven by the impressive outcomes of recent computer vision studies using language models, we harness language-based relational information to enhance attribute recognition. The core of LSA is to combine and balance the learned scaled-dot product attention with the attention constructed based on language-driven knowledge. To this end, we propose a correlation dictionary, obtained with the similarity between text embeddings of facial attributes and facial regions to represent relationships. The correlation dictionary then creates a cross-attention form and is combined into the cross-attention with balancing parameters. Thus, we can compensate for the lack of data information by providing prior knowledge directly to the network. Extensive experiments demonstrate that our method surpasses state-of-the-art techniques, achieving an average improvement of 0.29% on the CelebA dataset and 0.39% on the LFWA dataset with limited labeling data, even without additional dataset training.

PaperID: 993,

Authors: Ao Luo, Hui Ma, Hongru Ren, Hongyi Li

Affiliations: School of Automation, Guangdong-Hong Kong Joint Laboratory for Intelligent Decision and Cooperative Control, and Guangdong Provincial Key Laboratory for Intelligent Decision and Cooperative Control, Guangdong University of Technology, Guangzhou, China; School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, China; College of Electronic and Information Engineering, Chongqing Key Laboratory of Generic Technology and System of Service Robot, Southwest University, Chongqing, China

Title: Estimator-Based Reinforcement Learning Consensus Control for Multiagent Systems With Discontinuous Constraints

Abstract:
This article focuses on the optimal consensus control problem for multiagent systems (MASs) with discontinuous constraints. The case of discontinuous constraints is a particular instance of state constraints, which has been studied less but occurs in many practical situations. Due to the discontinuous constraint boundaries, the traditional barrier function-based backstepping methods cannot be used directly. In response to this thorny problem, a novel constraint boundary reconstruction technique is proposed by designing a class of switch-like functions. The technique can convert discontinuous constraint boundaries into continuous ones, and it strictly proves that when the states satisfy the transformed constraint boundaries, the original constraints are also absolutely fulfilled. Meanwhile, with the aid of the barrier function and distributed event-triggered estimator, an improved coordinate transformation is constructed, which can remove the “feasibility condition” and simplify the controller design. In addition, by introducing prediction error and revised term into the learning process of neural networks (NNs), the optimal consensus problem is resolved by constructing a modified reinforcement learning strategy. Finally, the stability of the MASs is testified through the Lyapunov stability theory, and a simulation example verifies the effectiveness of the proposed method.

PaperID: 994,

Authors: Jiaming Xing, Dengwei Wei, Shanghang Zhou, Tingting Wang, Yanjun Huang, Hong Chen

Affiliations: School of Automotive Studies, Tongji University, Shanghai, China; School of Communication Engineering, Jilin University, Changchun, China; College of Electronics and Information Engineering and the Clean Energy Automotive Engineering Center, Tongji University, Shanghai, China

Title: A Comprehensive Study on Self-Learning Methods and Implications to Autonomous Driving

Abstract:
As artificial intelligence (AI) has already seen numerous successful applications, the upcoming challenge lies in how to realize artificial general intelligence (AGI). Self-learning algorithms can autonomously acquire knowledge and adapt to new, demanding applications, recognized as one of the most effective techniques to overcome this challenge. Although many related studies have been conducted, there is still no comprehensive and systematic review available, nor well-founded recommendations for the application of autonomous intelligent systems, especially autonomous driving. As a result, this article comprehensively analyzes and classifies self-learning algorithms into three categories: broad self-learning, narrow self-learning, and limited self-learning. These categories are used to describe the popular usage, the most promising techniques, and the current status of hybridization with self-supervised learning. Then, the narrow self-learning is divided into three parts based on the self-learning realization path: sample self-learning, model self-learning, and self-learning architecture. For each method, this article discusses in detail its self-learning capacity, challenges, and applications to autonomous driving. Finally, the future research directions of self-learning algorithms are pointed out. It is expected that this study has the potential to eventually contribute to revolutionizing autonomous driving technology.

PaperID: 995,

Authors: Anais Boumendil, Walid Bechkit, Karima Benatchba

Affiliations: INSA Lyon, Inria, CITI, UR, Villeurbanne, France; École nationale Supérieure d’Informatique, Laboratoire de Méthodes de conception des Systèmes, Algiers, Algeria

Title: On-Device Deep Learning: Survey on Techniques Improving Energy Efficiency of DNNs

Abstract:
Providing high-quality predictions is no longer the sole goal for neural networks. As we live in an increasingly interconnected world, these models need to match the constraints of resource-limited devices powering the Internet of Things (IoT) and embedded systems. Moreover, in the era of climate change, reducing the carbon footprint of neural networks is a critical step for green artificial intelligence, which is no longer an aspiration but a major need. Enhancing the energy efficiency of neural networks, in both training and inference phases, became a predominant research topic in the field. Training optimization has grown in interest recently but remains challenging, as it involves changes in the learning procedure that can impact the prediction quality significantly. This article presents a study on the most popular techniques aiming to reduce the energy consumption of neural networks’ training. We first propose a classification of the methods before discussing and comparing the different categories. In addition, we outline some energy measurement techniques. We discuss the limitations identified during our study as well as some interesting directions, such as neuromorphic and reservoir computing (RC).

PaperID: 996,

Authors: Jun Zhang, Song Zhu, Xiaoyang Liu, Shiping Wen, Chaoxu Mu

Affiliations: School of Mathematics, JCAM, China University of Mining and Technology, Xuzhou, China; School of Computer Science and Technology, Jiangsu Normal University, Xuzhou, China; Centre for Artificial Intelligence, University of Technology Sydney, Ultimo, NSW, Australia; School of Electrical and Automation Engineering, Tianjin University, Tianjin, China

Title: Finite-Time Stabilization of Inertial Memristive Neural Networks via Nonreduced Order Method

Abstract:
This article investigates the finite-time stabilization problem of inertial memristive neural networks (IMNNs) with bounded and unbounded time-varying delays, respectively. To simplify the theoretical derivation, the nonreduced order method is utilized for constructing appropriate comparison functions and designing a discontinuous state feedback controller. Then, based on the controller, the state of IMNNs can directly converge to 0 in finite time. Several criteria for finite-time stabilization of IMNNs are obtained and the setting time is estimated. Compared with previous studies, the requirement of differentiability of time delay is eliminated. Finally, numerical examples illustrate the usefulness of the analysis results in this article.

PaperID: 997,

Authors: Junfan Lin, Zhongzhan Huang, Keze Wang, Lingbo Liu, Liang Lin

Affiliations: Peng Cheng Laboratory, Shenzhen, China; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China

Title: Continuous Value Assignment: A Doubly Robust Data Augmentation for Off-Policy Learning

Abstract:
Deep reinforcement learning (RL) has witnessed remarkable success in a wide range of control tasks. To overcome RL’s notorious sample inefficiency, prior studies have explored data augmentation techniques leveraging collected transition data. However, these methods face challenges in synthesizing transitions adhering to the authentic environment dynamics, especially when the transition is high-dimensional and includes many redundant/irrelevant features to the task. In this article, we introduce continuous value assignment (CVA), an innovative optimization-level data augmentation approach that directly synthesizes novel training data in the state-action value space, effectively bypassing the need for explicit transition modeling. The key intuition of our method is that the transition plays an intermediate role in calculating the state-action value during optimization, and therefore directly augmenting the state-action value is more causally related to the optimization process. Specifically, our CVA combines parameterized value prediction and nonparametric value interpolation from neighboring states, resulting in doubly robust target values w.r.t. novel states and actions. Extensive experiments demonstrate CVA’s substantial improvements in sample efficiency across complex continuous control tasks, surpassing several advanced baselines.

PaperID: 998,

Authors: Mohd. Tasleem Khan, Mohammed A. Alhartomi

Affiliations: Institute of Sensors, Signals and Systems, School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh, U.K.; Department of Electrical Engineering, University of Tabuk, Tabuk, Saudi Arabia

Title: Digit-Serial DA-Based Fixed-Point RNNs: A Unified Approach for Enhancing Architectural Efficiency

Abstract:
The next crucial step in artificial intelligence involves integrating neural network models into embedded and mobile systems. This requires designing compact and energy-efficient neural network models in silicon for optimized performance. This article introduces a unified approach for enhancing the architectural efficiency of long short-term memory (LSTM) recurrent neural networks (RNNs). Precisely, two new structures (I and II) based on the two’s complement (TC) digit-serial distributed arithmetic (DSDA) technique are presented. The block-circulant matrix-vector multiplications (MVMs) and element-wise multiplications (EWMs) are formulated using TC DSDA. In addition, a fixed-point (FxP) training procedure for quantized LSTM RNNs is considered and validated for speech recognition tasks. Both structures leverage the circular rotation of weights and generate partial products with input digit slices. A new partial-product generator (PPG) and partial-product selector (PPS) designed to work with both unsigned and signed digits is introduced. In Structure I, a nonpipelined MVM is realized with a few PPGs and PPSs, followed by a shift-accumulate unit (SAU). Conversely, in Structure II, a suitably chosen depth-pipelined MVM is achieved with multiple PPGs and PPSs, followed by a shift-to-add tree (SAT). A critical path delay (CPD) analysis for both the proposed structures is also presented. Compared with previous works, post-synthesis results on 28-nm fully depleted silicon-on-insulator (FDSOI) technology reveal that for a model size of 128 × 128 , Structures I and II provide 39.87%, 95.63%, and 30.95%, 91.18% more area and energy efficiencies, respectively.

PaperID: 999,

Authors: Zhijie Zhong, Zhiwen Yu, Ziwei Fan, C. L. Philip Chen, Kaixiang Yang

Affiliations: School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China

Title: Adaptive Memory Broad Learning System for Unsupervised Time Series Anomaly Detection

Abstract:
Time series anomaly detection is the process of identifying anomalies within time series data. The primary challenge of this task lies in the necessity for the model to comprehend the characteristics of time-independent and abnormal data patterns. In this study, a novel algorithm called adaptive memory broad learning system (AdaMemBLS) is proposed for time series anomaly detection. This algorithm leverages the rapid inference capabilities of the broad learning algorithm and the memory bank’s capacity to differentiate between normal and abnormal data. Furthermore, an incremental algorithm based on multiple data augmentation techniques is introduced and applied to multiple ensemble learners, thereby enhancing the model’s effectiveness in learning the characteristics of time series data. To bolster the model’s anomaly detection capabilities, a more diverse ensemble approach and a discriminative anomaly score are recommended. Extensive experiments conducted on various real-world datasets demonstrate that the proposed method exhibits superior inference speed and more accurate anomaly detection compared to the existing competitors. A detailed experimental investigation is presented to elucidate the effectiveness of the proposed method and the underlying reasons for its efficacy.

PaperID: 1000,

Authors: Kecheng Chen, Jie Liu, Renjie Wan, Victor Ho-fun Lee, Varut Vardhanabhuti, Hong Yan, Haoliang Li

Affiliations: Department of Electrical Engineering and the Center for Intelligent Multidimensional Data Analysis, City University of Hong Kong, Hong Kong, China; Department of Computer Science, Hong Kong Baptist University, Hong Kong, China; Department of Clinical Oncology, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, SAR, China; Department of Diagnostic Radiology, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, SAR, China

Title: Unsupervised Domain Adaptation for Low-Dose CT Reconstruction via Bayesian Uncertainty Alignment

Abstract:
Low-dose computed tomography (LDCT) image reconstruction techniques can reduce patient radiation exposure while maintaining acceptable imaging quality. Deep learning (DL) is widely used in this problem, but the performance of testing data (also known as target domain) is often degraded in clinical scenarios due to the variations that were not encountered in training data (also known as source domain). Unsupervised domain adaptation (UDA) of LDCT reconstruction has been proposed to solve this problem through distribution alignment. However, existing UDA methods fail to explore the usage of uncertainty quantification, which is crucial for reliable intelligent medical systems in clinical scenarios with unexpected variations. Moreover, existing direct alignment for different patients would lead to content mismatch issues. To address these issues, we propose to leverage a probabilistic reconstruction framework to conduct a joint discrepancy minimization between source and target domains in both the latent and image spaces. In the latent space, we devise a Bayesian uncertainty alignment to reduce the epistemic gap between the two domains. This approach reduces the uncertainty level of target domain data, making it more likely to render well-reconstructed results on target domains. In the image space, we propose a sharpness-aware distribution alignment (SDA) to achieve a match of second-order information, which can ensure that the reconstructed images from the target domain have similar sharpness to normal-dose CT (NDCT) images from the source domain. Experimental results on two simulated datasets and one clinical low-dose imaging dataset show that our proposed method outperforms other methods in quantitative and visualized performance.

PaperID: 1001,

Authors: Xiaoqi Sheng, Hongmin Cai, Yongwei Nie, Shengfeng He, Yiu-Ming Cheung, Jiazhou Chen

Affiliations: School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China; School of Computing and Information Systems, Singapore Management University, Victoria Street Singapore, Singapore; Department of Computer Science, Hong Kong Baptist University, Hong Kong, SAR, China

Title: Modality-Aware Discriminative Fusion Network for Integrated Analysis of Brain Imaging Genomics

Abstract:
Mild cognitive impairment (MCI) represents an early stage of Alzheimer’s disease (AD), characterized by subtle clinical symptoms that pose challenges for accurate diagnosis. The quest for the identification of MCI individuals has highlighted the importance of comprehending the underlying mechanisms of disease causation. Integrated analysis of brain imaging and genomics offers a promising avenue for predicting MCI risk before clinical symptom onset. However, most existing methods face challenges in: 1) mining the brain network-specific topological structure and addressing the single nucleotide polymorphisms (SNPs)-related noise contamination and 2) extracting the discriminative properties of brain imaging genomics, resulting in limited accuracy for MCI diagnosis. To this end, a modality-aware discriminative fusion network (MA-DFN) is proposed to integrate the complementary information from brain imaging genomics to diagnose MCI. Specifically, we first design two modality-specific feature extraction modules: the graph convolutional network with edge-augmented self-attention module (GCN-EASA) and the deep adversarial denoising autoencoder module (DAD-AE), to capture the topological structure of brain networks and the intrinsic distribution of SNPs. Subsequently, a discriminative-enhanced fusion network with correlation regularization module (DFN-CorrReg) is employed to enhance inter-modal consistency and between-class discrimination in brain imaging and genomics. Compared to other state-of-the-art approaches, MA-DFN not only exhibits superior performance in stratifying cognitive normal (CN) and MCI individuals but also identifies disease-related brain regions and risk SNPs locus, which hold potential as putative biomarkers for MCI diagnosis.

PaperID: 1002,

Authors: Shangyang He, Yuanzheng Li, Yang Li, Yang Shi, C. Y. Chung, Zhigang Zeng

Affiliations: Department of Electrical and Electronic Engineering and the Research Center for Grid Modernisation, The Hong Kong Polytechnic University, Hong Kong, China; School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China; School of Electrical Engineering, Northeast Electric Power University, Jilin, China; Department of Mechanical Engineering, University of Victoria, Victoria, BC, Canada

Title: Boosting Communication Efficiency in Federated Learning for Multiagent-Based Multimicrogrid Energy Management

Abstract:
Privacy of user is becoming increasingly significant in constructing efficient multiagent energy management systems for multimicrogrid (MMG). As an emerging privacy-protection method, federated learning (FL) has been used to prevent data breaches in the MMG-related field. However, with the ever-growing participants, the underlying communication burden existing in FL is evident. Besides, since the neural network layers collectively determine an agent’s performance, the possible difference in layer convergence speeds would cause the inconsistency problem, that is, the FL may degrade the convergence rate of those fast-convergent layers, which weakens the overall performance of the agent. To address these issues, a communication-efficient FL (CEFL) algorithm is proposed in this study. Considering the cooperative relationship among layers, a layer evaluation (LE) mechanism is developed in CEFL to evaluate layer contribution through the Shapley value (SV), a profit distribution approach for coalitions. In this way, only partial layers with the highest contributions are selected to be uploaded to the server. In addition, instead of average parameters aggregation, a communication-efficient parameter aggregation method is proposed in CEFL to update the parameters of the global model (GM), in which an aggregation model (AM) is developed to receive parameters for aggregation. The performance of the proposed CEFL is verified by the numerical analysis of MMGs with 3–8 MGs participating. Furthermore, experiments investigate the influence of the hyperparameter in the CEFL and also demonstrate performance improvements, compared with the other four state-of-the-art algorithms.

PaperID: 1003,

Authors: Linqing Huang, Jinfu Fan, Alan Wee-Chung Liew

Affiliations: School of Cyber Science and Engineering and the School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University (SJTU), Shanghai, China; College of Computer Science and Technology, Qingdao University (QDU), Qingdao, China; School of Information and Communication Technology, Griffith University (GU), Gold Coast, Australia

Title: Integration of Multikinds Imputation With Covariance Adaptation Based on Evidence Theory

Abstract:
For incomplete data classification, missing attribute values are often estimated by imputation methods before building classifiers. The estimated attribute values are not actual attribute values. Thus, the distributions of data will be changed after imputing, and this phenomenon often results in degradation of classification performance. Here, we propose a new framework called integration of multikinds imputation with covariance adaptation (MICA) based on evidence theory (ET) to effectively deal with the classification problem with incomplete training data and complete test data. In MICA, we first employ different kinds of imputation methods to obtain multiple imputed training datasets. In general, the distributions of each imputed training dataset and test dataset will be different. A covariance adaptation module (CAM) is then developed to reduce the distribution difference of each imputed training dataset and test dataset. Then, multiple classifiers can be learned on the multiple imputed training datasets, and they are complementary to each other. For a test pattern, we can combine the multiple pieces of soft classification results yielded by these classifiers based on ET to obtain better classification performance. However, the reliabilities/weights of different imputed training datasets are usually different, so the soft classification results cannot be treated equally during fusion. We propose to use covariance difference across datasets and accuracy of imputed training data to estimate the weights. Finally, the soft classification results discounted by the estimated weights are combined by ET to make the final class decision. MICA was compared with a variety of related methods on several datasets, and the experimental results demonstrate that this new method can significantly improve the classification performance.

PaperID: 1004,

Authors: Hongmin Cai, Fei Qi, Junyu Li, Yu Hu, Bin Hu, Yue Zhang, Yiu-Ming Cheung

Affiliations: School of Computer Science and Engineering, South China University of Technology, Guangzhou, China; School of Medical Technology, Beijing Institute of Technology, Beijing, China; School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, China; Department of Computer Science, Hong Kong Baptist University, Hong Kong, SAR, China

Title: Uniform Tensor Clustering by Jointly Exploring Sample Affinities of Various Orders

Abstract:
Traditional clustering methods rely on pairwise affinity to divide samples into different subgroups. However, high-dimensional small-sample (HDLSS) data are affected by the concentration effects, rendering traditional pairwise metrics unable to accurately describe relationships between samples, leading to suboptimal clustering results. This article advances the proposition of employing high-order affinities to characterize multiple sample relationships as a strategic means to circumnavigate the concentration effects. We establish a nexus between different order affinities by constructing specialized decomposable high-order affinities, thereby formulating a uniform mathematical framework. Building upon this insight, a novel clustering method named uniform tensor clustering (UTC) is proposed, which learns a consensus low-dimensional embedding for clustering by the synergistic exploitation of multiple-order affinities. Extensive experiments on synthetic and real-world datasets demonstrate two findings: 1) high-order affinities are better suited for characterizing sample relationships in complex data and 2) reasonable use of different order affinities can enhance clustering effectiveness, especially in handling high-dimensional data.

PaperID: 1005,

Authors: Jiajin He, Min Xiao, Wenwu Yu, Zhengxin Wang, Xiangyu Du, Wei Xing Zheng

Affiliations: College of Automation and the College of Artificial Intelligence, Nanjing University of Posts and Telecommunications, Nanjing, China; School of Mathematics, Southeast University, Nanjing, China; College of Science, Nanjing University of Posts and Telecommunications, Nanjing, China; School of Computer, Data and Mathematical Sciences, Western Sydney University, Sydney, NSW, Australia

Title: How Can Anomalous-Diffusion Neural Networks Under Connectomics Generate Optimized Spatiotemporal Dynamics

Abstract:
Spatiotemporal dynamics in the brain have been recognized as strongly related to the formation of perceived and cognitive diseases, such as delusions and hallucinations in Alzheimer’s disease. However, two practical considerations are rarely mentioned in related mechanism research: the connectomics networking and the anomalous diffusion generated by the complex medium between neurons and the complex topology of neural networks, respectively. Furthermore, how to optimize the corresponding dynamics behaviors has excellent implications for treating brain diseases. This article first realizes the networking under connectomics for an anomalous-diffusion single-neuron model and applies a nonlinear state feedback control to generate optimized dynamic behaviors, which provides a paradigm of nonequilibrium self-organization driven by anomalous diffusion. Then, by tracing the root distribution of the characteristic equation, some controlled conditions causing or inhibiting Turing instability and Hopf bifurcation are deduced, and the effects of self-diffusion and cross diffusion on Turing instability range are also revealed. At last, thorough numerical simulations are updated to illustrate the results. It is emphasized that delay, self-diffusion, cross diffusion, and fractional order occupy dominant positions in determining the network’s spatiotemporal dynamics, and utilizing the control strategy can efficiently reduce Turing instability and delay Hopf bifurcation.

PaperID: 1006,

Authors: Yong He, Hongshan Yu, Zhengeng Yang, Xiaoyan Liu, Wei Sun, Ajmal Mian

Affiliations: College of Electrical and Information Engineering, School of Robotics, Quanzhou Institute of Industrial Design and Machine Intelligence Innovation, Hunan University, Yuelu, Changsha, China; College of Engineering and Design, Hunan Normal University, Yuelu, Changsha, China; Department of Computer Science, The University of Western Australia, Perth, WA, Australia

Title: Full Point Encoding for Local Feature Aggregation in 3-D Point Clouds

Abstract:
Point cloud processing methods exploit local point features and global context through aggregation which does not explicitly model the internal correlations between local and global features. To address this problem, we propose full point encoding which is applicable to convolution and transformer architectures. Specifically, we propose full point convolution (FuPConv) and full point transformer (FPTransformer) architectures. The key idea is to adaptively learn the weights from local and global geometric connections, where the connections are established through local and global correlation functions, respectively. FuPConv and FPTransformer simultaneously model the local and global geometric relationships as well as their internal correlations, demonstrating strong generalization ability and high performance. FuPConv is incorporated in classical hierarchical network architectures to achieve local and global shape-aware learning. In FPTransformer, we introduce full point position encoding in self-attention, that hierarchically encodes each point position in the global and local receptive field. We also propose a shape-aware downsampling block that takes into account the local shape and the global context. Experimental comparison to existing methods on benchmark datasets shows the efficacy of FuPConv and FPTransformer for semantic segmentation, object detection, classification, and normal estimation tasks. In particular, we achieve state-of-the-art semantic segmentation results of 76.8% mIoU on S3DIS sixfold and 73.1% on S3DIS Area 5. Our code is available at https://github.com/hnuhyuwa/FullPointTransformer.

PaperID: 1007,

Authors: Yaozhong Zheng, Hai-Tao Zhang, Zuogong Yue, Jun Wang

Affiliations: School of Artificial Intelligence and Automation, the MOE Engineering Research Center of Autonomous Intelligent Unmanned Systems, and the State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan, China; Department of Computer Science and the dean of the School of Data Science, City University of Hong Kong, Kowloon, Hong Kong

Title: Identifying Community-Bridge Network Structures via Bayesian Learning With Mixed Sparsity Mode

Abstract:
Identifying structures of complex networks based on time series of nodal data is of considerable interest and significance in many fields of science and engineering. This article presents a sparse Bayesian learning (SBL) method for identifying structures of community-bridge networks, where nodes are grouped to form communities connected via bridges. Using the structural information of such networks with unknown nodal dynamics and community formations, network structure identification is tackled similar to sparse signal reconstruction with mixed sparsity mode. The proposed method is theoretically proved to be convergent. Its superiority to mainstream baselines is demonstrated via extensive experiments without the need for manual adjustment of regularization parameters.

PaperID: 1008,

Authors: Xueli Geng, Licheng Jiao, Xu Liu, Lingling Li, Puhua Chen, Fang Liu, Shuyuan Yang

Title: A Spatial-Spectral Relation-Guided Fusion Network for Multisource Optical RS Image Classification

Abstract:
Multisource optical remote sensing (RS) image classification has obtained extensive research interest with demonstrated superiority. Existing approaches mainly improve classification performance by exploiting complementary information from multisource data. However, these approaches are insufficient in effectively extracting data features and utilizing correlations of multisource optical RS images. For this purpose, this article proposes a generalized spatial-spectral relation-guided fusion network (S2RGF-Net) for multisource optical RS image classification. First, we elaborate on spatial- and spectral-domain-specific feature encoders based on data characteristics to explore the rich feature information of optical RS data deeply. Subsequently, two relation-guided fusion strategies are proposed at the dual-level (intradomain and interdomain) to integrate multisource image information effectively. In the intradomain feature fusion, an adaptive de-redundancy fusion module (ADRF) is introduced to eliminate redundancy so that the spatial and spectral features are complete and compact, respectively. In interdomain feature fusion, we construct a spatial-spectral joint attention module (SSJA) based on interdomain relationships to sufficiently enhance the complementary features, so as to facilitate later fusion. Experiments on various multisource optical RS datasets demonstrate that S2RGF-Net outperforms other state-of-the-art (SOTA) methods.

PaperID: 1009,

Authors: Yao Zou, Yanghe Feng, Xiaocheng Song, Muhammad Arif Mughal, Wei He

Affiliations: School of Intelligence Science and Technology, the Institute of Artificial Intelligence, and the Key Laboratory of Intelligent Bionic Unmanned Systems, Ministry of Education, University of Science and Technology Beijing, Beijing, China; College of Systems Engineering, National University of Defense Technology, Changsha, China; Beijing Institute of Electronic Engineering, China Aerospace Science and Industry Corporation, Beijing, China

Title: Distributed GNE Seeking Strategy for Second-Integrator Multiplayer Systems Over Directed Topologies

Abstract:
This article studies the generalized Nash equilibrium (GNE) seeking problem of second-integrator multiplayer systems. In particular, each player is endowed with an individual payoff function with respect to collective decision variables, and simultaneously, a coupling inequality constraint and a set constraint are imposed to each player. The players communicate with their local neighbors over a directed topology. To begin with, a distributed-observer-based seeking strategy is synthesized by leveraging a proper composite variable. It is first demonstrated using nonsmooth analysis that the established distributed observer enables each player to accurately estimate the decision variables of others in terms of a strongly connected topology condition. Upon this basis, all the decision variables are then shown to converge to the expected GNE asymptotically borrowing from convex theory. In addition, three extension results are also given under the built GNE seeking framework. First, under the postulation that the velocity information is unavailable, a velocity-free distributed GNE seeking strategy is synthesized for second-integrator systems by implementing a proper auxiliary dynamics. Second, we consider nonlinear Euler-Lagrange systems with unknown inertia parameters and synthesize an improved distributed GNE seeking strategy resorting to an adaptation technique. Third, we focus on integrator chain systems and synthesize a modified distributed GNE seeking strategy using a new composite variable based on a proper coordinate transformation. For three extension cases, we all show in detail the achievement of the GNE seeking objective. Finally, a practical example is simulated to confirm the built GNE seeking results.

PaperID: 1010,

Authors: Chengzhong Ma, Deyu Yang, Tianyu Wu, Zeyang Liu, Houxue Yang, Xingyu Chen, Xuguang Lan, Nanning Zheng

Affiliations: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, China

Title: Improving Offline Reinforcement Learning With in-Sample Advantage Regularization for Robot Manipulation

Abstract:
Offline reinforcement learning (RL) aims to learn the possible policy from a fixed dataset without real-time interactions with the environment. By avoiding the risky exploration of the robot, this approach is expected to significantly improve the robot’s learning efficiency and safety. However, due to errors in value estimation from out-of-distribution actions, most offline RL algorithms constrain or regularize the policy to the actions contained within the dataset. The cost of such methods is the introduction of new hyperparameters and additional complexity. In this article, we aim to adapt offline RL to robotic manipulation with minimal changes and to avoid evaluating out-of-distribution actions as much as possible. Therefore, we improve offline RL with in-sample advantage regularization (ISAR). To mitigate the impact of unseen actions, the ISAR learns the state-value function only with the dataset sample to regress the optimal action-value function. Our method calculates the advantage function of action-state pairs based on in-sample value estimation and adds a behavior cloning (BC) regularization term in the policy update. This improves sample efficiency with minimal changes, resulting in a simple and easy-to-implement method. The experiments of the D4RL robot benchmark and multigoal sparse rewards robotic tasks show that the ISAR achieves excellent performance comparable to current state-of-the-art algorithms without the need for complex parameter tuning and too much training time. In addition, we demonstrate the effectiveness of our method on a real-world robot platform.

PaperID: 1011,

Authors: Qiyao Peng, Yinghui Wang, Pengfei Jiao, Huaming Wu, Lin Pan

Affiliations: School of New Media and Communication, Tianjin University, Tianjin, China; Key Laboratory of Information System and Technology, Beijing Institute of Control and Electronic Technology, Beijing, China; School of Cyberspace and the Data Security Governance Zhejiang Engineering Research Center, Hangzhou Dianzi University, Hangzhou, China; Center for Applied Mathematics, Tianjin University, Tianjin, China; School of Marine Science and Technology, Tianjin University, Tianjin, China

Title: Alleviate the Impact of Heterogeneity in Network Alignment From Community View

Abstract:
Network alignment is a fundamental problem in various domains since it can establish bridges for the same entity (i.e., anchor nodes) between different networks. Most existing network alignment methods are based on consistency assumption, i.e., anchor nodes exhibit similar local structures or neighbors across different networks. However, many anchor nodes have different local structures or neighbors across different networks, which could be regarded as anchor nodes’ heterogeneity. It poses a challenge to methods based on the assumption of consistency, as they lack abundant shared information, such as common neighbors. Fortunately, network communities provide the comprehension of node relationships and group structures within networks, which could alleviate the information insufficient. In this article, we propose to address the challenge of inadequate shared information triggered by nodes’ heterogeneity from a community perspective. Our model is based on joint optimization of node representation learning and community discovery, including: 1) a node-level constraint is employed to bring nodes with more anchor pairs as neighbors closer together and 2) a community-level constraint is utilized to bring nodes with higher order similarity closer together. We model the cross-network community alignment relations as asymmetric to mitigate the interference caused by anchor node heterogeneity when measuring community alignment relations. Furthermore, we leverage the learned cross-network community alignment relations to supplement node alignment, which could narrow down the search range of potential anchor nodes by focusing solely on aligning nodes within aligned cross-network communities. We conducted extensive experiments on real-world datasets, and the results show the effectiveness and efficiency of our proposed model on network alignment.

PaperID: 1012,

Authors: Jiajun Qian, Liang Xu, Xiaoqiang Ren, Xiao Fan Wang

Affiliations: School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, China; School of Future Technology, Shanghai University, Shanghai, China

Title: Structured Deep Neural Network-Based Backstepping Trajectory Tracking Control for Lagrangian Systems

Abstract:
Deep neural networks (DNNs) are increasingly being used to learn controllers due to their excellent approximation capabilities. However, their black-box nature poses significant challenges to closed-loop stability guarantees and performance analysis. In this brief, we introduce a structured DNN-based controller for the trajectory tracking control of Lagrangian systems using backing techniques. By properly designing neural network structures, the proposed controller can ensure closed-loop stability for any compatible neural network parameters. In addition, improved control performance can be achieved by further optimizing neural network parameters. Besides, we provide explicit upper bounds on tracking errors in terms of controller parameters, which allows us to achieve the desired tracking performance by properly selecting the controller parameters. Furthermore, when system models are unknown, we propose an improved Lagrangian neural network (LNN) structure to learn the system dynamics and design the controller. We show that in the presence of model approximation errors and external disturbances, the closed-loop stability and tracking control performance can still be guaranteed. The effectiveness of the proposed approach is demonstrated through simulations.

PaperID: 1013,

Authors: Yuhang Wang, Kaiquan Cai, Deyuan Meng

Affiliations: School of Electronics and Information Engineering and the State Key Laboratory of CNS/ATM, Beihang University (BUAA), Beijing, China; School of Automation Science and Electrical Engineering, the State Key Laboratory of CNS/ATM, and the Seventh Research Division, Beihang University (BUAA), Beijing, China

Title: Probabilistic Approximation of Stochastic Time Series Using Bayesian Recurrent Neural Network

Abstract:
In this brief, we investigate the approximation theory (AT) of Bayesian recurrent neural network (BRNN) for stochastic time series forecasting (TSF) from a probabilistic standpoint. Due to the cumulative dependencies present in stochastic time series, which are incompatible with the recurrent structure of BRNN and further complicate the analysis of AT, we first perform marginalization and transform the time series into a probabilistically equivalent latent variable model (LVM). Subsequently, we analyze the AT by evaluating the approximation error between the output mean of BRNN and that of the LVM, which are derived through Taylor expansion-based uncertainty propagation and distribution parameterization, respectively. Finally, leveraging the Khinchin’s law of large numbers, we study the convergence in probability of the sampling-based training algorithm, i.e., Bayes by Backprop (BBB), and prove that increasing the number of Monte Carlo samples in BBB leads to a convergence probability approaching one. Numerical simulations are conducted to demonstrate the validity of our results.

PaperID: 1014,

Authors: Dehong Gao, Duanxiao Song, Guangyuan Shen, Xiaoyan Cai, Libin Yang, Gongshen Liu, Xiaoyong Li, Zhen Wang

Affiliations: School of Cybersecurity, Northwestern Polytechnical University, Xi’an, China; School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China; School of Automation, Northwestern Polytechnical University, Xi’an, China

Title: FedSTS: A Stratified Client Selection Framework for Consistently Fast Federated Learning

Abstract:
In this article, we investigate random client selection in the context of horizontal federated learning (FL), whereby only a randomly selected subset of clients transmit their model updates to the server instead of yielding all clients involved. Many researchers have demonstrated that clustering-based client selection constitutes a simple yet efficacious approach to the identification of those clients possessing representative gradient information. Despite the extensive body of research on modified selection methodologies, the majority of prior work is predicated upon the assumption of consistently effective clustering. However, raw gradient-based clustering methods are subject to several challenges: 1) poor effectiveness, the raw high-dimensional gradient of a client is too complex to serve as an appropriate feature for grouping, resulting in large intra-cluster distances and 2) fluctuating effectiveness, due to inherent limitations in clustering, the effectiveness can vary significantly, leading to clusters with diverse levels of heterogeneity. In practice, suboptimal and inconsistent clustering effects can result in clusters with low intra-cluster similarity among clients. The selection of clients from such clusters may impede the overall convergence of training. In this article, we propose FedSTS, a novel client selection scheme to accelerate the FL convergence by variance reduction. The main idea of FedSTS is to stratify a compressed model update in order to ensure an excellent grouping effect, and at the same time reduce the cross-client variance by re-allocating the sample chance among different groups based on their diverse heterogeneity. It strikes this convergence acceleration by paying more attention to those client groups with relatively low similarity and then improving the representativeness of the selected subset as much as possible. Theoretically, we demonstrate the critical improvement of the proposed scheme in variance reduction and present equivalence conditions among different client selection methods. We also present the tighter convergence guarantee of the proposed method thanks to the variance reduction. Experimental results confirm the exceeded efficiency of our approach compared to alternatives.

PaperID: 1015,

Authors: Mengxuan Shao, Haiqi Zhu, Debin Zhao, Kun Han, Feng Jiang, Shaohui Liu, Wei Zhang

Affiliations: Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China

Title: Progressively Learning to Reach Remote Goals by Continuously Updating Boundary Goals

Abstract:
Training an effective policy on complex goal-reaching tasks with sparse rewards is an open challenge. It is more difficult for the task of reaching remote goals (RRG), as the unavailability of the original rewards and large Wasserstein distance between the distributions of desired goals and initial states make existing methods for common goal-reaching tasks inefficient or even completely ineffective. In this article, we propose progressively learning to reach remote goals by continuously updating boundary goals (PLUB), which solves RRG tasks by reducing the Wasserstein distance between the distributions of boundary goals and desired goals. Specifically, the concept of boundary goal is introduced, which is the set of the closest achieved goals for each desired goal. In addition, to reduce the computational complexity caused by the Wasserstein distance, the closest moving distance is introduced, which is its upper bound, and also the expectation of the distance between the desired goal and the closest boundary goal. By selecting the appropriate intermediate goal from all boundary goals and continuously updating boundary goals, both the closest moving distance and the Wasserstein distance can be reduced. As a result, RRG tasks degenerate into common goal-reaching tasks that can be efficiently solved by a combination of hindsight relabeling and the learning from demonstrations (LfD) method. Extensive experiments on several robotic manipulation tasks demonstrate that PLUB can bring substantial improvements over the existing methods.

PaperID: 1016,

Authors: Peng Wang, Zhongchen He, Bo Huang, Mauro Dalla Mura, Henry Leung, Jocelyn Chanussot

Affiliations: Key Laboratory of Radar Imaging and Microwave Photonics, Ministry of Education, Nanjing University of Aeronautics and Astronautics, Nanjing, China; Department of Geography, The University of Hong Kong, Hong Kong, China; Grenoble Images Parole Signals Automatics Laboratory, Grenoble Institute of Technology, Saint Martin d’Hères, France; Department of Electrical and Computer Engineering, University of Calgary, Calgary, AB, Canada; Université Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, Grenoble, France

Title: VOGTNet: Variational Optimization-Guided Two-Stage Network for Multispectral and Panchromatic Image Fusion

Abstract:
Multispectral image (MS) and panchromatic image (PAN) fusion, which is also named as multispectral pansharpening, aims to obtain MS with high spatial resolution and high spectral resolution. However, due to the usual neglect of noise and blur generated in the imaging and transmission phases of data during training, many deep learning (DL) pansharpening methods fail to perform on the dataset containing noise and blur. To tackle this problem, a variational optimization-guided two-stage network (VOGTNet) for multispectral pansharpening is proposed in this work, and the performance of variational optimization (VO)-based pansharpening methods relies on prior information and estimates of spatial-spectral degradation from the target image to other two original images. Concretely, we propose a dual-branch fusion network (DBFN) based on supervised learning and train it by using the datasets containing noise and blur to generate the prior fusion result as the prior information that can remove noise and blur in the initial stage. Subsequently, we exploit the estimated spectral response function (SRF) and point spread function (PSF) to simulate the process of spatial-spectral degradation, respectively, thereby making the prior fusion result and the adaptive recovery model (ARM) jointly perform unsupervised learning on the original dataset to restore more image details and results in the generation of the high-resolution MSs in the second stage. Experimental results indicate that the proposed VOGTNet improves pansharpening performance and shows strong robustness against noise and blur. Furthermore, the proposed VOGTNet can be extended to be a general pansharpening framework, which can improve the ability to resist noise and blur of other supervised learning-based pansharpening methods. The source code is available at https://github.com/HZC-1998/VOGTNet.

PaperID: 1017,

Authors: Dan-Dan Li, Hong-Li Li, Cheng Hu, Haijun Jiang, Jinde Cao

Affiliations: College of Mathematics and System Sciences, Xinjiang University, Ürümqi, China; School of Mathematics, Southeast University, Nanjing, China

Title: Projective Synchronization of Discrete-Time Variable-Order Fractional Neural Networks With Time-Varying Delays

Abstract:
This article is committed to studying projective synchronization and complete synchronization (CS) issues for one kind of discrete-time variable-order fractional neural networks (DVFNNs) with time-varying delays. First, two new variable-order fractional (VF) inequalities are built by relying on nabla Laplace transform and some properties of Mittag-Leffler function, which are extensions of constant-order fractional (CF) inequalities. Moreover, the VF Halanay inequality in discrete-time sense is strictly proved. Subsequently, some sufficient projective synchronization and CS criteria are derived by virtue of VF inequalities and hybrid controllers. Finally, we exploit numerical simulation examples to verify the validity of the derived results, and a practical application of the obtained results in image encryption is also discussed.

PaperID: 1018,

Authors: Rui Zhang, Luziwei Leng, Kaiwei Che, Hu Zhang, Jie Cheng, Qinghai Guo, Jianxing Liao, Ran Cheng

Affiliations: Department of Computer Science and Engineering, Shenzhen Key Laboratory of Computational Intelligence, Southern University of Science and Technology, Shenzhen, China; ACSLab, Huawei Technologies Company Ltd., Shenzhen, China; Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China

Title: Accurate and Efficient Event-Based Semantic Segmentation Using Adaptive Spiking Encoder-Decoder Network

Abstract:
Spiking neural networks (SNNs), known for their low-power, event-driven computation, and intrinsic temporal dynamics, are emerging as promising solutions for processing dynamic, asynchronous signals from event-based sensors. Despite their potential, SNNs face challenges in training and architectural design, resulting in limited performance in challenging event-based dense prediction tasks compared with artificial neural networks (ANNs). In this work, we develop an efficient spiking encoder-decoder network (SpikingEDN) for large-scale event-based semantic segmentation (EbSS) tasks. To enhance the learning efficiency from dynamic event streams, we harness the adaptive threshold which improves network accuracy, sparsity, and robustness in streaming inference. Moreover, we develop a dual-path spiking spatially adaptive modulation (SSAM) module, which is specifically tailored to enhance the representation of sparse events and multimodal inputs, thereby considerably improving network performance. Our SpikingEDN attains a mean intersection over union (MIoU) of 72.57% on the DDD17 dataset and 58.32% on the larger DSEC-Semantic dataset, showing competitive results to the state-of-the-art ANNs while requiring substantially fewer computational resources. Our results shed light on the untapped potential of SNNs in event-based vision applications. The source codes are publicly available at https://github.com/EMI-Group/spikingedn.

PaperID: 1019,

Authors: Guojun Liang, Prayag Tiwari, Slawomir Nowaczyk, Stefan Byttner, Fernando Alonso-Fernandez

Affiliations: CAISR, School of Information Technology, Halmstad University, Halmstad, Sweden

Title: Dynamic Causal Explanation Based Diffusion-Variational Graph Neural Network for Spatiotemporal Forecasting

Abstract:
Graph neural networks (GNNs), especially dynamic GNNs, have become a research hotspot in spatiotemporal forecasting problems. While many dynamic graph construction methods have been developed, relatively few of them explore the causal relationship between neighbor nodes. Thus, the resulting models lack strong explainability for the causal relationship between the neighbor nodes of the dynamically generated graphs, which can easily lead to a risk in subsequent decisions. Moreover, few of them consider the uncertainty and noise of dynamic graphs based on the time series datasets, which are ubiquitous in real-world graph structure networks. In this article, we propose a novel dynamic diffusion-variational GNN (DVGNN) for spatiotemporal forecasting. For dynamic graph construction, an unsupervised generative model is devised. Two layers of graph convolutional network (GCN) are applied to calculate the posterior distribution of the latent node embeddings in the encoder stage. Then, a diffusion model is used to infer the dynamic link probability and reconstruct causal graphs (CGs) in the decoder stage adaptively. The new loss function is derived theoretically, and the reparameterization trick is adopted in estimating the probability distribution of the dynamic graphs by evidence lower bound (ELBO) during the backpropagation period. After obtaining the generated graphs, dynamic GCN and temporal attention are applied to predict future states. Experiments are conducted on four real-world datasets of different graph structures in different domains. The results demonstrate that the proposed DVGNN model outperforms state-of-the-art approaches and achieves outstanding root mean square error (RMSE) results while exhibiting higher robustness. Also, by F1-score and probability distribution analysis, we demonstrate that DVGNN better reflects the causal relationship and uncertainty of dynamic graphs. The website of the code is https://github.com/gorgen2020/DVGNN.

PaperID: 1020,

Authors: Dongyuan Li, Zhen Wang, Yankai Chen, Renhe Jiang, Weiping Ding, Manabu Okumura

Affiliations: School of Information and Communication Engineering, Institute of Innovative Research, Tokyo Institute of Technology, Tokyo, Japan; School of Computer Science and Engineering, The Chinese University of Hong Kong, shatin, Hong Kong; Center for Spatial Information Science, The University of Tokyo, Tokyo, Japan; School of Information Science and Technology, Nantong University, Nantong, China

Title: A Survey on Deep Active Learning: Recent Advances and New Frontiers

Abstract:
Active learning seeks to achieve strong performance with fewer training samples. It does this by iteratively asking an oracle to label newly selected samples in a human-in-the-loop manner. This technique has gained increasing popularity due to its broad applicability, yet its survey papers, especially for deep active learning (DAL), remain scarce. Therefore, we conduct an advanced and comprehensive survey on DAL. We first introduce reviewed paper collection and filtering. Second, we formally define the DAL task and summarize the most influential baselines and widely used datasets. Third, we systematically provide a taxonomy of DAL methods from five perspectives, including annotation types, query strategies, deep model architectures, learning paradigms, and training processes, and objectively analyze their strengths and weaknesses. Then, we comprehensively summarize the main applications of DAL in natural language processing (NLP), computer vision (CV), data mining (DM), and so on. Finally, we discuss challenges and perspectives after a detailed analysis of current studies. This work aims to serve as a useful and quick guide for researchers in overcoming difficulties in DAL. We hope that this survey will spur further progress in this burgeoning field.

PaperID: 1021,

Authors: Junwei Sun, Wanting Xu, Peng Liu, Yanfeng Wang

Affiliations: School of Electrical and Information Engineering, Zhengzhou University of Light Industry, Zhengzhou, China

Title: Design and Implementation of Pavlovian Associative Memory Based on DNA Neurons

Abstract:
In the field of biocomputing and neural networks, deoxyribonucleic acid (DNA) strand displacement (DSD) technology performs well in computation, programming, and information processing. In this article, the multiplication gate, addition gate, and threshold gate based on DSD are used to cascade into a single DNA neuron. Multiple DNA neurons can be cascaded to form different neural networks. The DNA neural networks are designed to implement seven classical conditioned reflexes from Pavlovian associative memory experiments. A classical conditioned reflex is a combination of a conditioned stimulus (CS) and another un CS with a reward or punishment. So that the individual develops a conditioned reflex that is similar to an unconditioned reflex in the use of CS alone. The seven classical conditioned reflexes include acquisition and forgetting, interstimulus interval effect, blocking, conditioned inhibition, overshadowing, generation, and differentiation. The simulations are verified by the software visual DSD. This article provides a direction for the integration of biology and psychology.

PaperID: 1022,

Authors: Xueru Bai, Minjia Yang, Bowen Chen, Feng Zhou

Affiliations: National Key Laboratory of Radar Signal Processing, Xidian University, Xi’an, China; Key Laboratory of Electronic Information Countermeasure and Simulation Technology of the Ministry of Education, Xidian University, Xi’an, China

Title: REMI: Few-Shot ISAR Target Classification Via Robust Embedding and Manifold Inference

Abstract:
Unknown image deformation and few-shot issues have posed significant challenges to inverse synthetic aperture radar (ISAR) target classification. To achieve robust feature representation and precise correlation modeling, this article proposes a novel two-stage few-shot ISAR classification network, dubbed as robust embedding and manifold inference (REMI). In the robust embedding stage, a multihead spatial transformation network (MH-STN) is designed to adjust unknown image deformations from multiple perspectives. Then, the grouped embedding network (GEN) integrates and compresses diverse information by grouped feature extraction, intermediate feature fusion, and global feature embedding. In the manifold inference stage, a masked Gaussian graph attention network (MG-GAT) is devised to capture the irregular manifold of samples in the embedding space. In particular, the node features are described by Gaussian distributions, with interactions guided by the masked attention mechanism. Experimental results on two ISAR datasets demonstrate that REMI significantly improves the performance of few-shot classification and exhibits robustness in various scenarios.

PaperID: 1023,

Authors: Sina Aeeneh, Nikola Zlatanov, Jiangshan Yu

Affiliations: Department of Electrical and Computer Systems Engineering, Monash University, Clayton, VIC, Australia; Department of Computer Science and Engineering, Innopolis University, Innopolis, Russia; School of Computer Science, The University of Sydney, Camperdown, NSW, Australia

Title: New Bounds on the Accuracy of Majority Voting for Multiclass Classification

Abstract:
Majority voting is a simple mathematical function that returns the most frequently occurring value within a given set. As a popular decision fusion technique (DFT), the majority voting function (MVF) finds applications in resolving conflicts, where several independent voters report their opinions on a classification problem. Despite its importance and its various applications in ensemble learning, data crowdsourcing, remote sensing, and data oracles for blockchains, the accuracy of the MVF for the general multiclass classification problem has remained unknown. In this article, we derive a new upper bound on the accuracy of the MVF for the multiclass classification problem. More specifically, we show that under certain conditions, the error rate of the MVF exponentially decays toward 0 as the number of independent voters increases. Conversely, the error rate of the MVF exponentially grows with the number of voters if these conditions are not met. We first explore the problem for voters with independent and identically distributed (i.i.d.) outputs, where we assume that, given the true classification of the data point, every voter follows the same conditional probability distribution of voting for different classes. Next, we extend our results to encompass independent but nonidentically distributed votes. Using the derived results, we then provide a discussion on the accuracy of the truth discovery algorithms. We show that in the best-case scenarios, truth discovery algorithms operate as an amplified MVF and thereby achieve a small error rate only when the MVF achieves a small error rate too, and vice versa, achieve a large error rate when the MVF also achieves a large error rate. However, in the worst case scenario, the truth discovery algorithms may exhibit a significantly higher error rate than the MVF. Finally, we confirm our theoretical results using numerical simulations.

PaperID: 1024,

Authors: Wenqian Wang, Faliang Chang, Chunsheng Liu, Bin Wang, Zehao Liu

Affiliations: School of Control Science and Engineering, Shandong University, Jinan, China

Title: TODO-Net: Temporally Observed Domain Contrastive Network for 3-D Early Action Prediction

Abstract:
Early action prediction aiming to recognize which classes the actions belong to before they are fully conveyed is a very challenging task, owing to the insufficient discrimination information caused by the domain gaps among different temporally observed domains. Most of the existing approaches focus on using fully observed temporal domains to “guide” the partially observed domains while ignoring the discrepancies between the harder low-observed temporal domains and the easier highly observed temporal domains. The recognition models tend to learn the easier samples from the highly observed temporal domains and may lead to significant performance drops on low-observed temporal domains. Therefore, in this article, we propose a novel temporally observed domain contrastive network, namely, TODO-Net, to explicitly mine the discrimination information from the hard actions samples from the low-observed temporal domains by mitigating the domain gaps among various temporally observed domains for 3-D early action prediction. More specifically, the proposed TODO-Net is able to mine the relationship between the low-observed sequences and all the highly observed sequences belonging to the same action category to boost the recognition performance of the hard samples with fewer observed frames. We also introduce a temporal domain conditioned supervised contrastive (TD-conditioned SupCon) learning scheme to empower our TODO-Net with the ability to minimize the gaps between the temporal domains within the same action categories, meanwhile pushing apart the temporal domains belonging to different action classes. We conduct extensive experiments on two public 3-D skeleton-based activity datasets, and the results show the efficacy of the proposed TODO-Net.

PaperID: 1025,

Authors: Jiawei Wang, Teng Wang, Lele Xu, Zichen He, Changyin Sun

Affiliations: College of Electronics and Information Engineering, Tongji University, Shanghai, China; School of Automation, Southeast University, Nanjing, China

Title: Discovering Intrinsic Subgoals for Vision- and-Language Navigation via Hierarchical Reinforcement Learning

Abstract:
Vision-and-language navigation requires an agent to navigate in a photo-realistic environment by following natural language instructions. Mainstream methods employ imitation learning (IL) to let the agent imitate the behavior of the teacher. The trained model will overfit the teacher’s biased behavior, resulting in poor model generalization. Recently, researchers have sought to combine IL and reinforcement learning (RL) to overcome overfitting and enhance model generalization. However, these methods still face the problem of expensive trajectory annotation. We propose a hierarchical RL-based method—discovering intrinsic subgoals via hierarchical (DISH) RL—which overcomes the generalization limitations of current methods and gets rid of expensive label annotations. First, the high-level agent (manager) decomposes the complex navigation problem into simple intrinsic subgoals. Then, the low-level agent (worker) uses an intrinsic subgoal-driven attention mechanism for action prediction in a smaller state space. We place no constraints on the semantics that subgoals may convey, allowing the agent to autonomously learn intrinsic, more generalizable subgoals from navigation tasks. Furthermore, we design a novel history-aware discriminator (HAD) for the worker. The discriminator incorporates historical information into subgoal discrimination and provides the worker with additional intrinsic rewards to alleviate the reward sparsity. Without labeled actions, our method provides supervision for the worker in the form of self-supervision by generating subgoals from the manager. The final results of multiple comparison experiments on the Room-to-Room (R2R) dataset show that our DISH can significantly outperform the baseline in accuracy and efficiency.

PaperID: 1026,

Authors: Bin Chen, Zehong Cao, Quan Bai

Affiliations: STEM, University of South Australia, Adelaide, SA, Australia; School of ICT, University of Tasmania, Hobart, TAS, Australia

Title: SATF: A Scalable Attentive Transfer Framework for Efficient Multiagent Reinforcement Learning

Abstract:
It is challenging to train an efficient learning procedure with multiagent reinforcement learning (MARL) when the number of agents increases as the observation space exponentially expands, especially in large-scale multiagent systems. In this article, we proposed a scalable attentive transfer framework (SATF) for efficient MARL, which achieved goals faster and more accurately in homogeneous and heterogeneous combat tasks by transferring learned knowledge from a small number of agents (4) to a large number of agents (up to 64). To reduce and align the dimensionality of the observed state variations caused by increasing numbers of agents, the proposed SATF deployed a novel state representation network with a self-attention mechanism, known as dynamic observation representation network (DorNet), to extract the dominant observed information with excellent cost-effectiveness. The experiments on the MAgent platform showed that the SATF outperformed the distributed MARL (independent Q-learning (IQL) and A2C) in task sequences from 8 to 64 agents. The experiments on StarCraft II showed that the SATF demonstrated superior performance relative to the centralized training with decentralized execution MARL (QMIX) by presenting shorter training steps, achieving a desired win rate of up to approximately 90% when increasing the number of agents from 4 to 32. The findings of our study showed the great potential for enhancing the efficiency of MARL training in large-scale agent combat missions.

PaperID: 1027,

Authors: Zehong Wang, Donghua Yu, Shigen Shen, Shichao Zhang, Huawen Liu, Shuang Yao, Maozu Guo

Affiliations: Department of Computer Science and Engineering, Shaoxing University, Shaoxing, China; Department of Computer Science and Engineering and the Institute of Artificial Intelligence, Shaoxing University, Shaoxing, China; School of Information Engineering, Huzhou University, Huzhou, China; School of Computer Science and Engineering, Central South University, Changsha, China; College of Economics and Management, China Jiliang University, Hangzhou, China; Department of Computer Science, Beijing University of Civil Engineering and Architecture, Beijing, China

Title: Select Your Own Counterparts: Self-Supervised Graph Contrastive Learning With Positive Sampling

Abstract:
Contrastive learning (CL) has emerged as a powerful approach for self-supervised learning. However, it suffers from sampling bias, which hinders its performance. While the mainstream solutions, hard negative mining (HNM) and supervised CL (SCL), have been proposed to mitigate this critical issue, they do not effectively address graph CL (GCL). To address it, we propose graph positive sampling (GPS) and three contrastive objectives. The former is a novel learning paradigm designed to leverage the inherent properties of graphs for improved GCL models, which utilizes four complementary similarity measurements, including node centrality, topological distance, neighborhood overlapping, and semantic distance, to select positive counterparts for each node. Notably, GPS operates without relying on true labels and enables preprocessing applications. The latter aims to fuse positive samples and enhance representative selection in the semantic space. We release three node-level models with GPS and conduct extensive experiments on public datasets. The results demonstrate the superiority of GPS over state-of-the-art (SOTA) baselines and debiasing methods. In addition, the GPS has also been proven to be versatile, adaptive, and flexible.

PaperID: 1028,

Authors: Xia Dong, Feiping Nie, Danyang Wu, Rong Wang, Xuelong Li

Affiliations: School of Computer Science and Engineering, Nanyang Technological University, Jurong West, Singapore; School of Artificial Intelligence, Optics and Electronics (iOPEN), School of Computer Science, Northwestern Polytechnical University, Xi’an, P. R. China; College of Information Engineering and the Shaanxi Engineering Research Center for Intelligent Perception and Analysis of Agricultural Information, Northwest A&F University, Yangling, P. R. China; Key Laboratory of Intelligent Interaction and Applications (Northwestern Polytechnical University), Ministry of Industry and Information Technology, Xi’an, P. R. China

Title: Joint Structured Bipartite Graph and Row-Sparse Projection for Large-Scale Feature Selection

Abstract:
Feature selection plays an important role in data analysis, yet traditional graph-based methods often produce suboptimal results. These methods typically follow a two-stage process: constructing a graph with data-to-data affinities or a bipartite graph with data-to-anchor affinities and independently selecting features based on their scores. In this article, a large-scale feature selection approach based on structured bipartite graph and row-sparse projection (RS2BLFS) is proposed to overcome this limitation. RS2BLFS integrates the construction of a structured bipartite graph consisting of c connected components into row-sparse projection learning with k nonzero rows. This integration allows for the joint selection of an optimal feature subset in an unsupervised manner. Notably, the c connected components of the structured bipartite graph correspond to c clusters, each with multiple subcluster centers. This feature makes RS2BLFS particularly effective for feature selection and clustering on nonspherical large-scale data. An algorithm with theoretical analysis is developed to solve the optimization problem involved in RS2BLFS. Experimental results on synthetic and real-world datasets confirm its effectiveness in feature selection tasks.

PaperID: 1029,

Authors: Yuli Sun, Lin Lei, Dongdong Guan, Gangyao Kuang, Zhang Li, Li Liu

Affiliations: College of Aerospace Science and Engineering, National University of Defense Technology, Changsha, China; College of Electronic Science, National University of Defense Technology, Changsha, China; High-Tech Institute of Xi’an, Xi’an, China

Title: Locality Preservation for Unsupervised Multimodal Change Detection in Remote Sensing Imagery

Abstract:
Multimodal change detection (MCD) is a topic of increasing interest in remote sensing. Due to different imaging mechanisms, the multimodal images cannot be directly compared to detect the changes. In this article, we explore the topological structure of multimodal images and construct the links between class relationships (same/different) and change labels (changed/unchanged) of pairwise superpixels, which are imaging modality-invariant. With these links, we formulate the MCD problem within a mathematical framework termed the locality-preserving energy model (LPEM), which is used to maintain the local consistency constraints embedded in the links: the structure consistency based on feature similarity and the label consistency based on spatial continuity. Because the foundation of LPEM, i.e., the links, is intuitively explainable and universal, the proposed method is very robust across different MCD situations. Noteworthy, LPEM is built directly on the label of each superpixel, so it is a paradigm that outputs the change map (CM) directly without the need to generate intermediate difference image (DI) as most previous algorithms have done. Experiments on different real datasets demonstrate the effectiveness of the proposed method. Source code of the proposed method is made available at https://github.com/yulisun/LPEM.

PaperID: 1030,

Authors: Lei Liu, Mengya Zhang, Cheng Li, Chenglong Li, Jin Tang

Affiliations: School of Computer Science and Technology, Anhui University, Hefei, China

Title: Cross-Modal Object Tracking via Modality-Aware Fusion Network and a Large-Scale Dataset

Abstract:
Visual object tracking often faces challenges such as invalid targets and decreased performance in low-light conditions when relying solely on RGB image sequences. While incorporating additional modalities like depth and infrared data has proven effective, existing multimodal imaging platforms are complex and lack real-world applicability. In contrast, near-infrared (NIR) imaging, commonly used in surveillance cameras, can switch between RGB and NIR based on light intensity. However, tracking objects across these heterogeneous modalities poses significant challenges, particularly due to the absence of modality switch signals during tracking. To address these challenges, we propose an adaptive cross-modal object tracking algorithm called modality-aware fusion network (MAFNet). MAFNet efficiently integrates information from both RGB and NIR modalities using an adaptive weighting mechanism, effectively bridging the appearance gap and enabling a modality-aware target representation. It consists of two key components: an adaptive weighting module and a modality-specific representation module. The adaptive weighting module predicts fusion weights to dynamically adjust the contribution of each modality, while the modality-specific representation module captures discriminative features specific to RGB and NIR modalities. MAFNet offers great flexibility as it can effortlessly integrate into diverse tracking frameworks. With its simplicity, effectiveness, and efficiency, MAFNet outperforms state-of-the-art methods in cross-modal object tracking. To validate the effectiveness of our algorithm and overcome the scarcity of data in this field, we introduce CMOTB, a comprehensive and extensive benchmark dataset for cross-modal object tracking. CMOTB consists of 61 categories and 1000 video sequences, comprising a total of over 799K frames. We believe that our proposed method and dataset offer a strong foundation for advancing cross-modal object-tracking research. The dataset, toolkit, experimental data, and source code will be publicly available at: https://github.com/mmic-lcl/ Datasets-and-benchmark-code.

PaperID: 1031,

Authors: Vincenzo Dentamaro, Paolo Giglio, Donato Impedovo, Giuseppe Pirlo, Marco Di Ciano

Affiliations: Department of Computer Science, University of Bari Aldo Moro, Bari, Italy

Title: An Interpretable Adaptive Multiscale Attention Deep Neural Network for Tabular Data

Abstract:
Deep learning (DL) has been demonstrated to be a valuable tool for analyzing signals such as sounds and images, thanks to its capabilities of automatically extracting relevant patterns as well as its end-to-end training properties. When applied to tabular structured data, DL has exhibited some performance limitations compared to shallow learning techniques. This work presents a novel technique for tabular data called adaptive multiscale attention deep neural network architecture (also named excited attention). By exploiting parallel multilevel feature weighting, the adaptive multiscale attention can successfully learn the feature attention and thus achieve high levels of F1-score on seven different classification tasks (on small, medium, large, and very large datasets) and low mean absolute errors on four regression tasks of different size. In addition, adaptive multiscale attention provides four levels of explainability (i.e., comprehension of its learning process and therefore of its outcomes): 1) calculates attention weights to determine which layers are most important for given classes; 2) shows each feature’s attention across all instances; 3) understands learned feature attention for each class to explore feature attention and behavior for specific classes; and 4) finds nonlinear correlations between co-behaving features to reduce dataset dimensionality and improve interpretability. These interpretability levels, in turn, allow for employing adaptive multiscale attention as a useful tool for feature ranking and feature selection.

PaperID: 1032,

Authors: Zicheng Pan, Xiaohan Yu, Miaohua Zhang, Weichuan Zhang, Yongsheng Gao

Affiliations: Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia

Title: DyCR: A Dynamic Clustering and Recovering Network for Few-Shot Class-Incremental Learning

Abstract:
Few-shot class-incremental learning (FSCIL) aims to continually learn novel data with limited samples. One of the major challenges is the catastrophic forgetting problem of old knowledge while training the model on new data. To alleviate this problem, recent state-of-the-art methods adopt a well-trained static network with fixed parameters at incremental learning stages to maintain old knowledge. These methods suffer from the poor adaptation of the old model with new knowledge. In this work, a dynamic clustering and recovering network (DyCR) is proposed to tackle the adaptation problem and effectively mitigate the forgetting phenomena on FSCIL tasks. Unlike static FSCIL methods, the proposed DyCR network is dynamic and trainable during the incremental learning stages, which makes the network capable of learning new features and better adapting to novel data. To address the forgetting problem and improve the model performance, a novel orthogonal decomposition mechanism is developed to split the feature embeddings into context and category information. The context part is preserved and utilized to recover old class features in future incremental learning stages, which can mitigate the forgetting problem with a much smaller size of data than saving the raw exemplars. The category part is used to optimize the feature embedding space by moving different classes of samples far apart and squeezing the sample distances within the same classes during the training stage. Experiments show that the DyCR network outperforms existing methods on four benchmark datasets. The code is available at: https://github.com/zichengpan/DyCR.

PaperID: 1033,

Authors: Qihui Han, Cheolkon Jung

Affiliations: School of Electronic Engineering, Xidian University, Xian, China

Title: Deep Selective Fusion of Visible and Near-Infrared Images Using Unsupervised U-Net

Abstract:
In low light conditions, visible (VIS) images are of a low dynamic range (low contrast) with severe noise and color, while near-infrared (NIR) images contain clear textures without noise and color. Multispectral fusion of VIS and NIR images produces color images of high quality, rich textures, and little noise by taking both advantages of VIS and NIR images. In this article, we propose the deep selective fusion of VIS and NIR images using unsupervised U-Net. Existing image fusion methods are afflicted with the low contrast in VIS images and flash-like effect in NIR images. Thus, we adopt unsupervised U-Net to achieve deep selective fusion of multiple scale features. Due to the absence of the ground truth, we use unsupervised learning by formulating an energy function as a loss function. To deal with insufficient training data, we perform data augmentation by rotating images and adjusting their intensity. We synthesize training data by degrading clean VIS images and masking clean NIR images using a circle. First, we utilize pretrained visual geometry group (VGG) to extract features from VIS images. Second, we build an encoding network to obtain edge information from NIR images. Finally, we combine all features and feed them into a decoding network for fusion. Experimental results demonstrate that the proposed fusion network produces visually pleasing results with fine details, little noise, and natural color and it is superior to state-of-the-art methods in terms of visual quality and quantitative measurements.

PaperID: 1034,

Authors: Jifeng Guo, C. L. Philip Chen, Zhulin Liu, Xixin Yang

Affiliations: School of Computer Science and Engineering, South China University of Technology, Guangzhou, China; College of Computer Science and Technology and the School of Automation, Qingdao University, Qingdao, China

Title: Dynamic Neural Network Structure: A Review for its Theories and Applications

Abstract:
The dynamic neural network (DNN), in contrast to the static counterpart, offers numerous advantages, such as improved accuracy, efficiency, and interpretability. These benefits stem from the network’s flexible structures and parameters, making it highly attractive and applicable across various domains. As the broad learning system (BLS) continues to evolve, DNNs have expanded beyond deep learning (DL), orienting a more comprehensive range of domains. Therefore, this comprehensive review article focuses on two prominent areas where DNN structures have rapidly developed: 1) DL and 2) broad learning. This article provides an in-depth exploration of the techniques related to dynamic construction and inference. Furthermore, it discusses the applications of DNNs in diverse domains while also addressing open issues and highlighting promising research directions. By offering a comprehensive understanding of DNNs, this article serves as a valuable resource for researchers, guiding them toward future investigations.

PaperID: 1035,

Authors: Bahareh Nikpour, Dimitrios Sinodinos, Narges Armanfard

Affiliations: Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada

Title: Deep Reinforcement Learning in Human Activity Recognition: A Survey and Outlook

Abstract:
Human activity recognition (HAR) is a popular research field in computer vision that has already been widely studied. However, it is still an active research field since it plays an important role in many current and emerging real-world intelligent systems, like visual surveillance and human–computer interaction. Deep reinforcement learning (DRL) has recently been used to address the activity recognition problem with various purposes, such as finding attention in video data or obtaining the best network structure. DRL-based HAR has only been around for a short time, and it is a challenging, novel field of study. Therefore, to facilitate further research in this area, we have constructed a comprehensive survey on activity recognition methods that incorporate DRL. Throughout the article, we classify these methods according to their shared objectives and delve into how they are ingeniously framed within the DRL framework. As we navigate through the survey, we conclude by shedding light on the prominent challenges and lingering questions that await the attention of future researchers, paving the way for further advancements and breakthroughs in this exciting domain.

PaperID: 1036,

Authors: Diego Isla-Cernadas, Manuel Fernández Delgado, Eva Cernadas, Manisha Sanjay Sirsat, Haitham Maarouf, Senén Barro

Affiliations: Centro Singular de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS), University of Santiago de Compostela, Santiago de Compostela, Spain; Department of Data Management and Risk Analysis, InnovPlantProtect, Elvás, Portugal

Title: Closed-Form Gaussian Spread Estimation for Small and Large Support Vector Classification

Abstract:
The support vector machine (SVM) with Gaussian kernel often achieves state-of-the-art performance in classification problems, but requires the tuning of the kernel spread. Most optimization methods for spread tuning require training, being slow and not suited for large-scale datasets. We formulate an analytic expression to calculate, directly from data without iterative search, the spread minimizing the difference between Gaussian and ideal kernel matrices. The proposed direct gamma tuning (DGT) equals the performance of and is one to two orders of magnitude faster than the state-of-the art approaches on 30 small datasets. Combined with random sampling of training patterns, it also runs on large classification problems. Our method is very efficient in experiments with 20 large datasets up to 31 million of patterns, it is faster and performs significantly better than linear SVM, and it is also faster than iterative minimization. Code is available upon paper acceptance from this link: https://persoal.citius.usc.es/manuel.fernandez.delgado/papers/dgt/index.html and from CodeOcean: https://codeocean.com/capsule/4271163/tree/v1.

PaperID: 1037,

Authors: Yong-Sheng Ma, Wei-Wei Che

Affiliations: College of Information Science and Engineering, Northeastern University, Shenyang, China

Title: A Hierarchical Distributed Data-Driven Adaptive Learning Control for Nonaffine Nonlinear MASs

Abstract:
This article designs a new hierarchical distributed data-driven adaptive learning control algorithm to accomplish the leader-following tracking control objective for nonaffine nonlinear multiagent systems (MASs). The proposed hierarchical control structure is composed of a distributed observer and a decentralized data-driven adaptive learning controller. Considering that some followers cannot directly receive information from the leader, a distributed observer is designed to estimate the information of the leader. Based on this, a decentralized data-driven adaptive learning controller is further devised to enable the follower to track the estimated information of the leader, where the model parameter learning algorithm is developed to capture the dynamic characteristics of the original system. One advantage of the developed hierarchical control learning algorithm is that neither the leader’s system model nor the follower’s system model is needed. The other one is the elimination of the noncausal problem without the additional assumption. Simulation results exemplify the merits of the theoretical results by comparisons.

PaperID: 1038,

Authors: Tianyu Zhang, Chengbin Hou, Rui Jiang, Xuegong Zhang, Chenghu Zhou, Ke Tang, Hairong Lv

Affiliations: Department of Automation, Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China; School of Computer Science and Engineering, Fuyao Institute of Technology, Fuzhou, China; State Key Laboratory of Resources and Environmental Information Systems, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China; Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China

Title: Label Informed Contrastive Pretraining for Node Importance Estimation on Knowledge Graphs

Abstract:
Node importance estimation (NIE) is the task of inferring the importance scores of the nodes in a graph. Due to the availability of richer data and knowledge, recent research interests of NIE have been dedicated to knowledge graphs (KGs) for predicting future or missing node importance scores. Existing state-of-the-art NIE methods train the model by available labels, and they consider every interested node equally before training. However, the nodes with higher importance often require or receive more attention in real-world scenarios, e.g., people may care more about the movies or webpages with higher importance. To this end, we introduce Label Informed ContrAstive Pretraining (LICAP) to the NIE problem for being better aware of the nodes with high importance scores. Specifically, LICAP is a novel type of contrastive learning (CL) framework that aims to fully utilize continuous labels to generate contrastive samples for pretraining embeddings. Considering the NIE problem, LICAP adopts a novel sampling strategy called top nodes preferred hierarchical sampling to first group all interested nodes into a top bin and a nontop bin based on node importance scores, and then divide the nodes within the top bin into several finer bins also based on the scores. The contrastive samples are generated from those bins and are then used to pretrain node embeddings of KGs via a newly proposed predicate-aware graph attention networks (PreGATs), so as to better separate the top nodes from nontop nodes, and distinguish the top nodes within the top bin by keeping the relative order among finer bins. Extensive experiments demonstrate that the LICAP pretrained embeddings can further boost the performance of existing NIE methods and achieve new state-of-the-art performance regarding both regression and ranking metrics. The source code for reproducibility is available at https://github.com/zhangtia16/LICAP.

PaperID: 1039,

Authors: Kun Zhu, Pengyu Song, Chunhui Zhao

Affiliations: College of Control Science and Engineering, Zhejiang University, Hangzhou, China

Title: Fuzzy State-Driven Cross-Time Spatial Dependence Learning for Multivariate Time-Series Anomaly Detection

Abstract:
Cross-time spatial dependence (i.e., the interaction between different variables at different time points) is indispensable for detecting anomalies in multivariate time series, as certain anomalies may have time delays in their propagation from one variable to another. However, accurately capturing cross-time spatial dependence remains a challenge. Specifically, real-world time series usually exhibits complex and incomprehensible evolutions that may be compounded by multiple temporal states (i.e., temporal patterns, such as rising, fluctuating, and peak). These temporal states mix and overlap with each other and exhibit dynamic and heterogeneous evolution laws in different time series, making the cross-time spatial dependence extremely intricate and mutable. Therefore, a cross-time spatial graph network with fuzzy embedding is proposed to disentangle latent and mixing temporal states and exploit it to meticulously learn cross-time spatial dependence. First, considering that temporal states are diversiform and their mixing modes are unknown, we introduce a fuzzy state set to uniformly characterize potential temporal states and adaptively generate corresponding membership degrees to depict how these states mix. Further, we propose a cross-time spatial graph, quantifying similarities among fuzzy states and sensing their dynamic evolutions, to flexibly learn mutable cross-time spatial dependence. Finally, we design state diversity and temporal proximity constraints to ensure the differences among fuzzy states and the evolution continuity of fuzzy states. Experiments on real-world datasets show that the proposed model outperforms the state-of-the-art models.

PaperID: 1040,

Authors: Giulia Fracastoro, Sophie M. Fosson, Andrea Migliorati, Giuseppe Carlo Calafiore

Affiliations: Department of Electronics and Telecommunications (DET), Politecnico di Torino, Turin, Italy; Department of Control and Computer Engineering (DAUIN), Politecnico di Torino, Turin, Italy

Title: Playing the Lottery With Concave Regularizers for Sparse Trainable Neural Networks

Abstract:
The design of sparse neural networks, i.e., of networks with a reduced number of parameters, has been attracting increasing research attention in the last few years. The use of sparse models may significantly reduce the computational and storage footprint in the inference phase. In this context, the lottery ticket hypothesis (LTH) constitutes a breakthrough result, that addresses not only the performance of the inference phase, but also of the training phase. It states that it is possible to extract effective sparse subnetworks, called winning tickets, that can be trained in isolation. The development of effective methods to play the lottery, i.e., to find winning tickets, is still an open problem. In this article, we propose a novel class of methods to play the lottery. The key point is the use of concave regularization to promote the sparsity of a relaxed binary mask, which represents the network topology. We theoretically analyze the effectiveness of the proposed method in the convex framework. Then, we propose extended numerical tests on various datasets and architectures, that show that the proposed method can improve the performance of state-of-the-art algorithms.

PaperID: 1041,

Authors: Xiao Ma, Yuan Yuan, Lei Guo

Affiliations: School of Astronautics, Northwestern Polytechnical University, Xi’an, China; School of Automation Science and Electrical Engineering, Beihang University, Beijing, China

Title: Hierarchical Reinforcement Learning for UAV-PE Game With Alternative Delay Update Method

Abstract:
This article proposes a novel hierarchical reinforcement learning (HRL) algorithm for unmanned aerial vehicle pursuit-evasion (UAV-PE) game systems with an alternative delay update (ADU) method. In the proposed algorithm, the approximate solutions of the UAV-PE game problem are derived from a hierarchical learning process, which relies on a zero-sum game process of kinematics and a corresponding optimal process of dynamics. In this case, deep neural networks (NNs) are used to approximate the policy and value functions of UAV-PE game systems in kinematics and dynamics level. Furthermore, the ADU method is adopted to improve the training efficiency of deep NN by fixing one player of the UAV-PE game systems to form a stable environment. The goal of this article is to develop an HRL algorithm with an ADU method for obtaining approximate Nash equilibrium (NE) solutions of the considered UAV-PE game systems which are subjected to the coupling of kinematics and dynamics. Subsequently, sufficient conditions are provided for analyzing the convergence and optimality of the proposed HRL algorithm. Moreover, the inequalities of overload are obtained to guarantee that the state of dynamics tracks with the control input of kinematics in UAV-PE game systems. Finally, simulation examples are provided to demonstrate the feasibility and usefulness of the proposed HRL algorithm and ADU method.

PaperID: 1042,

Authors: Bo Zhao, Shunchao Zhang, Derong Liu

Affiliations: School of Systems Science, Beijing Normal University, Beijing, China; School of Internet Finance and Information Engineering, Guangdong University of Finance, Guangzhou, China; School of System Design and Intelligent Manufacturing, Southern University of Science and Technology, Shenzhen, China

Title: Self-Triggered Approximate Optimal Neuro-Control for Nonlinear Systems Through Adaptive Dynamic Programming

Abstract:
In this article, a novel self-triggered approximate optimal neuro-control scheme is presented for nonlinear systems by utilizing adaptive dynamic programming (ADP). According to the Bellman principle of optimality, the cost function of the general nonlinear system is approximated by building a critic neural network with a nested updating weight vector. Thus, the Hamilton–Jacobi–Bellman equation is solved to indirectly obtain the approximate optimal neuro-control input. In order to reduce the computation, the communication bandwidth, and the energy consumption, an appropriate self-triggering condition is designed as an alternative way to predict the updating time instants of the approximate optimal neuro-control policy. On the basis of Lyapunov’s direct method, the stability of the closed-loop nonlinear system is analyzed and guaranteed to be uniformly ultimately bounded. Simulation results of two practical systems illustrate the present ADP-based self-triggered approximate optimal neuro-control scheme to be reasonable and effective.

PaperID: 1043,

Authors: Jialin Tian, Abdulmotaleb El Saddik, Xing Xu, Dongshuai Li, Zuo Cao, Heng Tao Shen

Affiliations: College of Electronic and Information Engineering, Tongji University, Shanghai, China; School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, Canada; Meituan, Shanghai, China

Title: Intrinsic Consistency Preservation With Adaptively Reliable Samples for Source-Free Domain Adaptation

Abstract:
Unsupervised domain adaptation (UDA) aims to alleviate the domain shift by transferring knowledge learned from a labeled source dataset to an unlabeled target domain. Although UDA has seen promising progress recently, it requires access to data from both domains, making it problematic in source data-absent scenarios. In this article, we investigate a practical task source-free domain adaptation (SFDA) that alleviates the limitations of the widely studied UDA in simultaneously acquiring source and target data. In addition, we further study the imbalanced SFDA (ISFDA) problem, which addresses the intra-domain class imbalance and inter-domain label shift in SFDA. We observe two key issues in SFDA that: 1) target data form clusters in the representation space regardless of whether the target data points are aligned with the source classifier and 2) target samples with higher classification confidence are more reliable and have less variation in their classification confidence during adaptation. Motivated by these observations, we propose a unified method, named intrinsic consistency preservation with adaptively reliable samples (ICPR), to jointly cope with SFDA and ISFDA. Specifically, ICPR first encourages the intrinsic consistency in the predictions of neighbors for unlabeled samples with weak augmentation (standard flip-and-shift), regardless of their reliability. ICPR then generates strongly augmented views specifically for adaptively selected reliable samples and is trained to fix the intrinsic consistency between weakly and strongly augmented views of the same image concerning predictions of neighbors and their own. Additionally, we propose to use a prototype-like classifier to avoid the classification confusion caused by severe intra-domain class imbalance and inter-domain label shift. We demonstrate the effectiveness and general applicability of ICPR on six benchmarks of both SFDA and ISFDA tasks. The reproducible code of our proposed ICPR method is available at https://github.com/CFM-MSG/Code_ICPR.

PaperID: 1044,

Authors: Haoran Dou, Seppo Virtanen, Nishant Ravikumar, Alejandro F. Frangi

Affiliations: Center for Computational Imaging and Simulation Technologies in Biomedicine, School of Computing, University of Leeds, Leeds, U.K; Christabel Pankhurst Institute, Division of Informatics, Imaging and Data Sciences, and the Department of Computer Science, The University of Manchester, Manchester, U.K.

Title: A Generative Shape Compositional Framework to Synthesize Populations of Virtual Chimeras

Abstract:
Generating virtual organ populations that capture sufficient variability while remaining plausible is essential to conduct in silico trials (ISTs) of medical devices. However, not all anatomical shapes of interest are always available for each individual in a population. The imaging examinations and modalities used can vary between subjects depending on their individualized clinical pathways. Different imaging modalities may have various fields of view and are sensitive to signals from other tissues/organs, or both. Hence, missing/partially overlapping anatomical information is often available across individuals. We introduce a generative shape model for multipart anatomical structures, learnable from sets of unpaired datasets, i.e., where each substructure in the shape assembly comes from datasets with missing or partially overlapping substructures from disjoint subjects of the same population. The proposed generative model can synthesize complete multipart shape assemblies coined virtual chimeras (VCs). We applied this framework to build VCs from databases of whole-heart shape assemblies that each contribute samples for heart substructures. Specifically, we propose a graph neural network-based generative shape compositional framework, which comprises two components, a part-aware generative shape model that captures the variability in shape observed for each structure of interest in the training population and a spatial composition network that assembles/composes the structures synthesized by the former into multipart shape assemblies (i.e., VCs). We also propose a novel self-supervised learning scheme that enables the spatial composition network to be trained with partially overlapping data and weak labels. We trained and validated our approach using shapes of cardiac structures derived from cardiac magnetic resonance (MR) images in the UK Biobank (UKBB). When trained with complete and partially overlapping data, our approach significantly outperforms a principal component analysis (PCA)-based shape model (trained with complete data) in terms of generalizability and specificity. This demonstrates the superiority of the proposed method, as the synthesized cardiac virtual populations are more plausible and capture a greater degree of shape variability than those generated by the PCA-based shape model.

PaperID: 1045,

Authors: Mou Wu, Haibin Liao, Zhengtao Ding, Yonggang Xiao

Affiliations: School of Computer Science and Technology and the Laboratory of Optoelectronic Information and Intelligent Control, Hubei University of Science and Technology, Xianning, China; School of Electronic and Electrical Engineering, Wuhan Textile University, Wuhan, China; Department of Electrical and Electronic Engineering, The University of Manchester, Manchester, U.K

Title: MUSIC: Accelerated Convergence for Distributed Optimization With Inexact and Exact Methods

Abstract:
Gradient-type distributed optimization methods have blossomed into one of the most important tools for solving a minimization learning task over a networked agent system. However, only one gradient update per iteration makes it difficult to achieve a substantive acceleration of convergence. In this article, we propose an accelerated framework named multiupdates single-combination (MUSIC) allowing each agent to perform multiple local updates and a single combination in each iteration. More importantly, we equip inexact and exact distributed optimization methods into this framework, thereby developing two new algorithms that exhibit accelerated linear convergence and high communication efficiency. Our rigorous convergence analysis reveals the sources of steady-state errors arising from inexact policies and offers effective solutions. Numerical results based on synthetic and real datasets demonstrate both our theoretical motivations and analysis, as well as performance advantages.

PaperID: 1046,

Authors: Wei Liu, Jianhang Zhao, Huanyu Zhao, Qian Ma, Shengyuan Xu, Ju H. Park

Affiliations: Faculty of Automation, Huaiyin Institute of Technology, Huaian, China; School of Automation, Nanjing University of Science and Technology, Nanjing, China

Title: Neural Preassigned Performance Control for State-Constrained Nonlinear Systems Subject to Disturbances

Abstract:
This article addresses the finite-time neural predefined performance control (PPC) issue for state-constrained nonlinear systems (NSs) with exogenous disturbances. By integrating the predefined-time performance function (PTPF) and the conventional barrier Lyapunov function (BLF), a new set of time-varying BLFs is designed to constrain the error variables. This establishes conditions for satisfying full-state constraints while ensuring that the tracking error meets the predefined performance indicators (PPIs) within a predefined time. Additionally, the incorporation of the nonlinear disturbance observer technique (NDOT) in the control design significantly enhances the ability of the system to reject disturbances and improves overall robustness. Leveraging recursive design based on dynamic surface control (DSC), a finite-time neural adaptive PPC strategy is devised to ensure that the closed-loop system is semi-globally practically finite-time stable (SPFS) and achieves the desired PPIs. Finally, the simulation results of two practical examples validate the efficacy and viability of the proposed approach.

PaperID: 1047,

Authors: Hongru Ren, Zeyi Liu, Hongjing Liang, Hongyi Li

Affiliations: School of Automation, Guangdong-Hong Kong Joint Laboratory for Intelligent Decision and Cooperative Control, and Guangdong Provincial Key Laboratory for Intelligent Decision and Cooperative Control, Guangdong University of Technology, Guangzhou, China; College of Information Science and Engineering, Northeastern University, Shenyang, China; School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China; College of Electronic and Information Engineering, Southwest University, Chongqing, China

Title: Pinning-Based Neural Control for Multiagent Systems With Self-Regulation Intermediate Event-Triggered Method

Abstract:
A pinning-based self-regulation intermediate event-triggered (ET) funnel tracking control strategy is proposed for uncertain nonlinear multiagent systems (MASs). Based on the backstepping framework, a pinning control strategy is designed to achieve the tracking control objective, which only uses the communication weight between the agents without additional feedback parameters. Moreover, by designing a self-regulation triggered condition based on the tracking error, the intermediate triggered signal is calculated to replace the continuous signal in the controller, so as to achieve the goal of discontinuous update of the controller signal, and this mechanism does not need to add additional compensation function to the controller signal. At the same time, the funnel method is adopted to restrict the error of step n and avoid the possible negative impact caused by control signal. Furthermore, the nonlinear noncontinuous faults are compensated by the disturbance observer. Then, the Lyapunov stability theorem is used to prove that all signals of the closed-loop system are semiglobally uniformly ultimately bounded (SGUUB). Finally, some simulation results confirm the effectiveness of the proposed control scheme.

PaperID: 1048,

Authors: Boyu Qiao, Wei Zhou, Kun Li, Shilong Li, Songlin Hu

Affiliations: Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China

Title: Dispelling the Fake: Social Bot Detection Based on Edge Confidence Evaluation

Abstract:
Social bot detection is essential for maintaining the safety and integrity of online social networks (OSNs). Graph neural networks (GNNs) have emerged as a promising solution. Mainstream GNN-based social bot detection methods learn rich user representations by recursively performing message passing along user–user interaction edges, where users are treated as nodes and their relationships as edges. However, these methods face challenges when detecting advanced bots interacting with genuine accounts. Interaction with real accounts results in the graph structure containing camouflaged and unreliable edges. These unreliable edges interfere with the differentiation between bot and human representations, and the iterative graph encoding process amplifies this unreliability. In this article, we propose a social Bot detection method based on Edge Confidence Evaluation (BECE). Our model incorporates an edge confidence evaluation module that assesses the reliability of the edges and identifies the unreliable edges. Specifically, we design features for edges based on the representation of user nodes and introduce parameterized Gaussian distributions to map the edge embeddings into a latent semantic space. We optimize these embeddings by minimizing Kullback–Leibler (KL) divergence from the standard distribution and evaluate their confidence based on edge representation. Experimental results on three real-world datasets demonstrate that BECE is effective and superior in social bot detection. Additionally, experimental results on six widely used GNN architectures demonstrate that our proposed edge confidence evaluation module can be used as a plug-in to improve detection performance.

PaperID: 1049,

Authors: Alireza Ramezani Moghaddam, Hamed Kebriaei

Affiliations: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran

Title: Expected Policy Gradient for Network Aggregative Markov Games in Continuous Space

Abstract:
In this article, we investigate the Nash-seeking problem of a set of agents, playing an infinite network aggregative Markov game. In particular, we focus on a noncooperative framework where each agent selfishly aims at maximizing its long-term average reward without having explicit information on the model of the environment dynamics and its own reward function. The main contribution of this article is to develop a continuous multiagent reinforcement learning (MARL) algorithm for the Nash-seeking problem in infinite dynamic games with convergence guarantee. To this end, we propose an actor–critic MARL algorithm based on expected policy gradient (EPG) with two general function approximators to estimate the value function and the Nash policy of the agents. We consider continuous state and action spaces and adopt a newly proposed EPG to alleviate the variance of the gradient approximation. Based on such formulation and under some conventional assumptions (e.g., using linear function approximators), we prove that the policies of the agents converge to the unique Nash equilibrium (NE) of the game. Furthermore, an estimation error analysis is conducted to investigate the effects of the error arising from function approximation. As a case study, the framework is applied on a cloud radio access network (C-RAN) by modeling the remote radio heads (RRHs) as the agents and the congestion of baseband units (BBUs) as the dynamics of the environment.

PaperID: 1050,

Authors: Jianqiao Sun, Bo Chen, Ruiying Lu, Ziheng Cheng, Chunhui Qu, Xin Yuan

Affiliations: National Key Laboratory of Radar Signal Processing, Xidian University, Xi’an, China; School of Cyber Engineering, Xidian University, Xi’an, China; School of Engineering, Westlake University, Hangzhou, Zhejiang, China

Title: Advancing Hyperspectral and Multispectral Image Fusion: An Information-Aware Transformer-Based Unfolding Network

Abstract:
In hyperspectral image (HSI) processing, the fusion of the high-resolution multispectral image (HR-MSI) and the low-resolution HSI (LR-HSI) on the same scene, known as MSI-HSI fusion, is a crucial step in obtaining the desired high-resolution HSI (HR-HSI). With the powerful representation ability, convolutional neural network (CNN)-based deep unfolding methods have demonstrated promising performances. However, limited receptive fields of CNN often lead to inaccurate long-range spatial features, and inherent input and output images for each stage in unfolding networks restrict the feature transmission, thus limiting the overall performance. To this end, we propose a novel and efficient information-aware transformer-based unfolding network (ITU-Net) to model the long-range dependencies and transfer more information across the stages. Specifically, we employ a customized transformer block to learn representations from both the spatial and frequency domains as well as avoid the quadratic complexity with respect to the input length. For spatial feature extractions, we develop an information transfer guided linearized attention (ITLA), which transmits high-throughput information between adjacent stages and extracts contextual features along the spatial dimension in linear complexity. Moreover, we introduce frequency domain learning in the feedforward network (FFN) to capture token variations of the image and narrow the frequency gap. Via integrating our proposed transformer blocks with the unfolding framework, our ITU-Net achieves state-of-the-art (SOTA) performance on both synthetic and real hyperspectral datasets.

PaperID: 1051,

Authors: Renwei Dian, Yuanye Liu, Shutao Li

Title: Hyperspectral Image Fusion via a Novel Generalized Tensor Nuclear Norm Regularization

Abstract:
Recently, low-rank tensor regularization has received more and more attention in hyperspectral and multispectral fusion (HMF). However, these methods often suffer from inflexible low-rank tensor definition and are highly sensitive to the permutation of tensor modes, which hinder their performance. To tackle this problem, we propose a novel generalized tensor nuclear norm (GTNN)-based approach for the HMF. First, we define a novel GTNN by extending the existing third-mode-based tensor nuclear norm (TNN) to arbitrary mode, which conducts the Fourier transform on an arbitrary single mode and then computes the TNN for each mode. In this way, we can not only capture more extensive correlations for the three modes of a tensor, and also omit the adverse effect of permutation of tensor modes. To utilize the correlations among spectral bands, the high-resolution hyperspectral image (HSI) is approximated as low-rank spectral basis multiplication by coefficients, and we estimate the spectral basis by conducting singular-value decomposition (SVD) on HSI. Then, the coefficients are estimated by addressing the proposed GTNN regularized optimization. In specific, to exploit the non-local similarities of the HSI, we first cluster the patches of the coefficient into a 3-D, which contains spatial, spectral, and non-local modes. Since the collected tensor contains the strong non-local spatial-spectral similarities of the HSI, the proposed low-rank tensor regularization is imposed on these collected tensors, which fully model the non-local self-similarities. Fusion experiments on both simulated and real datasets prove the advantages of this approach. The code is available at https://github.com/renweidian/GTNN.

PaperID: 1052,

Authors: Shiqi Fan, Guoxi Fan, Hongyi Nie, Quanming Yao, Yang Liu, Xuelong Li, Zhen Wang

Affiliations: Department of Electronic Engineering, Tsinghua University, Beijing, China; School of Artificial Intelligence, Optics and Electronics, Northwestern Polytechnical University, Xi’an, China; Institute of Artificial Intelligence (TeleAI), China Telecom, Beijing, China

Title: Flow to Candidate: Temporal Knowledge Graph Reasoning With Candidate-Oriented Relational Graph

Abstract:
Reasoning over temporal knowledge graphs (TKGs) is a challenging task that requires models to infer future events based on past facts. Currently, subgraph-based methods have become the state-of-the-art (SOTA) techniques for this task due to their superior capability to explore local information in knowledge graphs (KGs). However, while previous methods have been effective in capturing semantic patterns in TKG, they are hard to capture more complex topological patterns. In contrast, path-based methods can efficiently capture relation paths between nodes and obtain relation patterns based on the order of relation connections. But subgraphs can retain much more information than a single path. Motivated by this observation, we propose a new subgraph-based approach to capture complex relational patterns. The method constructs candidate-oriented relational graphs to capture the local structure of TKGs and introduces a variant of a graph neural network model to learn the graph structure information between query-candidate pairs. In particular, we first design a prior directed temporal edge sampling method, which is starting from the query node and generating multiple candidate-oriented relational graphs simultaneously. Next, we propose a recursive propagation architecture that can encode all relational graphs in the local structures in parallel. Additionally, we introduce a self-attention mechanism in the propagation architecture to capture the query’s preference. Finally, we design a simple scoring function to calculate the candidate nodes’ scores and generate the model’s predictions. To validate our approach, we conduct extensive experiments on four benchmark datasets (ICEWS14, ICEWS18, ICEWS0515, and YAGO). Experiments on four benchmark datasets demonstrate that our proposed approach possesses stronger inference and faster convergence than the SOTA methods. In addition, our method provides a relational graph for each query-candidate pair, which offers interpretable evidence for TKG prediction results.

PaperID: 1053,

Authors: Linjun Zhong, C. L. Philip Chen, Jifeng Guo, Tong Zhang

Affiliations: School of Computer Science and Engineering, South China University of Technology, Guangzhou, China; School of Computer Science and Engineering and the Guangdong Provincial Key Laboratory of Computational Intelligence and Cyberspace Information, South China University of Technology, Guangzhou, China

Title: Robust Incremental Broad Learning System for Data Streams of Uncertain Scale

Abstract:
Due to its marvelous performance and remarkable scalability, a broad learning system (BLS) has aroused a wide range of attention. However, its incremental learning suffers from low accuracy and long training time, especially when dealing with unstable data streams, making it difficult to apply in real-world scenarios. To overcome these issues and enrich its relevant research, a robust incremental BLS (RI-BLS) is proposed. In this method, the proposed weight update strategy introduces two memory matrices to store the learned information, thus the computational procedure of ridge regression is decomposed, resulting in precomputed ridge regression. During incremental learning, RI-BLS updates two memory matrices and renews weights via precomputed ridge regression efficiently. In addition, this update strategy is theoretically analyzed in error, time complexity, and space complexity compared with existing incremental BLSs. Different from Greville’s method used in the original incremental BLS, its results are closer to the solution of one-shot calculation. Compared with the existing incremental BLSs, the proposed method exhibits more stable time complexity and superior space complexity. The experiments prove that RI-BLS outperforms other incremental BLSs when handling both stable and unstable data streams. Furthermore, experiments demonstrate that the proposed weight update strategy applies to other random neural networks as well.

PaperID: 1054,

Authors: Wenqiang Cao, Jing Yan, Xian Yang, Cailian Chen, Xinping Guan

Affiliations: Institute of Electrical Engineering, Yanshan University, Qinhuangdao, China; Institute of Information Science and Engineering, Yanshan University, Qinhuangdao, China; School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China

Title: Bearing Rigidity-Based Flocking Control of AUVs via Semi-Supervised Incremental Broad Learning

Abstract:
Flocking control of autonomous underwater vehicles (AUVs) has been regarded as the basis of many sophisticated marine coordination missions. However, there is still a research gap on the flocking of AUVs in weak communication and complex marine environment. This article attempts to fill up the above research gap from graph theory and intelligent learning perspectives. We first employ the bearing rigidity graph to describe the topology relationships of AUVs, through which an iterative gradient decent-based localization estimator is provided to obtain the position information. In order to improve the localization accuracy and energy efficiency, a min-weighted bearing rigidity graph generation strategy is developed. Along with this, we adopt the semi-supervised broad learning system (BLS) to design the model-free flocking controllers for AUVs in obstacle environment. The innovations of this article are summarized as follows: 1) the min-weighted bearing rigidity-based localization strategy can balance the localization accuracy and communication consumption as compared to the neighboring rule-based solutions and 2) the semi-supervised broad learning-based flocking controller can decrease the training time and solve the label limit over the supervised learning-based controllers. Finally, simulation and experimental studies are provided to verify the effectiveness.

PaperID: 1055,

Authors: Ukjo Hwang, Songnam Hong

Affiliations: Department of Electronic Engineering, Hanyang University, Seoul, South Korea

Title: On Practical Robust Reinforcement Learning: Adjacent Uncertainty Set and Double-Agent Algorithm

Abstract:
Robust reinforcement learning (RRL) aims to seek a robust policy by optimizing the worst case performance over an uncertainty set. This set contains some perturbed Markov decision processes (MDPs) from a nominal MDP (N-MDP) that generate samples for training, which reflects some potential mismatches between the training simulator (i.e., N-MDP) and real-world settings (i.e., the testing environments). Unfortunately, existing RRL algorithms are only applied to the tabular setting and it is still an open problem to extend them into more general continuous state space. We contribute to this subject in the following ways. We first construct an elaborated uncertainty set, which contains plausible (perturbed) MDPs only compared with the existing sets. Based on this, we propose a sample-based RRL algorithm [named adjacent robust Q-learning (ARQ-Learning)] for the tabular setting and characterize its finite-time error bound. Also, it is proved that ARQ-Learning converges as fast as the standard Q-learning and robust Q-learning (Robust-Q) while guaranteeing better robustness. Our major contribution is to introduce an additional pessimistic agent that can address the major hurdle for the extension of ARQ-Learning into cases with large or continuous state spaces. Leveraging this double-agent approach, we for the first time develop (model-free) RRL algorithms for continuous state/action spaces. Via experiments, we demonstrate the effectiveness of our algorithms.

PaperID: 1056,

Authors: Zhehao Jin, Andong Liu, Wen-An Zhang, Li Yu, Chenguang Yang

Affiliations: Department of Information Engineering, Zhejiang Provincial United Key Laboratory of Embedded Systems, Zhejiang University of Technology, Hangzhou, China; Department of Computer Science, University of Liverpool, Liverpool, U.K.

Title: Learning an Autonomous Dynamic System to Encode Periodic Human Motion Skills

Abstract:
Learning an autonomous dynamic system (ADS) encoding human motion rules has been shown as an effective way for human motion skills transfer. However, most existing approaches focus on goal-directed motion skills transfer, and the study on periodic motion skills transfer is rare. One popular approach for periodic motion skills transfer is learning periodic dynamic movement primitive (DMP); however, periodic DMP is sensitive to spatial disturbances due to the introduction of the phase parameters. To solve this issue, this brief presents a novel approach to learn an ADS with a stable limit cycle without introducing phase parameters. First, a data-driven Lyapunov function (energy function) is learned, such that one of its level surfaces is consistent with periodic human demonstration trajectories. Then, an ADS is learned by sequentially solving energy function-related constrained optimization problems. With a proper design of constraint functions, we can ensure that the trajectory generated by the ADS will converge to an energy function-level surface, of which the shape is similar to periodic human demonstration trajectories. Experiments are conducted to show the effectiveness of the proposed approach (PA).

PaperID: 1057,

Authors: Xunbi A. Ji, Gábor Orosz

Affiliations: Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA; Department of Mechanical Engineering and the Department of Civil and Environmental Engineering, University of Michigan, Ann Arbor, MI, USA

Title: Trainable Delays in Time Delay Neural Networks for Learning Delayed Dynamics

Abstract:
In this article, the connection between time delay systems and time delay neural networks (TDNNs) is presented from a continuous-time perspective. TDNNs are utilized to learn the nonlinear dynamics of time delay systems from trajectory data. The concept of TDNN with trainable delay (TrTDNN) is established, and training algorithms are constructed for learning the time delays and the nonlinearities simultaneously. The proposed techniques are tested on learning the dynamics of autonomous systems from simulation data and on learning the delayed longitudinal dynamics of a connected automated vehicle (CAV) from real experimental data.

PaperID: 1058,

Authors: Yutong Gao, Congyan Lang, Fayao Liu, Chuan-Sheng Foo, Yuanzhouhan Cao, Lijuan Sun, Yunchao Wei

Affiliations: Key Laboratory of Big Data and Artificial Intelligence in Transportation (Ministry of Education), School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China; Beijing Key Laboratory of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China; Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Fusionopolis, Singapore; School of Computer Science and Information Technology, Beijing Jiaotong University, Beijing, China; Intellectual Property Information Service Center, Beihang University, Beijing, China; Institute of Information Science, Beijing Jiaotong University, Beijing, China

Title: Mining Semantic Correlations Between Mispredictions and Corrections for Interactive Semantic Segmentation

Abstract:
Interactive semantic segmentation pursues high-quality segmentation results at the cost of a small number of user clicks. It is attracting more and more research attention for its convenience in labeling semantic pixel-level data. Existing interactive segmentation methods often pursue higher interaction efficiency by mining the latent information of user clicks or exploring efficient interaction manners. However, these works neglect to explicitly exploit the semantic correlations between user corrections and model mispredictions, thus suffering from two flaws. First, similar prediction errors frequently occur in actual use, causing users to repeatedly correct them. Second, the interaction difficulty of different semantic classes varies across images, but existing models use monotonic parameters for all images which lack semantic pertinence. Therefore, in this article, we explore the semantic correlations existing in corrections and mispredictions by proposing a simple yet effective online learning solution to the above problems, named correction-misprediction correlation mining (CM2). Specifically, we leverage the correction-misprediction similarities to design a confusion memory module (CMM) for automatic correction when similar prediction errors reappear. Furthermore, we measure the semantic interaction difficulty by counting the correction-misprediction pairs and design a challenge adaptive convolutional layer (CACL), which can adaptively switch different parameters according to interaction difficulties to better segment the challenging classes. Our method requires no extra training besides the online learning process and can effectively improve interaction efficiency. Our proposed CM2 achieves state-of-the-art results on three public semantic segmentation benchmarks.

PaperID: 1059,

Authors: Shihua Fu, Jun-e Feng, Yuan Zhao, Jianjun Wang, Jinfeng Pan

Affiliations: Research Center of Semi-Tensor Product of Matrices: Theory and Applications, Liaocheng University, Liaocheng, Shandong, China; School of Mathematics, Shandong University, Jinan, Shandong, China; School of Automation, Qingdao University, Qingdao, Shandong, China; School of Science and Technology, University of Camerino, Camerino, Italy; School of Mathematics and Information Sciences, Weifang University, Weifang, Shandong, China

Title: Dimensionality Reduction Method for the Output Regulation of Boolean Control Networks

Abstract:
This article proposes a dimensionality reduction approach to study the output regulation problem (ORP) of Boolean control networks (BCNs), which has much lower computational complexity than previous results. First, an auxiliary system which is much smaller in scale than the augmented system in previous approach is constructed. By analyzing the set stabilization of the auxiliary system as well as the original BCN, a necessary and sufficient condition to detect the solvability of the ORP is presented. Second, a method to design the state feedback controls for the ORP is proposed. Finally, two biological examples are given to demonstrate the effectiveness and advantage of the obtained new results.

PaperID: 1060,

Authors: Jiarui Sun, Mingjing Du, Chen Sun, Yongquan Dong

Affiliations: School of Computer Science and Technology, Jiangsu Normal University, Xuzhou, China

Title: Efficient Online Stream Clustering Based on Fast Peeling of Boundary Micro-Cluster

Abstract:
A growing number of applications generate streaming data, making data stream mining a popular research topic. Classification-based streaming algorithms require pre-training on labeled data. Manually labeling a large number of samples in the data stream is impractical and cost-prohibitive. Stream clustering algorithms rely on unsupervised learning. They have been widely studied for their ability to effectively analyze high-speed data streams without prior knowledge. Stream clustering plays a key role in data stream mining. Currently, most data stream clustering algorithms adopt the online–offline framework. In the online stage, micro-clusters are maintained, and in the offline stage, they are clustered using an algorithm similar to density-based spatial clustering of applications with noise (DBSCAN). When data streams have clusters with varying densities and ambiguous boundaries, traditional data stream clustering algorithms may be less effective. To overcome the above limitations, this article proposes a fully online stream clustering algorithm called fast boundary peeling stream clustering (FBPStream). First, FBPStream defines a decay-based kernel density estimation (KDE). It can discover clusters with varying densities and identify the evolving trend of streams well. Then, FBPStream implements an efficient boundary micro-cluster peeling technique to identify the potential core micro-clusters. Finally, FBPStream employs a parallel clustering strategy to effectively cluster core and boundary micro-clusters. The proposed algorithm is compared with ten popular algorithms on 15 data streams. Experimental results show that FBPStream is competitive with the other ten popular algorithms.

PaperID: 1061,

Authors: Jin Zheng, Qing Gao, Maciej Ogorzalek, Jinhu Lü, Yue Deng

Affiliations: School of Automation Science and Electrical Engineering, Beihang University, Beijing, China; Department of Information Technologies, Jagiellonian University, Kraków, Poland; School of Astronautics, Beihang University, Beijing, China

Title: A Quantum Spatial Graph Convolutional Neural Network Model on Quantum Circuits

Abstract:
This article proposes a quantum spatial graph convolutional neural network (QSGCN) model that is implementable on quantum circuits, providing a novel avenue to processing non-Euclidean type data based on the state-of-the-art parameterized quantum circuit (PQC) computing platforms. Four basic blocks are constructed to formulate the whole QSGCN model, including the quantum encoding, the quantum graph convolutional layer, the quantum graph pooling layer, and the network optimization. In particular, the trainability of the QSGCN model is analyzed through discussions on the barren plateau phenomenon. Simulation results from various types of graph data are presented to demonstrate the learning, generalization, and robustness capabilities of the proposed quantum neural network (QNN) model.

PaperID: 1062,

Authors: Jiacheng Lin, Jiajun Chen, Kailun Yang, Alina Roitberg, Siyu Li, Zhiyong Li, Shutao Li

Affiliations: College of Computer Science and Electronic Engineering, Hunan University, Changsha, China; School of Robotics and the National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University, Changsha, China; Karlsruhe Institute of Technology, Karlsruhe, Germany; College of Electrical and Information Engineering and the Key Laboratory of Visual Perception and Artificial Intelligence of Hunan Province, Hunan University, Changsha, China

Title: AdaptiveClick: Click-Aware Transformer With Adaptive Focal Loss for Interactive Image Segmentation

Abstract:
Interactive image segmentation (IIS) has emerged as a promising technique for decreasing annotation time. Substantial progress has been made in pre- and post-processing for IIS, but the critical issue of interaction ambiguity, notably hindering segmentation quality, has been under-researched. To address this, we introduce ADAPTIVE CLICK — a click-aware transformer incorporating an adaptive focal loss (AFL) that tackles annotation inconsistencies with tools for mask- and pixel-level ambiguity resolution. To the best of our knowledge, AdaptiveClick is the first transformer-based, mask-adaptive segmentation framework for IIS. The key ingredient of our method is the click-aware mask-adaptive transformer decoder (CAMD), which enhances the interaction between click and image features. Additionally, AdaptiveClick enables pixel-adaptive differentiation of hard and easy samples in the decision space, independent of their varying distributions. This is primarily achieved by optimizing a generalized AFL with a theoretical guarantee, where two adaptive coefficients control the ratio of gradient values for hard and easy pixels. Our analysis reveals that the commonly used Focal and BCE losses can be considered special cases of the proposed AFL. With a plain ViT backbone, extensive experimental results on nine datasets demonstrate the superiority of AdaptiveClick compared to state-of-the-art methods. The source code is publicly available at https://github.com/lab206/AdaptiveClick.

PaperID: 1063,

Authors: Yajing Fan, Shuyang Yu, Bin Gu, Ziran Xiong, Zhou Zhai, Heng Huang, Yi Chang

Affiliations: School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, China; School of Computer Science, Michigan State University, East Lansing, MI, USA; School of Artificial Intelligence, Jilin University, Changchun, China; Heng Huang is the Department of Computer Science, University of Maryland, College Park, MD, USA

Title: Global Model Selection for Semi-Supervised Support Vector Machine via Solution Paths

Abstract:
Semi-supervised support vector machine (S3VM) is important because it can use plentiful unlabeled data to improve the generalization accuracy of traditional SVMs. In order to achieve good performance, it is necessary for S3VM to take some effective measures to select hyperparameters. However, model selection for semi-supervised models is still a key open problem. Existing methods for semi-supervised models to search for the optimal parameter values are usually computationally demanding, especially those ones with grid search. To address this challenging problem, in this article, we first propose solution paths of S3VM (SPS3VM), which can track the solutions of the nonconvex S3VM with respect to the hyperparameters. Specifically, we apply incremental and decremental learning methods to update the solution and let it satisfy the Karush–Kuhn–Tucker (KKT) conditions. Based on the SPS3VM and the piecewise linearity of model function, we can find the model with the minimum cross-validation (CV) error for the entire range of candidate hyperparameters by computing the error path of S3VM. Our SPS3VM is the first solution path algorithm for nonconvex optimization problem of semi-supervised learning models. We also provide the finite convergence analysis and computational complexity of SPS3VM. Experimental results on a variety of benchmark datasets not only verify that our SPS3VM can globally search the hyperparameters (regularization and ramp loss parameters) but also show a huge reduction of computational time while retaining similar or slightly better generalization performance compared with the grid search approach.

PaperID: 1064,

Authors: Jia Liu, Donghai Zhai, Wei Huang, Shenggong Ji, Junbo Zhang, Tianrui Li

Affiliations: School of Computer and Software Engineering, Xihua University, Chengdu, China; School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China; Tencent Inc., Shenzhen, China; JD Intelligent Cities Research, China and JD iCity, JD Tech, Beijing, China

Title: Daily Schedule Recommendation in Urban Life Based on Deep Reinforcement Learning

Abstract:
In our daily lives, people frequently consider daily schedule to meet their needs, such as going to a barbershop for a haircut, then eating in a restaurant, and finally shopping in a supermarket. Reasonable activity location [or point-of-interest (POI)] and activity sequencing will help people save a lot of time and get better services. In this article, we propose a reinforcement learning-based deep activity factor balancing model to recommend a reasonable daily schedule according to user’s current location and needs. The proposed model consists of a deep activity factor balancing network (DAFB) and a reinforcement learning framework. First, the DAFB is proposed to fuse multiple factors that affect daily schedule recommendation (DSR). Then, a reinforcement learning framework based on policy gradient is used to learn the parameters of the DAFB. Further, on the feature storage based on the matrix method, we compress the feature storage space of the candidate POIs. Finally, the proposed method is compared with seven benchmark methods using two real-world datasets. Experimental results show that the proposed method is adaptive and effective.

PaperID: 1065,

Authors: Zhenghua Xu, Wenting Xu, Ruizhi Wang, Junyang Chen, Chang Qi, Thomas Lukasiewicz

Affiliations: State Key Laboratory of Reliability and Intelligence of Electrical Equipment and the Tianjin Key Laboratory of Bioelectromagnetic Technology and Intelligent Health, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, China; College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China; Department of Information Data, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China; Institute of Logic and Computation, Vienna University of Technology, Vienna, Austria

Title: Hybrid Reinforced Medical Report Generation With M-Linear Attention and Repetition Penalty

Abstract:
To reduce doctors’ workload, deep-learning-based automatic medical report generation has recently attracted more and more research efforts, where deep convolutional neural networks (CNNs) are employed to encode the input images, and recurrent neural networks (RNNs) are used to decode the visual features into medical reports automatically. However, these state-of-the-art methods mainly suffer from three shortcomings: 1) incomprehensive optimization; 2) low-order and unidimensional attention; and 3) repeated generation. In this article, we propose a hybrid reinforced medical report generation method with m-linear attention and repetition penalty mechanism (HReMRG-MR) to overcome these problems. Specifically, a hybrid reward with different weights is employed to remedy the limitations of single-metric-based rewards, and a local optimal weight search algorithm is proposed to significantly reduce the complexity of searching the weights of the rewards from exponential to linear. Furthermore, we use m-linear attention modules to learn multidimensional high-order feature interactions and to achieve multimodal reasoning, while a new repetition penalty is proposed to apply penalties to repeated terms adaptively during the model’s training process. Extensive experimental studies on two public benchmark datasets show that HReMRG-MR greatly outperforms the state-of-the-art baselines in terms of all metrics. The effectiveness and necessity of all components in HReMRG-MR are also proved by ablation studies. Additional experiments are further conducted and the results demonstrate that our proposed local optimal weight search algorithm can significantly reduce the search time while maintaining superior medical report generation performances.

PaperID: 1066,

Authors: Jasdeep Singh, Subrahmanyam Murala, G. Sankara Raju Kosuru

Affiliations: Department of Electrical Engineering, Computer Vision and Pattern Recognition Laboratory, Indian Institute of Technology Ropar, Punjab, Rupnagar, India; Computer Vision and Pattern Recognition Laboratory, School of Computer Science and Statistics, Trinity College Dublin, Dublin , Ireland; Department of Mathematics, Indian Institute of Technology Ropar, Punjab, Rupnagar, India

Title: KL-DNAS: Knowledge Distillation-Based Latency Aware-Differentiable Architecture Search for Video Motion Magnification

Abstract:
Video motion magnification is the task of making subtle minute motions visible. Many times subtle motion occurs while being invisible to the naked eye, e.g., slight deformations in muscles of an athlete, small vibrations in the objects, microexpression, and chest movement while breathing. Magnification of such small motions has resulted in various applications like posture deformities detection, microexpression recognition, and studying the structural properties. State-of-the-art (SOTA) methods have fixed computational complexity, which makes them less suitable for applications requiring different time constraints, e.g., real-time respiratory rate measurement and microexpression classification. To solve this problem, we propose a knowledge distillation-based latency aware-differentiable architecture search (KL-DNAS) method for video motion magnification. To reduce memory requirements and to improve denoising characteristics, we use a teacher network to search the network by parts using knowledge distillation (KD). Furthermore, search among different receptive fields and multifeature connections are applied for individual layers. Also, a novel latency loss is proposed to jointly optimize the target latency constraint and output quality. We are able to find 2.8 × smaller model than the SOTA method and better motion magnification with lesser distortions. https://github.com/jasdeep-singh-007/KL-DNAS.

PaperID: 1067,

Authors: Qiming Zou, Einoshin Suzuki

Affiliations: Graduate School of Systems Life Sciences, Kyushu University, Fukuoka, Japan; Graduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan

Title: Compact Goal Representation Learning via Information Bottleneck in Goal-Conditioned Reinforcement Learning

Abstract:
We propose an Information bottleneck (IB) for Goal representation learning (InfoGoal), a self-supervised method for generalizable goal-conditioned reinforcement learning (RL). Goal-conditioned RL learns a policy from reward signals to predict actions for reaching desired goals. However, the policy would overfit the task-irrelevant information contained in the goal and may be falsely or ineffectively generalized to reach other goals. A goal representation containing sufficient task-relevant information and minimum task-irrelevant information is guaranteed to reduce generalization errors. However, in goal-conditioned RL, it is difficult to balance the tradeoff between task-relevant information and task-irrelevant information because of the sparse and delayed learning signals, i.e., reward signals, and the inevitable task-relevant information sacrifice caused by information compression. Our InfoGoal learns a minimum and sufficient goal representation with dense and immediate self-supervised learning signals. Meanwhile, InfoGoal adaptively adjusts the weight of information minimization to achieve maximum information compression with a reasonable sacrifice of task-relevant information. Consequently, InfoGoal enables policy to generate a targeted trajectory toward states where the desired goal can be found with high probability and broadly explores those states. We conduct experiments on both simulated and real-world tasks, and our method significantly outperforms baseline methods in terms of policy optimality and the success rate of reaching unseen test goals. Video demos are available at infogoal.github.io.

PaperID: 1068,

Authors: Mingjin Zhang, Haichen Bai, Wenteng Shang, Jie Guo, Yunsong Li, Xinbo Gao

Affiliations: State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, China; Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China

Title: MDEformer: Mixed Difference Equation Inspired Transformer for Compressed Video Quality Enhancement

Abstract:
Deep learning methods have achieved impressive performance in compressed video quality enhancement tasks. However, these methods rely excessively on practical experience by manually designing the network structure and do not fully exploit the potential of the feature information contained in the video sequences, i.e., not taking full advantage of the multiscale similarity of the compressed artifact information and not seriously considering the impact of the partition boundaries in the compressed video on the overall video quality. In this article, we propose a novel Mixed Difference Equation inspired Transformer (MDEformer) for compressed video quality enhancement, which provides a relatively reliable principle to guide the network design and yields a new insight into the interpretable transformer. Specifically, drawing on the graphical concept of the mixed difference equation (MDE), we utilize multiple cross-layer cross-attention aggregation (CCA) modules to establish long-range dependencies between encoders and decoders of the transformer, where partition boundary smoothing (PBS) modules are inserted as feedforward networks. The CCA module can make full use of the multiscale similarity of compression artifacts to effectively remove compression artifacts, and recover the texture and detail information of the frame. The PBS module leverages the sensitivity of smoothing convolution to partition boundaries to eliminate the impact of partition boundaries on the quality of compressed video and improve its overall quality, while not having too much impacts on non-boundary pixels. Extensive experiments on the MFQE 2.0 dataset demonstrate that the proposed MDEformer can eliminate compression artifacts for improving the quality of the compressed video, and surpasses the state-of-the-arts (SOTAs) in terms of both objective metrics and visual quality.

PaperID: 1069,

Authors: Thomas Limbacher, Ozan Özdenizci, Robert Legenstein

Affiliations: Faculty of Computer Science and Biomedical Engineering, Graz University of Technology, Graz, Austria

Title: Memory-Dependent Computation and Learning in Spiking Neural Networks Through Hebbian Plasticity

Abstract:
Spiking neural networks (SNNs) are the basis for many energy-efficient neuromorphic hardware systems. While there has been substantial progress in SNN research, artificial SNNs still lack many capabilities of their biological counterparts. In biological neural systems, memory is a key component that enables the retention of information over a huge range of temporal scales, ranging from hundreds of milliseconds up to years. While Hebbian plasticity is believed to play a pivotal role in biological memory, it has so far been analyzed mostly in the context of pattern completion and unsupervised learning in artificial and SNNs. Here, we propose that Hebbian plasticity is fundamental for computations in biological and artificial spiking neural systems. We introduce a novel memory-augmented SNN architecture that is enriched by Hebbian synaptic plasticity. We show that Hebbian enrichment renders SNNs surprisingly versatile in terms of their computational as well as learning capabilities. It improves their abilities for out-of-distribution generalization, one-shot learning, cross-modal generative association, language processing, and reward-based learning. This suggests that powerful cognitive neuromorphic systems can be built based on this principle.

PaperID: 1070,

Authors: Zhen Li, Yunfei Yang

Affiliations: Theory Laboratory, Huawei Technologies Company Ltd., Shenzhen, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong, China

Title: Universality and Approximation Bounds for Echo State Networks With Random Weights

Abstract:
We study the uniform approximation of echo state networks (ESNs) with randomly generated internal weights. These models, in which only the readout weights are optimized during training, have made empirical success in learning dynamical systems. Recent results showed that ESNs with ReLU activation are universal. In this article, we give an alternative construction and prove that the universality holds for general activation functions. Specifically, our main result shows that, under certain condition on the activation function, there exists a sampling procedure for the internal weights so that the ESN can approximate any continuous casual time-invariant operators with high probability. In particular, for ReLU activation, we give explicit construction for these sampling procedures. We also quantify the approximation error of the constructed ReLU ESNs for sufficiently regular operators.

PaperID: 1071,

Authors: Wang Liu, Xudong Kang, Puhong Duan, Zhuojun Xie, Xiaohui Wei, Shutao Li

Affiliations: College of Electrical and Information Engineering, Hunan University, Changsha, China; School of Robotics, Hunan University, Changsha, China

Title: SOSNet: Real-Time Small Object Segmentation via Hierarchical Decoding and Example Mining

Abstract:
Real-time semantic segmentation plays an important role in auto vehicles. However, most real-time small object segmentation methods fail to obtain satisfactory performance on small objects, such as cars and sign symbols, since the large objects usually tend to devote more to the segmentation result. To solve this issue, we propose an efficient and effective architecture, termed small objects segmentation network (SOSNet), to improve the segmentation performance of small objects. The SOSNet works from two perspectives: methodology and data. Specifically, with the former, we propose a dual-branch hierarchical decoder (DBHD) which is viewed as a small-object sensitive segmentation head. The DBHD consists of a top segmentation head that predicts whether the pixels belong to a small object class and a bottom one that estimates the pixel class. In this situation, the latent correlation among small objects can be fully explored. With the latter, we propose a small object example mining (SOEM) algorithm for balancing examples between small objects and large objects automatically. The core idea of the proposed SOEM is that most of the hard examples on small-object classes are reserved for training while most of the easy examples on large-object classes are banned. Experiments on three commonly used datasets show that the proposed SOSNet architecture greatly improves the accuracy compared to the existing real-time semantic segmentation methods while keeping efficiency. The code will be available at https://github.com/StuLiu/SOSNet.

PaperID: 1072,

Authors: Juntao Fei, Lei Zhang, Yunmei Fang

Affiliations: College of Mechanical and Electrical Engineering and the College of Artificial Intelligence and Automation, Jiangsu Key Laboratory of Power Transmission and Distribution Equipment Technology, Hohai University, Changzhou, China

Title: Self-Constructing Chebyshev Fuzzy Neural Complementary Sliding Mode Control and its Application

Abstract:
In this article, a complementary sliding mode (CSM) controller using a self-constructing Chebyshev fuzzy recurrent neural network (SCCFRNN) is proposed for harmonic suppression control of an active power filter (APF). The SCCFRNN whose structure can be automatically learned through the designed structure self-learning algorithm is introduced to approximate the unknown nonlinear term in the APF dynamic model, so as to improve modeling accuracy and reduce the burden of CSM control (CSMC). The SCCFRNN combines the advantages of a fuzzy neural network (FNN), recurrent neural network (RNN), and Chebyshev neural network (CNN), and all parameters can be adjusted according to the designed adaptive laws. Eventually, through detailed simulation, hardware experiments, and fair comparison, the feasibility and superiority of the proposed control algorithm were verified.

PaperID: 1073,

Authors: Yujuan Han, Wenlian Lu, Tianping Chen

Affiliations: College of Information Engineering, Shanghai Maritime University, Shanghai, China; School of Mathematical Sciences, Shanghai Center for Mathematical Sciences, Shanghai Key Laboratory for Contemporary Applied Mathematics, Fudan University, Shanghai, China

Title: Intralayer Synchronization and Interlayer Quasisynchronization in Multiplex Networks of Nonidentical Layers

Abstract:
In this article, we discuss synchronization in multiplex networks of different layers. Both the topologies and the uncoupled node dynamics in different layers are different. Novel sufficient criteria are derived for intralayer synchronization and interlayer quasisynchronization, in terms of the coupling matrices, the coupling strengths, and the intrinsic function of the uncoupled systems. We also investigate interlayer synchronization of multiplex networks with identical uncoupled node dynamics. Finally, we give some numerical examples to validate the effectiveness of these theoretical results.

PaperID: 1074,

Authors: Yuzhong Chen, Zhenxiang Xiao, Yu Du, Lin Zhao, Lu Zhang, Zihao Wu, Dajiang Zhu, Tuo Zhang, Dezhong Yao, Xintao Hu, Tianming Liu, Xi Jiang

Affiliations: Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China; School of Automation, Northwestern Polytechnical University, Xi’an, China; Department of Computer Science, University of Georgia, Athens, GA, USA; Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX, USA

Title: A Unified and Biologically Plausible Relational Graph Representation of Vision Transformers

Abstract:
Vision transformer (ViT) and its variants have achieved remarkable success in various tasks. The key characteristic of these ViT models is to adopt different aggregation strategies of spatial patch information within the artificial neural networks (ANNs). However, there is still a key lack of unified representation of different ViT architectures for systematic understanding and assessment of model representation performance. Moreover, how those well-performing ViT ANNs are similar to real biological neural networks (BNNs) is largely unexplored. To answer these fundamental questions, we, for the first time, propose a unified and biologically plausible relational graph representation of ViT models. Specifically, the proposed relational graph representation consists of two key subgraphs: an aggregation graph and an affine graph. The former considers ViT tokens as nodes and describes their spatial interaction, while the latter regards network channels as nodes and reflects the information communication between channels. Using this unified relational graph representation, we found that: 1) model performance was closely related to graph measures; 2) the proposed relational graph representation of ViT has high similarity with real BNNs; and 3) there was a further improvement in model performance when training with a superior model to constrain the aggregation graph.

PaperID: 1075,

Authors: Wenxuan Tu, Bin Xiao, Xinwang Liu, Sihang Zhou, Zhiping Cai, Jieren Cheng

Affiliations: School of Computer, National University of Defense Technology, Changsha, China; Department of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China; School of Intelligence Science and Technology, National University of Defense Technology, Changsha, China; School of Computer Science and Technology, Hainan University, Haikou, China

Title: Revisiting Initializing Then Refining: An Incomplete and Missing Graph Imputation Network

Abstract:
With the development of various applications, such as recommendation systems and social network analysis, graph data have been ubiquitous in the real world. However, graphs usually suffer from being absent during data collection due to copyright restrictions or privacy-protecting policies. The graph absence could be roughly grouped into attribute-incomplete and attribute-missing cases. Specifically, attribute-incomplete indicates that a portion of the attribute vectors of all nodes are incomplete, while attribute-missing indicates that all attribute vectors of partial nodes are missing. Although various graph imputation methods have been proposed, none of them is custom-designed for a common situation where both types of graph absence exist simultaneously. To fill this gap, we develop a novel graph imputation network termed revisiting initializing then refining (RITR), where both attribute-incomplete and attribute-missing samples are completed under the guidance of a novel initializing-then-refining imputation criterion. Specifically, to complete attribute-incomplete samples, we first initialize the incomplete attributes using Gaussian noise before network learning, and then introduce a structure-attribute consistency constraint to refine incomplete values by approximating a structure-attribute correlation matrix to a high-order structure matrix. To complete attribute-missing samples, we first adopt structure embeddings of attribute-missing samples as the embedding initialization, and then refine these initial values by adaptively aggregating the reliable information of attribute-incomplete samples according to a dynamic affinity structure. To the best of our knowledge, this newly designed method is the first end-to-end unsupervised framework dedicated to handling hybrid-absent graphs. Extensive experiments on six datasets have verified that our methods consistently outperform the existing state-of-the-art competitors. Our source code is available at https://github.com/WxTu/RITR.

PaperID: 1076,

Authors: Jingchen Li, Haobin Shi, Wenbai Chen, Naijun Liu, Kao-Shing Hwang

Affiliations: School of Computer Science, Northwestern Polytechnical University, Xi’an, China; School of Automation, Beijing Information Science and Technology University, Beijing, China; Department of Computer Science and Technology, Tsinghua University, Beijing, China; Department of Electrical Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan

Title: Semi-Supervised Detection Model Based on Adaptive Ensemble Learning for Medical Images

Abstract:
Introducing deep learning technologies into the medical image processing field requires accuracy guarantee, especially for high-resolution images relayed through endoscopes. Moreover, works relying on supervised learning are powerless in the case of inadequate labeled samples. Therefore, for end-to-end medical image detection with overcritical efficiency and accuracy in endoscope detection, an ensemble-learning-based model with a semi-supervised mechanism is developed in this work. To gain a more accurate result through multiple detection models, we propose a new ensemble mechanism, termed alternative adaptive boosting method (Al-Adaboost), combining the decision-making of two hierarchical models. Specifically, the proposal consists of two modules. One is a local region proposal model with attentive temporal–spatial pathways for bounding box regression and classification, and the other one is a recurrent attention model (RAM) to provide more precise inferences for further classification according to the regression result. The proposal Al-Adaboost will adjust the weights of labeled samples and the two classifiers adaptively, and the nonlabel samples are assigned pseudolabels by our model. We investigate the performance of Al-Adaboost on both the colonoscopy and laryngoscopy data coming from CVC-ClinicDB and the affiliated hospital of Kaohsiung Medical University. The experimental results prove the feasibility and superiority of our model.

PaperID: 1077,

Authors: Ilya Nachevsky, Olga G. Andrianova, Isaac Chairez, Alexander S. Poznyak

Affiliations: V. A. Trapeznikov Institute of Control Sciences, Russian Academy of Sciences (RAS), Moscow, Russia; Institute of Advanced Materials for Sustainable Manufacturing, Tecnológico de Monterrey, Zapopan, Jalisco, Mexico; Automatic Control Department, CINVESTAV-IPN, Av. Instituto Politécnico Nacional, Mexico City, Mexico

Title: Differential Neural Network Identifier for Dynamical Systems With Time-Varying State Constraints

Abstract:
This study presents a state nonparametric identifier based on neural networks with continuous dynamics, also known as differential neural networks (DNNs). The laws for adjusting their parameters are developed using a control barrier Lyapunov functions (BLFs). The motivation for using the BLF comes from the preliminary information of the system states, which remain in a predefined time-depending set characterized by state or purely time-dependent functions. In this study, time-dependent state constraints are supposed to be known in advance continuous-time functions. The obtained learning laws require solving differential continuous-time Riccati equations and nonlinear differential equations for the learning laws that depend on the identification error and the state restrictions. The developed identifier was evaluated concerning the identifier that does not consider the state restrictions. This comparison included the numerical evaluation of the identifier for a robotic arm intended to reproduce a nonstandard flight simulator. This evaluation confirmed that the identification results were improved using the proposed learning laws and considering that the state limits were not transgressed. The quality indicators based on the mean square error were more minor by 4.2 times.

PaperID: 1078,

Authors: Tomas Kulvicius, Minija Tamosiunaite, Florentin Wörgötter

Affiliations: Department for Computational Neuroscience, and the University Medical Center Göttingen, Systemic Ethology and Developmental Science, Child and Adolescent Psychiatry and Psychotherapy, University of Göttingen, Göttingen, Germany; Department for Computational Neuroscience, University of Göttingen, Göttingen, Germany

Title: Combining Optimal Path Search With Task-Dependent Learning in a Neural Network

Abstract:
Finding optimal paths in connected graphs requires determining the smallest total cost for traveling along the graph’s edges. This problem can be solved by several classical algorithms, where, usually, costs are predefined for all edges. Conventional planning methods can, thus, normally not be used when wanting to change costs in an adaptive way following the requirements of some task. Here, we show that one can define a neural network representation of path-finding problems by transforming cost values into synaptic weights, which allows for online weight adaptation using network learning mechanisms. When starting with an initial activity value of one, activity propagation in this network will lead to solutions, which are identical to those found by the Bellman–Ford (BF) algorithm. The neural network has the same algorithmic complexity as BF, and, in addition, we can show that network learning mechanisms (such as Hebbian learning) can adapt the weights in the network augmenting the resulting paths according to some task at hand. We demonstrate this by learning to navigate in an environment with obstacles as well as by learning to follow certain sequences of path nodes. Hence, the here-presented novel algorithm may open up a different regime of applications where path augmentation (by learning) is directly coupled with path finding in a natural way.

PaperID: 1079,

Authors: Yucheng Wang, Min Wu, Ruibing Jin, Xiaoli Li, Lihua Xie, Zhenghua Chen

Affiliations: Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Connexis, Singapore; School of Electrical and Electronic Engineering, Nanyang Technological University, Jurong West, Singapore

Title: Local-Global Correlation Fusion-Based Graph Neural Network for Remaining Useful Life Prediction

Abstract:
Remaining useful life (RUL) prediction is an essential component for prognostics and health management of a system. Due to the powerful ability of nonlinear modeling, deep learning (DL) models have emerged as leading solutions by capturing temporal dependencies within time series sensory data. However, in RUL prediction tasks, data are typically collected from multiple sensors, introducing spatial dependencies in the form of sensor correlations. Existing methods are limited in effectively modeling and capturing the spatial dependencies, restricting their performance to learn representative features for RUL prediction. To overcome the limitations, we propose a novel LOcal–GlObal correlation fusion-based framework (LOGO). Our approach combines both local and global information to model sensor correlations effectively. From a local perspective, we account for local correlations that represent dynamic changes of sensor relationships in local ranges. Simultaneously, from a global perspective, we capture global correlations that depict relatively stable relations between sensors. An adaptive fusion mechanism is proposed to automatically fuse the correlations from different perspectives. Subsequently, we define sequential micrographs for each sample to effectively capture the fused correlations. Graph neural network (GNN) is introduced to capture the spatial dependencies within each micrograph, and the temporal dependencies between these sequential micrographs are then captured. This approach allows us to effectively model and capture the dependency information within the data for accurate RUL prediction. Extensive experiments have been conducted, verifying the effectiveness of our method.

PaperID: 1080,

Authors: Dongseok Kwon, Sung Yun Woo, Joon Hwang, Hyeongsu Kim, Jong-Ho Bae, Wonjun Shin, Byung-Gook Park, Jong-Ho Lee

Affiliations: School of Electrical and Computer Engineering, Inter-University Semiconductor Research Center (ISRC), Seoul National University, Seoul, South Korea; School of Electronics Engineering, Kyungpook National University, Daegu, Republic of Korea; School of Electrical Engineering, Kookmin University, Seoul, South Korea; Ministry of Science and ICT, Sejong, Republic of Korea

Title: Efficient Hybrid Training Method for Neuromorphic Hardware Using Analog Nonvolatile Memory

Abstract:
Neuromorphic hardware using nonvolatile analog synaptic devices provides promising advantages of reducing energy and time consumption for performing large-scale vector-matrix multiplication (VMM) operations. However, the reported training methods for neuromorphic hardware have appreciably shown reduced accuracy due to the nonideal nature of analog devices, and use conductance tuning protocols that require substantial cost for training. Here, we propose a novel hybrid training method that efficiently trains the neuromorphic hardware using nonvolatile analog memory cells, and experimentally demonstrate the high performance of the method using the fabricated hardware. Our training method does not rely on the conductance tuning protocol to reflect weight updates to analog synaptic devices, which significantly reduces online training costs. When the proposed method is applied, the accuracy of the hardware-based neural network approaches to that of the software-based neural network after only one-epoch training, even if the fabricated synaptic array is trained for only the first synaptic layer. Also, the proposed hybrid training method can be efficiently applied to low-power neuromorphic hardware, including various types of synaptic devices whose weight update characteristics are extremely nonlinear. This successful demonstration of the proposed method in the fabricated hardware shows that neuromorphic hardware using nonvolatile analog memory cells becomes a more promising platform for future artificial intelligence.

PaperID: 1081,

Authors: Haobo Jiang, Kaihao Lan, Le Hui, Guangyu Li, Jin Xie, Shangbing Gao, Jian Yang

Affiliations: PCA Laboratory, Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, and the Jiangsu Key Laboratory of Image and Video Understanding for Social Security, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China; Faculty of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian, China

Title: Point Cloud Registration-Driven Robust Feature Matching for 3-D Siamese Object Tracking

Abstract:
Learning robust feature matching between the template and search area is crucial for 3-D Siamese tracking. The core of Siamese feature matching is how to assign high feature similarity to the corresponding points between the template and the search area for precise object localization. In this article, we propose a novel point cloud registration-driven Siamese tracking framework, with the intuition that spatially aligned corresponding points (via 3-D registration) tend to achieve consistent feature representations. Specifically, our method consists of two modules, including a tracking-specific nonlocal registration (TSNR) module and a registration-aided Sinkhorn template-feature aggregation module. The registration module targets the precise spatial alignment between the template and the search area. The tracking-specific spatial distance constraint is proposed to refine the cross-attention weights in the nonlocal module for discriminative feature learning. Then, we use the weighted singular value decomposition (SVD) to compute the rigid transformation between the template and the search area and align them to achieve the desired spatially aligned corresponding points. For the feature aggregation model, we formulate the feature matching between the transformed template and the search area as an optimal transport problem and utilize the Sinkhorn optimization to search for the outlier-robust matching solution. Also, a registration-aided spatial distance map is built to improve the matching robustness in indistinguishable regions (e.g., smooth surfaces). Finally, guided by the obtained feature matching map, we aggregate the target information from the template into the search area to construct the target-specific feature, which is then fed into a CenterPoint-like detection head for object localization. Extensive experiments on KITTI, NuScenes, and Waymo datasets verify the effectiveness of our proposed method.

PaperID: 1082,

Authors: Xingcai Zhou, Le Chang, Jinde Cao

Affiliations: School of Statistics and Data Science, Nanjing Audit University, Nanjing, China; School of Mathematics, Southeast University, Nanjing, China

Title: Communication-Efficient Nonconvex Federated Learning With Error Feedback for Uplink and Downlink

Abstract:
Facing large-scale online learning, the reliance on sophisticated model architectures often leads to nonconvex distributed optimization, which is more challenging than convex problems. Online recruited workers, such as mobile phone, laptop, and desktop computers, often have narrower uplink bandwidths than downlink. In this article, we propose two communication-efficient nonconvex federated learning algorithms with error feedback 2021 (EF21) and lazily aggregated gradient (LAG) for adapting uplink and downlink communications. EF21 is a new and theoretically better EF, which consistently and substantially outperforms vanilla EF in practice. LAG is a gradient filtration technique for adapting communication. For reducing communication costs of uplink, we design an effective LAG rule and then give EF21 with LAG (EF-LAG) algorithm, which combines EF21 and our LAG rule. We also present a bidirectional EF-LAG (BiEF-LAG) algorithm for reducing uplink and downlink communication costs. Theoretically, our proposed algorithms enjoy the same fast convergence rate \mathcal O(1/T) as gradient descent (GD) for smooth nonconvex learning. That is, our algorithms greatly reduce communication costs without sacrificing the quality of learning. Numerical experiments on both synthetic data and deep learning benchmarks show significant empirical superiority of our algorithms in communication.

PaperID: 1083,

Authors: Peng Wan, Yufeng Zhou, Zhigang Zeng

Affiliations: School of Information Science and Engineering and the Engineering Research Center of Metallurgical Automation and Measurement Technology, Wuhan University of Science and Technology, Wuhan, China; School of Artificial Intelligence and Automation and the Key Laboratory of Image Processing and Intelligent Control of Education Ministry of China, Huazhong University of Science and Technology, Wuhan, China

Title: Adaptive Drive-Response Synchronization of Timescale-Type Neural Networks With Unbounded Time-Varying Delays

Abstract:
In recent years, adaptive drive-response synchronization (DRS) of two continuous-time delayed neural networks (NNs) has been investigated extensively. For two timescale-type NNs (TNNs), how to develop adaptive synchronization control schemes and demonstrate rigorously is still an open problem. This article concentrates on adaptive control design for synchronization of TNNs with unbounded time-varying delays. First, timescale-type Barbalat lemma and novel timescale-type inequality techniques are first proposed, which provides us practical methods to investigate timescale-type nonlinear systems. Second, using timescale-type calculus, novel timescale-type inequality, and timescale-type Barbalat lemma, we demonstrate that global asymptotic synchronization can be achieved via adaptive control under algebraic and matrix inequality criteria even if the time-varying delays are unbounded and nondifferentiable. Adaptive DRS is discussed for TNNs, which implies our control schemes are suitable for continuous-time NNs, their discrete-time counterparts, and any combination of them. Finally, numerical examples on TNNs and timescale-type chaotic Ikeda-like oscillator with unbounded time-varying delays are carried out to verify the adaptive control schemes.

PaperID: 1084,

Authors: Zan Gao, Hongwei Wei, Weili Guan, Jie Nie, Meng Wang, Shengyong Chen

Affiliations: Shandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Faculty of Information Technology, Monash University, Clayton, VIC, Australia; College of Information Science and Engineering, Ocean University of China, Qingdao, China; School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China; Key Laboratory of Computer Vision and System, Ministry of Education, Tianjin University of Technology, Tianjin, China

Title: A Semantic-Aware Attention and Visual Shielding Network for Cloth-Changing Person Re-Identification

Abstract:
Cloth-changing person re-identification (ReID) is a newly emerging research topic that aims to retrieve pedestrians whose clothes are changed. Since the human appearance with different clothes exhibits large variations, it is very difficult for existing approaches to extract discriminative and robust feature representations. Current works mainly focus on body shape or contour sketches, but the human semantic information and the potential consistency of pedestrian features before and after changing clothes are not fully explored or are ignored. To solve these issues, in this work, a novel semantic-aware attention and visual shielding network for cloth-changing person ReID (abbreviated as SAVS) is proposed where the key idea is to shield clues related to the appearance of clothes and only focus on visual semantic information that is not sensitive to view/posture changes. Specifically, a visual semantic encoder is first employed to locate the human body and clothing regions based on human semantic segmentation information. Then, a human semantic attention (HSA) module is proposed to highlight the human semantic information and reweight the visual feature map. In addition, a visual clothes shielding (VCS) module is also designed to extract a more robust feature representation for the cloth-changing task by covering the clothing regions and focusing the model on the visual semantic information unrelated to the clothes. Most importantly, these two modules are jointly explored in an end-to-end unified framework. Extensive experiments demonstrate that the proposed method can significantly outperform state-of-the-art methods, and more robust features can be extracted for cloth-changing persons. Compared with multibiometric unified network (MBUNet) (published in TIP2023), this method can achieve improvements of 17.5% (30.9%) and 8.5% (10.4%) on the LTCC and Celeb-reID datasets in terms of mean average precision (mAP) (rank-1), respectively. When compared with the Swin Transformer (Swin-T), the improvements can reach 28.6% (17.3%), 22.5% (10.0%), 19.5% (10.2%), and 8.6% (10.1%) on the PRCC, LTCC, Celeb, and NKUP datasets in terms of rank-1 (mAP), respectively.

PaperID: 1085,

Authors: Chunyan Xiong, Chaoxing Zhang, Mengli Lu, Xiaotong Yu, Jian Cao, Zhong Chen, Di Guo, Xiaobo Qu

Affiliations: Institute of Electromagnetics and Acoustics School of Electronic Science and Engineering, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, Xiamen, China; School of Electronic Science and Engineering, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Biomedical Intelligent Cloud Research and Development Center, Xiamen University, Xiamen, China; School of Computer and Information Engineering, University of Technology, Xiamen, China

Title: Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks With Soft-Thresholding

Abstract:
Soft-thresholding has been widely used in neural networks. Its basic network structure is a two-layer convolution neural network with soft-thresholding. Due to the network’s nature of nonlinear and nonconvex, the training process heavily depends on an appropriate initialization of network parameters, resulting in the difficulty of obtaining a globally optimal solution. To address this issue, a convex dual network is designed here. We theoretically analyze the network convexity and prove that the strong duality holds. Extensive results on both simulation and real-world datasets show that strong duality holds, the dual network does not depend on initialization and optimizer, and enables faster convergence than the state-of-the-art two-layer network. This work provides a new way to convexify soft-thresholding neural networks. Furthermore, the convex dual network model of a deep soft-thresholding network with a parallel structure is deduced.

PaperID: 1086,

Authors: Dawen Wu, Abdel Lisser

Affiliations: CNRS, CentraleSupélec, Laboratoire des Signaux et Systèmes, Université Paris-Saclay, Gif-sur-Yvette, France

Title: Parallel Solution of Nonlinear Projection Equations in a Multitask Learning Framework

Abstract:
Nonlinear projection equations (NPEs) provide a unified framework for addressing various constrained nonlinear optimization and engineering problems. However, when it comes to solving multiple NPEs, traditional numerical integration methods are not efficient enough. This is because traditional methods solve each NPE iteratively and independently. In this article, we propose a novel approach based on multitask learning (MTL) for solving multiple NPEs. The solution procedure is outlined as follows. First, we model each NPE as a system of ordinary differential equations (ODEs) using neurodynamic optimization. Second, for each ODE system, we use a physics-informed neural network (PINN) as the solution. Third, we use a multibranch MTL framework, where each branch corresponds to a PINN model. This allows us to solve multiple NPEs in parallel by training a single neural network model. Experimental results show that our approach has superior computational performance, especially when the number of NPEs to be solved is large.

PaperID: 1087,

Authors: Zhaohui Qi, Yingqiang Ning, Lin Xiao, Zidong Wang, Yongjun He

Affiliations: Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing and MOE-LCSM, Hunan Normal University, Changsha, China; Department of Computer Science, Brunel University London, Middlesex, Uxbridge, U.K.

Title: Efficient Predefined-Time Adaptive Neural Networks for Computing Time-Varying Tensor Moore-Penrose Inverse

Abstract:
This article proposes predefined-time adaptive neural network (PTANN) and event-triggered PTANN (ET-PTANN) models to efficiently compute the time-varying tensor Moore–Penrose (MP) inverse. The PTANN model incorporates a novel adaptive parameter and activation function, enabling it to achieve strongly predefined-time convergence. Unlike traditional time-varying parameters that increase over time, the adaptive parameter is proportional to the error norm, thereby better allocating computational resources and improving efficiency. To further enhance efficiency, the ET-PTANN model combines an event trigger with the evolution formula, resulting in the adjustment of step size and reduction of computation frequency compared to the PTANN model. By conducting mathematical derivations, the article derives the upper bound of convergence time for the proposed neural network models and determines the minimum execution interval for the event trigger. A simulation example demonstrates that the PTANN and ET-PTANN models outperform other related neural network models in terms of computational efficiency and convergence rate. Finally, the practicality of the PTANN and ET-PTANN models is demonstrated through their application for mobile sound source localization.

PaperID: 1088,

Authors: Jisheng Dang, Huicheng Zheng, Xiaohao Xu, Longguang Wang, Qingyong Hu, Yulan Guo

Affiliations: School of Computer Science and Engineering, the Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, and the Guangdong Key Laboratory of Information Security Technology, Sun Yat-sen University, Guangzhou, China; Robotics Institute, University of Michigan, Ann Arbor, MI, USA; College of Electronic Science and Technology, National University of Defense Technology, Changsha, China; Department of Computer Science, University of Oxford, Oxford, U.K; School of Electronics and Communication Engineering, Sun Yat-sen University, Shenzhen, China

Title: Adaptive Sparse Memory Networks for Efficient and Robust Video Object Segmentation

Abstract:
Recently, memory-based networks have achieved promising performance for video object segmentation (VOS). However, existing methods still suffer from unsatisfactory segmentation accuracy and inferior efficiency. The reasons are mainly twofold: 1) during memory construction, the inflexible memory storage mechanism results in a weak discriminative ability for similar appearances in complex scenarios, leading to video-level temporal redundancy, and 2) during memory reading, matching robustness and memory retrieval accuracy decrease as the number of video frames increases. To address these challenges, we propose an adaptive sparse memory network (ASM) that efficiently and effectively performs VOS by sparsely leveraging previous guidance while attending to key information. Specifically, we design an adaptive sparse memory constructor (ASMC) to adaptively memorize informative past frames according to dynamic temporal changes in video frames. Furthermore, we introduce an attentive local memory reader (ALMR) to quickly retrieve relevant information using a subset of memory, thereby reducing frame-level redundant computation and noise in a simpler and more convenient manner. To prevent key features from being discarded by the subset of memory, we further propose a novel attentive local feature aggregation (ALFA) module, which preserves useful cues by selectively aggregating discriminative spatial dependence from adjacent frames, thereby effectively increasing the receptive field of each memory frame. Extensive experiments demonstrate that our model achieves state-of-the-art performance with real-time speed on six popular VOS benchmarks. Furthermore, our ASM can be applied to existing memory-based methods as generic plugins to achieve significant performance improvements. More importantly, our method exhibits robustness in handling sparse videos with low frame rates.

PaperID: 1089,

Authors: Qiong Wu, Jiahan Li, Pingyang Dai, Qixiang Ye, Liujuan Cao, Yongjian Wu, Rongrong Ji

Affiliations: Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Institute of Artificial Intelligence, Xiamen University, Xiamen, China; Faculty of Computing, Harbin Institute of Technology, Harbin, China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen, China; Peng Cheng Laboratory, Shenzhen, China; Youtu Laboratory, Tencent, Shanghai, China

Title: Unsupervised Domain Adaptation on Person Reidentification Via Dual-Level Asymmetric Mutual Learning

Abstract:
Unsupervised domain adaptation (UDA) person reidentification (Re-ID) aims to identify pedestrian images within an unlabeled target domain with an auxiliary labeled source-domain dataset. Many existing works attempt to recover reliable identity information by considering multiple homogeneous networks. And take these generated labels to train the model in the target domain. However, these homogeneous networks identify people in approximate subspaces and equally exchange their knowledge with others or their mean net to improve their ability, inevitably limiting the scope of available knowledge and putting them into the same mistake. This article proposes a dual-level asymmetric mutual learning (DAML) method to learn discriminative representations from a broader knowledge scope with diverse embedding spaces. Specifically, two heterogeneous networks mutually learn knowledge from asymmetric subspaces through the pseudo label generation in a hard distillation manner. The knowledge transfer between two networks is based on an asymmetric mutual learning (AML) manner. The teacher network learns to identify both the target and source domain while adapting to the target domain distribution based on the knowledge of the student. Meanwhile, the student network is trained on the target dataset and employs the ground-truth label through the knowledge of the teacher. Extensive experiments in Market-1501, CUHK-SYSU, and MSMT17 public datasets verified the superiority of DAML over state-of-the-arts (SOTA).

PaperID: 1090,

Authors: Cheng Yu, Jiansheng Chen, Yu Wang, Youze Xue, Huimin Ma

Affiliations: School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China; Department of Electronic Engineering, Tsinghua University, Beijing, China

Title: Improving Adversarial Robustness Against Universal Patch Attacks Through Feature Norm Suppressing

Abstract:
Universal adversarial patch attacks, which are readily implemented, have been validated to be able to fool real-world deep convolutional neural networks (CNNs), posing a serious threat to practical computer vision systems based on CNNs. Unfortunately, current defending approaches are severely understudied facing the following problems. Patch detection–based methods suffer from dramatic performance drops against white-box or adaptive attacks since they rely heavily on empirical clues. Methods based on adversarial training or certified defense are difficult to be scaled up to large-scale datasets or complex practical networks due to prohibitively high computational overhead or over strong assumptions on the network structure. In this article, we focus on two cases of widely adopted universal adversarial patch attacks, namely the universal targeted attack on image classifiers and the universal vanishing attack on object detectors. We find that, for popular CNNs, the attacking success of the adversarial patch relies on feature vectors centered at the patch location with large norm in classifiers and large channel-aware norm (CA-Norm) in detectors, and further present a mathematical explanation for this phenomenon. Based on this, we propose a simple but effective defending method using the feature norm suppressing (FNS) layer, which can renormalize the feature norm by nonincreasing functions. As a differentiable module, FNS can be adaptively inserted in various CNN architectures to achieve multistage suppression of the generation of large norm feature vectors. Moreover, FNS is efficient with no trainable parameters and very low computational overhead. We evaluate our proposed defending method across multiple CNN architectures and datasets against the strong adaptive white-box attacks in both visual classification and detection tasks. In both tasks, FNS significantly outperforms previous defending methods on adversarial robustness with a relatively low influence on the performance of benign images. Code is available at https://github.com/jschenthu/FNS.

PaperID: 1091,

Authors: Yang Liu, Xiaoqi Wang, Xi Wang, Zhen Wang, Jürgen Kurths

Affiliations: School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi’an, China; School of Mechanical Engineering, Northwestern Polytechnical University, Xi’an, China; Department of Radiation Oncology, Stanford University School of Medicine, Palo Alto, CA, USA; School of Artificial Intelligence and the School of Cybersecurity, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi’an, China; Potsdam Institute for Climate Impact Research, Potsdam, Germany

Title: Diffusion Source Inference for Large-Scale Complex Networks Based on Network Percolation

Abstract:
This article studies the diffusion-source-inference (DSI) problem, whose solution plays an important role in real-world scenarios such as combating misinformation and controlling diffusions of information or disease. The main task of the DSI problem is to optimize an estimator, such that the real source can be more precisely targeted. In this article, we assume that the state of a number of nodes, called observer set, in a network could be investigated if necessary, and study what configuration of those nodes could facilitate a better solution for the DSI problem. In particular, we find that the conventional error distance metric cannot precisely evaluate the effectiveness of varied DSI approaches in heterogeneous networks, and thus propose a novel and more general measurement, the candidate set, that is formulated to contain the diffusion source for sure. We propose the percolation-based evolutionary framework (PrEF) to optimize the observer set such that the candidate set can be minimized. Hence, one could further conduct more intensive investigation or search on only a few nodes to target the source. To achieve that, we first theoretically show that the size of the candidate set is bounded by the size of the largest component cover, and demonstrate that there are some similarities between the DSI problem and the network immunization problem. We find that, given the associated direction information of the diffusion is known on observers, the minimization of the candidate set is equivalent to the minimization of the order parameter if we view the observer set as the removal node set. Hence, PrEF is developed based on the network percolation and evolutionary algorithm. The effectiveness of the proposed method is validated on both synthetic and empirical networks in regard to varied circumstances. Our results show that the developed approach could achieve much smaller candidate sets compared to the state of the art in almost all cases, e.g., it is better in 26 out of 27 empirical networks and 155 out of 162 cases regarding the critical threshold. Meanwhile, our approach is also more stable, i.e., it works well irrespective of varied infection probabilities, diffusion models, and underlying networks. More importantly, we provide a framework for the analysis of the DSI problem in large-scale networks.

PaperID: 1092,

Authors: Lei Zhao, Lin Cai, Wu-Sheng Lu

Affiliations: Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada

Title: Federated Learning for Data Trading Portfolio Allocation With Autonomous Economic Agents

Abstract:
In the rapidly advancing ubiquitous intelligence society, the role of data as a valuable resource has become paramount. As a result, there is a growing need for the development of autonomous economic agents (AEAs) capable of intelligently and autonomously trading data. These AEAs are responsible for acquiring, processing, and selling data to entities such as software companies. To ensure optimal profitability, an intelligent AEA must carefully allocate its portfolio, relying on accurate return estimation and well-designed models. However, a significant challenge arises due to the sensitive and confidential nature of data trading. Each AEA possesses only limited local information, which may not be sufficient for training a robust and effective portfolio allocation model. To address this limitation, we propose a novel data trading market where AEAs exclusively possess local market information. To overcome the information constraint, AEAs employ federated learning (FL) that allows multiple AEAs to jointly train a model capable of generating promising portfolio allocations for multiple data products. To account for the dynamic and ever-changing revenue returns, we introduce an integration of the histogram of oriented gradients (HoGs) with the discrete wavelet transformation (DWT). This innovative combination serves to redefine the representation of local market information to effectively handle the inherent nonstationarity of revenue patterns associated with data products. Furthermore, we leverage the transform domain of local model drifts in the global model update process, effectively reducing the communication burden and significantly improving training efficiency. Through simulations, we provide compelling evidence that our proposed schemes deliver superior performance across multiple evaluation metrics, including test loss, cumulative return, portfolio risk, and Sharpe ratio.

PaperID: 1093,

Authors: Jiaojiao Li, Songcheng Du, Rui Song, Yunsong Li, Qian Du

Affiliations: State Key Laboratory of Integrated Service Networks, Xidian University, Xi'an, China; Department of Electrical and Computer Engineering, Mississippi State University, Starkville, MS, USA

Title: Progressive Spatial Information-Guided Deep Aggregation Convolutional Network for Hyperspectral Spectral Super-Resolution

Abstract:
Fusion-based spectral super-resolution aims to yield a high-resolution hyperspectral image (HR-HSI) by integrating the available high-resolution multispectral image (HR-MSI) with the corresponding low-resolution hyperspectral image (LR-HSI). With the prosperity of deep convolutional neural networks, plentiful fusion methods have made breakthroughs in reconstruction performance promotions. Nevertheless, due to inadequate and improper utilization of cross-modality information, the most current state-of-the-art (SOTA) fusion-based methods cannot produce very satisfactory recovery quality and only yield desired results with a small upsampling scale, thus affecting the practical applications. In this article, we propose a novel progressive spatial information-guided deep aggregation convolutional neural network (SIGnet) for enhancing the performance of hyperspectral image (HSI) spectral super-resolution (SSR), which is decorated through several dense residual channel affinity learning (DRCA) blocks cooperating with a spatial-guided propagation (SGP) module as the backbone. Specifically, the DRCA block consists of an encoding part and a decoding part connected by a channel affinity propagation (CAP) module and several cross-layer skip connections. In detail, the CAP module is customized by exploiting the channel affinity matrix to model correlations among channels of the feature maps for aggregating the channel-wise interdependencies of the middle layers, thereby further boosting the reconstruction accuracy. Additionally, to efficiently utilize the two cross-modality information, we developed an innovative SGP module equipped with a simulation of the degradation part and a deformable adaptive fusion part, which is capable of refining the coarse HSI feature maps at pixel-level progressively. Extensive experimental results demonstrate the superiority of our proposed SIGnet over several SOTA fusion-based algorithms.

PaperID: 1094,

Authors: Zichen Zhang, Yongquan Dong, Wei-Chiang Hong

Affiliations: School of Computer Science and Technology, Jiangsu Normal University, Xuzhou, China; Department of Information Management, Asia Eastern University of Science and Technology, New Taipei, Taiwan

Title: Long Short-Term Memory-Based Twin Support Vector Regression for Probabilistic Load Forecasting

Abstract:
A probabilistic load forecast that is accurate and reliable is crucial to not only the efficient operation of power systems but also to the efficient use of energy resources. In order to estimate the uncertainties in forecasting models and nonstationary electric load data, this study proposes a probabilistic load forecasting model, namely BFEEMD-LSTM-TWSVRSOA. This model consists of a data filtering method named fast ensemble empirical model decomposition (FEEMD) method, a twin support vector regression (TWSVR) whose features are extracted by deep learning-based long short-term memory (LSTM) networks, and parameters optimized by seeker optimization algorithms (SOAs). We compared the probabilistic forecasting performance of the BFEEMD-LSTM-TWSVRSOA and its point forecasting version with different machine learning and deep learning algorithms on Global Energy Forecasting Competition 2014 (GEFCom2014). The most representative month data of each season, totally four monthly data, collected from the one-year data in GEFCom2014, forming four datasets. Several bootstrap methods are compared in order to determine the best prediction intervals (PIs) for the proposed model. Various forecasting step sizes are also taken into consideration in order to obtain the best satisfactory point forecasting results. Experimental results on these four datasets indicate that the wild bootstrap method and 24-h step size are the best bootstrap method and forecasting step size for the proposed model. The proposed model achieves averaged 46%, 11%, 36%, and 44% better than suboptimal model on these four datasets with respect to point forecasting, and achieves averaged 53%, 48%, 46%, and 51% better than suboptimal model on these four datasets with respect to probabilistic forecasting.

PaperID: 1095,

Authors: Sandipan Dhar, Nanda Dulal Jana, Swagatam Das

Affiliations: Department of Computer Science and Engineering, National Institute of Technology Durgapur, Durgapur, West Bengal, India; Institute for Advancing Intelligence (IAI), TCG CREST, Kolkata, India

Title: GLGAN-VC: A Guided Loss-Based Generative Adversarial Network for Many-to-Many Voice Conversion

Abstract:
Many-to-many voice conversion (VC) is a technique aimed at mapping speech features between multiple speakers during training and transferring the vocal characteristics of one source speaker to another target speaker, all while maintaining the content of the source speech unchanged. Existing research highlights a notable gap between the original and generated speech samples in terms of naturalness within many-to-many VC. Therefore, there is substantial room for improvement in achieving more natural-sounding speech samples for both parallel and nonparallel VC scenarios. In this study, we introduce a generative adversarial network (GAN) system with a guided loss (GLGAN-VC) designed to enhance many-to-many VC by focusing on architectural improvements and the integration of alternative loss functions. Our approach includes a pair-wise downsampling and upsampling (PDU) generator network for effective speech feature mapping (FM) in multidomain VC. In addition, we incorporate an FM loss to preserve content information and a residual connection (RC)-based discriminator network to improve learning. A guided loss (GL) function is introduced to efficiently capture differences in latent feature representations between source and target speakers, and an enhanced reconstruction loss is proposed for better contextual information preservation. We evaluate our model on various datasets, including VCC 2016, VCC 2018, VCC 2020, and an emotional speech dataset (ESD). Our results, based on both subjective and objective evaluation metrics, demonstrate that our model outperforms state-of-the-art (SOTA) many-to-many GAN-based VC models in terms of speech quality and speaker similarity in the generated speech samples.

PaperID: 1096,

Authors: Kunyang Lin, Yufeng Wang, Peihao Chen, Runhao Zeng, Yinjie Lei, Siyuan Zhou, Qing Du, Mingkui Tan, Chuang Gan

Affiliations: School of Software Engineering, South China University of Technology, Guangzhou, China; School of Future Technology, South China University of Technology, Guangzhou, China; Artificial Intelligence Research Institute, Shenzhen MSU-BIT University, Shenzhen, China; College of Electronics and Information Engineering, Sichuan University, Chengdu, China; The Hong Kong University of Science and Technology, Sai Kung, Hong Kong; UMass Amherst, Amherst, MA, USA

Title: When to Align: Dynamic Behavior Consistency for Multiagent Systems via Intrinsic Rewards

Abstract:
In multiagent systems, learning optimal behavior policies for individual agents remains a challenging yet crucial task. While recent research has made strides in this area, the issue of when agents should maintain consistent behaviors with one another is still not adequately addressed. This article proposes a novel approach to enable agents to autonomously decide whether their behaviors should align with those of their peers by leveraging intrinsic rewards to optimize their policies. We define behavior consistency as the divergence between the actions taken by two agents given the same observations. To encourage agents to be aware of each other’s behaviors, we propose dynamic consistency-based intrinsic reward (DCIR), which guides agents in determining when to synchronize their behaviors. In addition, we introduce a dynamic scaling network (DSN) that provides learnable scaling factors at each time step, enabling agents to dynamically decide the extent of rewarding consistent behavior. Our method is evaluated on environments including Multiagent Particle, Google Research Football, and StarCraft II Micromanagement. Experimental results demonstrate its effectiveness in learning optimal policies.

PaperID: 1097,

Authors: Die Hu, Shuyue Hu, Chunjiang Mu, Shiqi Fan, Chen Chu, Jinzhuo Liu, Zhen Wang

Affiliations: School of Mechanical Engineering, Northwestern Polytechnical University, Xi’an, China; Shanghai Artificial Intelligence Laboratory, Shanghai, China; School of Cybersecurity, Northwestern Polytechnical University, Xi’an, China; Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong, China; School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi’an, China

Title: Regret Minimization in Population Network Games: Vanishing Heterogeneity and Convergence to Equilibria

Abstract:
Understanding and predicting the behavior of large-scale multiagents in games remains a fundamental challenge in multiagent systems. This article examines the role of heterogeneity in equilibrium formation by analyzing how smooth regret matching drives a large number of heterogeneous agents with diverse initial policies toward unified behavior. By modeling the system state as a probability distribution of regrets and analyzing its evolution through the continuity equation, we uncover a key phenomenon in diverse multiagent settings: the variance of the regret distribution diminishes over time, leading to the disappearance of heterogeneity and the emergence of consensus among agents. This universal result enables us to prove convergence to quantal response equilibria in both competitive and cooperative multiagent settings. This work advances the theoretical understanding of multiagent learning and offers a novel perspective on equilibrium selection in diverse game-theoretic scenarios.

PaperID: 1098,

Authors: Yidan Zhang, Junlin Yu, Guo-Bo Li, Zhenan He, Gary G. Yen

Affiliations: College of Computer Science, Sichuan University, Chengdu, China; Department of Medicinal Chemistry, West China School of Pharmacy, Sichuan University, Chengdu, China

Title: REaMA: Building Biomedical Relation Extraction Specialized Large Language Models Through Instruction Tuning

Abstract:
Aiming to identify entity pairs with biomedical semantic relations and assign specific relation types, biomedical relation extraction (BioRE) plays a critical role in biomedical text mining and information extraction (IE). Recent studies indicate that general large language models (LLMs) have made some breakthroughs in general relation extraction (RE) tasks. However, even the advanced open-source LLMs struggle with BioRE tasks. For example, WizardLM-70B and LLaMA-2-70B achieve F-scores of 14.05 and 12.21 on the BioRED dataset, respectively, significantly lagging behind the state-of-the-art (SOTA) method which scores 65.17. To address this gap, a multitask instruction-tuning framework is proposed, which can transform general LLMs into BioRE-specialized models with our meticulously curated instruction dataset, REInstruct, comprising 150000 diverse and quality instruction-response pairs. Consequently, we introduce REaMA, a series of open-source LLMs with sizes of 7B and 13B specifically tailored for BioRE tasks. Experimental results on seven representative BioRE datasets show that both REaMA-2-7B and REaMA-2-13B acquire promising performance on all datasets. Remarkably, the larger REaMA-2-13B outperforms the current SOTA method on five out of seven datasets. The result exhibits the effectiveness of instruction-tuning on REInstruct in eliciting strong RE capabilities in LLMs. Furthermore, we show that incorporating chain of thought (CoT) into REInstruct can further enhance the generalization ability of REaMA. The project is available at https://github.com/stzpp/REaMA

PaperID: 1099,

Authors: Xi Wang, Xueyang Fu, Yurui Zhu, Zheng-Jun Zha

Affiliations: School of Information Science and Technology, University of Science and Technology of China, Hefei, China

Title: DDCNet: Advanced Decoupling of Degradation and Content for Adverse Weather Image Restoration

Abstract:
Adverse weather image restoration aims to recover clear images from those affected by weather conditions such as rain, haze, and snow. Different weather types affect images in distinct ways, necessitating specific degradation removal strategies, while content reconstruction generally benefits from a consistent approach since the underlying image structure remains largely consistent. Previous methods, despite their ability to handle multiple weather conditions within a single framework, often failed to adequately separate these two critical processes, thereby adversely affecting image restoration quality. In this article, we present DDCNet, a novel framework designed to explicitly decouple degradation removal and content reconstruction when processing various adverse weather conditions within a unified network. We achieve this by separating tailored degradation removal from uniform content reconstruction at the feature level, based on channel statistics. Additionally, we utilize the Fourier transform to enhance both processes. Furthermore, to address the differing optimization directions required by different adverse weather types, we propose a novel degradation mapping (DM) loss function to constrain their respective optimization paths. Extensive experiments show that DDCNet establishes new performance standards across multiple adverse weather scenarios.

PaperID: 1100,

Authors: Mingxin Wang, Song Zhu, Xiaoyang Liu, Shiping Wen, Chaoxu Mu

Affiliations: School of Mathematics, China University of Mining and Technology, Xuzhou, China; School of Computer Science and Technology, Jiangsu Normal University, Xuzhou, China; Centre for Artificial Intelligence, University of Technology Sydney, Ultimo, NSW, Australia; School of Electrical and Automation Engineering, Tianjin University, Tianjin, China

Title: Observer-Based Event-Triggered Fault-Tolerant Synchronization for Memristive Neural Networks Subject to Multiple Failures

Abstract:
In this article, the synchronization problem of memristive neural networks (MNNs) subjected to multiple failures is investigated. First, a general form of fault model is introduced into the MNNs, which can represent and summarize various process faults, actuator faults, and their coupling. Subsequently, with the help of designing intermediate variables, two types of fault function observers based on state feedback and output feedback are constructed, and their effectiveness is verified through a generalization of Halanay-type inequalities. Then, based on the designed observers and the event-triggered strategy, two classes of fault-tolerant synchronization schemes are designed for the considered MNNs. By adjusting the controller parameter conditions, finite-time and fixed-time synchronization or quasi-synchronization of the considered MNNs system can be achieved, respectively. Finally, the effectiveness of the provided fault observers and synchronization strategies is verified through simulation and comparison experiments.

PaperID: 1101,

Authors: Zhen Zhou, Ziyuan Gu, Pan Liu, Wenwu Yu, Zhiyuan Liu

Affiliations: Jiangsu Key Laboratory of Urban ITS, Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, School of Transportation, Southeast University, Nanjing, China; School of Mathematics, Frontiers Science Center for Mobile Information Communication and Security, Southeast University, Nanjing, Jiangsu, China

Title: Leveraging Semi-Supervised Learning and Meta-Learning for Re-Identification in Few-Shot Spatiotemporal Anomaly Detection

Abstract:
Detecting spatiotemporal anomalies is imperative for addressing critical societal and engineering challenges, including public safety assurance, environmental hazard identification, epidemic surveillance, and transportation system optimization. Existing methodologies, however, face persistent limitations due to sparse labeled datasets and the inherent complexity of dynamic spatiotemporal systems. In order to bridge this gap, we present unsupervised-semi-supervised stacking (USemiS), a novel framework that synergizes semi-supervised learning with ensemble meta-learning. USemiS introduces three core innovations: 1) unsupervised component learners that extract low-level representations of heterogeneous anomalies, 2) a consensus-based tuning mechanism that dynamically weights robust learners via stability metrics, and 3) spatiotemporal MixUp (ST-MixUp), a tailored augmentation strategy that interpolates anomalies across spatial and temporal dimensions to enhance decision boundaries. By integrating these components, USemiS effectively disentangles latent anomaly patterns while mitigating label scarcity. Evaluated on large-scale traffic anomaly and crowd fall detection datasets, USemiS achieves state-of-the-art performance, outperforming existing methods by 1.3% and 2.1% in AUC under extreme low-label regimes (0.4% and 0.8% labeled data, respectively). These results underscore USemiS’s capacity to generalize across diverse spatiotemporal contexts, offering a scalable and robust solution for real-world applications where labeled anomalies are scarce yet critical.

PaperID: 1102,

Authors: Chuang Zhao, Xiaomeng Li

Affiliations: Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, SAR, China

Title: Enhancing Domain Generalization in Medical Image Segmentation With Global and Local Prompts

Abstract:
Enhancing domain generalization (DG) is a crucial and compelling research pursuit within the field of medical image segmentation, owing to the inherent heterogeneity observed in medical images. The recent success with large-scale pre-trained vision models (PVMs), such as Vision Transformer (ViT), inspires us to explore their application in this specific area. While a straightforward strategy involves fine-tuning the PVM using supervised signals from the source domains, this approach overlooks the domain shift issue and neglects the rich knowledge inherent in the instances themselves. To overcome these limitations, we introduce a novel framework enhanced by global and local prompts (GLPs). Specifically, to adapt PVM in the medical DG scenario, we explicitly separate domain-shared and domain-specific knowledge in the form of GLPs. Furthermore, we develop an individualized domain adapter to intricately investigate the relationship between each target domain sample and the source domains. To harness the inherent knowledge within instances, we devise two innovative regularization terms from both the consistency and anatomy perspectives, encouraging the model to preserve instance discriminability and organ position invariance. Extensive experiments and in-depth discussions in both vanilla and semi-supervised DG scenarios deriving from five diverse medical datasets consistently demonstrate the superior segmentation performance achieved by GLP. Our code and datasets are publicly available at https://github.com/xmed-lab/GLP.

PaperID: 1103,

Authors: Fanghui Bi, Tiantian He, Yew-Soon Ong, Xin Luo

Affiliations: College of Computer and Information Science, Southwest University, Chongqing, China; Center for Frontier AI Research, Institute of High Performance Computing, Singapore Institute of Manufacturing Technology (SIMTech), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore; Center for Frontier AI Research, Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore

Title: Discovering Spatiotemporal-Individual Coupled Features From Nonstandard Tensors-A Novel Dynamic Graph Mixer Approach

Abstract:
In this article, we present the dynamic graph mixer (DGM), a novel model for learning spatiotemporal-individual coupled features from high-dimensional and incomplete (HDI) tensors, which frequently represent dynamic interactions among real-world data samples. In contrast to existing methods, the proposed DGM possesses the following three advantages when learning representations from HDI tensors. First, it performs light graph message passing based on the conjoint attentions learned by jointly modeling latent features and implicit structures to extract the high-order connectivity. Second, a multilayer nonlinear tensor neural network (TNN) is adopted to learn the intricate attribute features of node–node–time from different views. Third, it follows the Tucker decomposition paradigm in a data density-oriented modeling mechanism to integrate node representations, preserving the overall multidimensional interaction patterns. In addition, we provide theoretical evidence that the key components in DGM can significantly improve expressiveness. Extensive experiments conducted on eight testing datasets of HDI tensors demonstrate that DGM outperforms state-of-the-art methods in both learning accuracy and efficiency.

PaperID: 1104,

Authors: Honglong Yang, Hui Tang, Shanshan Song, Xiaomeng Li

Affiliations: Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Sai Kung, Hong Kong

Title: OS-RRG: Observation State-Aware Radiology Report Generation With Balanced Diagnosis and Attention Intervention

Abstract:
Radiology report generation (RRG) aims to automatically generate detailed textual descriptions and diagnoses for clinical radiography, alleviating radiologists’ workloads, aiding inexperienced radiologists, and minimizing errors. RRG is challenging due to the need to generate coherent and clinically accurate multisentence reports that describe various medical conditions. Although previous diagnosis-guided methods achieve impressive diagnostic accuracy by explicitly converting the identified observation states (OSs) (e.g., positive, negative, and uncertain) to descriptions, these methods still struggle in accurate observation-state identification and establishing precise state-to-description alignment. These challenges largely stem from the two aspects of imbalance (interclass and intraclass) inherent in observation states. In this article, we introduce a novel framework, observation state-aware radiology report generator (OS-RRG), designed to improve both the identification of states and their alignment with clinical descriptions. Our approach includes a state-aware balancing diagnosis (SBD) module to address both interclass and intraclass imbalances, an issue that previous methods have overlooked, resulting in suboptimal identification performance. In addition, we propose a novel technique called state-guided attention intervention (SAI), which dynamically adjusts focus on critical diagnostic features through a targeted filtering and enhancement mechanism. Furthermore, we propose a task-specific learning paradigm that decouples the identification and alignment processes into independent pathways, significantly enhancing the overall performance. Experiments on the MIMIC-CXR and IU-Xray benchmarks demonstrate the superior diagnostic accuracy of our method, which outperforms existing state-of-the-art techniques. The code will be made publicly available at https://github.com/xmed-lab/OS_RRG

PaperID: 1105,

Authors: Yuxuan Liu, Hongwei Ge, Yong Luo, Chunguo Wu

Affiliations: School of Computer Science and Technology, Dalian University of Technology, Dalian, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China

Title: Find Hidden Modality Divergence: Adversarial Aware Learning for Unsupervised Visible-Infrared Person Re-Identification

Abstract:
Unsupervised visible–infrared person re-identifi-cation (Unsupervised VI-ReID) aims to learn discriminative identity features under the large modality gap without any labeled data. Currently, the state-of-the-art methods optimize cross-modality differences by using contrastive learning as the underlying paradigm. However, they neglect the problem of modality divergence during the cross-modality optimization process. This problem means that the interclass instances between the cross-modality intraclass gaps can make cross-modality intraclass instances difficult to get closer to each other in the feature space due to the effect of contrastive learning on these interclass instances. To alleviate the negative impact of the modality divergence problem, we propose an adversarial aware learning (ADAL) framework to explore the instances that generate modal divergence and adversarially optimize these explored instances. Specifically, on the one hand, we explore the optimization directions of each cluster during the cross-modality optimization process, and the cluster centroids generating positive optimization are facilitated, while the others generating negative optimization are penalized. On the other hand, we further consider the instance-level optimization process, which increases the affinities of the positive instance pairs with large cross-modality gaps to further improve the centroid-level optimization. Extensive experiments conducted on the visible–infrared person Re-ID datasets show that the proposed method is used as a universally applicable plug-in module to add the existing unsupervised VI-ReID methods, which outperforms the existing state-of-the-art approaches.

PaperID: 1106,

Authors: Yidan Zhang, Zhenan He, Gary G. Yen

Affiliations: College of Computer Science, Sichuan University, Chengdu, China

Title: Uncovering Large Language Model Weaknesses in Character and Word Understanding and Manipulating

Abstract:
Recently, large language models (LLMs) have showcased remarkable capabilities across a diverse range of applications, including general natural language processing (NLP) and domain-specific tasks. Empirical evidence indicates that LLMs have matched or even surpassed human performance in various areas, such as language translation, reading comprehension, and logical reasoning. However, preliminary research reveals that LLMs struggle with basic character and word editing, which is crucial for practical tasks such as creating 1000-word articles or modifying specific text information. To comprehensively assess the capabilities of LLMs in character and word understanding and manipulation (CWUM), we introduce the CWUM benchmark in Chinese and English. CWUM comprises 23 tasks focusing on text editions, including counting, identification, insertion, and reversal. A comprehensive evaluation of nine advanced LLMs on CWUM is conducted, which highlights significant failures of existing LLMs on CWUM tasks that humans can solve perfectly with 100% accuracy. Meanwhile, specific deficiencies of LLMs in basic language understanding and manipulation are revealed by performing quality and quantity analysis. Furthermore, in the experiment part, various methods are investigated to improve model performance, demonstrating the effectiveness of supervised fine-tuning (SFT) in enhancing model performance on CWUM while maintaining generalization abilities on unseen tasks.

PaperID: 1107,

Authors: Haoyu Wang, Sizheng Guo, Jin Ye, Zhongying Deng, Junlong Cheng, Tianbin Li, Jianpin Chen, Yanzhou Su, Ziyan Huang, Yiqing Shen, Bin Fu, Shaoting Zhang, Junjun He

Affiliations: School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China; School of Informatics, Xiamen University, Xiamen, China; Department of Data Science and AI, Faculty of IT, Monash University, Melbourne, Australia; Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, U.K.; College of Computer Science, Sichuan University, Sichuan, China; Shanghai Artificial Intelligence Laboratory, Shanghai, China; Technology Finance Center, China Construction Bank Fujian Province Branch, Fujian, China; University of Electronic Science and Technology of China, Chengdu, China; Department of Computer Science, Johns Hopkins University, Baltimore, USA; Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences, Shenzhen, China

Title: SAM-Med3D: A Vision Foundation Model for General-Purpose Segmentation on Volumetric Medical Images

Abstract:
Existing volumetric medical image segmentation models are typically task-specific, excelling at specific targets but struggling to generalize across anatomical structures or modalities. This limitation restricts their broader clinical use. In this article, we introduce segment anything model (SAM)-Med3D, a vision foundation model (VFM) for general-purpose segmentation on volumetric medical images. Given only a few 3-D prompt points, SAM-Med3D can accurately segment diverse anatomical structures and lesions across various modalities. To achieve this, we gather and preprocess a large-scale 3-D medical image segmentation dataset, SA-Med3D-140K, from 70 public datasets and 8K licensed private cases from hospitals. This dataset includes 22K 3-D images and 143K corresponding masks. SAM-Med3D, a promptable segmentation model characterized by its fully learnable 3-D structure, is trained on this dataset using a two-stage procedure and exhibits impressive performance on both seen and unseen segmentation targets. We comprehensively evaluate SAM-Med3D on 16 datasets covering diverse medical scenarios, including different anatomical structures, modalities, targets, and zero-shot transferability to new/unseen tasks. The evaluation demonstrates the efficiency and efficacy of SAM-Med3D, as well as its promising application to diverse downstream tasks as a pretrained model. Our approach illustrates that substantial medical resources can be harnessed to develop a general-purpose medical AI for various potential applications. Our dataset, code, and models are available at: https://github.com/uni-medical/SAM-Med3D

PaperID: 1108,

Authors: Yang Nan, Huichi Zhou, Xiaodan Xing, Guang Yang

Affiliations: Bioengineering Department and Imperial-X, Imperial College London, London, U.K.; GSK, London, U.K.

Title: Beyond the Hype: A Dispassionate Look at Vision-Language Models in Medical Scenario

Abstract:
Recent advancements in large vision-language models (LVLMs) have demonstrated remarkable capabilities across diverse tasks, garnering significant attention in AI communities. However, their performance and reliability in specialized domains such as medicine remain insufficiently assessed. In particular, most assessments overconcentrate on evaluating VLMs based on simple visual question answering (VQA) on multimodality data while ignoring the in-depth characteristics of LVLMs. In this study, we introduce RadVUQA, a novel radiological visual understanding and question answering benchmark, to comprehensively evaluate existing LVLMs. RadVUQA mainly validates LVLMs across five dimensions: 1) anatomical understanding, assessing the models’ ability to visually identify biological structures; 2) multimodal comprehension, which involves the capability of interpreting linguistic and visual instructions to produce desired outcomes; 3) quantitative and spatial reasoning, evaluating the models’ spatial awareness and proficiency in combining quantitative analysis with visual and linguistic information; 4) physiological knowledge, measuring the models’ capability to comprehend functions and mechanisms of organs and systems; and 5) robustness, which assesses the models’ capabilities against unharmonized and synthetic data. The results indicate that both generalized LVLMs and medical-specific LVLMs have critical deficiencies with weak multimodal comprehension and quantitative reasoning capabilities. Our findings reveal the large gap between existing LVLMs and clinicians, highlighting the urgent need for more robust and intelligent LVLMs. The code is available at https://github.com/Nandayang/RadVUQA

PaperID: 1109,

Authors: Muhammad Ahmad, Manuel Mazzara, Salvatore Distefano, Adil Mehmood Khan, Muhammad Hassaan Farooq Butt, Danfeng Hong

Affiliations: SDAIA-KFUPM Joint Research Center for Artificial Intelligence (JRCAI), King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia; Institute of Software Development and Engineering, Innopolis University, Innopolis, Russia; Dipartimento di Matematica e Informatica-MIFT, University of Messina, Messina, Italy; School of Computer Science, University of Hull, Hull, U.K.; School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, Sichuan, China; Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China

Title: PolicyMamba: Localized Policy Attention With State Space Model for Land Cover Classification

Abstract:
Multihead self-attention and cross-attention mechanisms often suffer from computational inefficiencies, limited scalability, and suboptimal contextual understanding, particularly in hyperspectral image (HSI) classification. These mechanisms struggle to effectively capture long-range dependencies while maintaining computational feasibility due to the quadratic complexity of self-attention. To address these challenges, this work proposes PolicyMamba, a spectral–spatial mamba model enhanced with a localized policy attention mechanism. This mechanism reduces computational overhead by restricting attention to nonoverlapping localized regions and enforcing sparsity constraints, ensuring that only the most informative interactions are retained. A hierarchical aggregation strategy further integrates patch-wise attention outputs, preserving spectral–spatial correlations across scales. In addition, a sliding window patch process enhances local feature continuity while mitigating information loss. The PolicyMamba framework integrates spectral–spatial token generation, token enhancement, localized attention, and state transition modules, significantly improving HSI feature representation. Extensive experiments demonstrate that PolicyMamba achieves superior classification accuracy, outperforming conventional and state-of-the-art methods in land cover classification (LCC) by efficiently modeling intricate dependencies in HSI data.

PaperID: 1110,

Authors: Luca Menicali, David H. Richter, Stefano Castruccio

Affiliations: Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN, USA; Department of Civil and Environmental Engineering, University of Notre Dame, Notre Dame, IN, USA

Title: Bayesian Neural Networks With Physics-Informed Priors With Application to Boundary Layer Velocity

Abstract:
One of the most popular recent areas of machine learning predicates the use of neural networks (NNs) augmented by information about the underlying process in the form of partial differential equations (PDEs). These physics-informed NNs (PINNs) are obtained by penalizing the inference with a PDE and have been cast as a minimization problem currently lacking a formal approach to quantify the uncertainty. In this work, we propose a novel model-based framework that regards the PDE as a prior information of a deep Bayesian NN (BNN), physics-informed prior (PIP)-BNN. The prior is calibrated without data to resemble the PDE solution in the prior mean, while our degree of confidence in the PDE with respect to the data is expressed in terms of the prior variance. The information embedded in the PDE are then propagated to the posterior yielding physics-informed forecasts with uncertainty quantification. We apply our approach to a simulated viscous fluid and to an experimentally obtained turbulent boundary layer velocity in a water tunnel using an appropriately simplified Navier–Stokes (NS) equation. Our approach requires very few observations to produce physically consistent forecasts as opposed to nonphysical forecasts stemming from noninformed priors, thereby allowing forecasting complex systems, where some amount of data as well as some contextual knowledge is available.

PaperID: 1111,

Authors: Yuning Yang, Xiurui Xie, Guowei Peng, Malu Zhang, Guangchun Luo, Yang Yang, Guisong Liu

Affiliations: Laboratory of Intelligent Collaborative Computing, University of Electronic Science and Technology of China, Chengdu, China; School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China; Complex Laboratory of New Finance and Economics, School of Computing and Artificial Intelligence, Southwestern University of Finance and Economics, Chengdu, China

Title: EMWQ: An Efficient Mixed Precision Weight Quantization Method for Large Language Models

Abstract:
Large language models (LLMs) have gained a lot of attention and achievements recently because of their significant comprehension and generative abilities. However, the large-scale parameters of LLMs require considerable computational resources in the training and inference process, which restricts their wide application. To overcome this challenge, we propose an efficient mixed precision weight quantization (EMWQ) method for LLMs in this article. Specifically, we introduce a new outlier detection method by analyzing the weight distribution instead of the conventional weight magnitude. Then, we propose a dual-quantization strategy that quantizes both the outlier critical columns and the residual matrices with different precision. Besides, we introduce two effective EMWQ-based application frameworks, the EMWQ-R and EMWQ-O in our study. Comprehensive experiments are conducted on the Penn Treebank (PTB), C4, ARC-Easy datasets, and MMLU benchmark across various tasks. The comparison results demonstrate that the proposed EMWQ achieves state-of-the-art performance in mixed precision quantization and further reduces computational memory cost. Besides, it has higher generalizability compared with conventional methods.

PaperID: 1112,

Authors: Zhijia Yang, Kun Gao, Yanzheng Zhang, Xiaodian Zhang, Zibo Hu, Junwei Wang, Jingyi Wang, Wei Li

Affiliations: School of Optics and Photonics and the Key Laboratory of Photoelectronic Imaging Technology and System, Ministry of Education of China, Beijing Institute of Technology, Beijing, China; National Satellite Meteorological Center, China Meteorological Administration (NSMC/CMA), Beijing, China

Title: DSFuse: A Dual-Diffusion Structure for Feature Fidelity Infrared and Visible Image Fusion

Abstract:
Image fusion aims to combine the complementary features of different modalities to produce an informative fused image. Due to the different imaging mechanisms, information conflicts may arise from infrared and visible source images. Existing infrared and visible fusion methods are devoted to preserving the features of source images as much as possible. However, handling conflicting information is often overlooked. Thus, we leverage the powerful generative priors of diffusion and propose a dual-diffusion structure, termed DSFuse, to handle conflicting information and achieve feature fidelity during image fusion processing. Diffusion modules are introduced to guide the fusion network to understand the meaningful information of the source image easily. First, the fusion network is used to retain features in the fused image as much as possible. Then, diffusion modules are used to reconstruct source images from noise based on the output of the fusion network. Finally, feedback from the diffusion modules forces the fusion network to aggregate modality information to ensure fidelity; the high quality of the fusion result is also profitable for a better reconstruction of diffusion modules, forming a positive feedback loop. In addition, we release a new dataset for infrared/visible fusion to support the fusion network training and evaluation, named the multiscene infrared and visible (MSIV) images dataset. Extensive experiments demonstrate that DSFuse outperforms other state-of-the-art (SOTA) fusion methods.

PaperID: 1113,

Authors: Qiya Song, Jiajun Hu, Lin Xiao, Bin Sun, Xieping Gao, Shutao Li

Affiliations: College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China; School of Electrical and Information Engineering, Hunan University, Changsha, Hunan, China

Title: DiffCL: A Diffusion-Based Contrastive Learning Framework With Semantic Alignment for Multimodal Recommendations

Abstract:
Multimodal recommendation systems integrate diverse multimodal information into the feature representations of both items and users, thereby enabling a more comprehensive modeling of user preferences. However, existing methods are hindered by data sparsity and the inherent noise within multimodal data, which impedes the accurate capture of users’ interest preferences. Additionally, discrepancies in the semantic representations of items across different modalities can adversely impact the prediction accuracy of recommendation models. To address these challenges, we introduce a novel diffusion-based contrastive learning (DiffCL) framework for multimodal recommendation. DiffCL employs a diffusion model (DM) to generate contrastive views that effectively mitigate the impact of noise during the contrastive learning phase. Furthermore, it improves semantic consistency across modalities by aligning distinct visual and textual semantic information through stable ID embeddings. Finally, the introduction of the item-item graph (I-I graph) enhances multimodal feature representations, thereby alleviating the adverse effects of data sparsity on the overall system performance. We conduct extensive experiments on three public datasets, and the results demonstrate the superiority and effectiveness of the DiffCL.

PaperID: 1114,

Authors: Wenjin Huang, Conghui Luo, Baoze Zhao, Han Jiao, Yihua Huang

Affiliations: School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, China

Title: HCG: Streaming DCNN Accelerator With a Hybrid Computational Granularity Scheme on FPGA

Abstract:
With the growth of field-programmable gate array (FPGA) hardware resources, streaming DCNN accelerators leverage interconvolutional-layer parallelism to enhance throughput. In existing streaming accelerators, convolution nodes typically adopt layer- or column-based tiling methods, where the tiled input feature map (Ifmap) encompasses all input channels. This approach facilitates the comprehensive calculation of the output feature map (Ofmap) and maximizes interlayer parallelism. The computational granularity, defined in this study as the calculated rows or columns of Ofmap based on each tiled Ifmap data, significantly influences on-chip Ifmap storage and off-chip weight bandwidth (BW). The uniform application of computational granularity across all nodes inevitably impacts the memory-BW tradeoff. This article introduces a novel streaming accelerator with a hybrid computational granularity (HCG) scheme. Each node employs an independently optimized computational granularity, enabling a more flexible memory-BW tradeoff and more effective utilization of FPGA resources. However, this hybrid scheme can introduce pipeline bubbles and increase system pipeline complexity and control logic. To address these challenges, this article theoretically analyzes the impact of computational granularity on individual computing nodes and the overall system, aiming to establish a seamless system pipeline without pipeline bubbles and simplify system design. Furthermore, the article develops a hardware overhead model and employs a heuristic algorithm to optimize computational granularity for each computing node, achieving optimal memory-BW tradeoff and higher throughput. Finally, the effectiveness of the proposed design and optimization methodology is validated through the implementation of a 3-TOPS ResNet-18 accelerator on the Alveo U250 development board under BW constraints of 25, 20, and 15 GB/s. Additionally, accelerators for 4-TOPS VGG-16, 4-TOPS ResNet-34, 5-TOPS ResNet-50, 3-TOPS MobileNetV1, 4-TOPS ConvNeXt-T, and 4-TOPS ResNeXt-50 are implemented, surpassing the performance of most existing works.

PaperID: 1115,

Authors: Jian Cao, Chen Qian, Yihui Huang, Dicheng Chen, Yuncheng Gao, Jiyang Dong, Di Guo, Xiaobo Qu

Affiliations: Department of Electronic Science, Xiamen University-Neusoft Medical Magnetic Resonance Imaging Joint Research and Development Center, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, Xiamen, China; School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, China

Title: A Dynamics Theory of RMSProp-Based Implicit Regularization in Deep Low-Rank Matrix Factorization

Abstract:
Implicit regularization induced by gradient optimization is an important way to understand generalization in neural networks. Recent theory explains implicit regularization over the deep matrix factorization (DMF) model and analyzes the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics can mathematically characterize the practical learning rate of adaptive gradient (AdaGrad) optimization, such as root-mean-square propagation (RMSProp). Discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters difficulty in complex computation for deep networks. In this work, we introduce another discrete gradient dynamics, landscape analysis, to theoretically and experimentally explain the implicit regularization of RMSProp-based deep networks. It mainly focuses on gradient regions like saddle points and local minima. We investigate the benefits of increasing learning rates in saddle point escaping (SPE) stages. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. Besides, we analyze the time it takes to escape from the plateau of the SPE stage. These conclusions are further experimentally verified on low-rank matrix, image reconstruction, and Hankel matrix reconstruction problems. Our proof is also applicable to gradient descent (GD) and adaptive moment estimation (Adam) but cannot apply to AdaGrad, further showing experimentally that the implicit regularization capability of RMSProp is stronger than GD and AdaGrad and weaker than Adam.

PaperID: 1116,

Authors: Zebin Chen, Yanwei Ding, Wenjian Tao, Jinxiu Zhang, Hui-Jie Sun

Affiliations: School of Aeronautics and Astronautics, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China

Title: ADP-Based Orbit Tracking Control for Deep Space Probe Flying Around Unknown Asteroid

Abstract:
This article investigates the orbit tracking control problem for the deep space probe flying around an unknown asteroid under completely unknown dynamics. First, an orbit tracking control model for the relative motion of the probe to the asteroid is established. Then, a model-based controller is designed for the optimal tracking control problem and the asymptotic stability of the closed-loop system is proved. Next, an adaptive dynamic programming (ADP) algorithm based on policy iteration is adopted to obtain a model-free suboptimal controller that approximates the previous model-based controller, followed by the convergence analysis. Specifically, by collecting some data online, a system of high-order linear equations is constructed and then solved to obtain parameters utilized in controller construction. Finally, numerical simulations are provided to validate the effectiveness and the performance of the proposed control method.

PaperID: 1117,

Authors: Xin-Xin Han, Kai-Ning Wu, Xin Yuan

Affiliations: Department of Mathematics, Nanjing Forestry University, Nanjing, Jiangsu, China; Department of Mathematics, Harbin Institute of Technology, Weihai, China; School of Electrical and Mechanical Engineering, The University of Adelaide, Adelaide, SA, Australia

Title: Asynchronous Boundary Stabilization of Stochastic Markovian Reaction-Diffusion Neural Networks With Mode-Dependent Delays

Abstract:
This article tackles asynchronous control issue for a class of stochastic Markovian reaction-diffusion neural networks with mode-dependent delays (MDDs). Taking into account the spatio-temporal distribution of such networks, we propose a boundary control (BC) scheme combined with asynchronous control to reduce control implementation cost and overcome environmental constraint. By incorporating a hidden Markov model to manage the mode asynchrony, we develop an integral asynchronous boundary controller for Neumann boundary conditions, as well as an innovative one for Dirichlet boundary conditions. We then derive an exponential stability criterion specific to MDDs and introduce a novel asynchronous BC synthesis approach. Additionally, we extend our findings to the leader-follower synchronization of these neural networks. The validity, superiority, and practicality of the proposed control design approach are demonstrated via three numerical examples, respectively.

PaperID: 1118,

Authors: Mengyao Du, Miao Zhang, Yuwen Pu, Qingming Li, Shouling Ji, Quanjun Yin

Affiliations: School of Systems Engineering, National University of Defense Technology, Changsha, China; School of Big Data and Software Engineering, Chongqing University, Chongqing, China; School of Computer Science, Zhejiang University, Hangzhou, China

Title: The Risk of Federated Learning to Skew Fine-Tuning Features and Underperform Robustness

Abstract:
To tackle the scarcity and privacy issues associated with domain-specific datasets, the integration of federated learning in conjunction with fine-tuning (FT) has emerged as a practical solution. However, our findings reveal that federated learning has the risk of skewing FT features and compromising the out-of-distribution (OOD) robustness of pretrained models. By introducing three robustness indicators and conducting experiments across diverse robust datasets, we elucidate these phenomena by scrutinizing the ability of data representations, transferability, and deviations within the model. To mitigate the negative impact of practical federated learning on model robustness, we introduce a general noisy projection (GNP)-based robust algorithm, ensuring no deterioration of accuracy on the target distribution. Specifically, the key strategy for enhancing model robustness entails the transfer of robustness from the pretrained model to the fine-tuned model, coupled with adding a small amount of Gaussian noise to augment the representative capacity of the model. The comprehensive experimental results demonstrate that our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient FT (PEFT) methods and confronting different levels of label distribution skew and quantity distribution skew.

PaperID: 1119,

Authors: Joseph A. Vincent, Mac Schwager

Affiliations: Department of Aeronautics and Astronautics, Stanford University, Stanford, CA, USA

Title: Reachable Polyhedral Marching (RPM): An Exact Analysis Tool for Deep-Learned Control Systems

Abstract:
Neural networks are increasingly used in robotics as policies, state transition models, state estimation models, or all of the above. With these components being learned from data, it is important to be able to analyze what behaviors were learned and how this affects closed-loop performance. In this article, we take steps toward this goal by developing methods for computing control invariant sets and regions of attraction (ROAs) of dynamical systems represented as neural networks. We focus our attention on feedforward neural networks with the rectified linear unit (ReLU) activation, which are known to implement continuous piecewise-affine (PWA) functions. We describe the reachable polyhedral marching (RPM) algorithm for enumerating the affine pieces of a neural network through an incremental connected walk. We then use this algorithm to compute exact forward and backward reachable sets, from which we provide methods for computing control invariant sets and ROAs. Our approach is unique in that we find these sets incrementally, without Lyapunov-based tools. In our examples, we demonstrate the ability of our approach to find nonconvex control invariant sets and ROAs on tasks with learned van der Pol oscillator and pendulum models. Further, we provide an accelerated algorithm for computing ROAs that leverages the incremental and connected enumeration of affine regions that RPM provides. We show this acceleration to lead to a 15× speedup in our examples. Finally, we apply our methods to find a set of states that are stabilized by an image-based controller for an aircraft runway control problem.

PaperID: 1120,

Authors: Qiulei Dong, Zhengming Zhou, Xiaolan Qiu, Liting Zhang

Affiliations: State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China; Key Laboratory of Technology in Geo-Spatial Information Processing and Application System and the National Key Laboratory of Microwave Imaging Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China

Title: A Survey on Self-Supervised Monocular Depth Estimation Based on Deep Neural Networks

Abstract:
Monocular depth estimation aims to predict the corresponding scene depth map to an input image, which has wide application prospects in various fields, such as robot navigation, autonomous driving, and augmented reality. Due to the advantage that only images rather than ground truth depth maps are required for model training, self-supervised monocular depth estimation methods have received more and more attention in recent years. Although numerous self-supervised monocular depth estimation methods were proposed, there has been no a comprehensive survey on them yet. Addressing this issue, we review recent developments in the community of self-supervised monocular depth estimation in this article. First, 89 existing works in the literature are categorized and reviewed. Then, we introduce the public datasets and evaluation metrics used in monocular depth estimation. Next, the performances of some state-of-the-art methods are compared and analyzed. Finally, we summarize several open problems and possible future developments in this community.

PaperID: 1121,

Authors: Chao Sun, Xing Wu, Yanxu Su, Xiasheng Shi, Changyin Sun

Affiliations: School of Artificial Intelligence, Anhui University, Hefei, China; College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Title: Multithreaded Asynchronous Deep Reinforcement Learning With Multisensor Fusion for Robot Collision Avoidance

Abstract:
To develop a safe and efficient navigation system of robotic vehicles in dynamic scenes, a new collision-avoidance method using deep reinforcement learning (DRL) is presented. First, a novel method of DRL based on multithreaded asynchronous proximal policy optimization (MAPPO) is developed. It can convert expensive online calculation into an offline training process, improving the sample efficiency during policy learning. Then, a multisensor fusion measurement (MSFM) method is presented by the combination of global reference path (GRP), laser scanner measurement (LSM), and motion energy (ME), to observe the state space of environment to maximum extent. By multireward refining at each timestep, the sparsity of rewards is avoided. On this basis, a collision-avoidance neural network (CANN) fused in multiscale and multilevel is devised to generate high-quality obstacle features, which can enable the MAPPO to master collision threat effectively. Besides, a premature collision prediction (PCP) module supervised by GRP is devised as an auxiliary task to learn high-level feature representation to further improve the safety during robot collision avoidance. Finally, a two-stage training strategy from 2-D Stage to 3-D Gazebo is presented to realize sufficient robot-environment interaction. This way, the policy model can maximize its degree of exploration in complex dynamic scenarios. Extensive navigation experiments are conducted on the complex simulation and real-world scenarios with a variety of obstacles, along with multiple comparative experiments to testify the effectiveness and robustness of our approach in robot collision avoidance. Experiment results reveal that our method can make farsighted navigation decisions in complex dynamic environments to dodge collisions successfully while moving toward the goal.

PaperID: 1122,

Authors: Kaiqun Zhu, Zidong Wang, Derui Ding, Jun Hu, Hongli Dong

Affiliations: Department of Control Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China; College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao, China; Department of Applied Mathematics, School of Automation, Harbin University of Science and Technology, Harbin, China; Artificial Intelligence Energy Research Institute, Northeast Petroleum University, Daqing, China

Title: Proportional-Integral-Observer-Based Fusion Estimation for Artificial Neural Networks: Implementing a One-Bit Encoding Scheme

Abstract:
This article is concerned with the proportional-integral-observer (PIO)-based fusion estimation problem for a class of artificial neural networks (ANNs) equipped with multiple sensors, which are constrained by bandwidth and subjected to unknown-but-bounded noises (UBBNs). For the purpose of efficient information communication, an approach known as the one-bit encoding mechanism (OBEM) is proposed that enables the encoding of scalar data using merely a single bit. Then, a local PIO-based set-membership estimator is devised for each sensor node, with the aim of achieving the desired estimation task while considering the possible data distortion due to OBEM and the existence of UBBNs. Subsequently, sufficient conditions are established to ensure the existence and effectiveness of the PIO-based set-membership estimator. Moreover, to enhance the global estimation performance, an ellipsoid-based fusion rule is introduced for all local PIO-based set-membership estimators. The performance of fusion estimation is then analyzed using set theory and the optimization method, leading to the determination of relevant parameters. Finally, the effectiveness and advantages of the proposed estimation algorithm are demonstrated through a simulation example.

PaperID: 1123,

Authors: Jingyang Chen, Ping Li, Jiancheng Lv, Hongyuan Zha, Kai Zhang, Jie Zhang

Affiliations: Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China; School of Computer Science and Software Engineering, Southwest Petroleum University, Chengdu, China; College of Computer Science, Sichuan University, Chengdu, China; School of Data Science, Chinese University of Hong Kong at Shenzhen, Shenzhen, China; School of Computer Science and Technology, East China Normal University, Shanghai, China

Title: Learning Temporal Features With Alternated Similarity and Proximity Attention for Time-Series Prediction

Abstract:
Time-series prediction is a fundamental problem in various scientific and engineering domains. Recently, attention-based models have shown great promise in long-term time-series forecasting. However, we prove that vanilla attention is equivalent to a one-step random walk on a bipartite graph between the query and the keys, in which the limited number of walks and simplified graph structure could make it less powerful in capturing complex, high-order featural and temporal dependencies. Inspired by how human brains iteratively reactivate memories through reminding, we propose “Alternated Similarity And Proximity Attention,” or ASAP-attention. ASAP-attention employs a random walk on two concurrent views (graphs) that, respectively, capture the featural similarity and the temporal proximity between time points. In particular, the random walk alternately visits the two graphs, each time remembering the previous probability configuration to build a coherent chain of distributions to retrieve useful historical data. This dynamic interplay between temporal and featural clues enhances the model’s ability to capture implicit and heterogeneous data dependencies without using positional encoding. When incorporating ASAP-attention with encoder-only Transformer architecture, we observed highly promising results against a wide collection of state-of-the-art methods on various benchmark datasets for long time-series forecasts (e.g., weather, electricity, illness, and exchange-rate data). Our source code is available at https://github.com/jychen01/ASAP-attention

PaperID: 1124,

Authors: Boyue Wang, Xiaoqian Ju, Junbin Gao, Xiaoyan Li, Yongli Hu, Baocai Yin

Affiliations: Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing Artificial Intelligence Institute, School of Information Science and Technology, Beijing University of Technology, Beijing, China; Discipline of Business Analytics, The University of Sydney Business School, The University of Sydney, Sydney, NSW, Australia

Title: Counterfactual Dual-Bias VQA: A Multimodality Debias Learning for Robust Visual Question Answering

Abstract:
Visual question answering (VQA) models often face two language bias challenges. First, they tend to rely solely on the question to predict the answer, often overlooking relevant information in the accompanying images. Second, even when considering the question, they may focus only on the wh-words, neglecting other crucial keywords that could enhance interpretability and the question sensitivity. Existing debiasing methods attempt to address this by training a bias model using question-only inputs to enhance the robustness of the target VQA model. However, this approach may not fully capture the language bias present. In this article, we propose a multimodality counterfactual dual-bias model to mitigate the linguistic bias issue in target VQA models. Our approach involves designing a shared-parameterized dual-bias model that incorporates both visual and question counterfactual samples as inputs. By doing so, we aim to fully model language biases, with visual and question counterfactual samples, respectively, emphasizing important objects and keywords to relevant the answers. To ensure that our dual-bias model behaves similarly to an ordinary model, we freeze the parameters of the target VQA model, meanwhile using the cross-entropy and Kullback-Leibler (KL) divergence as the loss function to train the dual-bias model. Subsequently, to mitigate language bias in the target VQA model, we freeze the parameters of the dual-bias model to generate pseudo-labels and then incorporate a margin loss to re-train the target VQA model. Experimental results on the VQA-CP datasets demonstrate the superior effectiveness of our proposed counterfactual dual-bias model. Additionally, we conduct an analysis of the unsatisfactory performance on the VQA v2 dataset. The origin code of the proposed model is available at https://github.com/Arrow2022jv/MCD

PaperID: 1125,

Authors: Yoleidy Huérfano-Maldonado, Karina Vilches-Ponce, Marco Mora, Clovis Tauber, Miguel A. Vera

Affiliations: Doctorado en Modelamiento Matemático Aplicado, Universidad Católica del Maule, Talca, Chile; Facultad de Ciencias de la Ingeniería, Laboratory of Technological Research in Pattern Recognition, Universidad Católica del Maule, Talca, Chile; U, iBraiN, Inserm Laboratory, Tours, France; Departamento de Ciencias,, Universidad Simón Bolívar, Cúcuta, Colombia

Title: Single Hidden Layer Neural Networks With Random Weights Based on Nondifferentiable Functions

Abstract:
Computational algorithms that utilize nondifferentiable functions have proven highly effective in machine learning applications. This study introduces a novel framework for incorporating nondifferentiable functions into the objective functions of random-weight neural networks, specifically focusing on functional link random vector functional-link (RVFL) networks and extreme learning machines (ELMs). Our framework explores six nondifferentiable functions: the norms L_1,1 , L_1,2 , and L_2,2 and the functions AbsMin, AbsMax, and a seminorm MaxMin. To enhance robustness, Fourier random assignments are applied as activation functions within these networks. The integration of these nondifferentiable functions into the objective functions of RVFL and ELM aims to reduce computational time in both training and testing stages, without compromising accuracy. We conducted extensive experiments on 12 benchmark datasets, encompassing small, medium, and large datasets, to evaluate the proposed algorithms against the L_2,1 -regularized random Fourier feature ELM ( L_2,1 -RF-ELM), which uses joint-norm regularization ( L_r,p ) as documented in previous studies. Our findings indicate that the algorithms based on nondifferentiable functions not only achieve high accuracy but also significantly reduce computation time compared to the L_2,1 -based algorithm and other standard machine learning approaches.

PaperID: 1126,

Authors: Jie Yang, Lingyun Xiaodiao, Guoyin Wang, Witold Pedrycz, Shuyin Xia, Qinghua Zhang, Di Wu

Affiliations: Key Laboratory of Cyberspace Big Data Intelligent Security, Chongqing University of Posts and Telecommunications, Chongqing, China; National Center for Applied Mathematics in Chongqing, Chongqing Normal University, Chongqing, China; Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada; College of Computer and Information Science, Southwest University, Chongqing, China

Title: A Robust Three-Way Classifier With Shadowed Granular Balls Based on Justifiable Granularity

Abstract:
The granular-ball (GB)-based classifier introduced by Xia exhibits adaptability in creating coarse-grained information granules for input, thereby enhancing its generality and flexibility. Nevertheless, the current GB-based classifiers rigidly assign a specific class label to each data instance and lack the necessary strategies to address uncertain instances. These far-fetched certain classification approaches toward uncertain instances may suffer considerable risks. To solve this problem, we construct a robust three-way classifier with shadowed GBs (3WC-SGBs) for uncertain data. First, combined with information entropy, we propose an enhanced GB generation method with the principle of justifiable granularity. Subsequently, based on minimum uncertainty, a shadowed mapping is utilized to partition a GB into core region (COR), important region (IMP), and unessential region (UNE). Based on the constructed shadowed GBs, we establish a three-way classifier to categorize data instances into certain classes and uncertain case. Finally, extensive comparative experiments are conducted with two three-way classifiers, three state-of-the-art GB-based classifiers, and three classical machine learning classifiers on 12 public benchmark datasets. The results show that our model demonstrates robustness in managing uncertain data and effectively mitigates classification risks. Furthermore, our model almost outperforms the other comparison methods in both effectiveness and efficiency.

PaperID: 1127,

Authors: Zhen Zhang, Yongming Han, Zhiqiang Geng

Affiliations: College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China

Title: Learning to Detect Industrial Time-Series Anomalies From Imputation Consistency With Sparse Observations

Abstract:
Time-series anomaly detection plays an important role in ensuring industrial safety. Currently, many anomaly detection methods mainly target complete time series and ignore the widespread problem of data missing in the real world. Therefore, this article proposes a novel anomaly detection method for time series with sparse observations based on imputation consistency using a mixture of patch information inference network (MoPIN). Due to the robustness of the imputation method modeling to the random mask, different imputed series of the same normal time series with different random masks should have consistency. Then, a novel imputation consistency is used to detect anomalies in sparse observation series. Moreover, the MoPIN imputes series by a two-step imputation and multiscale modeling of patch information. Meanwhile, the similarity of imputed series under different masks is used to measure imputation consistency, which well constructs the relationship between sparse observation series and anomaly scores. Finally, the MoPIN can accurately detect anomalies while imputing series. Extensive experiments on four real-world benchmarks in different domains of imputation and anomaly detection tasks and a real fluid catalytic cracking (FCC) process case demonstrate the effectiveness of the proposed method. Specifically, the MoPIN achieved at least 8.05% mean absolute error (MAE) relative improvement in imputation and 3.74% F1 relative improvement in anomaly detection.

PaperID: 1128,

Authors: Kai Yao, Zhaorui Tan, Zixian Su, Xi Yang, Jie Sun, Kaizhu Huang

Affiliations: Department of Intelligent Science, Department of Computer Science, Xi’an Jiaotong-Liverpool University, Suzhou, China; Digital Innovation Research Center and Jiangsu Provincial University Key (Construction) Laboratory for Smart Diagnosis, Duke Kunshan University, Kunshan, China

Title: SCMix: Stochastic Compound Mixing for Open Compound Domain Adaptation in Semantic Segmentation

Abstract:
Open compound domain adaptation (OCDA) aims to transfer knowledge from a labeled source domain to a mix of unlabeled homogeneous compound target domains while generalizing to open unseen domains. Existing OCDA methods solve the intradomain gaps by a divide-and-conquer strategy, which decomposes the problem into several individual and parallel domain adaptation (DA) tasks. In this work, starting from the general DA theory, we establish a novel generalization bound for the setting of OCDA. Built upon this, we argue that conventional OCDA approaches may substantially underestimate the inherent variance inside the compound target domains for model generalization, constraining the model’s performance. We subsequently present stochastic compound mixing (SCMix), an augmentation strategy with the primary objective of mitigating the divergence between the source and mixed target distributions. Theoretical analyses are conducted to substantiate the superiority of SCMix, proving that single-target mixing is a subgroup of our method. Extensive experiments show that our method attains a lower empirical risk on OCDA semantic segmentation tasks, thus supporting our theories. In particular, combining the transformer architecture, SCMix achieves a notable performance boost compared to SoTA results.

PaperID: 1129,

Authors: Tao Li, Cheng Meng, Hongteng Xu, Jun Zhu

Affiliations: Institute of Statistics and Big Data, Renmin University of China, Beijing, China; Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China

Title: Efficient Variants of Wasserstein Distance in Hyperbolic Space via Space-Filling Curve Projection

Abstract:
Hyperbolic spaces have been considered pervasively for embedding hierarchically structured data in the recent decade. However, there is a lack of studies focusing on efficient distance metrics for comparing probability distributions in hyperbolic spaces. To bridge the gap, we propose a novel metric called the hyperbolic space-filling curve projection Wasserstein (SFW) distance. The idea is to first project two probability distributions onto a space-filling curve to obtain a closed-form coupling between them and then calculate the transport distance between these two distributions in the hyperbolic space accordingly. Theoretically, we show the SFW distance is a proper metric and is well-defined for probability measures with bounded supports. Statistical convergence rates for the proposed estimator are provided as well. Moreover, we propose two variants of the SFW distance based on geodesic and horospherical projections, respectively, to combat the curse-of-dimensionality. Empirical results on synthetic and real-world data indicate that the SFW distance can effectively serve as a surrogate of the popular Wasserstein distance with low complexity.

PaperID: 1130,

Authors: Xiaoting Sun, Zhong Li, Changjun Jiang

Affiliations: School of Computer Science and Technology and the Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University, Shanghai, China; College of Information Science and Technology, Donghua University, Shanghai, China

Title: Federated Aggregation With Interlayer Personalized Contribution: Preference-Based Optimization Between Performance and Privacy

Abstract:
Currently, due to the different distribution of data for each user, many personalized federated learning (PFL) methods have emerged to meet the personalized needs of different users. However, existing methods have two problems: 1) in the aggregation process, the contribution between the internal layers of the client model is not considered and 2) it is difficult to match the quantitative weight information of both user privacy protection and performance with their qualitative preferences during the training process. Therefore, we first propose a framework for federated aggregation with interlayer personalized contribution named FedIPC, which completes model aggregation based on the contribution of internal layers and improves client model performance. Based on the above framework, we design a multiobjective federated optimization method based on adaptive preference indicators named FedAPI-nondominated sorting genetic algorithm II (NSGA-II). This method can match quantitative weight with qualitative user preferences and adaptively select for Pareto optimal solutions during the optimization process. Extensive experiments on two image datasets and a tabular dataset show that our proposed method not only accelerates model convergence, but also achieves good improvements in model performance. In addition, our proposed method can accurately match the qualitative preferences of users, balancing the performance of the model and privacy protection based on preferences.

PaperID: 1131,

Authors: Mengjie Chen, Ming Zhang, Guiying Yan, Guanghui Wang, Cunquan Qu

Affiliations: Data Science Institute, School of Mathematics, Shandong University, Jinan, China; Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China; Data Science Institute, Shandong University, Jinan, China

Title: MRHGNN: Enhanced Multimodal Relational Hypergraph Neural Network for Synergistic Drug Combination Forecasting

Abstract:
Drug combinations are vital for treating complex diseases and advancing drug development, but accurately identifying synergistic combinations remains a significant challenge. Although graph neural networks (GNNs) have recently been used to predict drug combinations, the complex interactions between drugs and multimodal data (e.g., target proteins) and the prevalent high-order relations among drugs have yet to be fully exploited. The hypergraph offers a natural methodology for modeling high-order relations and provides profound insights for multimodal fusion. Here, we introduce the multimodal relational hypergraph neural network (MRHGNN), a novel framework for predicting synergistic drug combinations. Specifically, we design a dual-channel architecture to capture the physicochemical attributes of drugs and their interactive synergies, thereby facilitating the generation of multimodal drug representations. To obtain comprehensive representations of drugs, we use an attention mechanism to explore complementarity among multimodal drug embeddings. In addition, the unified framework jointly learns primary and self-supervised learning tasks, fostering a robust predictive capability. Experimental results demonstrate that MRHGNN accurately predicts synergistic drug combinations, and the effectiveness of the dual-channel setup and motif structures has been validated through ablation studies. Further literature searches illustrate that our model holds significant promise in accelerating the discovery of novel synergistic drug combinations, particularly in cancer therapy. This study not only introduces a novel computational tool but also paves the way for advanced methodologies in drug discovery and development.

PaperID: 1132,

Authors: Gaofu Yang, Ruizhuo Song, Qing Li, Lina Xia

Affiliations: Beijing Engineering Research Center of Industrial Spectrum Imaging, School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China

Title: Nash Equilibrium in Multiplayer Graphical Games via Reinforcement Learning and Distributed Observers

Abstract:
Multiplayer game theory has been widely studied, with most existing research focusing on fully connected network structures. In contrast, multiplayer graphical games consider sparser communication topologies, making them more practical for large-scale systems. This article, based on a reinforcement learning (RL) method, investigates the problem of computing Nash equilibrium (NE) strategies in a class of multiplayer graphical games where the system is influenced by an external system. To estimate the unknown states of the external system, we propose a distributed adaptive observer and prove that its observation error asymptotically converges to zero. Furthermore, we derive a range of discount factor values that preserve system stability. To solve for the NE strategy, we develop an off-policy algorithm integrated with the distributed adaptive observer for policy evaluation. To enhance convergence speed, we introduce a distributed policy improvement mechanism, which ensures policy convergence to equilibrium while maintaining system stability. The effectiveness of the proposed algorithm is validated through simulations on a voltage synchronization system.

PaperID: 1133,

Authors: Jonggyu Jang, Hyeonsu Lyu, Seongjin Hwang, Hyun Jong Yang

Affiliations: Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA; Department of Electrical Engineering, Pohang University of Science and Technology (POSTECH), Pohang, Republic of Korea; Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea

Title: Unveiling Hidden Visual Information: A Reconstruction Attack Against Adversarial Visual Information Hiding

Abstract:
This article investigates the security vulnerabilities of adversarial example-based image encryption by executing data reconstruction (DR) attacks on encrypted images. A representative image encryption method is the adversarial visual information hiding (AVIH), which uses type-I adversarial example training to protect gallery datasets used in image recognition tasks. In the AVIH method, the type-I adversarial example approach creates images that appear completely different but are still recognized by machines as the original ones. Additionally, the AVIH method can restore encrypted images to their original forms using a predefined private key generative model. For the best security, assigning a unique key to each image is recommended; however, storage limitations may necessitate some images sharing the same key model. This raises a crucial security question for AVIH: How many images can safely share the same key model without being compromised by a DR attack? To address this question, we introduce a dual-strategy DR attack against the AVIH encryption method by incorporating 1) generative-adversarial loss and 2) augmented identity loss, which prevent DR from overfitting—an issue akin to that in machine learning. Our numerical results validate this approach through image recognition and re-identification benchmarks, demonstrating that our strategy can significantly enhance the quality of reconstructed images, thereby requiring fewer key-sharing encrypted images. The source code to reproduce the results will be available in https://github.com/jonggyujang0123/Hiding_person.

PaperID: 1134,

Authors: Lei Chen, Jiajun Tang, Ying Zou, Xuxin Liu, Xingquan Xie, Guangyang Deng

Affiliations: School of Information and Electrical Engineering, Hunan University of Science and Technology, Xiangtan, China

Title: Lightweight and Fast Time-Series Anomaly Detection via Point-Level and Sequence-Level Reconstruction Discrepancy

Abstract:
Unsupervised time-series anomaly detection (TSAD) aims to identify anomalies in industrial sensing signals to ensure production safety. As Industry 4.0 emerges, TSAD deployment must migrate from resource-rich cloud to resource-limited edges for real-time and fine-grained control. In this case, it raises new targets with high accuracy, high timeliness, and low consumption for TSAD. However, existing models focus solely on achieving high accuracy by building neural networks with deep structures and large parameters. Consequently, these models demand prohibitive training durations and computational overhead, which makes them unsuitable for edge deployment. To solve this issue, an unsupervised lightweight and fast TSAD model, namely, LFTSAD, is proposed via point-level and sequence-level reconstruction discrepancy in this article. First, to achieve high timeliness and low consumption, LFTSAD uses two two-layer multilayer perceptron networks (MLPs) to construct a lightweight contrastive architecture with few parameters. Second, leveraging the lightweight architecture, a dual-branch reconstruction network is designed to generate corresponding reconstruction discrepancies from point-level and sequence-level perspectives. Finally, a novel anomaly scoring scheme is designed to combine point-level and sequence-level reconstruction discrepancies for more accurate anomaly detection. To the best of our knowledge, this is the first work to develop a lightweight All-MLP-based TSAD model for resource-limited edge devices. Extensive experiments demonstrate that LFTSAD is 3–10 times faster in timeliness, consumes only 1/2 of the resources, and achieves accuracy that is either comparable to or superior to several deep SOTA models. The source code of LFTSAD is here https://github.com/infogroup502/LFTSAD

PaperID: 1135,

Authors: Shan Zhong, Gang Wang, Kah Chan Teh, Jiacheng He, Tee Hiang Cheng, Bei Peng

Affiliations: School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, China; School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China; School of Electrical and Electronic Engineering, Nanyang Technological University, Jurong West, Singapore

Title: Online Graph Models: Tackling the Challenges of Non-Gaussian Noise in Adaptive Filtering

Abstract:
Adaptive filtering faces significant challenges in handling complex non-Gaussian noise, while graph signal processing (GSP) excels at processing data with intricate structures. This brief introduces a novel method for solving non-Gaussian noise from the perspective of the graph domain for the first time. Specifically, we develop an online time-varying graph model based on the filter error signal and propose a corresponding graph topology transformation strategy. Utilizing a graph smoothness measure, we introduce a new adaptive filtering cost function, in which the graph Laplacian matrix plays a direct role in the filter update process. Subsequently, we derive the graph smoothness recursive adaptive filtering (GS-RAF) algorithm, rigorously analyze its theoretical performance, and validate its efficacy through simulations and echo cancellation experiments. The corresponding MATLAB (MathWorks, USA) codes of the simulations are publicly available at: https://github.com/smartXiaoz/GS-RAF.git.

PaperID: 1136,

Authors: Chang Liu, Lixin Tang, Kainan Zhang, Xuanqi Xu

Affiliations: National Frontiers Science Center for Industrial Intelligence and Systems Optimization and the Key Laboratory of Data Analytics and Optimization for Smart Industry (Northeastern University), Ministry of Education, Shenyang, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Northeastern University, Shenyang, China; Liaoning Engineering Laboratory of Data Analytics and Optimization for Smart Industry and Liaoning Key Laboratory of Manufacturing System and Logistics Optimization, Shenyang, China

Title: Multiobjective Evolutionary Learning for Multitask Quality Prediction Problems in Continuous Annealing Process

Abstract:
In industrial production processes, the mechanical properties of materials will directly determine the stability and consistency of product quality. However, detecting the current mechanical property is time-consuming and labor-intensive, and the material quality cannot be controlled in time. To achieve high-quality steel materials, developing a novel intelligent manufacturing technology that can satisfy multitask predictions for material properties has become a new research trend. This article proposes a multiobjective evolutionary learning method based on a two-stage model with topological sparse autoencoder (TSAE) and ensemble learning. For the structure characteristics of a typical autoencoder (AE), a topology-related constraint is incorporated into the loss function of the AE, thus maintaining the global relationship among multistage input data to improve the data reconstruction quality. Then, a sparse representation of the data is added to the AE to achieve dimensionality reduction. Moreover, the extreme gradient boosting (XGBoost) method is applied to predict the mechanical properties of steel materials through collaboration learning mechanisms. To enhance the model accuracy, a multiobjective evolutionary algorithm (MOEA) with a knee solution strategy is used to optimize the network structure and hyperparameters of the two-stage model. Experiments are conducted using real steel production data from a continuous annealing process (CAP). The results verify that the proposed method obtains a higher prediction accuracy than other state-of-the-art methods and can guide practical production and new material design.

PaperID: 1137,

Authors: Sorin Mihai Grigorescu, Mihai V. Zaha

Affiliations: Department of Automation and Information Technology, Robotics, Vision and Control Laboratory (RovisLab), Transilvania University of Brasov, Brasov, Romania

Title: Inverse RL Scene Dynamics Learning for Nonlinear Predictive Control in Autonomous Vehicles

Abstract:
This article introduces the deep learning-based nonlinear model predictive controller with scene dynamics (DL-NMPC-SD) method for autonomous navigation. DL-NMPC-SD uses an a priori nominal vehicle model in combination with a scene dynamics model learned from temporal range sensing information. The scene dynamics model is responsible for estimating the desired vehicle trajectory, as well as to adjust the true system model used by the underlying model predictive controller. We propose to encode the scene dynamics model within the layers of a deep neural network, which acts as a nonlinear approximator for the high-order state space of the operating conditions. The model is learned based on temporal sequences of range-sensing observations and system states, both integrated by an Augmented Memory component. We use inverse reinforcement learning (IRL) and the Bellman optimality principle to train our learning controller with a modified version of the deep Q-learning (DQL) algorithm, enabling us to estimate the desired state trajectory as an optimal action-value function. We have evaluated DL-NMPC-SD against the baseline dynamic window approach (DWA), as well as against two state-of-the-art End2End and RL methods, respectively. The performance has been measured in three experiments: 1) in our GridSim virtual environment; 2) on indoor and outdoor navigation tasks using our RovisLab autonomous mobile test unit (AMTU) platform; and 3) on a full-scale autonomous test vehicle driving on public roads.

PaperID: 1138,

Authors: Yuanxin Lin, Zhiwen Yu, Kaixiang Yang, C. L. Philip Chen

Affiliations: School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China

Title: Ensemble Denoising Autoencoders Based on Broad Learning System for Time-Series Anomaly Detection

Abstract:
Time-series anomaly detection has gained considerable prominence in numerous practical applications across various domains. Nonetheless, the scarcity of labels leads to the neglect of anomalous patterns in data, as well as the inherent complexities and variances in the definitions of temporal anomalies, pose significant challenges for insufficient recognition of anomaly patterns. In addition, real-time anomaly detection poses high demands on low computational cost and model robustness, presenting substantial obstacles for unsupervised time-series anomaly detection. In this article, we propose the data-driven spontaneous perturbation based on the sequence-image strategy and temporal anomaly knowledge enhancement strategy based on artificial anomalous data pairs to enhance the cognition of abnormal knowledge in unsupervised scenarios. On this basis, we propose the denoising autoencoder based on the broad learning system (DBLS-AE), which sufficiently learns the anomalous patterns, achieving efficient anomaly detection with low computational costs. To enhance the robustness in handling complex and diverse temporal anomalies, we further propose the progressive diversity denoising autoencoders based on the broad learning system (PddBLS-AE), which gradually prioritizes challenging samples and constructs a diverse ensemble of DBLS-AEs, markedly improving both performance and robustness. By innovatively utilizing the broad learning system (BLS), PddBLS-AE achieves accelerated training compared with advanced deep learning models. Comprehensive evaluations across multiple datasets robustly substantiate the efficacy of PddBLS-AE.

PaperID: 1139,

Authors: Weiying Xie, Jitao Ma, Tianen Lu, Yunsong Li, Jie Lei, Leyuan Fang, Qian Du

Affiliations: State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, China; College of Electrical and Information Engineering, Hunan University, Changsha, China; Department of Electronic and Computer Engineering, Mississippi State University, Starkville, MS, USA

Title: Distributed Deep Learning With Gradient Compression for Big Remote Sensing Image Interpretation

Abstract:
Fast and reliable interpretation of high-dimensional hyperspectral images (HSIs) can provide great support to remote sensing-based Earth observations. Targets of interest in HSI can be detected using deep neural networks (DNNs) for background learning on an acquired image where the occurrence probability of background samples is much greater than that of targets, accounting for more than 95% of the whole scene. However, there is an increasing gap between theory and feasible application, because of the contradiction between massive hyperspectral data and resource-limited Internet of Things (IoT)/edge device hardware like satellite. To facilitate the deployment of hyperspectral target detection (HTD) in an edge computing environment, we introduce distributed background learning—a decentralized deep learning approach to meet the computing requirements of exploding high-dimensional data and larger DNNs. To address the communication bottleneck caused by gradient exchange during distributed learning, the proposed gradient compression solution, named gradient compression via centroid (GCC), uniquely compresses the most replaceable gradients with redundant information, thereby reducing communication overhead while maintaining accuracy. To illustrate the feasibility of the proposed method, we test it over two very large hyperspectral datasets with a total size of about 3.2 gigabytes (GBs) on a distributed system based on Ring All-reduce. We show that HTD based on distributed background learning outperforms those developed on a single node in terms of speed. Besides, the GCC compresses 50% gradients with only 0.01% loss of target detection accuracy to greatly reduce the communication overhead, surpassing existing gradient compression methods. It is expected that this framework will accelerate the introduction of distributed training on IoT/edge devices.

PaperID: 1140,

Authors: Chuang Wang, Zidong Wang, Hongli Dong, Stanislao Lauria, Weibo Liu, Yiming Wang, Futra Fadzil, Xiaohui Liu

Affiliations: Artificial Intelligence Energy Research Institute and Heilongjiang Provincial Key Laboratory of Networking and Intelligent Control, Northeast Petroleum University, Daqing, China; Department of Computer Science, Brunel University London, Middlesex, Uxbridge, U.K.; School of Psychology and Neuroscience, University of Glasgow, Glasgow, U.K.

Title: Fusionformer: A Novel Adversarial Transformer Utilizing Fusion Attention for Multivariate Anomaly Detection

Abstract:
Multivariate time series forecasting (MTSF) is of significant importance in the enhancement and optimization of real-world applications. The task of MTSF poses substantial challenges due to the unpredictability of temporal patterns and the complexity in modeling the influence of all nonpredictive sequences on the target sequence at different time stages. Recent research has demonstrated the potential held by the Transformer algorithm to augment long-term forecasting capability. However, certain obstacles considerably obstruct the direct application of the Transformer to MTSF, such as an unsuitable embedding method, inadequate consideration of intervariable associations, and the intrinsic restriction of the point-wise objective function. To overcome these challenges, the Fusionformer, an effective Transformer-based forecasting model, is put forth in this article, which is characterized by three distinctive features: 1) the introduction of a segment-wise sequence embedding (SWSE) method allows for the conversion of the input sequence into multiple informative segments; 2) the implementation of a fusion attention mechanism (FAM), designed to capture predominant features across the time dimension and to model intricate intervariable dependencies; and 3) the development of an adversarial learning method, equipped with an auxiliary discriminator, facilitates the learning of data distribution, instead of progressively correcting the prediction error, thus substantially enhancing the MTSF’s accuracy. Furthermore, a Fusionformer-based risk assessment (FRA) method is structured for open-pit mine slope failure early warning issue (SFEW), which aims to prevent potential disasters by accurately predicting future slope movement trends and assessing the probabilities of landslide occurrences. Experimental outcomes validate that Fusionformer outperforms existing forecasting methods, while the FRA framework provides valuable insights and practical guidance for real-world applications.

PaperID: 1141,

Authors: Yajie Lei, Yujie Mo, Luping Ji, Xiaofeng Zhu

Affiliations: School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China

Title: Dual Consistency Constraint-Based Self-Supervised Representation Learning for Heterogeneous Graphs With Missing Attributes

Abstract:
Missing attribute completion for unattributed nodes in heterogeneous graphs has received increasing attention, but previous works still suffer from the following issues: 1) they ignore the noise in the raw attributes, resulting in noise propagation and even inaccurate information generation during attribute completion, thus further influencing the representation learning; and 2) they ignore constraints on unattributed nodes when conducting consistency learning across augmented graph views, resulting in data inconsistency across views. To solve these issues, in this article, we propose a new dual consistency constraint-based self-supervised representation learning method for heterogeneous graphs with missing attributes. Specifically, we first investigate the representation completion and the within-view consistency loss to complete missing information in the representation space, and then, we investigate the cross-view consistency loss to ensure data consistency across views. We further reconstruct the masked data to avoid information loss due to the masking process. As a result, our method effectively filters out noise and inaccurate information by the representation completion process as well as achieves discriminative representation learning for heterogeneous graphs with missing attributes. Experimental results on various downstream tasks verify the superiority of our method.

PaperID: 1142,

Authors: Chaoying Yang, Jie Liu, Yue Wang, Shuangye Yang, Tielin Shi

Affiliations: School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China; National Center of Technology Innovation for Digital Construction, School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan, China; Cummins Inc., Columbus, IN, USA; School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China

Title: A Fast Graph Construction-Driven Rotating Machine Fault Diagnosis Method Using Edge Predictor

Abstract:
Graph-based machine fault diagnosis methods are successfully used in extracting relationship information. However, the heavy computational burden of K-nearest neighbor graph (KNNG) has limited its application. To overcome it, a fast graph construction-driven rotating machine fault diagnosis method using an edge predictor is proposed in this article. The edge predictor, pretrained on an edge connection prediction task, is designed to learn how to get a distance matrix from an initial KNNG (IKNNG). Subsequently, numerous samples are directly input to the edge predictor, obtaining the generated distance matrix and enabling fast KNNG construction. Compared to the traditional KNNG construction method, this approach outputs directly without calculating the distance matrix, significantly reducing the computational burden. The experimental results show that the performance of the proposed method is as well as existing graph data-driven methods. Furthermore, theoretical analysis reveals that the quality of the constructed KNNG is similar to the KNNG obtained by traditional distance matrix calculations, but with a significantly reduced computational load.

PaperID: 1143,

Authors: Xing-Yuan Li, Yuan Xu, Qun-Xiong Zhu, Yan-Lin He

Affiliations: College of Information Science and Technology (CIST), Beijing University of Chemical Technology (BUCT), Beijing, China

Title: Industrial Data Imputation Based on Multiscale Spatiotemporal Information Embedding With Asymmetrical Transformer

Abstract:
In the process industry, the challenge of missing data significantly impairs the efficacy of data-driven process monitoring systems and soft sensor modeling, particularly due to issues, such as unbalanced sampling intervals and sensor malfunctions. Process data, inherently nonlinear and characterized by spatiotemporal coupling, are prone to distribution shifts, which traditional imputation techniques often fail to address comprehensively. To overcome these limitations, this article introduces a novel data imputation framework, termed multiscale spatiotemporal information embedding with asymmetrical Transformer (MSST-Former). This framework reconceptualizes the missing data problem by integrating both global and local perspectives on time series and input variables. The proposed approach initiates with a hybrid 1-D convolutional network module that effectively captures local spatiotemporal correlations and dependencies within the time-series data. This is followed by an encoder-decoder structure, incorporating an inverted Transformer (iTransformer) in conjunction with a Transformer block, to embed series representations with a focus on long-term multivariate correlations and overarching spatiotemporal dependencies. Finally, a multilayer residual network executes the data imputation by leveraging the features embedded at multiple scales. Comparative experiments with several baseline and state-of-the-art models on two real-world industrial datasets verify the superiority and robustness of the proposed MSST-Former.

PaperID: 1144,

Authors: Qianyu Zhao, Song Zhu, Zhen Zhang, Weiwei Luo, Shiping Wen

Affiliations: School of Mathematics, JCAM, China University of Mining and Technology, Xuzhou, China; Faculty of Engineering and Information Technology, Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, NSW, Australia

Title: Multistability of Almost Periodic Solutions for Fuzzy Competitive NNs With Time-Varying Delays

Abstract:
In this article, the multistability problem of almost periodic solutions of fuzzy competitive neural networks (FCNNs) with time-varying delays is investigated. Considering more general activation functions, which are nonmonotonic and nonlinear, and incorporating the almost periodic property of the parameters in FCNNs, sufficient conditions for the multistability of almost periodic solutions are given. \prod _r=1^n(L_r+1) stable almost periodic solutions are obtained, where L_r depends on the geometric features of the activation functions, which enriches and extends the research on multistability in fuzzy systems. Furthermore, the extended domain of attraction based on the original state space is presented. Finally, numerical simulations are provided to verify the conclusions of this article.

PaperID: 1145,

Authors: Wenjun Xiong, Xiaoxiao Wang, Lei Zou, Guanrong Chen

Affiliations: School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China; School of Computing and Artificial Intelligence, Southwestern University of Finance and Economics, Chengdu, China; College of Information Science and Technology and the Engineering Research Center of Digitalized Textile and Fashion Technology, Ministry of Education, Donghua University, Shanghai, China; Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China

Title: Designing Iterative Learning Schemes for Cooperative-Antagonistic Systems With Random Access Communication Protocols

Abstract:
In this article, the design issue for an iterative learning controller is investigated for the cooperative-antagonistic system under the scheduling effects of random access protocol (RAP). In order to reflect the heterogeneous characteristic of the underlying system, the dynamics of each node in the cooperative-antagonistic system are described by a two-time-scale system. For the purpose of avoiding data collisions in signal transmissions, the so-called RAP is introduced to schedule the data exchanges among nodes, where the transmission opportunities of nodes are modeled by a sequence of random variables with certain transition probabilities. Considering that the mutual relationships may be cooperative and also competitive among nodes in many real-world networks, a novel cooperative-antagonistic-based iterative learning controller is developed to handle the tracking problem of the system dynamics. Sufficient conditions are obtained for the design of the controller parameters. Furthermore, the derived results are extended to the case that the transition probabilities for the RAP are partially unknown. Finally, a numerical example is presented to illustrate the effectiveness of the proposed iterative learning control (ILC) scheme.

PaperID: 1146,

Authors: Zizhuo Li, Jie Jiang, Jiayi Ma

Affiliations: Electronic Information School, Wuhan University, Wuhan, China; Tencent, Shenzhen, China

Title: Seed to Prune: A Seeded Graph Neural Network for Two-View Correspondence Learning

Abstract:
We present a simple yet tough-to-beat method dubbed SGNNet, for correspondence learning. Instead of focusing on devising sophisticated geometric extractors to explore the global or local contextual information involving all sparse correspondences as most existing studies have done, which may be biased by heavy outliers, we propose to first delve into elaborate contextual information encoded in several specific reliable correspondences, and later leverage it to achieve per-correspondence representation updating. To this end, the proposed network contains three pivotal modules: 1) dynamic seeding module, which aims to dynamically sample a set of reliable matches from the putative set as seeds to guide the network learning; 2) intraseed attention module (ISAM), which intends to capture the geometrical relations among seed matches and further leverage them to enhance seed features; and 3) dynamic unseeding module, which is designed to sufficiently aggregate favorable contextual information from seed matches and broadcast it back to features of original matches. With all the aforementioned components, the proposed SGNNet is capable of rejecting outliers from putative correspondences effectively. Extensive experiments indicate that our method beats current solid baselines and sets new SOTA scores across multiple domains and datasets. Notably, SGNNet attains an AUC@5° of 56.43% on YFCC100M without RANSAC, surpassing the most cutting-edge model by 4.51 absolute percentage points and exceeding the 55% AUC@5° bar for the first time. Project page: https://github.com/ZizhuoLi/SGNNet.

PaperID: 1147,

Authors: Qianyi Chen, Jiannong Cao, Yu Yang, Wanyu Lin, Sumei Wang, Youwu Wang

Affiliations: Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong; Centre for Learning, Teaching and Technology, The Education University of Hong Kong, Tai Po, Hong Kong; Department of Civil and Environmental Engineering, The Hong Kong Polytechnic University, Hung Hom, Hong Kong

Title: Multistage Graph Convolutional Network With Spatial Attention for Multivariate Time Series Imputation

Abstract:
In multivariate time series (MTS) analysis, data loss is a critical issue that degrades analytical model performance and impairs downstream tasks such as structural health monitoring (SHM) and traffic flow monitoring. In real-world applications, MTS is usually collected by multiple types of sensors, making MTS and correlations between variates heterogeneous. However, existing MTS imputation methods overlook the heterogeneous correlations by manipulating heterogeneous MTS as a homogeneous entity, leading to inaccurate imputation results. Besides, correlations between different data types vary due to ever-changing environmental conditions, forming dynamic correlations in MTS. How to properly learn the hidden correlation from heterogeneous MTS for accurate data imputation remains unresolved. To solve the problem, we propose a multistage graph convolutional network with spatial attention (MSA-GCN). In the first stage, we decompose heterogeneous MTS into several clusters with homogeneous data collected from identical sensor types and learn intracluster correlations. Then, we devise a GCN with spatial attention to explore dynamic intercluster correlations, which is the second stage of MSA-GCN. In the last stage, we decode the learned features from previous stages via stacked convolutional neural networks. We jointly train these three-stage models to predict the missing data in MTS. Leveraging this multistage architecture and spatial attention mechanism makes MSA-GCN effectively learn heterogeneous and dynamic correlations among MTS, resulting in superior imputation performance. We tested MSA-GCN with the monitoring data from a large-span bridge and Wetterstation weather dataset. The results affirm its superiority over baseline models, demonstrating its enhanced accuracy in reducing imputation errors across diverse datasets.

PaperID: 1148,

Authors: Lirong Wu, Haitao Lin, Guojiang Zhao, Cheng Tan, Stan Z. Li

Affiliations: AI Laboratory, Research Center for Industries of the Future, Westlake University, Hangzhou, China

Title: Learning to Model Graph Structural Information on MLPs via Graph Structure Self-Contrasting

Abstract:
Recent years have witnessed great success in handling graph-related tasks with graph neural networks (GNNs). However, most existing GNNs are based on message passing to perform feature aggregation and transformation, where the structural information is explicitly involved in the forward propagation by coupling with node features through graph convolution at each layer. As a result, subtle feature noise or structure perturbation may cause severe error propagation, resulting in extremely poor robustness. In this article, we rethink the roles played by graph structural information in graph data training and identify that message passing is not the only path to modeling structural information. Inspired by this, we propose a simple but effective graph structure self-contrasting (GSSC) framework that learns graph structural information without message passing. The proposed framework is based purely on multilayer perceptrons (MLPs), where the structural information is only implicitly incorporated as prior knowledge to guide the computation of supervision signals, substituting the explicit message propagation as in GNNs. Specifically, it first applies structural sparsification (STR-Sparse) to remove potentially uninformative or noisy edges in the neighborhood, and then performs structural self-contrasting (STR-Contrast) in the sparsified neighborhood to learn robust node representations. Finally, STR-Sparse and self-contrasting are formulated as a bilevel optimization problem and solved in a unified framework. Extensive experiments have qualitatively and quantitatively demonstrated that the GSSC framework can produce truly encouraging performance with better generalization and robustness than other leading competitors. Codes are publicly available at: https://github.com/LirongWu/GSSC.

PaperID: 1149,

Authors: Wenjun Huang, Yunduan Cui, Huiyun Li, Xinyu Wu

Affiliations: Guangdong-Hong Kong-Macao Joint Laboratory of Human Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China

Title: Practical Probabilistic Model-Based Reinforcement Learning by Integrating Dropout Uncertainty and Trajectory Sampling

Abstract:
This article addresses the prediction stability, prediction accuracy, and control capability of the current probabilistic model-based reinforcement learning (MBRL) built on neural networks. A novel approach to dropout-based probabilistic ensembles with trajectory sampling (DPETS) is proposed, where the system uncertainty is stably predicted by combining the Monte Carlo dropout (MC Dropout) and trajectory sampling in one framework. Its loss function is designed to correct the fitting error of neural networks for more accurate prediction of probabilistic models. The state propagation in its policy is extended to filter the aleatoric uncertainty for superior control capability. Evaluated by several Mujoco benchmark control tasks under additional disturbances and one practical robot arm manipulation task, DPETS outperforms related MBRL approaches in both average return and convergence velocity while achieving superior performance than well-known model-free baselines with significant sample efficiency. The open-source code of DPETS is available at https://github.com/mrjun123/DPETS

PaperID: 1150,

Authors: Taewook Kim, Jong-Seok Lee

Affiliations: Manufacturing Intelligence Center, LG Energy Solution, Gwacheon, Republic of Korea; Department of Industrial and Systems Engineering, KAIST, Daejeon, Republic of Korea

Title: Gently Sloped and Extended Classification Margin for Overconfidence Relaxation of Out-of-Distribution Samples

Abstract:
Recently, machine learning models are expected to be capable of detecting out-of-distribution (OOD) samples for safe use. However, the existing OOD detection methods have limitations. Post hoc calibration techniques used for OOD detection during the inference phase suffer from slow inference and low OOD detection accuracy because pretrained classifiers were not originally designed for this task. Training-phase methods require auxiliary data, entail slow training, and result in a decrease in classification accuracy. To address these issues, this article proposes jointly employing discriminative representation learning through angular margin loss and weight regularization during neural network training. Angular margin loss extends the classification margin, whereas weight regularization ensures a gently sloped margin in the learned embedding space. By constructing a classification margin that is both gently sloped and enlarged, the proposed approach mitigates the overconfidence of OOD samples and overcomes the shortcomings of previous methods. The experimental results demonstrate that the proposed method outperforms state-of-the-art detectors in identifying OOD samples without any side effects.

PaperID: 1151,

Authors: Yajie Zhang, Yao Hu, Chengjun Cai, Yu-An Huang, Zhi-An Huang, Kay Chen Tan

Affiliations: Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong, SAR; Research Office, City University of Hong Kong (Dongguan), Dongguan, China; School of Computer Science, Northwestern Polytechnical University, Xi’an, China

Title: Anti-Confounding Hashing: Enhancing Radiological Image Retrieval via Debiased Weighting and Counterfactual Reasoning

Abstract:
Content-based medical image retrieval (CBMIR) enables physicians to make evidence-based diagnoses by retrieving similar medical images and recalling previous cases stored in databases. However, existing CBMIR models are prone to capturing superficial correlations due to confounding factors such as complex host organs and lesions, imaging discrepancies, artifacts, and inconsistent protocols. To address this issue, we propose a plug-and-play anti-confounding hashing (ACH) method, which uses debiased sample weighting and lesion counterfactual reasoning (LCR) to directly capture the natural direct effect (NDE) of lesions on query medical images without bias. The devised debiased weighting (DBW) loss adopts a backdoor adjustment to separate lesions from confounders. To effectively locate salient areas of lesions, we present a coarse-to-fine lesion positioning (C2F-LP) module by counterfactual reasoning. On two real-world radiological image datasets, ACH achieves 0.2%–9% improvement in mean average precision (mAP) over the six state-of-the-art methods, when using code lengths ranging from 8-bit to 32-bit. Its robustness to confounding factors is demonstrated through explainable visual analysis.

PaperID: 1152,

Authors: Zenglin Shi, Jie Jing, Ying Sun, Joo-Hwee Lim, Mengmi Zhang

Affiliations: College of Computing and Data Science, Nanyang Technological University (NTU), Jurong West, Singapore; NTU, Jurong West, Singapore; A*STAR, Fusionopolis, Singapore

Title: Unveiling the Tapestry: The Interplay of Generalization and Forgetting in Continual Learning

Abstract:
In artificial intelligence (AI), generalization refers to a model’s ability to perform well on out-of-distribution data related to the given task, beyond the data it was trained on. For an AI agent to excel, it must also possess the continual learning capability, whereby an agent incrementally learns to perform a sequence of tasks without forgetting the previously acquired knowledge to solve the old tasks. Intuitively, generalization within a task allows the model to learn underlying features that can readily be applied to novel tasks, facilitating quicker learning and enhanced performance in subsequent tasks within a continual learning framework. Conversely, continual learning methods often include mechanisms to mitigate catastrophic forgetting, ensuring that knowledge from earlier tasks is retained. This preservation of knowledge over tasks plays a role in enhancing generalization for the ongoing task at hand. Despite the intuitive appeal of the interplay of both abilities, existing literature on continual learning and generalization has proceeded separately. In the preliminary effort to promote studies that bridge both fields, we first present empirical evidence showing that each of these fields has a mutually positive effect on the other. Next, building upon this finding, we introduce a simple and effective technique known as shape-texture consistency regularization (STCR), which caters to continual learning. STCR learns both shape and texture representations for each task, consequently enhancing generalization and thereby mitigating forgetting. Remarkably, extensive experiments validate that our STCR, can be seamlessly integrated with existing continual learning methods, including replay-free approaches. Its performance surpasses these continual learning methods in isolation or when combined with established generalization techniques by a large margin. Our data and source code are available at https://github.com/ZhangLab-DeepNeuroCogLab/distillation-style-cnn.

PaperID: 1153,

Authors: Shihong Chen, Haicheng Yi, Zhuhong You, Xuequn Shang, Yu-An Huang, Lei Wang, Zhen Wang

Affiliations: School of Computer Science, Northwestern Polytechnical University, Xi’an, Shaanxi, China; School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China

Title: Local-Global Structure-Aware Geometric Equivariant Graph Representation Learning for Predicting Protein-Ligand Binding Affinity

Abstract:
Predicting protein-ligand binding affinities is a critical problem in drug discovery and design. A majority of existing methods fail to accurately characterize and exploit the geometrically invariant structures of protein-ligand complexes for predicting binding affinities. In this study, we propose Geo-protein-ligand binding affinity (PLA), a geometric equivariant graph representation learning framework with local-global structure awareness, to predict binding affinity by capturing the geometric information of protein-ligand complexes. Specifically, the local structural information of 3-D protein-ligand complexes is extracted by using an equivariant graph neural network (EGNN), which iteratively updates node representations while preserving the equivariance of coordinate transformations. Meanwhile, a graph transformer is utilized to capture long-range interactions among atoms, offering a global view that adaptively focuses on complex regions with a significant impact on binding affinities. Furthermore, the multiscale information from the two channels is integrated to enhance the predictive capability of the model. Extensive experimental studies on two benchmark datasets confirm the superior performance of Geo-PLA. Moreover, the visual interpretation of the learned protein-ligand complexes further indicates that our model offers valuable biological insights for virtual screening and drug repositioning.

PaperID: 1154,

Authors: Ye Qian, Xiaoyan Wang, Fuhui Sun, Li Pan

Affiliations: Institute of Cyber Science and Technology, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China; Information Technology Service Center of People’s Court, Beijing, China

Title: Compressing Transfer: Mutual Learning- Empowered Knowledge Distillation for Temporal Knowledge Graph Reasoning

Abstract:
With the widespread application of temporal knowledge graph reasoning (TKGR) models, there is an increasing demand to reduce the memory consumption and enhance the reasoning efficiency. Knowledge distillation (KD) is a classical approach to achieve model compression and acceleration, which has been gradually introduced into the TKGR domain. Through KD, the expertise of a high-capability teacher TKGR model can be transferred to a lightweight student TKGR model. The effective transfer of reasoning knowledge primarily faces two major challenges. The first is how to extract high-quality and high-value knowledge from the teacher to the student to achieve better teaching outcomes. The second is how to encourage the teacher to improve the teaching pattern so that the knowledge is more easily assimilated by the student. Motivated by these challenges, this article firstly designs a soft-label evaluation mechanism to mitigate the problem of anomaly diffusion and knowledge transfer redundancy by measuring the confidence and entropy changes of soft labels, then proposes a mutual learning-empowered KD (MLEMKD) framework for compressing TKGR models. It refines the distribution of knowledge by analyzing the cognitive differences between teacher and student models on training samples, which enhances the acceptability of knowledge. Extensive experiments conducted on four benchmark datasets demonstrate that MLEMKD significantly outperforms existing KD methods.

PaperID: 1155,

Authors: Shutao Chen, Ke Yan, Xuelong Li, Bin Liu

Affiliations: School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China; Institute of Artificial Intelligence (TeleAI) of China Telecom, Beijing, China

Title: Protein Language Pragmatic Analysis and Progressive Transfer Learning for Profiling Peptide-Protein Interactions

Abstract:
Protein complex structural data are growing at an unprecedented pace, but its complexity and diversity pose significant challenges for protein function research. Although deep learning models have been widely used to capture the syntactic structure, word semantics, or semantic meanings of polypeptide and protein sequences, these models often overlook the complex contextual information of sequences. Here, we propose interpretable interaction deep learning (IIDL)-peptide-protein interaction (PepPI), a deep learning model designed to tackle these challenges using data-driven and interpretable pragmatic analysis to profile PepPIs. IIDL-PepPI constructs bidirectional attention modules to represent the contextual information of peptides and proteins, enabling pragmatic analysis. It then adopts a progressive transfer learning framework to simultaneously predict PepPIs and identify binding residues for specific interactions, providing a solution for multilevel in-depth profiling. We validate the performance and robustness of IIDL-PepPI in accurately predicting peptide-protein binary interactions and identifying binding residues compared with the state-of-the-art methods. We further demonstrate the capability of IIDL-PepPI in peptide virtual drug screening and binding affinity assessment, which is expected to advance artificial intelligence-based peptide drug discovery and protein function elucidation.

PaperID: 1156,

Authors: Weiming Li, Xuelong Wu, Shuaishuai Fan, Songjie Wei, Glyn Gowing

Affiliations: School of Information and Electronic Engineering, Shandong Technology and Business University (SDTBU), Yantai, China; School of Computer Science and Engineering, Nanjing University of Science and Technology (NJUST), Nanjing, China; Department of Computer Science, LeTourneau University (LETU), Longview, TX, USA

Title: INGC-GAN: An Implicit Neural-Guided Cycle Generative Approach for Perceptual-Friendly Underwater Image Enhancement

Abstract:
The key requirement for underwater image enhancement (UIE) is to overcome the unpredictable color degradation caused by the underwater environment and light attenuation, while addressing issues, such as color distortion, reduced contrast, and blurring. However, most existing unsupervised methods fail to effectively solve these problems, resulting in a visual disparity in metric-optimal qualitative results compared with undegraded images. In this work, we propose an implicit neural-guided cyclic generative model for UIE tasks, and the bidirectional mapping structure solves the aforementioned ill-posed problem from the perspective of bridging the gap between the metric-favorable and the perceptual-friendly versions. The multiband-aware implicit neural normalization effectively alleviates the degradation distribution. The U-shaped generator simulates human visual attention mechanisms, which enables the aggregation of global coarse-grained and local fine-grained features, and enhances the texture and edge features under the guidance of shallow semantics. The discriminator ensures perception-friendly visual results through a dual-branch structure via appearance and color. Extensive experiments and ablation analyses on the full-reference and nonreference underwater benchmarks demonstrate the superiority of our proposed method. It can restore degraded images in most underwater scenes with good generalization and robustness, and the code is available at https://github.com/SUIEDDM/INGC-GAN.

PaperID: 1157,

Authors: Haotian Liu, Bowen Hu, Yadong Zhou, Yuxun Zhou

Affiliations: Ministry of Education Key Laboratory for Intelligent Networks and Network Security, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University,, Xi’an, Shaanxi, China; Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA

Title: Responding to News Sensitively in Stock Attention Networks via Prompt-Adaptive Trimodal Model

Abstract:
Modern quantitative finance and portfolio-based investment hinge on multimedia news and historical price trends for stock movement prediction. However, prior studies overlook the long tail effect in the feature distribution of stocks, inevitably leading to biased attention and thus degrading the efficiency of utilizing news. To this end, we propose a prompt-adaptive trimodal model (PA-TMM) to overcome the biased stock attention networks and tail feature scarcity problem. In this model, sentiments automatically extracted from trimodal information serve as prompts reflecting the market’s collective mood for other entities, and the interactions among stocks are dynamically inferred for integrating both news- and price-induced movements. By leveraging the movement prompt adaptation (MPA) strategy, our model proactively adapts to the feature-imbalanced phenomenon and converges toward being responsive to the news sensitively. Extensive experiments conducted on real-world datasets consistently demonstrate not only the superiority of the proposed framework over various state-of-the-art baselines, but also its effectiveness, profitability, and robustness in Fintech. The code is accessible at https://github.com/lauht/PA-TMM.

PaperID: 1158,

Authors: Jielong Chen, Yan Pan, Yunong Zhang

Affiliations: School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China

Title: Model-Free and Pseudoinverse-Free Zhang Neurodynamics Scheme for Robotic Arms' Path Tracking Control

Abstract:
Path tracking control of robotic arms is regarded as a fundamental problem in the field of robotics. However, obtaining an accurate model of the robotic arm in practical engineering poses significant challenges. As a result, model-free schemes have become a focus of investigation. In contrast to traditional model-free schemes used for estimating the Jacobian matrix of the robotic arm, in this work, a novel estimator directly for the pseudoinverse (PI) of the Jacobian matrix based on Zhang neurodynamics (ZN) is proposed for the first time. In addition, a novel model-free and PI-free ZN (MFPIFZN) scheme for path tracking control of robotic arms is proposed. The MFPIFZN scheme not only significantly reduces the operation complexity by eliminating the requirement to compute the PI of the Jacobian matrix but also enhances the accuracy by eliminating the potential errors that may arise from the computation of the PI. Theoretical analyses provide guarantees for the convergence and stability of the MFPIFZN scheme. Finally, experimental results conducted on planar four-link and Kinova Jaco2 robotic arms vividly illustrate the excellent performance of the MFPIFZN scheme. Comparison experiments with four other model-free schemes further confirm the superiority of the MFPIFZN scheme.

PaperID: 1159,

Authors: Amin Ghafourian, Huanyi Shui, Devesh Upadhyay, Rajesh Gupta, Dimitar P. Filev, Iman Soltani

Affiliations: Department of Mechanical and Aerospace Engineering, University of California at Davis, Davis, CA, USA; Ford Motor Company, Dearborn, MI, USA

Title: Targeted Collapse Regularized Autoencoder for Anomaly Detection: Black Hole at the Center

Abstract:
Autoencoders have been extensively used in the development of recent anomaly detection techniques. The premise of their application is based on the notion that after training the autoencoder on normal training data, anomalous inputs will exhibit a significant reconstruction error. Consequently, this enables a clear differentiation between normal and anomalous samples. In practice, however, it is observed that autoencoders can generalize beyond the normal class and achieve a small reconstruction error on some of the anomalous samples. To improve the performance, various techniques propose additional components and more sophisticated training procedures. In this work, we propose a remarkably straightforward alternative: instead of adding neural network components, involved computations, and cumbersome training, we complement the reconstruction loss with a computationally light term that regulates the norm of representations in the latent space. The simplicity of our approach minimizes the requirement for hyperparameter tuning and customization for new applications which, paired with its permissive data modality constraint, enhances the potential for successful adoption across a broad range of applications. We test the method on various visual and tabular benchmarks and demonstrate that the technique matches and frequently outperforms more complex alternatives. We further demonstrate that implementing this idea in the context of state-of-the-art methods can further improve their performance. We also provide a theoretical analysis and numerical simulations that help demonstrate the underlying process that unfolds during training and how it helps with anomaly detection. This mitigates the black-box nature of autoencoder-based anomaly detection algorithms and offers an avenue for further investigation of advantages, fail cases, and potential new directions.

PaperID: 1160,

Authors: Botai Yuan, Chen Gong, Dacheng Tao, Jie Yang

Affiliations: Department of Automation, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China; College of Computing and Data Science, Nanyang Technological University, Jurong West, Singapore

Title: Weighted Contrastive Learning With Hard Negative Mining for Positive and Unlabeled Learning

Abstract:
Positive and unlabeled (PU) learning aims to train a suitable classifier simply based on a set of positive data and unlabeled data. The state-of-the-art methods usually formulate PU learning as a cost-sensitive learning problem, in which every unlabeled example is treated as negative with modified class weights. However, existing methods fail to generate high-quality data representations, which brings about negative-prediction preference and performance decline. To overcome this problem, this article proposes a novel algorithm dubbed weighted contrastive learning with hard negative mining for positive and unlabeled learning (termed WConPU), which specifically designs a new prototypical contrastive strategy for gaining discriminative representations for PU learning. Specifically, our proposed WConPU consists of a contrastive learning (CL) module and a classifier training module, which can benefit from each other in an iterative manner. Moreover, a novel weighted contrastive objective function equipped with a prototype-based hard negative mining module is proposed to further enhance the representation quality. Theoretically, we show that our WConPU can be justified from the perspective of the expectation-maximization (EM) algorithm. Empirically, we compare our method with state-of-the-art PU algorithms on a wide range of real-world benchmark datasets, and the experimental results firmly demonstrate the advantage of our proposed method over the existing PU learning approaches.

PaperID: 1161,

Authors: Hao Ma, Yongkang Xu, Lixia Tian

Affiliations: Beijing Key Laboratory of Traffic Data Analysis and Mining and the School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China

Title: RS-MAE: Region-State Masked Autoencoder for Neuropsychiatric Disorder Classifications Based on Resting-State fMRI

Abstract:
Dynamic functional connectivity (DFC) extracted from resting-state functional magnetic resonance imaging (fMRI) has been widely used for neuropsychiatric disorder classifications. However, serious information redundancy within DFC matrices can significantly undermine the performance of classification models based on them. Moreover, traditional deep models cannot adapt well to connectivity-like data, and insufficient training samples further hinder their effective training. In this study, we proposed a novel region-state masked autoencoder (RS-MAE) for proficient representation learning based on DFC matrices and ultimately neuropsychiatric disorder classifications based on fMRI. Three strategies were taken to address the aforementioned limitations. First, masked autoencoder (MAE) was introduced to reduce redundancy within DFC matrices and learn effective representations of human brain function simultaneously. Second, region-state (RS) patch embedding was proposed to replace space-time patch embedding in video MAE to adapt to DFC matrices, in which only topological locality, rather than spatial locality, exists. Third, random state concatenation (RSC) was introduced as a DFC matrix augmentation approach, to alleviate the problem of training sample insufficiency. Neuropsychiatric disorder classifications were attained by fine-tuning the pretrained encoder included in RS-MAE. The performance of the proposed RS-MAE was evaluated on four publicly available datasets, achieving accuracies of 76.32%, 77.25%, 88.87%, and 76.53% for the attention deficit and hyperactivity disorder (ADHD), autism spectrum disorder (ASD), Alzheimer’s disease (AD), and schizophrenia (SCZ) classification tasks, respectively. These results demonstrate the efficacy of the RS-MAE as a proficient deep learning model for neuropsychiatric disorder classifications.

PaperID: 1162,

Authors: Xun Jiang, Xing Xu, Liqing Zhu, Zhe Sun, Andrzej Cichocki, Heng Tao Shen

Affiliations: Center for Future Media, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China; Graduate School of Medicine, Faculty of Health Data Science, Juntendo University, Tokyo, Japan; Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland

Title: Resisting Noise in Pseudo Labels: Audible Video Event Parsing With Evidential Learning

Abstract:
Perceiving temporal events and discriminating their modality types in audible videos, which is also called audio-visual video parsing (AVVP), is becoming a research hotspot in multimodal video understanding. The AVVP task generally follows weakly supervised learning settings, since only video-level labels are provided. Most existing works usually generate modalitywise pseudo labels (PLs) first and then learn to parse audio or visual events from the audible videos. However, this paradigm inevitably results in two defects: 1) the generated PLs for each modality are not fully reliable, which may confuse models if they are adopted as supervision signals for discriminating modalities; and 2) the absence of temporal annotations increases the ambiguities in localizing foregrounds in videos, furtherly causing models prone to being disturbed by noisy labels. To tackle these problems, we propose a novel AVVP framework termed noise-resistant event parsing (NREP), which introduces evidential deep learning (EDL) to overcome the limitations of noisy pseudo supervision. Specifically, our NREP framework consists of three key components: 1) modalitywise evidential learning (MEL) that discriminates the modality-class dependency; 2) temporalwise evidential learning (TEL) that explores meaningful foregrounds; and 3) foreground-background consistency learning (FBCL) for collaborating two evidential learning branches above. Through perceiving meaningful video content and learning evidence for modality dependencies, our method suppresses the disturbance of noise in generated PLs thus achieving remarkable performance with different PL generation strategies. We evaluate our NREP method on two AVVP benchmark datasets and demonstrate it consistently to establish new state-of-the-art. Our implementation codes are available at https://github.com/CFM-MSG/NREP.

PaperID: 1163,

Authors: Dewen Qiao, Liangxin Qian, Songtao Guo, Jun Zhao, Pengzhan Zhou

Affiliations: Key Laboratory of Dependable Service Computing in Cyber Physical Society (Ministry of Education) and the College of Computer Science, Chongqing University, Chongqing, China; College of Computing and Data Science, Nanyang Technological University, Jurong West, Singapore

Title: AMFL: Resource-Efficient Adaptive Metaverse-Based Federated Learning for the Human-Centric Augmented Reality Applications

Abstract:
The emergence of 5G technology has enabled the development of Metaverse applications that provide users with immersive experiences through augmented reality (AR) devices, and the integration of federated learning (FL) with the Metaverse AR (MAR) systems can enable many edge intelligence services in 5G. However, the presence of nonindependent and identically distributed (Non-IID) data across all AR users’ devices, coupled with limited edge communication resources, makes it challenging to achieve human-centric Metaverse-related applications such as target detection or image classification that combine virtual content with real-world. To address these challenges, we propose a novel adaptive resource-efficient Metaverse-based FL (AMFL) algorithm for AR applications that mitigates the negative effect of Non-IID data and reduces resource costs as well as improves the quality of experience (QoE). We first analyze the impact of wireless communication factors such as CPU frequency, bandwidth, and transmission power on FL training performance by a toy example in the MAR systems. Based on this analysis, furthermore, we establish a Non-IID degree, model accuracy, and resource consumption-related QoE maximization problem under given resource budgets, which is a stochastic optimization problem with strongly coupled variables, including bandwidth, CPU frequency, and transmission power. Guided by the theoretical analysis, to solve this issue, AMFL employs a deep reinforcement learning (DRL)-based method to adaptively allocate resources. Numerical results demonstrate that AMFL can significantly improve the QoE by up to 30.28%, and reduce communication round and energy costs by up to 81.08% and 72.20%, respectively, even under the worst Non-IID case, compared to benchmarks.

PaperID: 1164,

Authors: Maosheng Gao, Juan Yu, Salah Kamel, Zhifang Yang

Affiliations: State Key Laboratory of Power Transmission Equipment Technology, College of Electrical Engineering, Chongqing University, Chongqing, China; Department of Electrical Engineering, Faculty of Engineering, Aswan University, Aswan, Egypt

Title: A Trustable Data-Driven Optimal Power Flow Computational Method With Robust Generalization Ability

Abstract:
Data-driven optimal power flow (OPF) approach has been a research focus in recent years. However, the current data-driven OPF approaches face the following difficulties: 1) the data-driven solutions may have large deviations and are not trustable, facing out-of-distribution (OOD) samples and 2) it is hard to judge whether the solutions of the data-driven approach can be trusted. To handle these problems, this article first improves the generalization ability of the data-driven OPF method by embedding the inherent pattern of the OPF solution into the data-driven learning process. As an optimization problem, the OPF solution has certain fixed patterns that are not influenced by the distribution of samples. For example, the load balance constraints should always be satisfied. This leads to an inherent requirement of output vectors, which can be utilized to guide the learning process of the data-driven OPF method. Second, an adaptability judging method based on the decoder neural network is proposed to determine whether the data-driven OPF approach can produce trustable solutions. By measuring the decoding error from latent features to input features, the adaptability of neural networks for input samples could be accurately judged. According to extensive results on various systems, the proposed method can improve the calculation accuracy of OOD data by an average of 30.19% compared with state-of-the-art methods. With the adaptability judgment method, the accuracy of the data-driven approach can achieve higher than 98% for OOD data, whereas the accuracy of other methods ranges from 34.08% to 94.50% on the same set of OOD test data.

PaperID: 1165,

Authors: Zhihao Wu, Chengliang Liu, Jie Wen, Yong Xu, Jian Yang, Xuelong Li

Affiliations: School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China; Department of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China; School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi’an, Shaanxi, China

Title: Spatial Continuity and Nonequal Importance in Salient Object Detection With Image-Category Supervision

Abstract:
Due to the inefficiency of pixel-level annotations, weakly supervised salient object detection with image-category labels (WSSOD) has been receiving increasing attention. Previous works usually endeavor to generate high-quality pseudolabels to train the detectors in a fully supervised manner. However, we find that the detection performance is often limited by two types of noise contained in pseudolabels: 1) holes inside the object or at the edge and outliers in the background and 2) missing object portions and redundant surrounding regions. To mitigate the adverse effects caused by them, we propose local pixel correction (LPC) and key pixel attention (KPA), respectively, based on two key properties of desirable pseudolabels: 1) spatial continuity, meaning an object region consists of a cluster of adjacent points; and 2) nonequal importance, meaning pixels have different importance for training. Specifically, LPC fills holes and filters out outliers based on summary statistics of the neighborhood as well as its size. KPA directs the focus of training toward ambiguous pixels in multiple pseudolabels to discover more accurate saliency cues. To evaluate the effectiveness of our method, we design a simple yet strong baseline we call weakly supervised saliency detector with Transformer (WSSDT) and unify the proposed modules into WSSDT. Extensive experiments on five datasets demonstrate that our method significantly improves the baseline and outperforms all existing congeneric methods. Moreover, we establish the first benchmark to evaluate WSSOD robustness. The results show that our method can improve detection robustness as well. The code and robustness benchmark are available at https://github.com/Horatio9702/SCNI.

PaperID: 1166,

Authors: Xinzhi Wang, Yudong Chang, Luyao Kou, Xiangfeng Luo, Hui Zhang

Affiliations: School of Computer Engineering and Science, Shanghai University, Shanghai, China; National Academy of Governance, Emergency Management Research Institute, Beijing, China; School of Safety Science and the Institute of Public Safety Research, Tsinghua University, Beijing, China

Title: Public Behavior and Emotion Correlation Mining Driven by Aspect From News Corpus

Abstract:
Emotion motivates behavior. Investigating the correlation between behavior and emotion, an often overlooked perspective, plays a significant role in uncovering the underlying motives behind behaviors and the intrinsic cause-effects of social events. This article proposes a methodology for mining the correlation between public behavior and emotion using daily news data. Initially, aspect-emotion–reaction (A-E-R) triplets are extracted and generalized, encompassing both explicit and implicit patterns. Then, a knowledge representation model based on hypothetical context (KRHC) with a self-reflection mechanism is proposed to uncover implicit relationships between emotion and behavior through attention mechanisms. By combining rule-based methods for explicit relationships and deep learning for implicit ones, an understanding of emotion-behavior patterns is achieved. In this study, the behaviors are divided into three categories of prosocial, antisocial, and normal behaviors with ten secondary types. Seven categories of emotions are adopted. The proposed deep learning model KRHC is validated on A-E-R datasets and public KINSHIP datasets. The experiment results are concluded; for example, when “fear,” “sad,” and “surprise” emotions appear, it drives behavior “panic” with most probability. These findings could provide insights for both human-computer interaction and public safety management applications.

PaperID: 1167,

Authors: Ao Jin, Fan Zhang, Ganghui Shen, Bingxiao Huang, Panfeng Huang

Affiliations: Shaanxi Province Innovation Team of Intelligent Robotic Technology, Xi’an, China

Title: Learning-Based Modeling and Predictive Control for Unknown Nonlinear System With Stability Guarantees

Abstract:
This work focuses on the safety of learning-based control for unknown nonlinear system, considering the stability of learned dynamics and modeling mismatch between the learned dynamics and the true one. A learning-based scheme imposing the stability constraint is proposed in this work for modeling and stable control of unknown nonlinear system. Specifically, a linear representation of unknown nonlinear dynamics is established using the Koopman theory. Then, a deep learning approach is utilized to approximate embedding functions of Koopman operator for unknown system. For the safe manipulation of proposed scheme in the real-world applications, a stable constraint of learned dynamics and Lipschitz constraint of embedding functions are considered for learning a stable model for prediction and control. Moreover, a robust predictive control scheme is adopted to eliminate the effect of modeling mismatch between the learned dynamics and the true one, such that the stabilization of unknown nonlinear system is achieved. Finally, the effectiveness of proposed scheme is demonstrated on the tethered space robot (TSR) with unknown nonlinear dynamics.

PaperID: 1168,

Authors: Zhijun Zhang, Zhongwen Cao, Xingru Li

Affiliations: School of Automation Science and Engineering, South China University of Technology, Guangzhou, China

Title: Neural Dynamic Fault-Tolerant Scheme for Collaborative Motion Planning of Dual-Redundant Robot Manipulators

Abstract:
To avoid the task failure caused by joint breakdown during the collaborative motion planning of dual-redundant robot manipulators, a neural dynamic fault-tolerant (NDFT) scheme is proposed and applied. To do so, a joint fault-tolerant strategy is first designed, and it is formulated as a time-varying equality constraint. Second, combining the robot position and orientation control, joint limit constraint, joint fault-tolerant equality constraint, and considering the repetitive motion optimization criterion, a fault-tolerant framework for the dual-redundant robot manipulators based on quadratic programming (QP) is constructed. Then, a varying-parameter recurrent neural network (VP-RNN) is designed to solve the QP issue. The fault-tolerant framework and the VP-RNN constitute NDFT scheme. With the NDFT scheme, the impact of faulty joints on the whole system can be remedied by healthy joints, thereby the end-effectors of the robot can complete the given end-effector task. Finally, computer simulations and physical experiments are implemented to verify the availability, physical realizability, and accuracy of the proposed NDFT scheme in the collaborative execution of end-effector tasks. Comparative experimental results with conventional repetitive motion planning schemes based on neural networks show higher accuracy and smaller joint angle drift.

PaperID: 1169,

Authors: Fan Fan, Yilei Shi, Tobias Guggemos, Xiao Xiang Zhu

Affiliations: Chair of Data Science in Earth Observation (SiPEO), Technical University of Munich (TUM), Munich, Germany; School of Engineering and Design, TUM, Munich, Germany; IMF, DLR, Weßling, Germany; SiPEO, TUM, Munich, Germany

Title: Hybrid Quantum Deep Learning With Superpixel Encoding for Earth Observation Data Classification

Abstract:
Earth observation (EO) has inevitably entered the Big Data era. The computational challenge associated with analyzing large EO data using sophisticated deep learning models has become a significant bottleneck. To address this challenge, there has been a growing interest in exploring quantum computing as a potential solution. However, the process of encoding EO data into quantum states for analysis potentially undermines the efficiency advantages gained from quantum computing. This article introduces a hybrid quantum deep learning model that effectively encodes and analyzes EO data for classification tasks. The proposed model uses an efficient encoding approach called superpixel encoding, which reduces the quantum resources required for large image representation by incorporating the concept of superpixels. To validate the effectiveness of our model, we conducted evaluations on multiple EO benchmarks, including Overhead-MNIST, So2Sat LCZ42, and SAT-6 datasets. In addition, we studied the impacts of different interaction gates and measurements on classification performance to guide model optimization. The experimental results suggest the validity of our model for accurate classification of EO data. Our models and code are available on https://github.com/zhu-xlab/SEQNN.

PaperID: 1170,

Authors: Yuanyang Zhu, Zhi Wang, Yuanheng Zhu, Chunlin Chen, Dongbin Zhao

Affiliations: Department of Control Science and Intelligence Engineering, School of Management and Engineering, and the Laboratory of Data Intelligence and Interdisciplinary Innovation, Nanjing University, Nanjing, China; Department of Control Science and Intelligence Engineering, School of Management and Engineering, and the Research Center for Novel Technology of Intelligent Equipment, Nanjing University, Nanjing, China; State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China

Title: Discretizing Continuous Action Space With Unimodal Probability Distributions for On-Policy Reinforcement Learning

Abstract:
For on-policy reinforcement learning (RL), discretizing action space for continuous control can easily express multiple modes and is straightforward to optimize. However, without considering the inherent ordering between the discrete atomic actions, the explosion in the number of discrete actions can possess undesired properties and induce a higher variance for the policy gradient (PG) estimator. In this article, we introduce a straightforward architecture that addresses this issue by constraining the discrete policy to be unimodal using Poisson probability distributions. This unimodal architecture can better leverage the continuity in the underlying continuous action space using explicit unimodal probability distributions. We conduct extensive experiments to show that the discrete policy with the unimodal probability distribution provides significantly faster convergence and higher performance for on-policy RL algorithms in challenging control tasks, especially in highly complex tasks such as Humanoid. We provide theoretical analysis on the variance of the PG estimator, which suggests that our attentively designed unimodal discrete policy can retain a lower variance and yield a stable learning process.

PaperID: 1171,

Authors: Junyi Liu, Dawei Cheng, Changjun Jiang

Affiliations: School of Computing, National University of Singapore, Queenstown, Singapore; Department of Computer Science and Technology, Tongji University, Shanghai, China

Title: Preferential Selective-Aware Graph Neural Network for Preventing Attacks in Interbank Credit Rating

Abstract:
Accurately assessing and forecasting bank credit ratings at an early stage is vitally important for a healthy financial environment and sustainable economic development. However, the evaluation process faces challenges due to individual attacks on the rating model. Some participants may provide manipulated information in an attempt to undermine the rating model and secure higher scores, further complicating the evaluation process. Therefore, we propose a novel approach called the preferential selective-aware graph neural network (PSAGNN) model to simultaneously defend against feature and structural nontarget poisoning attacks on Interbank credit ratings. In particular, the model establishes a phased optimization approach combined with biased perturbation and explores the Interbank preferences and scale-free nature of networks, to adaptively prioritize the poisoning training data and simulate a clean graph. Finally, we apply a weighted penalty on the opposition function to optimize the model so that the model can distinguish between attackers. Extensive experiments on our newly collected Interbank quarter dataset and case studies demonstrate the superior performance of our proposed approach in preventing credit rating attacks compared to state-of-the-art baselines.

PaperID: 1172,

Authors: Si-Sheng Young, Chia-Hsiang Lin, Zi-Chao Leng

Affiliations: Department of Electrical Engineering, Institute of Computer and Communication Engineering, National Cheng Kung University, Tainan, Taiwan; Department of Electrical Engineering, Miin Wu School of Computing, National Cheng Kung University, Tainan, Taiwan

Title: Unsupervised Abundance Matrix Reconstruction Transformer-Guided Fractional Attention Mechanism for Hyperspectral Anomaly Detection

Abstract:
Hyperspectral anomaly detection (HAD), a challenging inverse problem, has found numerous scientific applications. Although extant HAD algorithms have achieved remarkable results, there are still several issues remained unresolved: 1) low spatial resolution (and spectral redundancy) in typical hyperspectral images prevents effectively distinguishing the abnormal pixels from those normal ones and 2) the reconstruction from existing residual-based frameworks would not completely remove anomaly effects, making the detection solely from the residual impractical. In this article, we propose a novel HAD method, termed transformer-guided fractional attention within the abundance domain (TGFA-AD), which substitutes raw input image with the abundance matrix obtained via blind source separation (BSS). First, the proposed abundance spatial-channel reconstruction transformer (ASCR-Former) is customized for rebuilding the abundance matrix. According to the image self-similarity, the abundance is patch-wisely encoded with class (CLS) tokens. The transformer encoders intensify the spatial and channel characteristics between tokens for reconstructing the abundance, followed by deriving the initial detection from the abundance residual matrix. Second, a novel fractional abundance attention (FAA) mechanism is proposed, where the attention weights coming from a specific linear combination of abundances are guided by the initial detection with convex Q -quadratic norm. Finally, the fractional convolution is incorporated to fuse the abundance and residual into the fractional feature for yielding the final detection result. Real data experiments quantitatively and qualitatively exhibit the state-of-the-art performance of TGFA-AD.

PaperID: 1173,

Authors: Yushi Li, George Baciu, Rong Chen, Chenhui Li, Hao Wang, Yushan Pan, Weiping Ding

Affiliations: School of Advanced Technologies, Xi’an Jiaotong-Liverpool University, Suzhou, China; Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong; College of Information Science and Technology, Dalian Maritime University, Dalian, China; School of Computer Science and Technology, East China Normal University, Shanghai, China; School of Cyber Engineering, Xidian University, Xi’an, China; School of Artificial Intelligence and Computer Science, Nantong University, Nantong, China

Title: DSANet: Dynamic and Structure-Aware GCN for Sparse and Incomplete Point Cloud Learning

Abstract:
Learning 3-D structures from incomplete point clouds with extreme sparsity and random distributions is a challenge since it is difficult to infer topological connectivity and structural details from fragmentary representations. Missing large portions of informative structures further aggravates this problem. To overcome this, a novel graph convolutional network (GCN) called dynamic and structure-aware NETwork (DSANet) is presented in this article. This framework is formulated based on a pyramidic auto-encoder (AE) architecture to address accurate structure reconstruction on the sparse and incomplete point clouds. A PointNet-like neural network is applied as the encoder to efficiently aggregate the global representations of coarse point clouds. On the decoder side, we design a dynamic graph learning module with a structure-aware attention (SAA) to take advantage of the topology relationships maintained in the dynamic latent graph. Relying on gradually unfolding the extracted representation into a sequence of graphs, DSANet is able to reconstruct complicated point clouds with rich and descriptive details. To associate analogous structure awareness with semantic estimation, we further propose a mechanism, called structure similarity assessment (SSA). This method allows our model to surmise semantic homogeneity in an unsupervised manner. Finally, we optimize the proposed model by minimizing a new distortion-aware objective end-to-end. Extensive qualitative and quantitative experiments demonstrate the impressive performance of our model in reconstructing unbroken 3-D shapes from deficient point clouds and preserving semantic relationships among different regional structures.

PaperID: 1174,

Authors: Jung-Min Yang, Chun-Kyung Lee, Kwang-Hyun Cho

Affiliations: School of Electronics Engineering, Kyungpook National University, Daegu, Republic of Korea; Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea

Title: Output Stabilizing Control of Complex Biological Networks Based on Boolean Algebra Analysis

Abstract:
Output stabilizing control of biological systems is of utmost importance in systems biology since key phenotypes of biological networks are often encoded by a small subset of their phenotypic marker nodes. This study addresses the challenge of output stabilizing control for complex biological systems modeled by Boolean networks (BNs). The objective is to identify a set of constant control inputs capable of driving the BN toward a desirable long-term behavior with respect to specified output nodes. Leveraging the algebraic properties of Boolean logic, we develop a novel control algorithm that reformulates the output stabilizing control problem into a simple graph theoretic problem involving auxiliary BNs, the scale of which significantly decreases compared to the original BN. The proposed method ensures superiority over previous results in terms of both the number of control inputs and computational loads, since it searches for the solution within the reduced BNs while retaining essential structures needed for output stabilization. The efficacy of the proposed control scheme is demonstrated through extensive numerical experiments with complex random BNs and real biological networks. To support the reproducible research initiative, detailed results of numerical experiments are provided in the supplementary material, and all the implementation codes are made accessible at https://github.com/choonlog/OutputStabilization.

PaperID: 1175,

Authors: Shuyin Xia, Xiaoyu Lian, Guoyin Wang, Xinbo Gao, Jiancu Chen, Xiaoli Peng

Affiliations: Chongqing Key Laboratory of Computational Intelligence, Key Laboratory of Big Data Intelligent Computing, Key Laboratory of Cyberspace Big Data Intelligent Security, Ministry of Education, Chongqing University of Posts and Telecommunications, Chongqing, China

Title: GBSVM: An Efficient and Robust Support Vector Machine Framework via Granular-Ball Computing

Abstract:
Granular-ball support vector machine (GBSVM) is a significant attempt to construct a classifier using the coarse-to-fine granularity of a granular ball as input, rather than a single data point. It is the first classifier whose input contains no points. However, the existing model has some errors, and its dual model has not been derived. As a result, the current algorithm cannot be implemented or applied. To address these problems, we fix the errors of the original model of the existing GBSVM and derive its dual model. Furthermore, a particle swarm optimization (PSO) algorithm is designed to solve the dual problem. The sequential minimal optimization (SMO) algorithm is also carefully designed to solve the dual problem. The latter is faster and more stable. The experimental results on the UCI benchmark datasets demonstrate that GBSVM is more robust and efficient. All codes have been released in the open source library available at: http://www.cquptshuyinxia.com/GBSVM.html or https://github.com/syxiaa/GBSVM.

PaperID: 1176,

Authors: Yejiang Yang, Tao Wang, Weiming Xiang

Affiliations: School of Electrical Engineering, Southwest Jiaotong University, Chengdu, China; School of Computer and Cyber Sciences, Augusta University, Augusta, GA, USA

Title: A Distributed Neural Hybrid System Learning Framework in Modeling Complex Dynamical Systems

Abstract:
In this article, a distributed neural network modeling framework including a novel neural hybrid system model is proposed for enhancing the scalability of neural network models in modeling dynamical systems. First, high-dimensional training data samples will be mapped to a low-dimensional feature space through the principal component analysis (PCA) featuring process. Following that, the feature space is bisected into multiple partitions based on the variation of the Shannon entropy under the maximum entropy (ME) bisecting process. The behavior of subsystems in the prespecified state space partitions will then be approximated using a group of shallow neural networks (SNNs) known as extreme learning machines (ELMs), and then it can further simplify the model by merging the redundant lattices based on their training error performance. The proposed modeling framework can handle high-dimensional dynamical system modeling problems with the advantages of reducing model complexity and improving model performance in training and verification. To demonstrate the effectiveness of the proposed modeling framework, examples of modeling the LASA dataset and an industrial robot are presented.

PaperID: 1177,

Authors: Qiming Liu, Guangzhan Wang, Zhe Liu, Hesheng Wang

Affiliations: Department of Automation, Shanghai Jiao Tong University, Shanghai, China; School of Software, Shanghai Jiao Tong University, Shanghai, China; MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China; Department of Automation, Key Laboratory of System Control and Information Processing of Ministry of Education, Key Laboratory of Marine Intelligent Equipment and System of Ministry of Education, Shanghai Engineering Research Center of Intelligent Control and Management, Shanghai Jiao Tong University, Shanghai, China

Title: Visuomotor Navigation for Embodied Robots With Spatial Memory and Semantic Reasoning Cognition

Abstract:
The fundamental prerequisite for embodied agents to make intelligent decisions lies in autonomous cognition. Typically, agents optimize decision-making by leveraging extensive spatiotemporal information from episodic memory. Concurrently, they utilize long-term experience for task reasoning and foster conscious behavioral tendencies. However, due to the significant disparities in the heterogeneous modalities of these two cognitive abilities, existing literature falls short in designing effective coupling mechanisms, thus failing to endow robots with comprehensive intelligence. This article introduces a navigation framework, the hierarchical topology-semantic cognitive navigation (HTSCN), which seamlessly integrates both memory and reasoning abilities within a singular end-to-end system. Specifically, we represent memory and reasoning abilities with a topological map and a semantic relation graph, respectively, within a unified dual-layer graph structure. Additionally, we incorporate a neural-based cognition extraction process to capture cross-modal relationships between hierarchical graphs. HTSCN forges a link between two different cognitive modalities, thus further enhancing decision-making performance and the overall level of intelligence. Experimental results demonstrate that in comparison to existing cognitive structures, HTSCN significantly enhances the performance and path efficiency of image-goal navigation. Visualization and interpretability experiments further corroborate the promoting role of memory, reasoning, as well as their online learned relationships, on intelligent behavioral patterns. Furthermore, we deploy HTSCN in real-world scenarios to further verify its feasibility and adaptability.

PaperID: 1178,

Authors: Luming Zhang, Guifeng Wang, Ming Chen, Ling Shao

Affiliations: Intelligent Manufacturing College, Jinhua University of Vocational Technology, Jinhua, China; Inception Institute of Artificial Intelligence, Abu Dhabi, United Arab Emirates

Title: A UHD Aerial Photograph Categorization System by Learning a Noise-Tolerant Topology Kernel

Abstract:
With thousands of observation satellites orbiting the Earth, massive-scale ultrahigh-definition (UHD) images are captured daily, covering vast areas of land, often extending across millions of square kilometers. These images commonly feature a wide range of ground objects, such as vehicles and rooftops, numbering from tens to hundreds. The ability to categorize the diverse types of objects in UHD aerial photographs is essential for a variety of real-world applications, including intelligent transportation systems, disaster prediction, and precision agriculture. In this study, we introduce a novel framework for categorizing UHD aerial photographs. The core of our approach is to represent the spatial configurations of ground objects topologically and encode these layouts using a binary matrix factorization (MF) technique that robustly addresses the challenge of noisy image-level labels. Specifically, for each UHD aerial photograph, we identify visually and semantically important object patches. These patches are then connected spatially to form graphlets, small graphs that capture the layout and relations between adjacent objects. To enhance the understanding of these graphlets, we propose a binary MF approach that captures their semantic content. The method integrates four key components: 1) learning binary hash codes; 2) refining noisy labels; 3) incorporating deep image-level semantics; and 4) adaptively updating the data graph. The binary MF is solved iteratively, with each graphlet being transformed into a set of discrete hash codes. These hash codes, which represent the spatial and semantic information of the graphlets, are subsequently encoded into a feature vector using a kernel machine, enabling multilabel categorization of the aerial photographs. For validation, we compiled a large-scale dataset of UHD aerial photographs, sourced from 100 of the top-ranked cities worldwide. Experimental results demonstrate that: 1) our method excels in learning categorization models from imperfect labels and 2) the integration of the four proposed attributes enables effective encoding of the graphlets into hash codes, providing a powerful representation of the UHD aerial photographs.

PaperID: 1179,

Authors: Conghua Wang, Yuan Zhang, Jinde Cao, Zhichun Yang

Affiliations: National Center for Applied Mathematics, Chongqing Normal University, Chongqing, China; Department of Mathematics, Yuxi Normal University, Yuxi, China; School of Mathematics, Southeast University, Nanjing, China; School of Mathematical Sciences, Chongqing Normal University, Chongqing, China

Title: Oscillatory Dynamics and Regulatory Mechanisms of the p53-Per2 Network in DNA-Damaged Cells

Abstract:
Circadian rhythm disruptions are linked to increased cancer risk and unfavorable prognosis in patients with cancer, highlighting the critical role of the interplay between the circadian rhythm factor Per2 and the tumor suppressor p53. This brief presents, for the first time, a mathematical model to capture the dynamics of the p53–Per2 network in DNA-damaged cells. The model accurately describes the different stages of the process from unstressed cells to cellular repair and finally to apoptosis as the degree of DNA damage increases. Furthermore, it is found that increasing the inhibition of Per2 by p53 leads to the phase advance of Per2 oscillations, whereas by modulating the inhibition of Mdm2 by Per2, an independent amplitude modulation of active p53 can be achieved, with the range of modulation increasing with the strength of the inhibition. Moreover, the effects of time delays inherent in the transcription, translation, and nuclear translocation of Per2 on the circadian rhythm of DNA-damaged cells are quantitatively investigated by theoretical analyses. It is found that time delays can induce stable oscillations through a supercritical Hopf bifurcation, thereby maintaining the circadian function of DNA-damaged cells and enhancing their DNA-damage repair capacity. This study proposes new insights into cancer prevention and treatment strategies.

PaperID: 1180,

Authors: Yaoming Cai, Zijia Zhang, Xiaobo Liu, Yao Ding, Fei Li, Jinhua Tan

Affiliations: School of Information Engineering, Zhongnan University of Economics and Law, Wuhan, China; School of Artificial Intelligence, Hubei University, Wuhan, China; School of Automation, China University of Geosciences, Wuhan, China; Xi’an Research Institute of High Technology, Xi’an, Shaanxi, China

Title: Learning Unified Anchor Graph for Joint Clustering of Hyperspectral and LiDAR Data

Abstract:
The joint clustering of multimodal remote sensing (RS) data poses a critical and challenging task in Earth observation. Although recent advances in multiview subspace clustering have shown remarkable success, existing methods become computationally prohibitive when dealing with large-scale RS datasets. Moreover, they neglect intrinsic nonlinear and spatial interdependencies among heterogeneous RS data and lack generalization ability for out-of-sample data, thereby restricting their applicability. This article introduces a novel unified framework called anchor-based multiview kernel subspace clustering with spatial regularization (AMKSC). It learns a scalable anchor graph in the kernel space, leveraging contributions from each modality instead of seeking a consensus full graph in the feature space. To ensure spatial consistency, we incorporate a spatial smoothing operation into the formulation. The method is efficiently solved using an alternating optimization strategy, and we provide theoretical evidence of its scalability with linear computational complexity. Furthermore, an out-of-sample extension of AMKSC based on multiview collaborative representation-based classification is introduced, enabling the handling of larger datasets and unseen instances. Extensive experiments on three real heterogeneous RS datasets confirm the superiority of our proposed approach over state-of-the-art methods in terms of clustering performance and time efficiency. The source code is available at https://github.com/AngryCai/AMKSC.

PaperID: 1181,

Authors: Xuejie Que, Zhenlei Wang, Yanqi Zhang, Guanghao Su

Affiliations: State Key Laboratory of Industrial Control Technology and the Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China

Title: Two-Time Scale Tracking Control of Flexible Robots With Primal-Dual Inverse Reinforcement Learning

Abstract:
Flexible robots (FRs) are generally designed to be lightweight to achieve rapid motion. However, accompanying vibrations and modeling errors influence tracking control, especially in situations involving reference signal loss. This article develops a two-time scale primal-dual inverse reinforcement learning (PD-IRL) framework for FRs to perform tracking tasks with incomplete reference signals. First, consider the admissible policy as a nonconvex input constraint to guarantee the stable operation of the equipment. Then, FRs imitate the demonstration behaviors of an expert, including both rigid and flexible motions, to achieve a balance in tracking speed and vibration suppression. During the imitation process, nonconvex optimization problems of FRs are transformed into corresponding dual problems to obtain the global optimal policy. Moreover, employing multiple linearly independent paths to explore the state space simultaneously can improve convergence speed. Convergence and stability are studied rigorously. Finally, simulations and comparisons show the effectiveness and superiority of the proposed method.

PaperID: 1182,

Authors: Aurele Tohokantche Gnanha, Wenming Cao, Xudong Mao, Si Wu, Hau-San Wong, Qing Li

Affiliations: Huawei Noah’s Ark Laboratory, Hong Kong, SAR, China; School of Mathematics and Statistics, Chongqing Jiaotong University, Chongqing, China; School of Artificial Intelligence, Sun Yat-sen University, Guangzhou, China; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China; Department of Computer Science, City University of Hong Kong, Hong Kong, SAR, China; Department of Computing, The Hong Kong Polytechnic University, Hong Kong, SAR, China

Title: EviD-GAN: Improving GAN With an Infinite Set of Discriminators at Negligible Cost

Abstract:
Ensemble learning improves the capability of convolutional neural network (CNN)-based discriminators, whose performance is crucial to the quality of generated samples in generative adversarial network (GAN). However, this learning strategy results in a significant increase in the number of parameters along with computational overhead. Meanwhile, the suitable number of discriminators required to enhance GAN performance is still being investigated. To mitigate these issues, we propose an evidential discriminator for GAN (EviD-GAN)—code is available at https://github.com/Tohokantche/EviD-GAN—to learn both the model (epistemic) and data (aleatoric) uncertainties. Specifically, by analyzing three GAN models, the relation between the distribution of discriminator’s output and the generator performance has been discovered yielding a general formulation of GAN framework. With the above analysis, the evidential discriminator learns the degree of aleatoric and epistemic uncertainties via imposing a higher order distribution constraint over the likelihood as expressed in the discriminator’s output. This constraint can learn an ensemble of likelihood functions corresponding to an infinite set of discriminators. Thus, EviD-GAN aggregates knowledge through the ensemble learning of discriminator that allows the generator to benefit from an informative gradient flow at a negligible computational cost. Furthermore, inspired by the gradient direction in maximum mean discrepancy (MMD)-repulsive GAN, we design an asymmetric regularization scheme for EviD-GAN. Unlike MMD-repulsive GAN that performs at the distribution level, our regularization scheme is based on a pairwise loss function, performs at the sample level, and is characterized by an asymmetric behavior during the training of generator and discriminator. Experimental results show that the proposed evidential discriminator is cost-effective, consistently improves GAN in terms of Frechet inception distance (FID) and inception score (IS), and performs better than other competing models that use multiple discriminators.

PaperID: 1183,

Authors: Zepeng Yan, Wen Sun, Wanli Guo, Biwen Li, Shiping Wen, Jinde Cao

Affiliations: Huangshi Key Laboratory of Metaverse and Virtual Simulation, School of Mathematics and Statistics, Hubei Normal University, Huangshi, Hubei, China; School of Mathematics and Physics, China University of Geosciences, Wuhan, China; Australian Artificial Intelligence Institute, University of Technology Sydney, Ultimo, NSW, Australia; School of Mathematics, Southeast University, Nanjing, China

Title: Complete Stability of Delayed Recurrent Neural Networks With New Wave-Type Activation Functions

Abstract:
Activation functions have a significant effect on the dynamics of neural networks (NNs). This study proposes new nonmonotonic wave-type activation functions and examines the complete stability of delayed recurrent NNs (DRNNs) with these activation functions. Using the geometrical properties of the wave-type activation function and subsequent iteration scheme, sufficient conditions are provided to ensure that a DRNN with n neurons has exactly (2m + 3)^n equilibria, where (m + 2)^n equilibria are locally exponentially stable, the remainder (2m + 3)^n - (m + 2)^n equilibria are unstable, and a positive integer m is related to wave-type activation functions. Furthermore, the DRNN with the proposed activation function is completely stable. Compared with the previous literature, the total number of equilibria and the stable equilibria significantly increase, thereby enhancing the memory storage capacity of DRNN. Finally, several examples are presented to demonstrate our proposed results.

PaperID: 1184,

Authors: Jiawei Li, Yuanfei Deng, Yixiu Qin, Shun Mao, Yuncheng Jiang

Affiliations: School of Computer Science, South China Normal University, Guangzhou, China

Title: Dual-Channel Adaptive Scale Hypergraph Encoders With Cross-View Contrastive Learning for Knowledge Tracing

Abstract:
Knowledge tracing (KT) refers to predicting learners’ performance in the future according to their historical responses, which has become an essential task in intelligent tutoring systems. Most deep learning-based methods usually model the learners’ knowledge states via recurrent neural networks (RNNs) or attention mechanisms. Recently emerging graph neural networks (GNNs) assist the KT model to capture the relationships such as question–skill and question–learner. However, non-pairwise and complex higher-order information among responses is ignored. In addition, a single-channel encoded hidden vector struggles to represent multigranularity knowledge states. To tackle the above problems, we propose a novel KT model named dual-channel adaptive scale hypergraph encoders with cross-view contrastive learning (HyperKT). Specifically, we design an adaptive scale hyperedge distillation component for generating knowledge-aware hyperedges and pattern-aware hyperedges that reflect non-pairwise higher-order features among responses. Then, we propose dual-channel hypergraph encoders to capture multigranularity knowledge states from global and local state hypergraphs. The encoders consist of a simplified hypergraph convolution network and a collaborative hypergraph convolution network. To enhance the supervisory signal in the state hypergraphs, we introduce the cross-view contrastive learning mechanism, which performs among state hypergraph views and their transformed line graph views. Extensive experiments on three real-world datasets demonstrate the superior performance of our HyperKT over the state-of-the-art (SOTA).

PaperID: 1185,

Authors: Yapeng Liu, Kun Zhou, Shouming Zhong, Kaibo Shi, Xuezhi Li

Affiliations: School of Mathematics and Information Science, Henan Normal University, Xinxiang, China; College of Mechanical and Electrical Engineering, China Jiliang University, Hangzhou, China; School of Mathematics Sciences, University of Electronic Science and Technology of China, Chengdu, China; School of Electronic Information and Electrical Engineering, Chengdu University, Chengdu, China

Title: Parametric Stability Criteria for Delayed Recurrent Neural Networks via Flexible Delay-Dividing Method

Abstract:
This article focuses on investigating the stability issue for recurrent neural networks (RNNs) with interval time-varying delays (TVDs) based on a flexible delay-dividing method with parameters, which are related to the delay derivative. First, an interval of delay is separated into parametric subintervals via the linear combination technique. Then, an establishment of Lyapunov-Krasovskii functional (LKF) is connected to the parameters, and a novel linear technology is suggested to dispose of integral terms in the derivatives of the constructed function. Finally, the validity and advantage of the inferred criteria are interpreted by the comparison of representative simulation examples.

PaperID: 1186,

Authors: Meng Xu, Xinhong Chen, Jianping Wang

Affiliations: Department of Computer Science, City University of Hong Kong, Hong Kong, SAR, China

Title: Policy Correction and State-Conditioned Action Evaluation for Few-Shot Lifelong Deep Reinforcement Learning

Abstract:
Lifelong deep reinforcement learning (DRL) approaches are commonly employed to adapt continuously to new tasks without forgetting previously acquired knowledge. While current lifelong DRL methods have shown promising advancements in retaining acquired knowledge, they suffer from significant adaptation efforts (i.e., longer training duration) and suboptimal policy when transferring to a new task that significantly deviates from previously learned tasks, a phenomenon known as the few-shot generalization challenge. In this work, we propose a generic approach that equips existing lifelong DRL methods with the capability of few-shot generalization. First, we employ selective experience reuse by leveraging the experience of encountered states, improving adaptation training for new tasks. Then, a relaxed softmax function is applied to the target Q values to improve the accuracy of evaluated Q values, leading to more optimal policies. Finally, we measure and reduce the discrepancy in data distribution between the policy and off-policy samples, resulting in improved adaptation efficiency. Extensive experiments have been conducted on three typical benchmarks to compare our approach with six representative lifelong DRL methods and two state-of-the-art (SOTA) few-shot DRL methods regarding their training speed, episode return, and average return of all episodes. Experimental results substantiate that our method improves the return of six lifelong DRL methods by at least 25%.

PaperID: 1187,

Authors: Brent A. Wallace, Jennie Si

Affiliations: Department of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, USA

Title: Continuous-Time Reinforcement Learning: New Design Algorithms With Theoretical Insights and Performance Guarantees

Abstract:
Continuous-time reinforcement learning (CT-RL) methods hold great promise in real-world applications. Adaptive dynamic programming (ADP)-based CT-RL algorithms, especially their theoretical developments, have achieved great successes. However, these methods have not been demonstrated for solving realistic or meaningful learning control problems. Thus, the goal of this work is to introduce a suite of new excitable integral reinforcement learning (EIRL) algorithms for control of CT affine nonlinear systems. This work develops a new excitation framework to improve persistence of excitation (PE) and numerical performance via input/output insights from classical control. Furthermore, when the system dynamics afford a physically-motivated partition into distinct dynamical loops, the proposed methods break the control problem into smaller subproblems, resulting in reduced complexity. By leveraging the known affine nonlinear dynamics, the methods achieve well-behaved system responses and considerable data efficiency. The work provides convergence, solution optimality, and closed-loop stability guarantees of the proposed methods, and it demonstrates these guarantees on a significant application problem of controlling an unstable, nonminimum phase hypersonic vehicle (HSV).

PaperID: 1188,

Authors: Zongyan Han, Zhenyong Fu, Shuo Chen, Le Hui, Guangyu Li, Jian Yang, Chang Wen Chen

Affiliations: School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China; RIKEN Center for Advanced Intelligence Project, Tokyo, Japan; School of Electronics and Information, Northwestern Polytechnical University, Xi’an, China; Department of Computing, The Hong Kong Polytechnic University, Hong Kong, SAR, China

Title: ZS-VAT: Learning Unbiased Attribute Knowledge for Zero-Shot Recognition Through Visual Attribute Transformer

Abstract:
In zero-shot learning (ZSL), attribute knowledge plays a vital role in transferring knowledge from seen classes to unseen classes. However, most existing ZSL methods learn biased attribute knowledge, which usually results in biased attribute prediction and a decline in zero-shot recognition performance. To solve this problem and learn unbiased attribute knowledge, we propose a visual attribute Transformer for zero-shot recognition (ZS-VAT), which is an effective and interpretable Transformer designed specifically for ZSL. In ZS-VAT, we design an attribute-head self-attention (AHSA) that is capable of learning unbiased attribute knowledge. Specifically, each attribute head in AHSA first transforms the local features into attribute-reinforced features and then accumulates the attribute knowledge from all corresponding reinforced features, reducing the mutual influence between attributes and avoiding information loss. AHSA finally preserves unbiased attribute knowledge through attribute embeddings. We also propose an attribute fusion model (AFM) that learns to recover the correct category knowledge from the attribute knowledge. In particular, AFM takes all features from AHSA as input and generates global embeddings. We carried out experiments to demonstrate that the attribute knowledge from AHSA and the category knowledge from AFM are able to assist each other. During the final semantic prediction, we combine the attribute embedding prediction (AEP) and global embedding prediction (GEP). We evaluated the proposed scheme on three benchmark datasets. ZS-VAT outperformed the state-of-the-art generalized ZSL (GZSL) methods on two datasets and achieved competitive results on the other dataset.

PaperID: 1189,

Authors: Meng Pang, Binghui Wang, Mang Ye, Yiu-Ming Cheung, Yintao Zhou, Wei Huang, Bihan Wen

Affiliations: School of Mathematics and Computer Sciences, Nanchang University, Nanchang, China; Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA; School of Computer Science, Wuhan University, Wuhan, China; Department of Computer Science, Hong Kong Baptist University, Hong Kong, China; School of Electrical and Electronic Engineering, Nanyang Technological University, Jurong West, Singapore

Title: Heterogeneous Prototype Learning From Contaminated Faces Across Domains via Disentangling Latent Factors

Abstract:
This article studies an emerging practical problem called heterogeneous prototype learning (HPL). Unlike the conventional heterogeneous face synthesis (HFS) problem that focuses on precisely translating a face image from a source domain to another target one without removing facial variations, HPL aims at learning the variation-free prototype of an image in the target domain while preserving the identity characteristics. HPL is a compounded problem involving two cross-coupled subproblems, that is, domain transfer and prototype learning (PL), thus making most of the existing HFS methods that simply transfer the domain style of images unsuitable for HPL. To tackle HPL, we advocate disentangling the prototype and domain factors in their respective latent feature spaces and then replacing the source domain with the target one for generating a new heterogeneous prototype. In doing so, the two subproblems in HPL can be solved jointly in a unified manner. Based on this, we propose a disentangled HPL framework, dubbed DisHPL, which is composed of one encoder–decoder generator and two discriminators. The generator and discriminators play adversarial games such that the generator embeds contaminated images into a prototype feature space only capturing identity information and a domain-specific feature space, while generating realistic-looking heterogeneous prototypes. Experiments on various heterogeneous datasets with diverse variations validate the superiority of DisHPL.

PaperID: 1190,

Authors: Yuanduo Hong, Huihui Pan, Yisong Jia, Weichao Sun, Huijun Gao

Affiliations: Research Institute of Intelligent Control and Systems, Harbin Institute of Technology, Harbin, China

Title: ResDNet: Efficient Dense Multi-Scale Representations With Residual Learning for High-Level Vision Tasks

Abstract:
Deep feature fusion plays a significant role in the strong learning ability of convolutional neural networks (CNNs) for computer vision tasks. Recently, works continually demonstrate the advantages of efficient aggregation strategy and some of them refer to multiscale representations. In this article, we describe a novel network architecture for high-level computer vision tasks where densely connected feature fusion provides multiscale representations for the residual network. We term our method the ResDNet which is a simple and efficient backbone made up of sequential ResDNet modules containing the variants of dense blocks named sliding dense blocks (SDBs). Compared with DenseNet, ResDNet enhances the feature fusion and reduces the redundancy by shallower densely connected architectures. Experimental results on three classification benchmarks including CIFAR-10, CIFAR-100, and ImageNet demonstrate the effectiveness of ResDNet. ResDNet always outperforms DenseNet using much less computation on CIFAR-100. On ImageNet, ResDNet-B-129 achieves 1.94% and 0.89% top-1 accuracy improvement over ResNet-50 and DenseNet-201 with similar complexity. Besides, ResDNet with more than 1000 layers achieves remarkable accuracy on CIFAR compared with other state-of-the-art results. Based on MMdetection implementation of RetinaNet, ResDNet-B-129 improves mAP from 36.3 to 39.5 compared with ResNet-50 on COCO dataset.

PaperID: 1191,

Authors: Wujie Zhou, Qinling Guo, Jingsheng Lei, Lu Yu, Jenq-Neng Hwang

Affiliations: School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, China; Institute of Information and Communication Engineering, Zhejiang University, Hangzhou, China; Department of Electrical Engineering, University of Washington, Seattle, WA, USA

Title: IRFR-Net: Interactive Recursive Feature-Reshaping Network for Detecting Salient Objects in RGB-D Images

Abstract:
Using attention mechanisms in saliency detection networks enables effective feature extraction, and using linear methods can promote proper feature fusion, as verified in numerous existing models. Current networks usually combine depth maps with red–green–blue (RGB) images for salient object detection (SOD). However, fully leveraging depth information complementary to RGB information by accurately highlighting salient objects deserves further study. We combine a gated attention mechanism and a linear fusion method to construct a dual-stream interactive recursive feature-reshaping network (IRFR-Net). The streams for RGB and depth data communicate through a backbone encoder to thoroughly extract complementary information. First, we design a context extraction module (CEM) to obtain low-level depth foreground information. Subsequently, the gated attention fusion module (GAFM) is applied to the RGB depth (RGB-D) information to obtain advantageous structural and spatial fusion features. Then, adjacent depth information is globally integrated to obtain complementary context features. We also introduce a weighted atrous spatial pyramid pooling (WASPP) module to extract the multiscale local information of depth features. Finally, global and local features are fused in a bottom-up scheme to effectively highlight salient objects. Comprehensive experiments on eight representative datasets demonstrate that the proposed IRFR-Net outperforms 11 state-of-the-art (SOTA) RGB-D approaches in various evaluation indicators.

PaperID: 1192,

Authors: Jianjian Cao, Xiameng Qin, Sanyuan Zhao, Jianbing Shen

Affiliations: Department of Computer Science, Beijing Institute of Technology, Beijing, China; Baidu Inc., Beijing, China; Department of Computer and Information Science, State Key Laboratory of Internet of Things for Smart City, University of Macau, Macau, China

Title: Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering

Abstract:
Answering semantically complicated questions according to an image is challenging in a visual question answering (VQA) task. Although the image can be well represented by deep learning, the question is always simply embedded and cannot well indicate its meaning. Besides, the visual and textual features have a gap for different modalities, it is difficult to align and utilize the cross-modality information. In this article, we focus on these two problems and propose a graph matching attention (GMA) network. First, it not only builds graph for the image but also constructs graph for the question in terms of both syntactic and embedding information. Next, we explore the intramodality relationships by a dual-stage graph encoder and then present a bilateral cross-modality GMA to infer the relationships between the image and the question. The updated cross-modality features are then sent into the answer prediction module for final answer prediction. Experiments demonstrate that our network achieves the state-of-the-art performance on the GQA dataset and the VQA 2.0 dataset. The ablation studies verify the effectiveness of each module in our GMA network.

PaperID: 1193,

Authors: Bowen Pang, Liyi Huang, Qingsong Li, Wei Wei

Affiliations: School of Mathematical Sciences, Beihang University, Beijing, China; Key Laboratory of Mathematics Informatics Behavioral Semantics, Ministry of Education, Beijing, China

Title: A Continuous Volatility Forecasting Model Based on Neural Differential Equations and Scale-Similarity

Abstract:
Volatility forecasting is a problem in finance that attracts the attention of both academia and industry. While existing approaches typically utilize a discrete-time latent process that governs the volatility to forecast its future level, volatility is considered to evolve continuously, which makes discrete-time modeling inevitably lose some critical information about the evolution of volatility. In this article, a novel neural-network-based model, Continuous Volatility Forecasting Model, CVFM is proposed to tackle this problem. First, CVFM introduces a continuous-time latent process, whose evolution is modeled with neural differential equations (NDEs), to govern volatility, which effectively captures the continuous evolutionary behavior of volatility in a data-driven way. Second, a scale-similarity-based mechanism is designed to calibrate the evolution equation of the latent process with real-world observations in the absence of high-frequency data. CVFM is tested on six real-world stock index datasets. The main experimental results show that CVFM can significantly outperform existing models in terms of both forecasting accuracy and high-volatility recognition.

PaperID: 1194,

Authors: Mingkun Xu, Faqiang Liu, Yifan Hu, Hongyi Li, Yuanyuan Wei, Shuai Zhong, Jing Pei, Lei Deng

Affiliations: Guangdong Institute of Intelligence Science and Technology, Zhuhai, China; Department of Precision Instrument, Tsinghua University, Center for Brain-Inspired Computing Research, the Optical Memory National Engineering Research Center, Tsinghua University-China Electronics Technology HIK Group Corporation Joint Research Center for Brain-Inspired Computing, the IDG/McGovern Institute for Brain Research, Beijing, China; Department of Biomedical Engineering, The Chinese University of Hong Kong, Hong Kong, SAR, China

Title: Adaptive Synaptic Scaling in Spiking Networks for Continual Learning and Enhanced Robustness

Abstract:
Synaptic plasticity plays a critical role in the expression power of brain neural networks. Among diverse plasticity rules, synaptic scaling presents indispensable effects on homeostasis maintenance and synaptic strength regulation. In the current modeling of brain-inspired spiking neural networks (SNN), backpropagation through time is widely adopted because it can achieve high performance using a small number of time steps. Nevertheless, the synaptic scaling mechanism has not yet been well touched. In this work, we propose an experience-dependent adaptive synaptic scaling mechanism (AS-SNN) for spiking neural networks. The learning process has two stages: First, in the forward path, adaptive short-term potentiation or depression is triggered for each synapse according to afferent stimuli intensity accumulated by presynaptic historical neural activities. Second, in the backward path, long-term consolidation is executed through gradient signals regulated by the corresponding scaling factor. This mechanism shapes the pattern selectivity of synapses and the information transfer they mediate. We theoretically prove that the proposed adaptive synaptic scaling function follows a contraction map and finally converges to an expected fixed point, in accordance with state-of-the-art results in three tasks on perturbation resistance, continual learning, and graph learning. Specifically, for the perturbation resistance and continual learning tasks, our approach improves the accuracy on the N-MNIST benchmark over the baseline by 44% and 25%, respectively. An expected firing rate callback and sparse coding can be observed in graph learning. Extensive experiments on ablation study and cost evaluation evidence the effectiveness and efficiency of our nonparametric adaptive scaling method, which demonstrates the great potential of SNN in continual learning and robust learning.

PaperID: 1195,

Authors: Wen-Shuai Hu, Wei Li, Heng-Chao Li, Feng-Hua Huang, Ran Tao

Affiliations: School of Information and Electronics, Beijing Institute of Technology, Beijing, China; School of Information Science and Technology, Southwest Jiaotong University, Chengdu, China; College of Artificial Intelligence, Yango University, Fuzhou, China

Title: Global Clue-Guided Cross-Memory Quaternion Transformer Network for Multisource Remote Sensing Data Classification

Abstract:
Multisource remote sensing data classification is a challenging research topic, and how to address the inherent heterogeneity between multimodal data while exploring their complementarity is crucial. Existing deep learning models usually directly adopt feature-level fusion designs, most of which, however, fail to overcome the impact of heterogeneity, limiting their performance. As such, a multimodal joint classification framework, called global clue-guided cross-memory quaternion transformer network (GCCQTNet), is proposed for multisource data [i.e., hyperspectral image (HSI) and synthetic aperture radar (SAR)/light detection and ranging (LiDAR)] classification. First, a three-branch structure is built to extract the local and global features, where an independent squeeze-expansion-like fusion (ISEF) structure is designed to update the local and global representations by considering the global information as an agent, suppressing the negative impact of multimodal heterogeneity layer by layer. A cross-memory quaternion transformer (CMQT) structure is further constructed to model the complex inner relationships between the intramodality and intermodality features to capture more discriminative fusion features that fully characterize multimodal complementarity. Finally, a cross-modality comparative learning (CMCL) structure is developed to impose the consistency constraint on global information learning, which, in conjunction with a classification head, is used to guide the end-to-end training of GCCQTNet. Extensive experiments on three public multisource remote sensing datasets illustrate the superiority of our GCCQTNet with regards to other state-of-the-art methods.

PaperID: 1196,

Authors: Jing Ping, Song Zhu, Weiwei Luo, Zhen Zhang

Affiliations: School of Mathematics, JCAM, China University of Mining and Technology, Xuzhou, China

Title: Hyper-Exponential Stabilization of Neural Networks by Event-Triggered Impulsive Control With Actuation Delay

Abstract:
This brief studies the hyper-exponential stabilization of neural networks (NNs) by event-triggered impulsive control, where the impulse instants are determined by the event-triggered conditions. In the presence of actuation delay, an event-triggered impulsive control scheme is devised. For reducing the sampling task of continuous detection, a periodic-detection scheme is also introduced. Within these frameworks, the occurrence of Zeno behavior is rigorously precluded, and some criteria are formulated to achieve the stabilization of the system with a hyper-exponential convergence rate. Moreover, a numerical simulation is provided to elucidate the validity of the theoretical findings.

PaperID: 1197,

Authors: Qinge Xiao, Ben Niu, Ying Tan, Zhile Yang, Xingzheng Chen

Affiliations: Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; College of Management, Shenzhen University, Shenzhen, China; Department of Machine Intelligence, School of Electronics Engineering and Computer Science, Key Laboratory of Machine Perception, Ministry of Education, Peking University, Beijing, China; College of Engineering and Technology, Southwest University, Chongqing, China

Title: Generative Upper-Level Policy Imitation Learning With Pareto-Improvement for Energy-Efficient Advanced Machining Systems

Abstract:
The potential intelligence behind advanced machining systems (AMSs) offers positive contributions toward process improvement. Imitation learning (IL) offers an appealing approach to accessing this intelligence by observing demonstrations from skilled technologists. However, existing IL algorithms that implement single policy strategies have yet to consider realistic scenarios for complex AMS tasks, where the available demonstrations may have come from various experts. Moreover, most IL assumes that the expert’s policy is optimal, preventing the learning from fulfilling the previously ignored green missions. This article introduces a novel three-phase policy search algorithm based on IL, enabling the learning of heterogeneous expert policies while balancing energy preferences. The first phase equips the agent with machining basics through upper-level policy learning, generating an imitation policy distribution with various decision-making principles. The second phase enhances energy conservation capabilities by employing Pareto-improvement learning and fine-tuning the agent’s policies to a Pareto-policy manifold. The third phase produces outcomes and amplifies the efficacy of human feedback by utilizing ensemble policies. The experimental results indicate that the proposed method outperforms meta-heuristics, exhibiting superior solution quality and faster computation times compared to four diverse baseline methods, each with diverse samples.

PaperID: 1198,

Authors: Yang Shi, Chenling Ding, Shuai Li, Bin Li, Xiaobing Sun

Affiliations: School of Information Engineering and Jiangsu Province Engineering Research Center of Knowledge Management and Intelligent Service, Yangzhou University, Yangzhou, China; Faculty of Information Technology and Electrical Engineering, University of Oulu, Oulu, Finland

Title: New RNN Algorithms for Different Time-Variant Matrix Inequalities Solving Under Discrete-Time Framework

Abstract:
A series of discrete time-variant matrix inequalities is generally regarded as one of the challenging problems in science and engineering fields. As a discrete time-variant problem, the existing solving schemes generally need the theoretical support under the continuous-time framework, and there is no independent solving scheme under the discrete-time framework. The theoretical deficiency of solving scheme greatly limits the theoretical research and practical application of discrete time-variant matrix inequalities. In this article, new discrete-time recurrent neural network (RNN) algorithms are proposed, analyzed, and investigated for solving different time-variant matrix inequalities under the discrete-time framework, including discrete time-variant matrix vector inequality (discrete time-variant MVI), discrete time-variant generalized matrix inequality (discrete time-variant GMI), discrete time-variant generalized-Sylvester matrix inequality (discrete time-variant GSMI), and discrete time-variant complicated-Sylvester matrix inequality (discrete time-variant CSMI), and all solving processes are based on the direct discretization thought. Specifically, first of all, four discrete time-variant matrix inequalities are presented as the target problems of these researches. Second, for solving such problems, we propose corresponding discrete-time recurrent neural network (RNN) (DT-RNN) algorithms (termed DT-RNN-MVI algorithm, DT-RNN-GMI algorithm, DT-RNN-GSMI algorithm, and DT-RNN-CSMI algorithm), which are different from the traditional DT-RNN design thought because second-order Taylor expansion is applied to derive the DT-RNN algorithms. This creative process avoids the intervention of continuous-time framework. Then, theoretical analyses are presented, which show the convergence and precision of the DT-RNN algorithms. Abundant numerical experiments are further carried out, which further confirm the excellent properties of the DT-RNN algorithms.

PaperID: 1199,

Authors: Lei Yan, Junhe Liu, Guanyu Lai, C. L. Philip Chen, Zongze Wu, Zhi Liu

Affiliations: School of Automation, Guangdong University of Technology, Guangzhou, China; School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China

Title: Adaptive Critic Learning-Based Optimal Bipartite Consensus for Multiagent Systems With Prescribed Performance

Abstract:
Developing a distributed bipartite optimal consensus scheme while ensuring user-predefined performance is essential in practical applications. Existing approaches to this problem typically require a complex controller structure due to adopting an identifier-actor–critic framework and prescribed performance cannot be guaranteed. In this work, an adaptive critic learning (ACL)-based optimal bipartite consensus scheme is developed to bridge the gap. A newly designed error scaling function, which defines the user-predefined settling time and steady accuracy without relying on the initial conditions, is then integrated into a cost function. The backstepping framework combines the ACL and integral reinforcement learning (IRL) algorithm to develop the adaptive optimal bipartite consensus scheme, which contributes a critic-only controller structure by removing the identifier and actor networks in the existing methods. The adaptive law of the critic network is derived by the gradient descent algorithm and experience replay to minimize the IRL-based residual error. It is shown that a compute-saving learning mechanism can achieve the optimal consensus, and the error variables of the closed-loop system are uniformly ultimately bounded (UUB). Besides, in any bounded initial condition, the evolution of bipartite consensus is limited to a user-prescribed boundary under bounded initial conditions. The illustrative simulation results validate the efficacy of the approach.

PaperID: 1200,

Authors: Chong Yu, Shuaiqi Shen, Shiqiang Wang, Kuan Zhang, Hai Zhao

Affiliations: Department of Computer Science, University of Cincinnati, Cincinnati, OH, USA; Department of Electrical Engineering, University of Wisconsin–Milwaukee, Milwaukee, WI, USA; IBM T. J. Watson Research Center, Yorktown Heights, NY, USA; Department of Electrical and Computer Engineering, University of Nebraska–Lincoln, Lincoln, NE, USA; Department of Computer Science and Engineering, Northeastern University, Shenyang, China

Title: Communication-Efficient Hybrid Federated Learning for E-Health With Horizontal and Vertical Data Partitioning

Abstract:
Electronic healthcare (e-health) allows smart devices and medical institutions to collaboratively collect patients’ data, which is trained by artificial intelligence (AI) technologies to help doctors make diagnosis. By allowing multiple devices to train models collaboratively, federated learning is a promising solution to address the communication and privacy issues in e-health. However, applying federated learning in e-health faces many challenges. First, medical data are both horizontally and vertically partitioned. Since single horizontal federated learning (HFL) or vertical federated learning (VFL) techniques cannot deal with both types of data partitioning, directly applying them may consume excessive communication cost due to transmitting a part of raw data when requiring high modeling accuracy. Second, a naive combination of HFL and VFL has limitations including low training efficiency, unsound convergence analysis, and lack of parameter tuning strategies. In this article, we provide a thorough study on an effective integration of HFL and VFL, to achieve communication efficiency and overcome the above limitations when data are both horizontally and vertically partitioned. Specifically, we propose a hybrid federated learning framework with one intermediate result exchange and two aggregation phases. Based on this framework, we develop a hybrid stochastic gradient descent (HSGD) algorithm to train models. Then, we theoretically analyze the convergence upper bound of the proposed algorithm. Using the convergence results, we design adaptive strategies to adjust the training parameters and shrink the size of transmitted data. The experimental results validate that the proposed HSGD algorithm can achieve the desired accuracy while reducing communication cost, and they also verify the effectiveness of the adaptive strategies.

PaperID: 1201,

Authors: Zhengxin Li, Feiping Nie, Rong Wang, Xuelong Li

Affiliations: College of Equipment Management and UAV Engineering, Air Force Engineering University, Xi’an, Shaanxi, China; School of Computer Science and School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi’an, Shaanxi, China

Title: A Revised Formation of Trace Ratio LDA for Small Sample Size Problem

Abstract:
Linear discriminant analysis (LDA) is a classic tool for supervised dimensionality reduction. Because the projected samples can be classified effectively, LDA has been successfully applied in many applications. Among the variants of LDA, trace ratio LDA (TR-LDA) is a classic form due to its explicit meaning. Unfortunately, when the sample size is much smaller than the data dimension, the algorithm for solving TR-LDA does not converge. The so-called small sample size (SSS) problem severely limits the application of TR-LDA. To solve this problem, we propose a revised formation of TR-LDA, which can be applied to datasets with different sizes in a unified form. Then, we present an optimization algorithm to solve the proposed method, explain why it can avoid the SSS problem, and analyze the convergence and computational complexity of the optimization algorithm. Next, based on the introduced theorems, we quantitatively elaborate on when the SSS problem will occur in TR-LDA. Finally, the experimental results on real-world datasets demonstrate the effectiveness of the proposed method.

PaperID: 1202,

Authors: Haixing Zhu, Weipeng Liu, Zhifan Gao, Heye Zhang

Affiliations: State Key Laboratory of Reliability and Intelligence of Electrical Equipment and the School of Artificial Intelligence and Data Science, Hebei University of Technology, Tianjin, China; School of Biomedical Engineering, Sun Yat-sen University, Guangzhou, China

Title: Explainable Classification of Benign-Malignant Pulmonary Nodules With Neural Networks and Information Bottleneck

Abstract:
Computerized tomography (CT) is a clinically primary technique to differentiate benign-malignant pulmonary nodules for lung cancer diagnosis. Early classification of pulmonary nodules is essential to slow down the degenerative process and reduce mortality. The interactive paradigm assisted by neural networks is considered to be an effective means for early lung cancer screening in large populations. However, some inherent characteristics of pulmonary nodules in high-resolution CT images, e.g., diverse shapes and sparse distribution over the lung fields, have been inducing inaccurate results. On the other hand, most existing methods with neural networks are dissatisfactory from a lack of transparency. In order to overcome these obstacles, a united framework is proposed, including the classification and feature visualization stages, to learn distinctive features and provide visual results. Specifically, a bilateral scheme is employed to synchronously extract and aggregate global-local features in the classification stage, where the global branch is constructed to perceive deep-level features and the local branch is built to focus on the refined details. Furthermore, an encoder is built to generate some features, and a decoder is constructed to simulate decision behavior, followed by the information bottleneck viewpoint to optimize the objective. Extensive experiments are performed to evaluate our framework on two publicly available datasets, namely, 1) the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) and 2) the Lung and Colon Histopathological Image Dataset (LC25000). For instance, our framework achieves 92.98% accuracy and presents additional visualizations on the LIDC. The experiment results show that our framework can obtain outstanding performance and is effective to facilitate explainability. It also demonstrates that this united framework is a serviceable tool and further has the scalability to be introduced into clinical research.

PaperID: 1203,

Authors: Zhongguo Li, Wen-Hua Chen, Jun Yang, Cunjia Liu

Affiliations: Department of Electrical and Electronic Engineering, The University of Manchester, Manchester, U.K.; Department of Aeronautical and Automotive Engineering, Loughborough University, Loughborough, U.K.

Title: Cooperative Active Learning-Based Dual Control for Exploration and Exploitation in Autonomous Search

Abstract:
In this article, a multi-estimator based computationally efficient algorithm is developed for autonomous search in an unknown environment with an unknown source. Different from the existing approaches that require massive computational power to support nonlinear Bayesian estimation and complex decision-making process, an efficient cooperative active-learning-based dual control for exploration and exploitation (COAL-DCEE) is developed for source estimation and path planning. Multiple cooperative estimators are deployed for environment learning process, which is helpful to improving the search performance and robustness against noisy measurements. The number of estimators used in COAL-DCEE is much smaller than that of the particles required for Bayesian estimation in information-theoretic approaches. Consequently, the computational load is significantly reduced. As an important feature of this study, the convergence and performance of COAL-DCEE are established in relation to the characteristics of sensor noises and turbulence disturbances. Numerical and experimental studies have been carried out to verify the effectiveness of the proposed framework. Compared with the existing approaches, COAL-DCEE not only provides convergence guarantee but also yields comparable search performance using much less computational power.

PaperID: 1204,

Authors: Youshen Xia, Tiantian Ye, Liqing Huang

Affiliations: College of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, China; Department of Mathematics and Big Data, Chaohu University, Hefei, Anhui, China

Title: Analysis and Application of Matrix-Form Neural Networks for Fast Matrix-Variable Convex Optimization

Abstract:
Matrix-variable optimization is a generalization of vector-variable optimization and has been found to have many important applications. To reduce computation time and storage requirement, this article presents two matrix-form recurrent neural networks (RNNs), one continuous-time model and another discrete-time model, for solving matrix-variable optimization problems with linear constraints. The two proposed matrix-form RNNs have low complexity and are suitable for parallel implementation in terms of matrix state space. The proposed continuous-time matrix-form RNN can significantly generalize existing continuous-time vector-form RNN. The proposed discrete-time matrix-form RNN can be effectively used in blind image restoration, where the storage requirement and computational cost are largely reduced. Theoretically, the two proposed matrix-form RNNs are guaranteed to be globally convergent to the optimal solution under mild conditions. Computed results show that the proposed matrix-form RNN-based algorithm is superior to related vector-form RNN and matrix-form RNN-based algorithms, in terms of computation time.

PaperID: 1205,

Authors: Vahid Saranirad, Shirin Dora, Thomas Martin McGinnity, Damien Coyle

Affiliations: Intelligent Systems Research Center, School of Computing, Engineering and Intelligent Systems, Ulster University, Londonderry, U.K.; Department of Computer Science, Loughborough University, Loughborough, U.K.

Title: CDNA-SNN: A New Spiking Neural Network for Pattern Classification Using Neuronal Assemblies

Abstract:
Spiking neural networks (SNNs) mimic their biological counterparts more closely than their predecessors and are considered the third generation of artificial neural networks. It has been proven that networks of spiking neurons have a higher computational capacity and lower power requirements than sigmoidal neural networks. This article introduces a new type of SNN that draws inspiration and incorporates concepts from neuronal assemblies in the human brain. The proposed network, termed as class-dependent neuronal activation-based SNN (CDNA-SNN), assigns each neuron learnable values known as CDNAs which indicate the neuron’s average relative spiking activity in response to samples from different classes. A new learning algorithm that categorizes the neurons into different class assemblies based on their CDNAs is also presented. These neuronal assemblies are trained via a novel training method based on spike-timing-dependent plasticity (STDP) to have high activity for their associated class and low firing rate for other classes. Also, using CDNAs, a new type of STDP that controls the amount of plasticity based on the assemblies of pre- and postsynaptic neurons is proposed. The performance of CDNA-SNN is evaluated on five datasets from the University of California, Irvine (UCI) machine learning repository, as well as Modified National Institute of Standards and Technology (MNIST) and Fashion MNIST, using nested cross-validation (N-CV) for hyperparameter optimization. Our results show that CDNA-SNN significantly outperforms synaptic weight association training (SWAT) ( p < 0.0005 ) and SpikeProp ( p < 0.05 ) on 3/5 and self-regulating evolving spiking neural (SRESN) ( p < 0.05 ) on 2/5 UCI datasets while using the significantly lower number of trainable parameters. Furthermore, compared to other supervised, fully connected SNNs, the proposed SNN reaches the best performance for Fashion MNIST and comparable performance for MNIST and neuromorphic-MNIST (N-MNIST), also utilizing much less (1%–35%) parameters.

PaperID: 1206,

Authors: Bin Zhou, Bing Huang, Yumin Su, Cheng Zhu

Affiliations: Department of Science and Technology on Underwater Vehicle Laboratory, Harbin Engineering University, Harbin, China

Title: Interleaved Periodic Event-Triggered Communications-Based Distributed Formation Control for Cooperative Unmanned Surface Vessels

Abstract:
This article addresses the distributed formation control issue of cooperative unmanned surface vessels (USVs) under interleaved periodic event-triggered communications. First, an adaptive event-based control protocol is designed, where the event-based neural network (NN) scheme is developed to compensate for uncertain model dynamics. Upon the designed control protocol, an interleaved periodic event-triggered mechanism (IPETM) is subsequently proposed to achieve the communication objective. Unlike the common continuous event-triggered methods and periodic event-triggered methods, in which multiple nodes are allowed to trigger their events at the same time, the proposed IPETM ensures that USVs detect their events at different times to avoid the simultaneous event triggering of different nodes. By this virtue, traffic jamming in common wireless environments can be prevented, such that potential communication delays and faults are naturally avoided. In addition, the event detecting instants of the presented IPETM are also discrete and periodic, such that it can be performed under low-computational frequencies. Through Lyapunov-based analysis, it is verified that all closed-loop signals can converge to an arbitrary small compact set with exponential convergence rates. Simulation results demonstrate the effectiveness and superiority of the proposed control scheme.

PaperID: 1207,

Authors: Shuang Li, Rui Zhang, Kaixiong Gong, Mixue Xie, Wenxuan Ma, Guangyu Gao

Affiliations: School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China

Title: Source-Free Active Domain Adaptation via Augmentation-Based Sample Query and Progressive Model Adaptation

Abstract:
Active domain adaptation (ADA), which enormously improves the performance of unsupervised domain adaptation (UDA) at the expense of annotating limited target data, has attracted a surge of interest. However, in real-world applications, the source data in conventional ADA are not always accessible due to data privacy and security issues. To alleviate this dilemma, we introduce a more practical and challenging setting, dubbed as source-free ADA (SFADA), where one can select a small quota of target samples for label query to assist the model learning, but labeled source data are unavailable. Therefore, how to query the most informative target samples and mitigate the domain gap without the aid of source data are two key challenges in SFADA. To address SFADA, we propose a unified method SQAdapt via augmentation-based S ample Q uery and progressive model Adapt ation. In specific, an active selection module (ASM) is built for target label query, which exploits data augmentation to select the most informative target samples with high predictive sensitivity and uncertainty. Then, we further introduce a classifier adaptation module (CAM) to leverage both the labeled and unlabeled target data for progressively calibrating the classifier weights. Meanwhile, the source-like target samples with low selection scores are taken as source surrogates to realize the distribution alignment in the source-free scenario by the proposed distribution alignment module (DAM). Moreover, as a general active label query method, SQAdapt can be easily integrated into other source-free UDA (SFUDA) methods, and improve their performance. Comprehensive experiments on multiple benchmarks have shown that SQAdapt can achieve superior performance and even surpass most of the ADA methods.

PaperID: 1208,

Authors: Jiahui Qu, Wenqian Dong, Yufei Yang, Tongzhen Zhang, Yunsong Li, Qian Du

Affiliations: State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, China; Department of Electronic and Computer Engineering, Mississippi State University, Starkville, MS, USA

Title: Cycle-Refined Multidecision Joint Alignment Network for Unsupervised Domain Adaptive Hyperspectral Change Detection

Abstract:
Hyperspectral change detection, which provides abundant information on land cover changes in the Earth’s surface, has become one of the most crucial tasks in remote sensing. Recently, deep-learning-based change detection methods have shown remarkable performance, but the acquirement of labeled data is extremely expensive and time-consuming. It is intuitive to learn changes from the scene with sufficient labeled data and adapting them into an unlabeled new scene. However, the nonnegligible domain shift between different scenes leads to inevitable performance degradation. In this article, a cycle-refined multidecision joint alignment network (CMJAN) is proposed for unsupervised domain adaptive hyperspectral change detection, which realizes progressive alignment of the data distributions between the source and target domains with cycle-refined high-confidence labeled samples. There are two key characteristics: 1) progressively mitigate the distribution discrepancy to learn domain-invariant difference feature representation and 2) update the high-confidence training samples of the target domain in a cycle manner. The benefit is that the domain shift between the source and target domains is progressively alleviated to promote change detection performance on the target domain in an unsupervised manner. Experimental results on different datasets demonstrate that the proposed method can achieve better performance than the state-of-the-art change detection methods.

PaperID: 1209,

Authors: Qiang Zhu, Feiyu Chen, Shuyuan Zhu, Yu Liu, Xue Zhou, Ruiqin Xiong, Bing Zeng

Affiliations: School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China; School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China; School of Computer Science, Peking University, Beijing, China

Title: DVSRNet: Deep Video Super-Resolution Based on Progressive Deformable Alignment and Temporal-Sparse Enhancement

Abstract:
Video super-resolution (VSR) is used to compose high-resolution (HR) video from low-resolution video. Recently, the deformable alignment-based VSR methods are becoming increasingly popular. In these methods, the features extracted from video are aligned to eliminate the motion error targeting high super-resolution (SR) quality. However, these methods often suffer from misalignment and the lack of enough temporal information to compose HR frames, which accordingly induce artifacts in the SR result. In this article, we design a deep VSR network (DVSRNet) based on the proposed progressive deformable alignment (PDA) module and temporal-sparse enhancement (TSE) module. Specifically, the PDA module is designed to accurately align features and to eliminate artifacts via the bidirectional information propagation. The TSE module is constructed to further eliminate artifacts and to generate clear details for the HR frame. In addition, we construct a lightweight deep optical flow network (OFNet) to obtain the bidirectional optical flows for the implementation of the PDA module. Moreover, two new loss functions are designed for our proposed method. The first one is adopted in OFNet and the second one is constructed to guarantee the generation of sharp and clear details for the HR frames. The experimental results demonstrate that our method performs better than the state-of-the-art methods.

PaperID: 1210,

Authors: Xiaohong Chen, Guanying Xu, Xuesong Xu, Haichong Jiang, Zhiping Tian, Tao Ma

Affiliations: Business School of Central South University, Xiang Jiang Laboratory, Changsha, China; Changsha Social Laboratory of Artificial Intelligence, Hunan University of Technology and Business, Changsha, China; School of Advanced Interdisciplinary Studies, Hunan University of Technology and Business, Changsha, China; Hope Innovation Company Ltd., Changsha, China

Title: Multicenter Hierarchical Federated Learning With Fault-Tolerance Mechanisms for Resilient Edge Computing Networks

Abstract:
In the realm of federated learning (FL), the conventional dual-layered architecture, comprising a central parameter server and peripheral devices, often encounters challenges due to its significant reliance on the central server for communication and security. This dependence becomes particularly problematic in scenarios involving potential malfunctions of devices and servers. While existing device-edge-cloud hierarchical FL (HFL) models alleviate some dependence on central servers and reduce communication overheads, they primarily focus on load balancing within edge computing networks and fall short of achieving complete decentralization and edge-centric model aggregation. Addressing these limitations, we introduce the multicenter HFL (MCHFL) framework. This innovative framework replaces the traditional single central server architecture with a distributed network of robust global aggregation centers located at the edge, inherently enhancing fault tolerance crucial for maintaining operational integrity amidst edge network disruptions. Our comprehensive experiments with the MNIST, FashionMNIST, and CIFAR-10 datasets demonstrate the MCHFL’s superior performance. Notably, even under high paralysis ratios of up to 50%, the MCHFL maintains high accuracy levels, with maximum accuracy reductions of only 2.60%, 5.12%, and 16.73% on these datasets, respectively. This performance significantly surpasses the notable accuracy declines observed in traditional single-center models under similar conditions. To the best of our knowledge, the MCHFL is the first edge multicenter FL framework with theoretical underpinnings. Our extensive experimental results across various datasets validate the MCHFL’s effectiveness, showcasing its higher accuracy, faster convergence speed, and stronger robustness compared to single-center models, thereby establishing it as a pioneering paradigm in edge multicenter FL.

PaperID: 1211,

Authors: Miao Liu, Jing Wang, Fei Wang, Fei Xiang, Jingdong Chen

Affiliations: School of Information and Electronics, Beijing Institute of Technology, Beijing, China; Xiaomi Company, Beijing, China; Center of Intelligent Acoustics and Immersive Communications, Northwestern Polytechnical University, Xi’an, China

Title: Non-Intrusive Speech Quality Assessment Based on Deep Neural Networks for Speech Communication

Abstract:
Traditionally, speech quality evaluation relies on subjective assessments or intrusive methods that require reference signals or additional equipment. However, over recent years, non-intrusive speech quality assessment has emerged as a promising alternative, capturing much attention from researchers and industry professionals. This article presents a deep learning-based method that exploits large-scale intrusive simulated data to improve the accuracy and generalization of non-intrusive methods. The major contributions of this article are as follows. First, it presents a data simulation method, which generates degraded speech signals and labels their speech quality with the perceptual objective listening quality assessment (POLQA). The generated data is proven to be useful for pretraining the deep learning models. Second, it proposes to apply an adversarial speaker classifier to reduce the impact of speaker-dependent information on speech quality evaluation. Third, an autoencoder-based deep learning scheme is proposed following the principle of representation learning and adversarial training (AT) methods, which is able to transfer the knowledge learned from a large amount of simulated speech data labeled by POLQA. With the help of discriminative representations extracted from the autoencoder, the prediction model can be trained well on a relatively small amount of speech data labeled through subjective listening tests. Fourth, an end-to-end speech quality evaluation neural network is developed, which takes magnitude and phase spectral features as its inputs. This phase-aware model is more accurate than the model using only the magnitude spectral features. A large number of experiments are carried out with three datasets: one simulated with labels obtained using POLQA and two recorded with labels obtained using subjective listening tests. The results show that the presented phase-aware method improves the performance of the baseline model and the proposed model with latent representations extracted from the adversarial autoencoder (AAE) outperforms the state-of-the-art objective quality assessment methods, reducing the root mean square error (RMSE) by 10.5% and 12.2% on the Beijing Institute of Technology (BIT) dataset and Tencent Corpus, respectively. The code and supplementary materials are available at https://github.com/liushenme/AAE-SQA.

PaperID: 1212,

Authors: Jung Uk Kim, Yong Man Ro

Affiliations: Image and Video Systems Laboratory, School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea

Title: Enabling Visual Object Detection With Object Sounds via Visual Modality Recalling Memory

Abstract:
When humans hear the sound of an object, they recall associated visual information and integrate the sound with recalled visual modality to detect the object. In this article, we present a novel sound-based object detector that mimics this process. We design a visual modality recalling (VMR) memory to recall information of a visual modality based on an audio modal input (i.e., sound). To achieve this goal, we propose a VMR loss and an audio–visual association loss to guide the VMR memory to memorize visual modal information by establishing associations between audio and visual modalities. With the visual modal information recalled through the VMR memory along with the original audio input, we perform audio–visual integration. In this step, we introduce an integrated feature contrastive loss that allows the integrated feature to be embedded as if it were encoded using both audio and visual modal inputs. This guidance enables our sound-based object detector to effectively perform visual object detection even when only sound is provided. We believe that our work is a cornerstone study that offers a new perspective to conventional object detection studies that solely rely on the visual modality. Comprehensive experimental results demonstrate the effectiveness of the proposed method with the VMR memory.

PaperID: 1213,

Authors: Yabo Liu, Jinghua Wang, Linhui Xiao, Chengliang Liu, Zhihao Wu, Yong Xu

Affiliations: Shenzhen Key Laboratory of Visual Object Detection and Recognition, Harbin Institute of Technology, Shenzhen, China; School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China; Peng Cheng Laboratory, Shenzhen, China

Title: Foregroundness-Aware Task Disentanglement and Self-Paced Curriculum Learning for Domain Adaptive Object Detection

Abstract:
Unsupervised domain adaptive object detection (UDA-OD) is a challenging problem since it needs to locate and recognize objects while maintaining the generalization ability across domains. Most existing UDA-OD methods directly integrate the adaptive modules into the detectors. This integration procedure can significantly sacrifice the detection performances, though it enhances the generalization ability. To solve this problem, we propose an effective framework, named foregroundness-aware task disentanglement and self-paced curriculum adaptation (FA-TDCA), to disentangle the UDA-OD task into four independent subtasks of source detector pretraining, classification adaptation, location adaptation, and target detector training. The disentanglement can transfer the knowledge effectively while maintaining the detection performance of our model. In addition, we propose a new metric, i.e., foregroundness, and use it to evaluate the confidence of the location result. We use both foregroundness and classification confidence to assess the label quality of the proposals. For effective knowledge transfer across domains, we utilize a self-paced curriculum learning paradigm to train adaptors and gradually improve the quality of the pseudolabels associated with the target samples. Experiment results indicate that our method achieves state-of-the-art results on four cross-domain object detection tasks.

PaperID: 1214,

Authors: Xin Han, Xing He, Xingxing Ju, Hangjun Che, Tingwen Huang

Affiliations: Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, College of Electronics and Information Engineering, Southwest University, Chongqing, China; College of Mathematics, Sichuan University of Arts and Science, Sichuan, Dazhou, China; College of Electronics and Information Engineering, Sichuan University, Chengdu, Sichuan, China; Department of Mathematics, Texas A&M University at Qatar, Doha, Qatar

Title: Distributed Neurodynamic Models for Solving a Class of System of Nonlinear Equations

Abstract:
This article investigates a class of systems of nonlinear equations (SNEs). Three distributed neurodynamic models (DNMs), namely a two-layer model (DNM-I) and two single-layer models (DNM-II and DNM-III), are proposed to search for such a system’s exact solution or a solution in the sense of least-squares. Combining a dynamic positive definite matrix with the primal–dual method, DNM-I is designed and it is proved to be globally convergent. To obtain a concise model, based on the dynamic positive definite matrix, time-varying gain, and activation function, DNM-II is developed and it enjoys global convergence. To inherit DNM-II’s concise structure and improved convergence, DNM-III is proposed with the aid of time-varying gain and activation function, and this model possesses global fixed-time consensus and convergence. For the smooth case, DNM-III’s globally exponential convergence is demonstrated under the Polyak–Łojasiewicz (PL) condition. Moreover, for the nonsmooth case, DNM-III’s globally finite-time convergence is proved under the Kurdyka–Łojasiewicz (KL) condition. Finally, the proposed DNMs are applied to tackle quadratic programming (QP), and some numerical examples are provided to illustrate the effectiveness and advantages of the proposed models.

PaperID: 1215,

Authors: Ye Liu, Huifang Li, Chao Hu, Shuang Luo, Yan Luo, Chang Wen Chen

Affiliations: School of Resource and Environmental Sciences, Wuhan University, Wuhan, China; Changjiang Spatial Information Technology Engineering Company Ltd., Wuhan, China; Department of Computing, The Hong Kong Polytechnic University, Hong Kong, SAR, China

Title: Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images

Abstract:
The task of instance segmentation in remote sensing images, aiming at performing per-pixel labeling of objects at the instance level, is of great importance for various civil applications. Despite previous successes, most existing instance segmentation methods designed for natural images encounter sharp performance degradations when they are directly applied to top-view remote sensing images. Through careful analysis, we observe that the challenges mainly come from the lack of discriminative object features due to severe scale variations, low contrasts, and clustered distributions. In order to address these problems, a novel context aggregation network (CATNet) is proposed to improve the feature extraction process. The proposed model exploits three lightweight plug-and-play modules, namely, dense feature pyramid network (DenseFPN), spatial context pyramid (SCP), and hierarchical region of interest extractor (HRoIE), to aggregate global visual context at feature, spatial, and instance domains, respectively. DenseFPN is a multi-scale feature propagation module that establishes more flexible information flows by adopting interlevel residual connections, cross-level dense connections, and feature reweighting strategy. Leveraging the attention mechanism, SCP further augments the features by aggregating global spatial context into local regions. For each instance, HRoIE adaptively generates RoI features for different downstream tasks. Extensive evaluations of the proposed scheme on iSAID, DIOR, NWPU VHR-10, and HRSID datasets demonstrate that the proposed approach outperforms state-of-the-arts under similar computational costs. Source code and pretrained models are available at https://github.com/yeliudev/CATNet.

PaperID: 1216,

Authors: Peng Wei, Han-Xiong Li

Affiliations: School of Automation, Wuhan University of Technology, Wuhan, China; Department of Systems Engineering, City University of Hong Kong, Hong Kong, China

Title: Spatiotemporal Transformation-Based Neural Network With Interpretable Structure for Modeling Distributed Parameter Systems

Abstract:
Many industrial processes can be described by distributed parameter systems (DPSs) governed by partial differential equations (PDEs). In this research, a spatiotemporal network is proposed for DPS modeling without any process knowledge. Since traditional linear modeling methods may not work well for nonlinear DPSs, the proposed method considers the nonlinear space-time separation, which is transformed into a Lagrange dual optimization problem under the orthogonal constraint. The optimization problem can be solved by the proposed neural network with good structural interpretability. The spatial construction method is employed to derive the continuous spatial basis functions (SBFs) based on the discrete spatial features. The nonlinear temporal model is derived by the Gaussian process regression (GPR). Benefiting from spatial construction and GPR, the proposed method enables spatially continuous modeling and provides a reliable output range under the given confidence level. Experiments on a catalytic reaction process and a battery thermal process demonstrate the effectiveness and superiority of the proposed method.

PaperID: 1217,

Authors: Lingzhi Zhang, Lei Xie, Yi Jiang, Zhishan Li, Xueqin Amy Liu, Hongye Su

Affiliations: State Key Laboratory of Industrial Control Technology and the Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China; Department of Biomedical Engineering, City University of Hong Kong, Hong Kong, China; School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, Belfast, U.K.

Title: Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning

Abstract:
The state and input constraints of nonlinear systems could greatly impede the realization of their optimal control when using reinforcement learning (RL)-based approaches since the commonly used quadratic utility functions cannot meet the requirements of solving constrained optimization problems. This article develops a novel optimal control approach for constrained discrete-time (DT) nonlinear systems based on safe RL. Specifically, a barrier function (BF) is introduced and incorporated with the value function to help transform a constrained optimization problem into an unconstrained one. Meanwhile, the minimum of such an optimization problem can be guaranteed to occur at the origin. Then a constrained policy iteration (PI) algorithm is developed to realize the optimal control of the nonlinear system and to enable the state and input constraints to be satisfied. The constrained optimal control policy and its corresponding value function are derived through the implementation of two neural networks (NNs). Performance analysis shows that the proposed control approach still retains the convergence and optimality properties of the traditional PI algorithm. Simulation results of three examples reveal its effectiveness.

PaperID: 1218,

Authors: Vandan Gorade, Azad Singh, Deepak Mishra

Affiliations: Department of Computer Science, University of Pune, Pune, India; Computer Science and Engineering Department, Indian Institute of Technology Jodhpur, Jodhpur, India

Title: Large Scale Time-Series Representation Learning via Simultaneous Low- and High-Frequency Feature Bootstrapping

Abstract:
Learning representations from unlabeled time series data is a challenging problem. Most existing self-supervised and unsupervised approaches in the time-series domain fall short in capturing low- and high-frequency features at the same time. As a result, the generalization ability of the learned representations remains limited. Furthermore, some of these methods employ large-scale models like transformers or rely on computationally expensive techniques such as contrastive learning. To tackle these problems, we propose a noncontrastive self-supervised learning (SSL) approach that efficiently captures low- and high-frequency features in a cost-effective manner. The proposed framework comprises a Siamese configuration of a deep neural network with two weight-sharing branches which are followed by low- and high-frequency feature extraction modules. The two branches of the proposed network allow bootstrapping of the latent representation by taking two different augmented views of raw time series data as input. The augmented views are created by applying random transformations sampled from a single set of augmentations. The low- and high-frequency feature extraction modules of the proposed network contain a combination of multilayer perceptron (MLP) and temporal convolutional network (TCN) heads, respectively, which capture the temporal dependencies from the raw input data at various scales due to the varying receptive fields. To demonstrate the robustness of our model, we performed extensive experiments and ablation studies on five real-world time-series datasets. Our method achieves state-of-art performance on all the considered datasets.

PaperID: 1219,

Authors: Hasita Veluri, Umesh Chand, Chun-Kuei Chen, Aaron Voon-Yew Thean

Affiliations: Department of Electrical and Computer Engineering, National University of Singapore, Queenstown, Singapore

Title: A Low-Latency DNN Accelerator Enabled by DFT-Based Convolution Execution Within Crossbar Arrays

Abstract:
Analog resistive random access memory (RRAM) devices enable parallelized nonvolatile in-memory vector-matrix multiplications for neural networks eliminating the bottlenecks posed by von Neumann architecture. While using RRAMs improves the accelerator performance and enables their deployment at the edge, the high tuning time needed to update the RRAM conductance states adds significant burden and latency to real-time system training. In this article, we develop an in-memory discrete Fourier transform (DFT)-based convolution methodology to reduce system latency and input regeneration. By storing the static DFT/inverse DFT (IDFT) coefficients within the analog arrays, we keep digital computational operations using digital circuits to a minimum. By performing the convolution in reciprocal Fourier space, our approach minimizes connection weight updates, which significantly accelerates both neural network training and interference. Moreover, by minimizing RRAM conductance update frequency, we mitigate the endurance limitations of resistive nonvolatile memories. We show that by leveraging the symmetry and linearity of DFT/IDFTs, we can reduce the power by 1.57× for convolution over conventional execution. The designed hardware-aware deep neural network (DNN) inference accelerator enhances the peak power efficiency by 28.02× and area efficiency by 8.7× over state-of-the-art accelerators. This article paves the way for ultrafast, low-power, compact hardware accelerators.

PaperID: 1220,

Authors: Liao Zhu, Qinglai Wei, Ping Guo

Affiliations: International Academic Center of Complex Systems, Beijing Normal University, Zhuhai, China; State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China

Title: Synergetic Learning Neuro-Control for Unknown Affine Nonlinear Systems With Asymptotic Stability Guarantees

Abstract:
For completely unknown affine nonlinear systems, in this article, a synergetic learning algorithm (SLA) is developed to learn an optimal control. Unlike the conventional Hamilton–Jacobi–Bellman equation (HJBE) with system dynamics, a model-free HJBE (MF-HJBE) is deduced by means of off-policy reinforcement learning (RL). Specifically, the equivalence between HJBE and MF-HJBE is first bridged from the perspective of the uniqueness of the solution of the HJBE. Furthermore, it is proven that once the solution of MF-HJBE exists, its corresponding control input renders the system asymptotically stable and optimizes the cost function. To solve the MF-HJBE, the two agents composing the synergetic learning (SL) system, the critic agent and the actor agent, can evolve in real-time using only the system state data. By building an experience reply (ER)-based learning rule, it is proven that when the critic agent evolves toward the optimal cost function, the actor agent not only evolves toward the optimal control, but also guarantees the asymptotic stability of the system. Finally, simulations of the F16 aircraft system and the Van der Pol oscillator are conducted and the results support the feasibility of the developed SLA.

PaperID: 1221,

Authors: Baiyang He, Ying Meng, Lixin Tang

Affiliations: National Frontiers Science Center for Industrial Intelligence and Systems Optimization and the Liaoning Engineering Laboratory of Data Analytics and Optimization for Smart Industry, Northeastern University, Shenyang, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization and the Key Laboratory of Data Analytics and Optimization for Smart Industry, Ministry of Education, Northeastern University, Shenyang, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Northeastern University, Shenyang, China

Title: An Off-Policy Reinforcement Learning-Based Adaptive Optimization Method for Dynamic Resource Allocation Problem

Abstract:
In this article, an adaptive optimization method is proposed for the dynamic resource allocation problem (RAP) with multiple objectives in the manufacturing industry. In the proposed method, a novel reinforcement learning method (DSAC-ERCE) is designed to adaptively set the weights for multiple objectives, and then the optimization method is adopted to generate the noninferior solutions in each time period. To ensure DSAC-ERCE’s performance in dynamic and complex resource allocation environments, we develop a state-encoding network with a proposed information entropy attention mechanism to encode the state. Then, we introduce a new reward function to escape from the local optima of the policy and further present a conditional entropy policy to enhance the policy network. In addition, we demonstrate the feasibility of improving the quality of actions and present a boundary method for high-quality actions. We also introduce an optimization model to automatically adjust the temperature parameter in DSAC-ERCE. Furthermore, we compare and analyze our approach with other state-of-the-art reinforcement learning methods. The experiments illustrate that DSAC-ERCE outperforms state-of-the-art reinforcement learning methods. Moreover, DSAC-ERCE can be generalized to solve optimization problems with two to five objectives, problems with linear, quadratic, cubic, logarithmic, or inverse objectives, and problems with diverse structures.

PaperID: 1222,

Authors: Xuexiong Luo, Jia Wu, Jian Yang, Hongyang Chen, Zhao Li, Hao Peng, Chuan Zhou

Affiliations: School of Computing, Macquarie University, Sydney, NSW, Australia; Research Center for Graph Computing, Zhejiang Laboratory, Hangzhou, China; School of Cyber Science and Technology, Beihang University, Beijing, China; Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China

Title: Knowledge Distillation Guided Interpretable Brain Subgraph Neural Networks for Brain Disorder Exploration

Abstract:
The human brain is a highly complex neurological system that has been the subject of continuous exploration by scientists. With the help of modern neuroimaging techniques, there has been significant progress made in brain disorder analysis. There is an increasing interest about utilizing artificial intelligence techniques to improve the efficiency of disorder diagnosis in recent years. However, these methods rely only on neuroimaging data for disorder diagnosis and do not explore the pathogenic mechanism behind the disorder or provide an interpretable result toward the diagnosis decision. Furthermore, the scarcity of medical data limits the performance of existing methods. As the hot application of graph neural networks (GNNs) in molecular graphs and drug discovery due to its strong graph-structured data learning ability, whether GNNs can also play a huge role in the field of brain disorder analysis. Thus, in this work, we innovatively model brain neuroimaging data into graph-structured data and propose knowledge distillation (KD) guided brain subgraph neural networks to extract discriminative subgraphs between patient and healthy brain graphs to explain which brain regions and abnormal functional connectivities cause the disorder. Specifically, we introduce the KD technique to transfer the knowledge of pretrained teacher model to guide brain subgraph neural networks training and alleviate the problem of insufficient training data. And these discriminative subgraphs are conducive to learn better brain graph-level representations for disorder prediction. We conduct abundant experiments on two functional magnetic resonance imaging datasets, i.e., Parkinson’s disease (PD) and attention-deficit/hyperactivity disorder (ADHD), and experimental results well demonstrate the superiority of our method over other brain graph analysis methods for disorder prediction accuracy. The interpretable experimental results given by our method are consistent with corresponding medical research, which is encouraging to provide a potential for deeper brain disorder study.

PaperID: 1223,

Authors: Hejia Qiu, Chao Li, Ying Weng, Zhun Sun, Qibin Zhao

Affiliations: School of Computer Science, University of Nottingham Ningbo China, Ningbo, China; RIKEN AIP, Tokyo, Japan; Graduate School of Information Sciences, Tohoku University, Sendai, Japan

Title: Fractional Tensor Recurrent Unit (fTRU): A Stable Forecasting Model With Long Memory

Abstract:
The tensor recurrent model is a family of nonlinear dynamical systems, of which the recurrence relation consists of a p -fold (called degree- p ) tensor product. Despite such models frequently appearing in advanced recurrent neural networks (RNNs), to this date, there are limited studies on their long memory properties and stability in sequence tasks. In this article, we propose a fractional tensor recurrent model, where the tensor degree p is extended from the discrete domain to the continuous domain, so it is effectively learnable from various datasets. Theoretically, we prove that a large degree p is essential to achieve the long memory effect in a tensor recurrent model, yet it could lead to unstable dynamical behaviors. Hence, our new model, named fractional tensor recurrent unit (fTRU), is expected to seek the saddle point between long memory property and model stability during the training. We experimentally show that the proposed model achieves competitive performance with a long memory and stable manners in several forecasting tasks compared to various advanced RNNs.

PaperID: 1224,

Authors: Shizhe Hu, Chengkun Zhang, Guoliang Zou, Zhengzheng Lou, Yangdong Ye

Affiliations: School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China

Title: Deep Multiview Clustering by Pseudo-Label Guided Contrastive Learning and Dual Correlation Learning

Abstract:
Deep multiview clustering (MVC) is to learn and utilize the rich relations across different views to enhance the clustering performance under a human-designed deep network. However, most existing deep MVCs meet two challenges. First, most current deep contrastive MVCs usually select the same instance across views as positive pairs and the remaining instances as negative pairs, which always leads to inaccurate contrastive learning (CL). Second, most deep MVCs only consider learning feature or cluster correlations across views, failing to explore the dual correlations. To tackle the above challenges, in this article, we propose a novel deep MVC framework by pseudo-label guided CL and dual correlation learning. Specifically, a novel pseudo-label guided CL mechanism is designed by using the pseudo-labels in each iteration to help removing false negative sample pairs, so that the CL for the feature distribution alignment can be more accurate, thus benefiting the discriminative feature learning. Different from most deep MVCs learning only one kind of correlation, we investigate both the feature and cluster correlations among views to discover the rich and comprehensive relations. Experiments on various datasets demonstrate the superiority of our method over many state-of-the-art compared deep MVCs. The source implementation code will be provided at https://github.com/ShizheHu/Deep-MVC-PGCL-DCL.

PaperID: 1225,

Authors: Peng Wan, Zhigang Zeng

Title: Convergence-Rate-Based Event-Triggered Mechanisms for Quasi-Synchronization of Delayed Nonlinear Systems on Time Scales

Abstract:
Most of the existing event-triggered mechanisms (ETMs) were designed according to the difference between the quadratic form of measurement errors and the quadratic form of sampling states (or real-time states). In order to reduce the amount of data transmission and develop ETMs for continuous-time and discrete-time delayed nonlinear systems (NSs) simultaneously, this article investigates quasi-synchronization (QS) of NSs on time scales based on a novel ETM, which is designed according to the convergence rate instead of measurement errors of the addressed systems. First, a novel ETM is designed under known nonlinear dynamics, and it is demonstrated that QS with given convergence rate and error level can be achieved under matrix inequality criteria. Second, if the nonlinear functions are unknown, we adapt our ETM to handle this special case. Not only QS but also complete synchronization with given convergence rate can be achieved under the ETMs. If the constructed Lyapunov functions passes through 0, the designed ETM will keep it at the origin. In this case, finite-time synchronization is achieved. Third, under the designed ETMs, it is proved that Zeno behavior can be excluded. At last, four numerical simulations are presented to demonstrate the feasibility and the advantage of the designed ETMs in this article.

PaperID: 1226,

Authors: Wei Yu, Ning Yang, Zhijiong Wang, Hung Chun Li, Anguo Zhang, Chaoxu Mu, Sio-Hang Pun

Affiliations: Chongqing Key Laboratory of Autonomous Systems and the School of Automation, Chongqing University, Chongqing, China; State Key Laboratory of Analog and Mixed-Signal VLSI, Institute of Microelectronics, University of Macau, Macau, China; Joint Laboratory, Zhuhai UM Science and Technology Research Institute—Lingyange Semiconductor Inc., Zhuhai, China; School of Artificial Intelligence, the Research Center of Autonomous Unmanned System Technology, Ministry of Education, and the Anhui Provincial Engineering Research Center for Unmanned System and Intelligent Technology, Anhui University, Hefei, China; School of Electrical and Information Engineering, Tianjin University, Tianjin, China

Title: Fault-Tolerant Attitude Tracking Control Driven by Spiking NNs for Unmanned Aerial Vehicles

Abstract:
In this article, we proposed a novel fault-tolerant control scheme for quadrotor unmanned aerial vehicles (UAVs) based on spiking neural networks (SNNs), which leverages the inherent features of neural network computing to significantly enhance the reliability and robustness of UAV flight control. Traditional control methods are known to be inadequate in dealing with complex and real-time sensor data, which results in poor performance and reduced robustness in fault-tolerant control. In contrast, the temporal processing, parallelism, and nonlinear capacity of SNNs enable the fault-tolerant control scheme to process vast amounts of sensory data with the ability to accurately identify and respond to faults. Furthermore, SNNs can learn and adjust to new environments and fault conditions, providing effective and adaptive flight control. The proposed SNN-based fault-tolerant control scheme demonstrates significant improvements in control accuracy and robustness compared with conventional methods, indicating its potential applicability and suitability for a range of UAV flight control scenarios.

PaperID: 1227,

Authors: Chenlin Zhang, Shijun Lin, Hao Wang, Ziyang Chen, Shaochen Wang, Zhen Kan

Affiliations: Department of Automation, University of Science and Technology of China, Hefei, Anhui, China

Title: Data-Driven Safe Policy Optimization for Black-Box Dynamical Systems With Temporal Logic Specifications

Abstract:
Learning-based policy optimization methods have shown great potential for building general-purpose control systems. However, existing methods still struggle to achieve complex task objectives while ensuring policy safety during learning and execution phases for black-box systems. To address these challenges, we develop data-driven safe policy optimization (D2SPO), a novel reinforcement learning (RL)-based policy improvement method that jointly learns a control barrier function (CBF) for system safety and a linear temporal logic (LTL) guided RL algorithm for complex task objectives. Unlike many existing works that assume known system dynamics, by carefully constructing the data sets and redesigning the loss functions of D2SPO, a provably safe CBF is learned for black-box dynamical systems, which continuously evolves for improved system safety as RL interacts with the environment. To deal with complex task objectives, we take advantage of the capability of LTL in representing the task progress and develop LTL-guided RL policy for efficient completion of various tasks with LTL objectives. Extensive numerical and experimental studies demonstrate that D2SPO outperforms most state-of-the-art (SOTA) baselines and can achieve over 95% safety rate and nearly 100% task completion rates. The experiment video is available at https://youtu.be/2RgaH-zcmkY.

PaperID: 1228,

Authors: Shengju Yu, Suyuan Liu, Siwei Wang, Chang Tang, Zhigang Luo, Xinwang Liu, En Zhu

Affiliations: School of Computer, National University of Defense Technology, Changsha, China; Intelligent Game and Decision Laboratory, Beijing, China; School of Computer Science, China University of Geosciences, Wuhan, China

Title: Sparse Low-Rank Multi-View Subspace Clustering With Consensus Anchors and Unified Bipartite Graph

Abstract:
Anchor technology is popularly employed in multi-view subspace clustering (MVSC) to reduce the complexity cost. However, due to the sampling operation being performed on each individual view independently and not considering the distribution of samples in all views, the produced anchors are usually slightly distinguishable, failing to characterize the whole data. Moreover, it is necessary to fuse multiple separated graphs into one, which leads to the final clustering performance heavily subject to the fusion algorithm adopted. What is worse, existing MVSC methods generate dense bipartite graphs, where each sample is associated with all anchor candidates. We argue that this dense-connected mechanism will fail to capture the essential local structures and degrade the discrimination of samples belonging to the respective near anchor clusters. To alleviate these issues, we devise a clustering framework named SL-CAUBG. Specifically, we do not utilize sampling strategy but optimize to generate the consensus anchors within all views so as to explore the information between different views. Based on the consensus anchors, we skip the fusion stage and directly construct the unified bipartite graph across views. Most importantly, \ell _1 norm and Laplacian-rank constraints employed on the unified bipartite graph make it capture both local and global structures simultaneously. \ell _1 norm helps eliminate the scatters between anchors and samples by constructing sparse links and guarantees our graph to be with clear anchor-sample affinity relationship. Laplacian-rank helps extract the global characteristics by measuring the connectivity of unified bipartite graph. To deal with the nondifferentiable objective function caused by \ell _1 norm, we adopt an iterative re-weighted method and the Newton’s method. To handle the nonconvex Laplacian-rank, we equivalently transform it as a convex trace constraint. We also devise a four-step alternate method with linear complexity to solve the resultant problem. Substantial experiments show the superiority of our SL-CAUBG.

PaperID: 1229,

Authors: Weiling Li, Renfang Wang, Xin Luo

Affiliations: School of Computer Science and Technology, Dongguan University of Technology, Dongguan, China; College of Big Data and Software Engineering, Zhejiang Wanli University, Ningbo, China; College of Computer and Information Science, Southwest University, Chongqing, China

Title: A Generalized Nesterov-Accelerated Second-Order Latent Factor Model for High-Dimensional and Incomplete Data

Abstract:
High-dimensional and incomplete (HDI) data are frequently encountered in big date-related applications for describing restricted observed interactions among large node sets. How to perform accurate and efficient representation learning on such HDI data is a hot yet thorny issue. A latent factor (LF) model has proven to be efficient in addressing it. However, the objective function of an LF model is nonconvex. Commonly adopted first-order methods cannot approach its second-order stationary point, thereby resulting in accuracy loss. On the other hand, traditional second-order methods are impractical for LF models since they suffer from high computational costs due to the required operations on the objective’s huge Hessian matrix. In order to address this issue, this study proposes a generalized Nesterov-accelerated second-order LF (GNSLF) model that integrates twofold conceptions: 1) acquiring proper second-order step efficiently by adopting a Hessian-vector algorithm and 2) embedding the second-order step into a generalized Nesterov’s acceleration (GNA) method for speeding up its linear search process. The analysis focuses on the local convergence for GNSLF’s nonconvex cost function instead of the global convergence has been taken; its local convergence properties have been provided with theoretical proofs. Experimental results on six HDI data cases demonstrate that GNSLF performs better than state-of-the-art LF models in accuracy for missing data estimation with high efficiency, i.e., a second-order model can be accelerated by incorporating GNA without accuracy loss.

PaperID: 1230,

Authors: Liu Liu, Xuanqing Liu, Cho-Jui Hsieh, Dacheng Tao

Affiliations: Institute of Artificial Intelligence and the State Key Laboratory of Software Development Environment, Beihang University, Beijing, China; Amazon Web Services (AWS), Seattle, WA, USA; Department of Computer Science, University of California at Los Angeles, Los Angeles, CA, USA; School of Computer Science, Sydney AI Centre, Faculty of Engineering, The University of Sydney, Sydney, NSW, Australia

Title: Stochastic Optimization for Nonconvex Problem With Inexact Hessian Matrix, Gradient, and Function

Abstract:
Trust region (TR) and adaptive regularization using cubics (ARC) have proven to have some very appealing theoretical properties for nonconvex optimization by concurrently computing function value, gradient, and Hessian matrix to obtain the next search direction and the adjusted parameters. Although stochastic approximations help largely reduce the computational cost, it is challenging to theoretically guarantee the convergence rate. In this article, we explore a family of stochastic TR (STR) and stochastic ARC (SARC) methods that can simultaneously provide inexact computations of the Hessian matrix, gradient, and function values. Our algorithms require much fewer propagations overhead per iteration than TR and ARC. We prove that the iteration complexity to achieve \epsilon -approximate second-order optimality is of the same order as the exact computations demonstrated in previous studies. In addition, the mild conditions on inexactness can be met by leveraging a random sampling technology in the finite-sum minimization problem. Numerical experiments with a nonconvex problem support these findings and demonstrate that, with the same or a similar number of iterations, our algorithms require less computational overhead per iteration than current second-order methods.

PaperID: 1231,

Authors: Shuangming Yang, Badong Chen

Affiliations: School of Electrical and Information Engineering, Tianjin University, Tianjin, China; Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, China

Title: Effective Surrogate Gradient Learning With High-Order Information Bottleneck for Spike-Based Machine Intelligence

Abstract:
Brain-inspired computing technique presents a promising approach to prompt the rapid development of artificial general intelligence (AGI). As one of the most critical aspects, spiking neural networks (SNNs) have demonstrated superiority for AGI, such as low power consumption. Effective training of SNNs with high generalization ability, high robustness, and low power consumption simultaneously is a significantly challenging problem for the development and success of applications of spike-based machine intelligence. In this research, we present a novel and flexible learning framework termed high-order spike-based information bottleneck (HOSIB) leveraging the surrogate gradient technique. The presented HOSIB framework, including second-order and third-order formation, i.e., second-order information bottleneck (SOIB) and third-order information bottleneck (TOIB), comprehensively explores the common latent architecture and the spike-based intrinsic information and discards the superfluous information in the data, which improves the generalization capability and robustness of SNN models. Specifically, HOSIB relies on the information bottleneck (IB) principle to prompt the sparse spike-based information representation and flexibly balance its exploitation and loss. Extensive classification experiments are conducted to empirically show the promising generalization ability of HOSIB. Furthermore, we apply the SOIB and TOIB algorithms in deep spiking convolutional networks to demonstrate their improvement in robustness with various categories of noise. The experimental results prove the HOSIB framework, especially TOIB, can achieve better generalization ability, robustness and power efficiency in comparison with the current representative studies.

PaperID: 1232,

Authors: Sheng Yu, Di-Hua Zhai, Yuyin Guan, Yuanqing Xia

Affiliations: School of Automation, Beijing Institute of Technology, Beijing, China; Beijing Building Materials Academy of Science Research, Beijing, China

Title: Category-Level 6-D Object Pose Estimation With Shape Deformation for Robotic Grasp Detection

Abstract:
Category-level 6-D object pose estimation plays a crucial role in achieving reliable robotic grasp detection. However, the disparity between synthetic and real datasets hinders the direct transfer of models trained on synthetic data to real-world scenarios, leading to ineffective results. Additionally, creating large-scale real datasets is a time-consuming and labor-intensive task. To overcome these challenges, we propose CatDeform, a novel category-level object pose estimation network trained on synthetic data but capable of delivering good performance on real datasets. In our approach, we introduce a transformer-based fusion module that enables the network to leverage multiple sources of information and enhance prediction accuracy through feature fusion. To ensure proper deformation of the prior point cloud to align with scene objects, we propose a transformer-based attention module that deforms the prior point cloud from both geometric and feature perspectives. Building upon CatDeform, we design a two-branch network for supervised learning, bridging the gap between synthetic and real datasets and achieving high-precision pose estimation in real-world scenes using predominantly synthetic data supplemented with a small amount of real data. To minimize reliance on large-scale real datasets, we train the network in a self-supervised manner by estimating object poses in real scenes based on the synthetic dataset without manual annotation. We conduct training and testing on CAMERA25 and REAL275 datasets, and our experimental results demonstrate that the proposed method outperforms state-of-the-art (SOTA) techniques in both self-supervised and supervised training paradigms. Finally, we apply CatDeform to object pose estimation and robotic grasp experiments in real-world scenarios, showcasing a higher grasp success rate.

PaperID: 1233,

Authors: Qinwei Fan, Shuai Zhao, Jacek M. Zurada, Tingwen Huang, Xiaolong Qin, Rui Zhang

Affiliations: School of Mathematics and Information Science, Guangzhou University, Guangzhou, China; School of Mathematics and Statistics, Xidian University, Xi’an, China; Department of Electrical and Computer Engineering, University of Louisville, Louisville, KY, USA; Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, China; Department of Mathematics, Hangzhou Normal University, Hangzhou, China; Medical Big Data Research Center, Northwest University, Xi’an, China

Title: Bidirectional Multiscale Efficient Dilated Convolutional Recurrent Neural Network Improved by Swarm Intelligence Optimization

Abstract:
In recent years, bidirectional convolutional recurrent neural networks (RNNs) have made significant breakthroughs in addressing a wide range of challenging problems related to time series and prediction applications. However, the performance of the models is highly dependent on the hyperparameters chosen. Hence, we propose an automatic method for hyperparameter optimization and apply a bidirectional convolutional RNN based on the improved swarm intelligence optimization (sparrow search) to solve regression prediction problems. Specifically, a parallel multiscale dilated convolution (PMDC) module was designed to capture both local and global spatial correlations. This method utilizes convolution with different dilation rates to expand the receptive field without increasing the complexity of the model. Meanwhile, it integrates parallel multiscale structures to extract features at different scales and enhance the model’s understanding of the input data. Then, the bidirectional gated recurrent units (BGRUs) learn temporal information from the convolutional features. To address the limitations of empirical hyperparameter selection, such as slow training and low efficiency, a novel PMDC-BGRU model integrated with a pretrained sparrow search algorithm (SSA) was proposed for hyperparameter optimization. Finally, experiments on multiple datasets verified the superiority of the algorithm and explained the flexibility of intelligent optimization algorithms in solving model parameter optimization.

PaperID: 1234,

Authors: Benke Gao, Hao Chen, Quan Liu, Hanqiang Deng, Jian Huang, Yan-Jun Liu

Affiliations: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, Hunan, China; College of Science, Liaoning University of Technology, Jinzhou, Liaoning, China

Title: CRL: An Efficient Autonomous Exploration Framework for Large-Scale Environments With Contrastive-Driven Reinforcement Learning

Abstract:
Autonomous exploration in large-scale environments is impeded by two critical challenges, namely, suboptimal viewpoint selection resulting from inadequate feature extraction and the continuously rising computational costs as the environment expands. Existing methods struggle to simultaneously tackle these dual challenges within cohesive frameworks. In response, we present an efficient autonomous exploration framework with contrastive-driven reinforcement learning. Inspired by human cognitive mechanisms that reinforce crucial information recognition through contrast, our study implements contrastive constraints on nodes of varying utility levels within high-dimensional feature spaces, achieving a decoupling of their latent representations. This capability empowers decision networks to explicitly capture key regional characteristics, thereby enhancing the precision of optimal viewpoint selection. Moreover, to mitigate the issues of backtracking and redundant exploration, we design specialized training rules that enforce effective action constraints, further enhancing viewpoint selection. Additionally, we propose a novel graph rarefaction algorithm to tackle computational costs, simplifying computational complexities while maintaining performance standards. Compared to the state-of-the-art (SOTA) approaches, our method achieves 6.7% shorter path lengths, while also demonstrates robust generalization capabilities through real-world robotic experiments across multiple real-world scenarios.

PaperID: 1235,

Authors: Bing Wang, Tianjing Wang, Yong Tang, Yanhao Huang

Affiliations: Electric Power System Department, China Electric Power Research Institute, Beijing, China; Department of Electrical Engineering, City University of Hong Kong, Kowloon Tong, Hong Kong

Title: Knowledge-GPT Guided Generalizable Reinforcement Learning for Intelligent Emergency Generator Tripping in Power System

Abstract:
Emergency control is essential for ensuring transient stability in power systems after faults. This study addresses the limitations in existing methods by proposing a knowledge-generative pretrained transformer (GPT)-guided generalizable reinforcement learning (RL) approach for intelligent emergency generator tripping. This approach incorporates general electrical principles and knowledge-GPT to assist deep reinforcement learning (DRL). The general electrical principles involve identifying severely disturbed generators and selecting appropriate control actions through dynamic probability. The knowledge-GPT model extracts insights from an expert strategy knowledge base, reshaping the DRL reward structure by comparing the DRL strategy with the knowledge-GPT outputs. This paradigm is designed to leverage electrical laws and domain expertise to guide the DRL training process, thereby enhancing both training efficiency and electrical consistency. To enhance generalization capability under topological changes, message passing neural networks (NNs) are integrated into the DRL architecture, effectively simulating power flow dynamics in transmission lines. The proposed method is validated through simulations on the IEEE 39-bus system and the Northeast power grid of China, demonstrating superior control effectiveness and adaptability compared to existing approaches, offering a more robust solution for emergency control in complex power systems.

PaperID: 1236,

Authors: Daizong Liu, Mingyu Yang, Xiaoye Qu, Pan Zhou, Yu Cheng, Wei Hu

Affiliations: Wangxuan Institute of Computer Technology, Peking University, Beijing, China; Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan, China; Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong

Title: A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

Abstract:
With the significant development of large models in recent years, large vision–language models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. Compared with traditional large language models (LLMs), LVLMs present great potential and challenges due to their closer proximity to the multiresource real-world applications and the complexity of multimodal processing. However, the vulnerability of LVLMs is relatively underexplored, posing potential security risks in the daily use of LVLM applications. In this article, we provide a comprehensive review of the various forms of existing LVLM attacks. Specifically, we first introduce the background of attacks targeting LVLMs, including the attack preliminary, attack challenges, and attack resources. Then, we systematically review the development of LVLM attack methods, such as adversarial attacks that manipulate model outputs, jailbreak attacks that exploit model vulnerabilities for unauthorized actions, prompt injection attacks that engineer the prompt type and pattern, and data poisoning that affects model training. Finally, we discuss promising future research directions in LVLM attacks. We believe that our survey provides insights into the current landscape of LVLM vulnerabilities, inspiring more researchers to explore and mitigate potential safety issues in LVLM developments.

PaperID: 1237,

Authors: Rui Wang, Shaocheng Jin, Zhenyu Cai, Ziheng Chen, Xiao-jun Wu, Josef Kittler

Affiliations: School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China; Department of Information Engineering and Computer Science, University of Trento, Trento, Italy

Title: Learning a Better SPD Network for Signal Classification: A Riemannian Batch Normalization Method

Abstract:
Symmetric positive definite (SPD) matrices have been widely used as Riemannian feature descriptors in various scientific fields, due to their capacity to encode effective manifold-valued representations. Inspired by the architectural principles of Euclidean deep learning, the emerging SPD neural networks have achieved more robust signal classification. Among these advancements, Riemannian batch normalization (RBN) based on the affine-invariant Riemannian metric (AIRM) has emerged as a key technique for enhancing the learning capability of SPD-based networks. Nevertheless, the reliance of singular value decomposition (SVD) makes this metric relatively unstable for the computation of SPD matrices, especially for the ill-conditioned case. To address this limitation, we propose a novel RBN algorithm based on the recently introduced log-Cholesky metric (LCM), which leverages Cholesky decomposition. Unlike AIRM, the LCM offers enhanced numerical stability and allows for more efficient computation. Specifically, the LCM-based Riemannian operators such as Fr \acute \mathrm e chet mean and parallel transport (PT) are much simpler than those of AIRM, and both have closed forms. Besides, since LCM is the pullback metric from the Cholesky manifold via Cholesky decomposition, the LCM-based RBN on the SPD manifold can be computed in the Cholesky manifold, further boosting the efficiency. Extensive experiments conducted on four benchmarking datasets certify the effectiveness of our proposed algorithm. The source code is now available at: https://github.com/jjscc/CBN.git.

PaperID: 1238,

Authors: Fuxiang Zhang, Junyou Li, Yi-Chen Li, Zongzhang Zhang, Yang Yu, Deheng Ye

Affiliations: National Key Laboratory for Novel Software Technology and School of Artificial Intelligence, Nanjing University, Nanjing, China; Tencent, Shenzhen, China

Title: Improving Sample Efficiency of Reinforcement Learning With Background Knowledge From Large Language Models

Abstract:
Low sample efficiency is an enduring challenge of reinforcement learning (RL). With the advent of versatile large language models (LLMs), recent works impart common-sense knowledge to accelerate policy learning for RL processes. However, we note that such guidance is often tailored for one specific task but loses generalizability. In this article, we introduce a framework that harnesses LLMs to extract background knowledge of an environment, which contains general understandings of the entire environment, making various downstream RL tasks benefit from one-time knowledge representation. We ground LLMs by feeding a few precollected experiences and requesting them to delineate background knowledge of the environment. Afterward, we represent the output knowledge as potential functions for potential-based reward shaping, which has a good property for maintaining policy optimality from task rewards. We instantiate three variants to prompt LLMs for background knowledge, including writing code, annotating pReferences, and assigning goals. Our experiments show that these methods achieve significant sample efficiency improvements in a spectrum of downstream tasks from Minigrid and Crafter domains.

PaperID: 1239,

Authors: Hong-Bing Zeng, Zong-Jun Zhu, Shen-Ping Xiao, Xian-Ming Zhang

Affiliations: School of Transportation and Electrical Engineering, Hunan University of Technology, Zhuzhou, Hunan, China; School of Engineering, Swinburne University of Technology, Melbourne, VIC, Australia

Title: A Switched System Model for Exponential Stability and Dissipativity of Delayed Neural Networks

Abstract:
This article investigates the problems of exponential stability and dissipativity for neural networks with time-varying delays. To capture more information on the delay and its derivative in constructing Lyapunov–Krasovskii functionals (LKFs), the original delayed neural network (DNN) is modeled as a switching system with two modes, corresponding to cases where the delay derivative is positive or negative. This model provides extra freedom in constructing a proper LKF, allowing for the selection of different Lyapunov matrices in each mode. By applying the average dwell time (ADT) technique, several criteria for exponential stability and exponential dissipativity are obtained for DNNs. Two extensively studied benchmark examples and a quadruple-tank process control system are provided to demonstrate the superiority of the proposed criteria over some existing methods and to verify the practical applicability of the approach.

PaperID: 1240,

Authors: Hyeong-Gun Joo, Songnam Hong, Dong-Joon Shin

Affiliations: Department of Electronic Engineering, Hanyang University, Seoul, South Korea

Title: FedLSC: Improving Communication Efficiency and Robustness in Federated Learning With Stragglers and Adversaries

Abstract:
Despite significant progress in federated learning (FL), persistent challenges, such as stragglers, adversaries, and communication costs remain. To address these issues, we propose FedLSC, a novel FL framework that leverages layer-selected correlation (LSC) to enhance both robustness and efficiency. In contrast to the existing methods, FedLSC does not rely on public data during model training, making it more practical and resilient in real-world scenarios. FedLSC introduces three key innovations: 1) preprocessing of layer selection (LS), which identifies significant layers to reduce communication costs and performance degradation; 2) local updates using LS-based scaled sign-stochastic gradient descent (SSS), introducing a layer-specific scaling mechanism to mitigate performance loss from quantization and significantly reduce communication costs; and 3) model aggregation via LSC-based schemes, which enhances robustness by processing only the significant layers and mitigating the impact of stragglers and adversaries. Furthermore, integrating the SSS scheme into FedLSC reduces communication costs to as little as 0.01% of those in state-of-the-art (SOTA) method while maintaining performance. Evaluations conducted across various FL scenarios show that FedLSC effectively supports robust performance and efficiency, even in bandwidth-constrained environments, thereby confirming its practicality in modern FL applications.

PaperID: 1241,

Authors: Yifan Xu, Chao Zhang, Hanqi Jiang, Xiaoyan Wang, Ruifei Ma, Yiwei Li, Zihao Wu, Zeju Li, Xiangde Liu

Affiliations: School of Computer Science and Engineering, Beihang University, Beijing, China; Beijing Digital Native Digital City Research Center, Beijing, China; School of Computing, University of Georgia, Athens, GA, USA; School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China; Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China

Title: Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models

Abstract:
Advancements in foundation models have made it possible to conduct applications in various downstream tasks. Especially, the new era has witnessed a remarkable capability to extend large language models (LLMs) for tackling tasks of 3-D scene understanding. Current methods rely heavily on 3-D point clouds, but the 3-D point cloud reconstruction of an indoor scene often results in information loss. Some textureless planes or repetitive patterns are prone to omission and manifest as voids within the reconstructed 3-D point clouds. Besides, objects with complex structures tend to introduce distortion of details caused by misalignments between the captured images and the dense reconstructed point clouds. The 2-D multiview images present visual consistency with 3-D point clouds and provide more detailed representations of scene components, which can naturally compensate for these deficiencies. Based on these insights, we propose Argus, a novel 3-D multimodal framework that leverages multiview images for enhanced 3-D scene understanding with LLMs. In general, Argus can be treated as a 3-D large multimodal foundation model (3D-LMM) since it takes various modalities as input (text instructions, 2-D multiview images, and 3-D point clouds) and expands the capability of LLMs to tackle 3-D tasks. Argus involves fusing and integrating multiview images and camera poses into view-as-scene features, which interact with the 3-D features to create comprehensive and detailed 3-D-aware scene embeddings. Our approach compensates for the information loss while reconstructing 3-D point clouds and helps LLMs better understand the 3-D world. Extensive experiments demonstrate that our method outperforms existing 3D-LMMs in various downstream tasks.

PaperID: 1242,

Authors: Daizong Liu, Yang Liu, Wencan Huang, Wei Hu

Affiliations: Wangxuan Institute of Computer Technology, Peking University, Beijing, China

Title: A Survey on Text-Guided 3-D Visual Grounding: Elements, Recent Advances, and Future Directions

Abstract:
Text-guided 3-D visual grounding (T-3DVG), which aims to locate a specific object that semantically corresponds to a language query from a complicated 3-D scene, has drawn increasing attention in the 3-D research community over the past few years. Compared to 2-D visual grounding, this task presents great potential and challenges due to its closer proximity to the real world, the complexity of data collection, and 3-D point cloud source processing. In this survey, we attempt to provide a comprehensive overview of the T-3DVG progress, including its fundamental elements, recent research advances, and future research directions. To the best of our knowledge, this is the first systematic survey on the T-3DVG task. Specifically, we first provide a general structure of the T-3DVG pipeline with detailed components in a tutorial style, presenting a complete background overview. Then, we summarize the existing T-3DVG approaches into different categories and analyze their strengths and weaknesses. We also present the benchmark datasets and evaluation metrics to assess their performances. Finally, we discuss the potential limitations of existing T-3DVG and share some insights on several promising research directions.

PaperID: 1243,

Authors: Cunbo Li, Zehong Cao, Yue Pan, Pengcheng Zhu, Peiyang Li, Fali Li, Huafu Chen, Bao-Liang Lu, Feng Wan, Dezhong Yao, Peng Xu

Affiliations: Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China; STEM, Mawson Lakes Campus, University of South Australia, Adelaide, SA, Australia; School of Bioinfomatics, Chongqing University of Posts and Telecommunications, Chongqing, China; Department of Computer Science and Engineering, Center for Brain-Like Computing and Machine Intelligence, Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Brain Science and Technology Research Center, Qing Yuan Research Institute, Shanghai Jiao Tong University, Shanghai, China; Department of Electrical and Computer Engineering, Faculty of Science and Technology, and the Center for Cognitive and Brain Sciences and the Centre for Artificial Intelligence and Robotics, Institute of Collaborative Innovation, University of Macau, Macau, China

Title: EEG-Based Emotion Monitoring and Regulation System by Learning the Discriminative Brain Network Manifold

Abstract:
Emotion recognition based on electroencephalogram (EEG) is fundamentally associated with human-like intelligence system. However, due to the noise-sensitive characteristics of EEGs and the individual variability of emotions, it is very challenging to extract inherent emotion dependent patterns from emotional EEG signals. In this work, we propose a L1-norm space defined discriminative brain network manifold learning model (L1-SGL), in which the EEG noise outliers can be effectively separated and the pseudolabeled samples caused by subjective feelings can be automatically corrected. Off-line experimental results consistently indicate that the L1-SGL can effectively suppress the influence of noise and achieve an incomparable superiority performance over other existing methods in EEG emotion recognition. Besides, benefiting from the time efficiency of the L1-SGL, an online emotion monitoring and regulation system is further implemented in this work. On-line emotion decoding experimental results (86.30%) of 25 participants prove that the L1-SGL can effectively satisfy the real-time requirements of on-line emotional monitoring applications, and the significant negative emotion regulation experimental results ( p \lt 0.001 ) further confirm the feasibility and effectiveness of L1-SGL model in real-time emotion regulation and interactive applications. Overall, the L1-SGL provides a promising solution for the real-time online affective brain-computer interfaces (aBCIs) and the intelligent clinical closed-loop treatments.

PaperID: 1244,

Authors: Jonggwon Park, Soobum Kim, Byungmu Yoon, Jihun Hyun, Kyoyun Choi

Affiliations: DEEPNOID Inc., Seoul, South Korea

Title: M4CXR: Exploring Multitask Potentials of Multimodal Large Language Models for Chest X-Ray Interpretation

Abstract:
The rapid evolution of artificial intelligence, especially in large language models (LLMs), has significantly impacted various domains, including healthcare. In chest X-ray (CXR) analysis, previous studies have employed LLMs, but with limitations: either underutilizing the LLMs’ capability for multitask learning or lacking clinical accuracy. This article presents M4CXR, a multimodal LLM designed to enhance CXR interpretation. The model is trained on a visual instruction-following dataset that integrates various task-specific datasets in a conversational format. As a result, the model supports multiple tasks such as medical report generation (MRG), visual grounding, and visual question answering (VQA). M4CXR achieves state-of-the-art clinical accuracy in MRG by employing a chain-of-thought (CoT) prompting strategy, in which it identifies findings in CXR images and subsequently generates corresponding reports. The model is adaptable to various MRG scenarios depending on the available inputs, such as single-image, multiimage, and multistudy contexts. In addition to MRG, M4CXR performs visual grounding at a level comparable to specialized models and demonstrates outstanding performance in VQA. Both quantitative and qualitative assessments reveal M4CXR’s versatility in MRG, visual grounding, and VQA, while consistently maintaining clinical accuracy.

PaperID: 1245,

Authors: Jingxin Mao, Yu Yang, Zhiwei Wei, Yanlong Bi, Rongqing Zhang

Affiliations: School of Computer Science and Technology, Tongji University, Shanghai, China; Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA; Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University, Shanghai, China; Department of Ophthalmology, Tongji Hospital, School of Medicine, the Center for Vision Science and Translational Research and the Tongji Eye Institute, Tongji University, Shanghai, China

Title: D2Fed: Federated Semi-Supervised Learning With Dual-Role Additive Local Training and Dual-Perspective Global Aggregation

Abstract:
Federated semi-supervised learning (FSSL) has recently emerged as a promising approach for enhancing the performance of federated learning (FL) using ubiquitous unlabeled data. However, this approach encounters challenges when learning a global model using both fully labeled and fully unlabeled clients. Previous works overlook the dissimilarities between labeled and unlabeled clients, predominantly using shared parameters for local training across these two types of clients, thereby inducing intertask interference during local training. Moreover, these works typically adopt a single-perspective aggregation strategy, primarily focusing on data-volume-aware aggregation (i.e., FedAvg), leading to a lack of comprehensive consideration in model aggregation. In this article, we propose a novel FSSL method termed \text D^2\text Fed , which addresses these issues by rethinking the roles of labeled clients and unlabeled ones to mitigate intertask interference during local training and by integrating client-type-aware with data-volume-aware to provide a more comprehensive perspective for model aggregation. Specifically, in local training, our proposed \text D^2\text Fed distinguishes between the primary and accessory roles of labeled and unlabeled clients, respectively, performing dual-role additive local training (DALT) accordingly. In global aggregation, \text D^2\text Fed uses a dual-perspective global aggregation (DGA) strategy, transitioning from data-volume-aware aggregation to client-type-aware aggregation. The proposed method simultaneously improves both local training and global model aggregation for FSSL without compromising privacy. We demonstrate the effectiveness and robustness of the proposed method through extensive experiments and elaborate ablation studies conducted on the CIFAR-10/100, SVHN, FMNIST, and STL-10 datasets. Experimental results show that \text D^2\text Fed outperforms state-of-the-arts on five datasets under diverse data settings.

PaperID: 1246,

Authors: Xue-juan Han, Zhong Qu, Shi-Yan Wang, Shu-Fang Xia

Affiliations: School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China

Title: Object Detection With Physical Prior and AWConv in Foggy Weather for Traffic Scenes

Abstract:
Despite significant advances in object detection methods for traffic scenes, object detection under adverse weather conditions is still a challenging task. Especially in foggy weather, the presence of fog reduces visibility, thus weakening the feature information of traffic objects in images, and foggy weather occurs frequently. To cope with this problem, we propose an object detection method with physical prior and adaptive weight convolution (AWConv), and evaluate it on datasets such as Foggy Cityscapes and RTTS. We apply gamma correction in the improved defogging algorithm to enhance the key regions in the image, thus improving the separability of the features. Meanwhile, the feature extraction and representation ability of the model is enhanced by an adaptive weighting mechanism, which in turn improves the model detection performance. In addition, we explore the relationship between image quality and detection accuracy and observe that they are not linearly positively correlated. Due to the complexity of traffic objects in foggy weather, we conduct experiments on Foggy Cityscapes (synthetic fog), RTTS (real-world multiple adverse weather), Cityscapes (normal weather), and extended dataset (different fog concentrations) to validate the model’s effectiveness, generalization ability, and robustness. Experimental results show that the small model alone improves mean average precision (mAP) by 1.4% with only 24.6 giga floating point operations per second (GFLOPs) on the Foggy Cityscapes dataset, reduces GFLOPs by 3.8 and improves recall (R) by 1.1% on the RTTS dataset.

PaperID: 1247,

Authors: Jiqing Li, Zhendong Yin, Dasen Li, Yanlong Zhao

Affiliations: School of Electronic and Information Engineering, Harbin Institute of Technology, Harbin, China

Title: Utilizing TOP2 Class for Hybrid Decision-Making to Enhance TOP1 Accuracy of Ensemble Models

Abstract:
In the domain of deep learning for visual tasks, ensemble models combine several less accurate models to form a more precise composite model, improving overall performance. Traditionally, majority voting and average probabilities have been the main decision-making techniques in ensemble learning, focusing only on the TOP1 Class of base models, hence overlooking other significant information. This article introduces a new algorithm, TOP2 hybrid decision (TOP2 HD), which enhances the TOP1 accuracy of the ensemble model. TOP2 HD categorizes base models into hierarchies based on their TOP1 Class and uses the TOP2 Class for ranking, leading to better performance. Extensive experiments across various models and datasets demonstrate that TOP2 HD not only surpasses traditional ensemble methods, such as majority voting, average probabilities, and stacking, but also exceeds many of the latest ensemble strategies in the image domain. In addition, our experiments revealed a functional relationship between the test accuracy of the ensemble model and the number of base models. This enables us to predict the upper limit of the ensemble model’s performance using only a fraction of the models, providing a crucial reference for the performance after the deployment of the ensemble model.

PaperID: 1248,

Authors: Jitendra Kumar, Deepika Saxena, Kishu Gupta, Satyam Kumar, Ashutosh Kumar Singh

Affiliations: Department of Mathematics, Bioinformatics and Computer Applications, Maulana Azad National Institute of Technology, Bhopal, India; School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, Japan; Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan; Carelon Global Solutions, National Institute of Technology Tiruchirappalli, Bengaluru, Karnataka, India; Department of Computer Science, VIZJA University, Warsaw, Poland

Title: A Comprehensively Adaptive Architectural Optimization-Ingrained Quantum Neural Network Model for Cloud Workloads Prediction

Abstract:
Accurate workload prediction and advanced resource reservation are indispensably crucial for managing dynamic cloud services. Traditional neural networks and deep learning models frequently encounter challenges with diverse, high-dimensional workloads, especially during sudden resource demand changes, leading to inefficiencies. This issue arises from their limited optimization during training, relying only on parametric (interconnection weights) adjustments using conventional algorithms. To address this issue, this work proposes a novel comprehensively adaptive architectural optimization-based variable quantum neural network (CA-QNN), which combines the efficiency of quantum computing with complete structural and qubit vector parametric learning. The model converts workload data into qubits, processed through qubit neurons with controlled not-gated activation functions for intuitive pattern recognition. In addition, a comprehensive architecture optimization algorithm for networks is introduced to facilitate the learning and propagation of the structure and parametric values in variable-sized quantum neural networks (VQNNs). This algorithm incorporates quantum adaptive modulation (QAM) and size-adaptive recombination during the training process. The performance of the CA-QNN model is thoroughly investigated against seven state-of-the-art methods across four benchmark datasets of heterogeneous cloud workloads. The proposed model demonstrates superior prediction accuracy, reducing prediction errors by up to 93.40% and 91.27% compared to existing deep learning and QNN-based approaches.

PaperID: 1249,

Authors: Ruoheng Wang, Xiaowen Bi, Siqi Bu, Zhixian Tang

Affiliations: Department of Electrical and Electronic Engineering, The Hong Kong Polytechnic University, Hong Kong, China; Department of Statistics and Data Science, Guangdong Provincial/Zhuhai Key Laboratory of IRADS, Beijing Normal-Hong Kong Baptist University, Zhuhai, China; Department of Electrical and Electronic Engineering, Shenzhen Research Institute, the Research Center for Grid Modernisation, the Research Institute for Smart Energy, the Policy Research Center for Innovation and Technology, the International Center of Urban Energy Nexus, and the Center for Advances in Reliability and Safety, The Hong Kong Polytechnic University, Hung Hom, Hong Kong

Title: Deep Reinforcement Learning Approach for Dynamic Distribution Network Reconfiguration Based on Sequential Masking

Abstract:
Dynamic distribution network reconfiguration (DDNR) is a widely used technique for the secure and economic operation of power distribution networks (PDNs), especially in the presence of high-penetration renewable energy sources (RESs). DDNR is realized by controlling the on/off status of remotely controlled switches (RCSs) equipped at power lines in PDNs to optimize power flows. Thanks to the enhanced data availability of PDNs, data-driven solutions to DDNR, such as deep reinforcement learning (DRL), have gained growing attention recently. However, DDNR solves a sequence of combinatorial problems featuring a vast and sparse action space incurred by a so-called “radiality constraint,” which is highly challenging for DRLs to handle. Existing DRL methods are either unscalable to large-scale problems or potentially restrict optimality. Hence, we propose a sequential masking strategy to decompose its complex action space into a sequence of maskable sub-action spaces. A gated recurrent unit (GRU)-based agent and an adapted soft actor critic (SAC) algorithm are designed accordingly, producing a data-efficient, safety-guaranteed, and scalable DRL solution to the DDNR problem. Comprehensive comparisons with existing data-driven methods and model-based benchmarks are conducted via various case studies, demonstrating the advantages of the proposed method in both algorithmic performance and scalability.

PaperID: 1250,

Authors: Biswajit Sadhu, Trijit Sadhu, S. Anand

Affiliations: Health Physics Division, Health Safety and Environment Group, Bhabha Atomic Research Center, Mumbai, India; Birla Institute of Technology and Science, Pilani, India

Title: RadDQN: A Deep Q Learning-Based Architecture for Finding Time-Efficient Minimum Radiation Exposure Pathway

Abstract:
Recent advances in deep reinforcement learning (DRL) have expanded its use in various automation sectors, including the nuclear industry. While DRL shows promise for optimizing radiation exposure, the development of radiation-aware autonomous unmanned aerial vehicles (UAVs) is hindered by inefficient reward functions and exploration strategies. In this article, we introduce a radiation-aware deep Q-learning network (RadDQN), designed to provide time-efficient, minimum radiation-exposure pathways in radiation zones. RadDQN operates on a radiation-sensitive reward function considering surrounding radiation intensity through the inverse square law and prioritizes reaching the final destination. Departing from the traditional \epsilon -greedy algorithm, RadDQN implements unique exploration strategies that guide the agent to transform random actions into model-directed ones if transitioning to a future state projects higher radiation exposure compared to its current state. This approach ensures minimal radiation exposure while efficiently progressing toward the goal. We validate RadDQN’s accuracy against a grid-based deterministic method. Our results demonstrate that the formulated reward function and exploration strategy effectively manage various radiation field distributions. In addition, RadDQN shows superior convergence rates and higher training stability compared to the baseline model, indicating its effectiveness in optimizing radiation-aware UAV navigation and potential for real-world applications.

PaperID: 1251,

Authors: Tianyi Zhang, Zhiling Yan, Chunhui Li, Nan Ying, Yanli Lei, Shangqing Lyu, Yunlu Feng, Yu Zhao, Guanglei Zhang

Affiliations: Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, China; School of Biological Sciences, Nanyang Technological University, Jurong West, Singapore; School of Artificial Intelligence, Nanjing University, Nanjing, China; PuzzleLogic Pte Ltd., Singapore, Singapore; Department of Gastroenterology, Peking Union Medical College Hospital, Beijing, China; Department of Pathology, Peking Union Medical College Hospital, Beijing, China

Title: CellMix: A General Instance Relationship-Based Method for Data Augmentation Toward Pathology Image Classification

Abstract:
In the pathology image analysis, obtaining and maintaining high-quality annotated samples is an extremely labor-intensive task. To overcome this challenge, mixing-based methods have introduced new relationships to traditional preprocessing data augmentation techniques. Nonetheless, these methods fail to fully consider the unique features of pathology images, such as local specificity, global distribution, and inner/outer sample instance relationships. To better comprehend these characteristics and create valuable pseudosamples, we propose the CellMix framework, which employs a novel distribution-oriented in-place shuffle approach. The images are divided into patches based on the granularity of pathology instances, and the patches are in-place shuffled within the same batch. Thus, the locational relationships among instances can be effectively preserved while new relationships can be further introduced. Moreover, inspired by curriculum learning (CL), a loss-driven strategy is designed to control the relationship augmentation. This strategy enables the model to adaptively explore the instances at multiple scales and efficiently handle distribution-related noise under various difficulties. Our experiments in pathology image classification tasks demonstrate state-of-the-art (SOTA) performance on seven distinct datasets. This innovative instance relationship-centered method sheds light on general data augmentation for pathology image classification. The associated codes are available at: https://github.com/sagizty/CellMix.

PaperID: 1252,

Authors: Dongyue Guo, Shiyu Zhang, Jianwei Zhang, Bo Yang, Yi Lin

Affiliations: College of Computer Science, Sichuan University, Chengdu, China

Title: Exploring Contextual Knowledge-Enhanced Speech Recognition in Air Traffic Control Communication: A Comparative Study

Abstract:
Accurate recognition of named entities from spoken instructions remains a significant challenge for automatic speech recognition (ASR) techniques in air traffic control (ATC), which limits the reliability of ASR-based applications. A promising solution to overcome this challenge is to integrate prior contextual knowledge into ASR since it contains rich named entities used in ATC communications. Although existing studies have investigated ATC-related contextual ASR techniques, there is a lack of benchmarks to evaluate the advantages of different approaches. In this article, a comprehensive comparative study is presented to explore effective contextual ASR approaches for the ATC domain. Specifically, several typical contextual ASR approaches are introduced in ATC to conduct a comprehensive comparison. Moreover, a novel contextual ASR model, denoted CATCNet, is presented to dedicatedly address the domain-specific problems in ATC, such as limited resources, fast speech, and volatile noise. Several evaluation metrics are proposed to validate the performance of comparison approaches based on the practical requirements of ATC efforts. Extensive experiments are conducted across two real-world ATC speech corpora to build the benchmark. The experimental results demonstrated that integrating context knowledge is effective in improving the recognition performance of named entities. Crucially, the proposed CATCNet outperforms other baseline models by confirming all technical improvements, achieving 80.0% and 86.54% instruction recognition accuracy (IRA) on the ATCSpeech and C-ATCSpeech corpora, respectively. It is believed that this work not only overcomes the bottleneck of ASR performance in the ATC domain, but also provides an applicable solution for ATC-related ASR applications.

PaperID: 1253,

Authors: Chen Zhang, Guorong Li, Yuankai Qi, Hanhua Ye, Laiyun Qing, Ming-Hsuan Yang, Qingming Huang

Affiliations: Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Computer Science and Technology, Key Laboratory of Big Data Mining and Knowledge Management, University of Chinese Academy of Sciences, Beijing, China; School of Computing, Macquarie University, Sydney, NSW, Australia; Department of Electrical Engineering and Computer Science, University of California at Merced, Merced, CA, USA

Title: Dynamic Erasing Network With Adaptive Temporal Modeling for Weakly Supervised Video Anomaly Detection

Abstract:
The weakly supervised video anomaly detection aims to learn a detection model using only video-level labeled data. Prior studies ignore the complexity or duration of anomalies present in abnormal videos during temporal modeling. Moreover, existing works usually detect the most abnormal segments, potentially overlooking the completeness of anomalies. We propose a dynamic erasing network (DE-Net) for weakly supervised video anomaly detection, which learns video-specific temporal features via adaptive temporal modeling (ATM) to address these limitations. Specifically, to handle duration variations of abnormal events, we propose an ATM module capable of adaptively selecting and aggregating the most appropriate K temporal scale features for each video. Then, we design a dynamic erasing (DE) strategy that dynamically assesses the completeness of the detected anomalies and erases prominent abnormal segments to encourage the model to discover gentle abnormal segments. The proposed method achieves favorable performance compared to several state-of-the-art approaches on the widely used XD-Violence, TAD, and UCF-Crime datasets.

PaperID: 1254,

Authors: Zhipeng Huang, Jianhao Ding, Zhiyu Pan, Haoran Li, Ying Fang, Zhaofei Yu, Jian K. Liu

Affiliations: Digital Fujian Internet-of-Things Laboratory of Environmental Monitoring, College of Computer and Cyber Security, Fujian Normal University, Fuzhou, Fujian, China; School of Computer Science and the State Key Laboratory of Multimedia Information Processing, Peking University, Beijing, China; Institute for Artificial Intelligence and the School of Computer Science, Peking University, Beijing, China; Shaanxi Key Laboratory of Intelligent Human-Computer Interaction and Wearable Technology, Guangzhou Institute of Technology, Xidian University, Guangzhou, China; Fujian Provincial Key Laboratory of Statistics and Artificial Intelligence, Fujian Normal University, Fuzhou, Fujian, China; School of Computer Science, University of Birmingham, Birmingham, U.K.

Title: Converting High-Performance and Low-Latency SNNs Through Explicit Modeling of Residual Error in ANNs

Abstract:
Spiking neural networks (SNNs) have garnered interest due to their energy efficiency and superior effectiveness on neuromorphic chips compared with traditional artificial neural networks (ANNs). One of the mainstream approaches to implementing deep SNNs is the ANN–SNN conversion, which integrates the efficient training strategy of ANNs with the energy-saving potential and fast inference capability of SNNs. However, under extremely low-latency conditions, the existing conversion theory suggests that the problem of SNNs’ neurons firing more or fewer spikes within each layer, i.e., residual error, leads to a performance gap in the converted SNNs compared with the original ANNs. This severely limits the possibility of the practical application of SNNs on delay-sensitive edge devices. Existing conversion methods addressing this problem usually involve modifying the state of the conversion spiking neurons. However, these methods do not consider their adaptability and compatibility with neuromorphic chips. We propose a new approach based on explicit modeling of residual errors as additive noise. The noise is incorporated into the activation function of the source ANN, effectively reducing the impact of residual error on SNN performance. Our experiments on the CIFAR10/100 and Tiny-ImageNet datasets verify that our approach exceeds the prevailing ANN–SNN conversion methods and directly trained SNNs concerning accuracy and the required time steps. Overall, our method provides new ideas for improving SNN performance under ultralow-latency conditions and is expected to promote practical neuromorphic hardware applications for further development. The code for our NQ framework is available at https://github.com/hzp2022/ANN2SNN_NQ

PaperID: 1255,

Authors: Heshan Wang, Zhepeng Wang, Mingyuan Yu, Jing Liang, Jinzhu Peng, Yaonan Wang

Affiliations: School of Electronic and Information Engineering, Zhengzhou University, Zhengzhou, China

Title: Grouped Vector Autoregression Reservoir Computing Based on Randomly Distributed Embedding for Multistep-Ahead Prediction

Abstract:
As an efficient recurrent neural network (RNN), reservoir computing (RC) has achieved various applications in time-series forecasting. Nevertheless, a poorly explained phenomenon remains as to why the RC and deep RCs succeed in handling time-series prediction despite completely randomized weights. This study tries to generate a grouped vector autoregressive RC (GVARC) time-series forecasting model based on the randomly distributed embedding (RDE) theory. In RDE-GVARC, the deep structures are constructed by multiple GVARCs, which makes the established RDE-GVARC evolve into a deterministic deep RC model with few hyperparameters. Then, the spatial output information of the GVARC is mapped into the future temporal states of an output variable based on RDE equations. The main advantages of the RDE-GVARC can be summarized as follows: 1) RDE-GVARC solves the problems of uncertainty in the weight matrix and difficulty in large-scale parameter selection in the input and hidden layers of deep RCs; 2) the GVARC can avoid massive deep RC hyperparameter design and make the design of deep RC more straightforward and effective; and 3) the proposed RDE-GVARC shows good performance, strong stability, and robustness in several chaotic and real-world sequences for multistep-ahead prediction. The simulating results confirm that the RDE-GVARC not only outperforms some recently deep RCs and RNNs, but also maintains the rapidity of RC with an interpretable structure.

PaperID: 1256,

Authors: Yuanpeng Gong, Shuxian Lun, Ming Li

Affiliations: Yuanpeng Gong is the College of Mathematical Sciences, Bohai University, Jinzhou, China; College of Control Science and Engineering, Bohai University, Jinzhou, China

Title: Broad-ESN Based on Radical Activation Function for Predicting Time Series With Multiple Variables

Abstract:
Multidimensional time series (MTS) has the unique characteristics of multidimensionality and multifeature, so it becomes particularly important when choosing a prediction model. Therefore, this article proposes a novel broad echo state network (Broad-ESN) based on radical activation function (RB-ESN). First, a radical activation function is proposed to solve the problem of gradient disappearing in the iterative process and is more conducive to dealing with complex data patterns. Second, the sliding window is used to extract the features of MTS. The number of reservoirs is determined by the number of features. Third, by using Cubic chaotic mapping to initialize the pied kingfisher optimizer (PKO) population, the search space can be effectively expanded, and high-quality random sequences can be generated. Then, the exponential spiral equation is used to optimize the position update equation of the pied kingfisher, which solves the problem of local optimization. Finally, the results show that the model proposed in this article is significantly superior to other models in forecasting performance, with high prediction accuracy and low error.

PaperID: 1257,

Authors: Guoqing Zhang, Zhihao Li, Jiqiang Li, Weidong Zhang, Bin Qiu

Affiliations: Navigation College, Dalian Maritime University (DMU), Dalian, Liaoning, China; School of Information and Communication Engineering, Hainan University, Haikou, Hainan, China; School of Computer Science and Engineering and Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin, China

Title: Prescribed Performance Path-Following Control for Rotor-Assisted Vehicles via an Improved Reinforcement Learning Mechanism

Abstract:
This article investigates an adaptive prescribed performance path-following control algorithm for rotor-assisted vehicles, incorporating reinforcement learning (RL) to execute energy-saving cruising missions. For obtaining a high-performance path-following controller, a concise prescribed performance control (PPC) algorithm is designed to tightly constrain the output errors within the defined boundaries, while a shifting function is introduced to solve the problem of initial condition restrictions. Furthermore, through integrating the Backstepping method and the optimal control technique, an improved RL with the form of actor–critic neural networks (AC-NNs) is proposed to offer an innovative approach to the challenges of the model uncertainties and external disturbances. In this approach, the actor NN is employed to create an appropriate control policy, while the critic NN is aimed at evaluating the cost-to-go function to modify the system action. Semi-global uniform ultimate bounded (SGUUB) stable properties of the proposed algorithm are guaranteed via the Lyapunov theory. Finally, the superiority and feasibility of the proposed algorithm are verified by two numerical experiments.

PaperID: 1258,

Authors: Pengzhou Cheng, Zongru Wu, Wei Du, Haodong Zhao, Wei Lu, Gongshen Liu

Affiliations: Department of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China; StatNLP Research Group, Singapore University of Technology and Design, Tampines, Singapore

Title: Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review

Abstract:
Language models (LMs) are becoming increasingly popular in real-world applications. Outsourcing model training and data hosting to third-party platforms has become a standard method for reducing costs. In such a situation, the attacker can manipulate the training process or data to inject a backdoor into models. Backdoor attacks are a serious threat where malicious behavior is activated when triggers are present; otherwise, the model operates normally. However, there is still no systematic and comprehensive review of LMs from the attacker’s capabilities and purposes on different backdoor attack surfaces. Moreover, there is a shortage of analysis and comparison of the diverse emerging backdoor countermeasures. Therefore, this work aims to provide the natural language processing (NLP) community with a timely review of backdoor attacks and countermeasures. According to the attackers’ capability and affected stage of the LMs, the attack surfaces are formalized into four categorizations: attacking the pretrained model with fine-tuning (APMF) or parameter-efficient fine-tuning (PEFT), attacking the final model with training (AFMT), and attacking large language model (ALLM). Thus, attacks under each categorization are combed. The countermeasures are categorized into two general classes: sample inspection and model inspection. Thus, we review countermeasures and analyze their advantages and disadvantages. Also, we summarize the benchmark datasets and provide comparable evaluations for representative attacks and defenses. Drawing the insights from the review, we point out the crucial areas for future research on the backdoor, especially soliciting more efficient and practical countermeasures.

PaperID: 1259,

Authors: Ying Kong, Xi Chen, Yunliang Jiang, Danfeng Sun, Jun Zhang

Affiliations: Department of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, China; Department of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China; Zhejiang Normal University, Jinhua, China

Title: Novel Discretized Zeroing Neural Network Models for Time-Varying Optimization Aided With Predictor-Corrector Methods

Abstract:
In this article, we derive the predictor-corrector (PC) methods with three-order convergent precision, together with a class of specific general linear three-step (GLTS) rules provided. Afterward, a time-varying optimization (TVO) problem, which is deemed as a discrete TVO has been formulated and studied. The classical discrete zeroing neural network via Zhang et al. discretization (ZD-DZNN) is often utilized to obtain the solution. Actually, the stepsize domain of the DZNN model is a great factor for the dynamical stability. To enlarge the stepsize domain of the DZNN model, specific GLTS-type PC-DZNN models are applied to solve the TVO problem. Theoretical analyses show that better stability of the DZNN can be achieved by PC methods. Numerical simulative comparisons between the proposed PC-DZNN models and the ZD-DZNN in terms of stability are provided for further illustrations. In addition, motion planning of a PA10 manipulator and physical kinematics on UR5 formed as a TVO problem has been solved efficiently by applying the specific GLTS-type PC-DZNN models.

PaperID: 1260,

Authors: Wei Wei, Zi'ang Wang, Bowen Pang, Jiannan Wang, Xue Liu

Affiliations: School of Software, Beihang University, Beijing, China; Key Laboratory of Mathematics, Informatics and Behavioral Semantics, Ministry of Education, Beijing, China; Institute of Artificial Intelligence, Beihang University, Beijing, China

Title: Wavelet Transformer: An Effective Method on Multiple Periodic Decomposition for Time Series Forecasting

Abstract:
Time series forecasting has attracted significant interest across various fields in recent years. Notably, Transformers have been extensively investigated for long-term time series forecasting (LTSF) due to their remarkable ability on modeling sequential data. However, the point-wise calculation of its self-attention leads to a challenging task for accurately capturing real-world time series’ local and global characteristics, especially with multiple seasonal periodic components and outliers. In this article, we leverage wavelet analysis to recognize different frequency patterns and design an effective attention mechanism for time series forecasting to address this issue. In detail, we employ the maximal overlap discrete wavelet transform (MODWT) to construct a novel wavelet attention (WA) mechanism and propose the wavelet transformer (Waveformer) prediction technique. This approach effectively extracts multiple periodic features, mitigates the influence of anomalies and improves the precision of time series prediction under seasonal-trend decomposition methods. Experimental evaluations on six real-world datasets from various application fields demonstrate that the multiple periodic decomposition strategy of Waveformer successfully captures time series seasonal patterns and improves forecasting performance in comparison with many state-of-art methods.

PaperID: 1261,

Authors: Zhanglu Yan, Kaiwen Tang, Jun Zhou, Weng-Fai Wong

Affiliations: School of Computing, National University of Singapore, Cluny Road, Singapore; Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Fusionopolis, Singapore

Title: Low Latency Conversion of Artificial Neural Network Models to Rate-Encoded Spiking Neural Networks

Abstract:
Spiking neural networks (SNNs) are well suited for resource-constrained applications as they do not need expensive multipliers. In a typical rate-encoded SNN, a series of binary spikes within a globally fixed time window is used to fire the neurons. The time window size is also the latency of the network in performing a single inference, as well as determining the overall energy efficiency of the model. The aim of this article is to reduce this while maintaining accuracy when converting artificial neural networks (ANNs) to their equivalent SNNs. The state-of-the-art conversion schemes yield SNNs with accuracies comparable with ANNs only for large window sizes. In this article, we start with understanding the information loss when converting from preexisting ANN models to standard rate-encoded SNN models. From these insights, we propose a suite of techniques that includes a novel SNN encoding scheme, a new spike generation model, an input channel expansion strategy, and a threshold training technique. Together, these methods enabled us to achieve state-of-the-art accuracies using the lowest latencies reported in the literature. In particular, our method achieved a top-1 SNN accuracy of 98.73% (using a single time step) on the MNIST dataset, 76.38% (with eight time steps) on the CIFAR-100 dataset, and 93.71% (eight time steps) on the CIFAR-10 dataset. On ImageNet, an SNN accuracy of 81.9% was achieved using 40 time steps.

PaperID: 1262,

Authors: Xu Zhang, Biao Luo, Zi-Peng Wang, Xiaodong Xu, Chunhua Yang

Affiliations: School of Automation, Central South University, Changsha, China; School of Information Science and Technology, Beijing Laboratory of Smart Environmental Protection, Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China

Title: Adaptive Boundary Control for Synchronization of Reaction-Diffusion Neural Networks With Random Time-Varying Delay

Abstract:
This article addresses the synchronization problem of reaction-diffusion neural networks (RDNNs) with random time-varying delay (RTVD) via boundary control (BC) (including adaptive BC and BC with constant-valued gain) under distributed measurements or boundary measurements. First, a novel BC strategy with constant-valued gain is designed, which considers three cases of the measurements, that is, distributed measurements, boundary measurements, and both coexist. Subsequently, an adaptive BC scheme under boundary measurements is proposed, where the control gain is regulated effectively. Next, based on the inequality techniques and Lyapunov direct approach, the delay-dependent synchronization conditions are gained and some linear matrix inequalities (LMIs) based theorems are given. Then, the BC design for the delayed RDNNs is transformed into an LMI feasibility problem. Finally, the developed BC approaches are validated by the simulation results.

PaperID: 1263,

Authors: Sheng Yu, Di-Hua Zhai, Yufeng Zhan, Wencai Wang, Yuyin Guan, Yuanqing Xia

Affiliations: School of Automation, Beijing Institute of Technology, Beijing, China; Beijing Building Materials Academy of Science Research, Beijing, China

Title: 6-D Object Pose Estimation Based on Point Pair Matching for Robotic Grasp Detection

Abstract:
The 6-D pose estimation is a critical work essential to achieve reliable robotic grasping. Currently, the prevalent method is reliant on keypoint correspondence. However, this approach hinges on the determination of object keypoint locations, alongside their detection and localization in real scenes. It also employs the random sample consensus (RANSAC)-based perspective-n-point (PnP) algorithm to solve the pose. Yet, it is nondifferentiable and incapable of backpropagation with loss during the training phase. Alternatively, the direct regression method, while speedy and differentiable, falls short in terms of pose estimation performance, and thus needs enhancement. In view of these gaps, we investigate PPM6D, a new method for 6-D object pose estimation based on regression and point pair matching. Our methodology begins with a proposed cross-fusion module, designed to achieve the fusion and complementation of RGB features and point cloud features. Subsequently, an attention module adjusts the features of the object’s 3-D model. Finally, we design a point pair matching module for effective matching of points and characteristics, resulting in an integral matching and fusion. PPM6D is extensively trained and tested utilizing benchmark datasets like LINEMOD, occlusion LINEMOD (LINEMOD-occ), YCB-Video, and T-LESS dataset. Experimental results prove that PPM6D can outperform many keypoint-based pose estimation methods, given its relatively rapid speed, thereby offering novel regression-based pose estimation ideas. When applied to real-world scenarios of object pose estimation tasks and grasp tasks of an actual Baxter robot, PPM6D demonstrates superior performance as compared to most alternatives.

PaperID: 1264,

Authors: Renzhi Lu, Xiaotao Wang, Yiyu Ding, Hai-Tao Zhang, Feng Zhao, Lijun Zhu, Yong He

Affiliations: School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China; School of Artificial Intelligence and Automation, Institute of Artificial Intelligence, MOE Engineering Research Center of Autonomous Intelligent Unmanned Systems, State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan, China; China Ship Scientific Research Center, Wuxi, China; School of Artificial Intelligence and Automation and the MOE Engineering Research Center of Autonomous Intelligent Unmanned Systems, Huazhong University of Science and Technology, Wuhan, China; School of Automation, Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, and the Engineering Research Center of Intelligent Technology for Geo-Exploration, Ministry of Education, China University of Geosciences, Wuhan, China

Title: Adaptive Optimal Surrounding Control of Multiple Unmanned Surface Vessels via Actor-Critic Reinforcement Learning

Abstract:
In this article, an optimal surrounding control algorithm is proposed for multiple unmanned surface vessels (USVs), in which actor-critic reinforcement learning (RL) is utilized to optimize the merging process. Specifically, the multiple-USV optimal surrounding control problem is first transformed into the Hamilton-Jacobi–Bellman (HJB) equation, which is difficult to solve due to its nonlinearity. An adaptive actor-critic RL control paradigm is then proposed to obtain the optimal surround strategy, wherein the Bellman residual error is utilized to construct the network update laws. Particularly, a virtual controller representing intermediate transitions and an actual controller operating on a dynamics model are employed as surrounding control solutions for second-order USVs; thus, optimal surrounding control of the USVs is guaranteed. In addition, the stability of the proposed controller is analyzed by means of Lyapunov theory functions. Finally, numerical simulation results demonstrate that the proposed actor-critic RL-based surrounding controller can achieve the surrounding objective while optimizing the evolution process and obtains 9.76% and 20.85% reduction in trajectory length and energy consumption compared with the existing controller.

PaperID: 1265,

Authors: Jianming Huang, Xun Su, Zhongxi Fang, Hiroyuki Kasai

Affiliations: Department of Computer Science and Communication Engineering, Graduate School of Fundamental Science and Engineering, Waseda University, Tokyo, Japan; Department of Computer Science and Communication Engineering, Graduate School of Fundamental Science and Engineering, and the Department of Communication and Computer Engineering, School of Fundamental Science and Engineering, Waseda University, Tokyo, Japan

Title: Anchor Space Optimal Transport as a Fast Solution to Multiple Optimal Transport Problems

Abstract:
In machine learning, optimal transport (OT) theory is extensively utilized to compare probability distributions across various applications, such as graph data represented by node distributions and image data represented by pixel distributions. In practical scenarios, it is often necessary to solve multiple OT problems. Traditionally, these problems are treated independently, with each OT problem being solved sequentially. However, the computational complexity required to solve a single OT problem is already substantial, making the resolution of multiple OT problems even more challenging. Although many applications of fast solutions to OT are based on the premise of a single OT problem with arbitrary distributions, few efforts handle such multiple OT problems with multiple distributions. Therefore, we propose the anchor space OT (ASOT) problem: an approximate OT problem designed for multiple OT problems. This proposal stems from our finding that in many tasks the mass transport tends to be concentrated in a reduced space from the original feature space. By restricting the mass transport to a learned anchor point space, ASOT avoids pairwise instantiations of cost matrices for multiple OT problems and simplifies the problems by canceling insignificant transports. This simplification greatly reduces its computational costs. We then prove the upper bounds of its 1-Wasserstein distance error between the proposed ASOT and the original OT problem under different conditions. Building upon this accomplishment, we propose three methods to learn anchor spaces for reducing the approximation error. Furthermore, our proposed methods present great advantages for handling distributions of different sizes with GPU parallelization.

PaperID: 1266,

Authors: Meng Xu, Xinhong Chen, Jianping Wang

Affiliations: Department of Computer Science, City University of Hong Kong, Hong Kong, SAR, China

Title: A Novel Topology Adaptation Strategy for Dynamic Sparse Training in Deep Reinforcement Learning

Abstract:
Deep reinforcement learning (DRL) has been widely adopted in various applications, yet it faces practical limitations due to high storage and computational demands. Dynamic sparse training (DST) has recently emerged as a prominent approach to reduce these demands during training and inference phases, but existing DST methods achieve high sparsity levels by sacrificing policy performance as they rely on the absolute magnitude of connections for pruning and randomly generating connections. Addressing this, our study presents a generic method that can be seamlessly integrated into existing DST methods in DRL to enhance their policy performance while preserving their sparsity levels. Specifically, we develop a novel method for calculating the importance of connections within the model. Subsequently, we dynamically adjust the sparse network topology by dropping existing connections and introducing new connections based on their respective importance values. Through validation on eight widely used simulation tasks, our method improves two state-of-the-art (SOTA) DST approaches by up to 70% in episode return and average return across all episodes under various sparsity levels.

PaperID: 1267,

Authors: Huibin Lin, Xiaofeng Huang, Zhuyun Chen, Guolin He, Ciyang Xi, Weihua Li

Affiliations: School of Mechanical and Automotive Engineering, South China University of Technology, Guangzhou, China

Title: Matching Pursuit Network: An Interpretable Sparse Time-Frequency Representation Method Toward Mechanical Fault Diagnosis

Abstract:
Rotatory machinery commonly operates in complex environments with strong noise and variable working conditions. Time-frequency representation offers a valuable method for capturing and analyzing nonstationary characteristics, making it particularly suitable for identifying transient fault-related features. However, despite these advantages, extracting robust and interpretable fault features in machinery operating under variable speeds remains a challenge with existing techniques. In this article, a novel sparse time-frequency representation (STFR) method, named matching pursuit network (MPNet) is proposed for mechanical fault diagnosis. First, a deep network structure with signal decomposition capability is constructed by well-defined interpretable matching pursuit (MP) units to automatically learn discriminative features from time-frequency inputs. Then, the weights of each effective component signal to reconstruct the raw input are designed to measure their contributions. Accordingly, the optimization criterion with structural similarity metric is produced to realize the model parameter update in an end-to-end manner. Finally, phenomenological model-based fault simulation signals and real fault signals from gearbox experiments are used for model training and testing, respectively. The results show that the proposed approach can well extract robust and interpretable time-frequency features and obviously outperforms the state-of-the-art time-frequency representation methods.

PaperID: 1268,

Authors: Wujie Zhou, Xinyu Sun, Xiaohong Qian, Meixin Fang

Affiliations: School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, China; Zhejiang University School of Medicine, Hangzhou, China

Title: Asymmetrical Contrastive Learning Network via Knowledge Distillation for No-Service Rail Surface Defect Detection

Abstract:
Owing to extensive research on deep learning, significant progress has recently been made in trackless surface defect detection (SDD). Nevertheless, existing algorithms face two main challenges. First, while depth features contain rich spatial structure features, most models only accept red-green–blue (RGB) features as input, which severely constrains performance. Thus, this study proposes a dual-stream teacher model termed the asymmetrical contrastive learning network (ACLNet-T), which extracts both RGB and depth features to achieve high performance. Second, the introduction of the dual-stream model facilitates an exponential increase in the number of parameters. As a solution, we designed a single-stream student model (ACLNet-S) that extracted RGB features. We leveraged a contrastive distillation loss via knowledge distillation (KD) techniques to transfer rich multimodal features from the ACLNet-T to the ACLNet-S pixel by pixel and channel by channel. Furthermore, to compensate for the lack of contrastive distillation loss that focuses exclusively on local features, we employed multiscale graph mapping to establish long-range dependencies and transfer global features to the ACLNet-S through multiscale graph mapping distillation loss. Finally, an attentional distillation loss based on the adaptive attention decoder (AAD) was designed to further improve the performance of the ACLNet-S. Consequently, we obtained the ACLNet-S, which achieved performance similar to that of ACLNet-T, despite having a nearly eightfold parameter count gap. Through comprehensive experimentation using the industrial RGB-D dataset NEU RSDDS-AUG, the ACLNet-S (ACLNet-S with KD) was confirmed to outperform 16 state-of-the-art methods. Moreover, to showcase the generalization capacity of ACLNet-S, the proposed network was evaluated on three additional public datasets, and ACLNet-S achieved comparable results. The code is available at https://github.com/Yuride0404127/ACLNet-KD.

PaperID: 1269,

Authors: Litong Fan, Dengxiu Yu, Kang Hao Cheong, Zhen Wang

Affiliations: School of Mechanical Engineering and the School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi’an, China; School of Artificial Intelligence, Optics and Electronics, iOPEN, Northwestern Polytechnical University, Xi’an, China; Science, Mathematics and Technology Cluster, Singapore University of Technology and Design, Tampines, Singapore; School of Mechanical Engineering, iOPEN, and the School of Cybersecurity, Northwestern Polytechnical University, Xi’an, China

Title: Optimal Evolution Strategy for Continuous Strategy Games on Complex Networks via Reinforcement Learning

Abstract:
This article presents an optimal evolution strategy for continuous strategy games on complex networks via reinforcement learning (RL). In the past, evolutionary game theory usually assumed that agents use the same selection intensity when interacting, ignoring the differences in their learning abilities and learning willingness. Individuals are reluctant to change their strategies too much. Therefore, we design an adaptive strategy updating framework with various selection intensities for continuous strategy games on complex networks based on imitation dynamics, allowing agents to achieve the optimal state and a higher cooperation level with the minimal strategy changes. The optimal updating strategy is acquired using a coupled Hamilton-Jacobi-Bellman (HJB) equation by minimizing the performance function. This function aims to maximize individual payoffs while minimizing strategy changes. Furthermore, a value iteration (VI) RL algorithm is proposed to approximate the HJB solutions and learn the optimal strategy updating rules. The RL algorithm employs actor and critic neural networks to approximate strategy changes and performance functions, along with the gradient descent weight update approach. Meanwhile, the stability and convergence of the proposed methods have been proved by the designed Lyapunov function. Simulations validate the convergence and effectiveness of the proposed methods in different games and complex networks.

PaperID: 1270,

Authors: Pengfei Guo, Yunong Zhang, Shuai Li

Affiliations: School of Computational Science, Zhongkai University of Agriculture and Engineering, Guangzhou, China; School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, China; Faculty of Information Technology and Electrical Engineering, University of Oulu, Oulu, Finland

Title: Reciprocal-Kind Zhang Neurodynamics Method for Temporal-Dependent Sylvester Equation and Robot Manipulator Motion Planning

Abstract:
It is noteworthy that the Sylvester equation plays a pivotal role in the field of industrial intelligence control. To meet the demands of real-time applications, temporal-dependent Sylvester equations (TDSEs) are employed to formulate motion planning problems for robot manipulators. Traditionally, the classical Zhang neurodynamics (ZN) method is utilized to address the TDSE problems, which encounters challenges associated with temporal-dependent inverse matrix computations. In this article, we introduce an inverse-free approach based on energy zeroing, termed the reciprocal-kind ZN (RKZN) model, specifically designed to tackle the TDSE problem. Additionally, we propose a discrete RKZN (DRKZN) algorithm to address future Sylvester equation (FSE) problems and the motion planning challenges of robot manipulators. Furthermore, we conduct a thorough analysis of the convergence property and robustness of the RKZN method for addressing the TDSE problem. This analysis is grounded in the Lyapunov stability theory of nonlinear systems and a comparative method for nonlinear systems with temporal-dependent error-feedback-related uncertainty disturbances. Numerical experiments, simulations, and physical experiments substantiate the effectiveness and superiority of the developed RKZN method in addressing both the TDSE problem and the motion planning challenges of robot manipulators.

PaperID: 1271,

Authors: Tianyuan Liu, Libin Hou, Xiyu Song, Linyuan Wang, Bin Yan

Affiliations: Henan Key Laboratory of Imaging and Intelligent Processing, Information Engineering University, Zhengzhou, China

Title: Dense Optimizer: An Information Entropy-Guided Structural Search Method for Dense-Like Neural Network Design

Abstract:
Dense convolutional network has been continuously refined to adopt a highly efficient and compact architecture, owing to its lightweight and efficient structure. However, as the current dense-like architectures are mainly designed manually, it becomes increasingly difficult to adjust the channels and reuse level based on past experience. As such, we propose an architecture search method called dense optimizer that can search high-performance dense-like network automatically. In dense optimizer, we view the dense network as a hierarchical information system, maximizing the network’s information entropy while constraining the effectiveness and the distribution of the entropy across each stage via a power law, thereby constructing an optimization problem. We also propose a branch-and-bound optimization algorithm that tightly integrates power-law principle with search space scaling to solve the optimization problem efficiently. The superiority of dense optimizer has been validated on different computer vision benchmark datasets. Our searched model DenseNet-OPT achieved a top-1 accuracy of 84.3% on CIFAR-100, which is 5.97% higher than the original one. Specifically, dense optimizer achieves high-quality search results while only requiring 4 h of computation time on a single CPU.

PaperID: 1272,

Authors: Yanbin Lin, Zhen Ni

Affiliations: Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA

Title: A Robust Multi-Virtual-Agent Inverse Reinforcement Learning Approach With Data Aggregation for Perturbed Environments

Abstract:
Learning control in environments with uncertainties and perturbations remains a challenging issue in the field of artificial intelligence. Though conventional imitation learning (IL) and inverse reinforcement learning (IRL) methods have made some progress in handling perturbations, the repeatability and resilience are somehow limited. To alleviate this issue, we propose a multi-virtual-agent IRL (MVIRL) method to produce stable policies. Specifically, we design multiple virtual agents interacting with pertinent environments. The proposed MVIRL method can recover a resilient reward function from multiple demonstration sources. This recovered reward function provides adequate information and comprehensive coverage of perturbations by considering the upper and lower bounds. Moreover, using maximum discrimination for the worst case and applying data aggregation, the proposed method requires fewer demonstrations than existing methods and improves the ability to handle uncertainties. Case studies with gravity and noise interruptions are considered to validate the effectiveness of the proposed method. The proposed MVIRL method obtains better performance than comparable IL and IRL methods in terms of average return (Avg Return) and standard deviation (SD) metrics, and it is more robust to the level of uncertainties.

PaperID: 1273,

Authors: Nilay Kushawaha, Lorenzo Fruzzetti, Enrico Donato, Egidio Falotico

Affiliations: Department of Excellence in Robotics and AI, Scuola Superiore Sant’Anna, The BioRobotics Institute, Pisa, Italy

Title: SynapNet: A Complementary Learning System Inspired Algorithm With Real-Time Application in Multimodal Perception

Abstract:
Catastrophic forgetting is a phenomenon in which a neural network, upon learning a new task, struggles to maintain its performance on previously learned tasks. It is a common challenge in the realm of continual learning (CL) through neural networks. The mammalian brain addresses catastrophic forgetting by consolidating memories in different parts of the brain, involving the hippocampus and the neocortex. Taking inspiration from this brain strategy, we present a CL framework that combines a plastic model simulating the fast learning capabilities of the hippocampus and a stable model representing the slow consolidation nature of the neocortex. To supplement this, we introduce a variational autoencoder (VAE)-based pseudo memory for rehearsal purposes. In addition by applying lateral inhibition masks on the gradients of the convolutional layer, we aim at damping the activity of adjacent neurons and introduce a sleep phase to reorganize the learned representations. Empirical evaluation demonstrates the positive impact of such additions on the performance of our proposed framework; we evaluate the proposed model on several class-incremental and domain-incremental datasets and compare it with the standard benchmark algorithms, showing significant improvements. With the aim to showcase practical applicability, we implement the algorithm in a physical environment for object classification using a soft pneumatic gripper. The algorithm learns new classes incrementally in real time and also exhibits significant backward knowledge transfer (KT).

PaperID: 1274,

Authors: Xiaobing Dai, Zewen Yang, Sihua Zhang, Di-Hua Zhai, Yuanqing Xia, Sandra Hirche

Affiliations: Chair of Information-Oriented Control (ITR), TUM School of Computation, Information and Technology (CIT), Technical University of Munich (TUM), Munich, Germany; School of Automation, Beijing Institute of Technology, Beijing, China

Title: Cooperative Online Learning for Multiagent System Control via Gaussian Processes With Event-Triggered Mechanism

Abstract:
In the realm of the cooperative control of multiagent systems (MASs) with unknown dynamics, Gaussian process (GP) regression is widely used to infer the uncertainties due to its modeling flexibility of nonlinear functions and the existence of a theoretical prediction error bound. Online learning, which involves incorporating newly acquired training data into GP models, promises to improve control performance by enhancing predictions during the operation. Therefore, this article investigates the online cooperative learning algorithm for MAS control. Moreover, an event-triggered data selection mechanism, inspired by the analysis of a centralized event-trigger (CET), is introduced to reduce the model update frequency and enhance the data efficiency. With the proposed learning-based control, the practical convergence of the MAS is validated with guaranteed tracking performance via the Lyapunov theory. Furthermore, the exclusion of the Zeno behavior for individual agents is shown. Finally, the effectiveness of the proposed event-triggered online learning method is demonstrated in simulations.

PaperID: 1275,

Authors: Jingqi Li, Yuzhen Zhang, Yi Zeng, Changxin Ye, Wenzheng Xu, Xianye Ben, Fei-Yue Wang, Junping Zhang

Affiliations: School of Computer Science, Fudan University, Shanghai, China; Shenzhen Research Institute of Shandong University, Shenzhen, China; DeSci Center of Parallel Intelligence, Óbuda University, Budapest, Hungary

Title: Rethinking Appearance-Based Deep Gait Recognition: Reviews, Analysis, and Insights From Gait Recognition Evolution

Abstract:
Gait recognition is a prominent biometric recognition technique extensively employed in public security. Appearance-based and model-based gait recognition are two categories of methods commonly used. Specifically, appearance-based methods, which use silhouettes to represent body information, typically outperform model-based methods that rely on skeleton data, making them more popular. Recently, the shift from single-frame templates to multiframe silhouettes has advanced appearance-based gait recognition with better spatiotemporal representation. However, there is a notable lack of comprehensive studies that deepen the understanding of multiframe appearance-based gait recognition methods. This article reviews various methods to trace the evolution of gait recognition. Furthermore, we unify various performant models in one framework, study the overlooked effects on data arrangement, and explore the scaling ability of existing methods. Besides the advancement in gait recognition, we also summarize the current challenges and future prospects to foster future research.

PaperID: 1276,

Authors: Tomomasa Yamasaki, Zhehui Wang, Tao Luo, Niangjun Chen, Bo Wang

Affiliations: Information System Technology and Design Pillar, Singapore University of Technology and Design (SUTD), Tampines, Singapore; Technology and Research (A*STAR), Institute of High Performance Computing, Agency for Science, Fusionopolis, Singapore

Title: RBFleX-NAS: Training-Free Neural Architecture Search Using Radial Basis Function Kernel and Hyperparameter Detection

Abstract:
Neural architecture search (NAS) is an automated technique to design optimal neural network architectures for a specific workload. Conventionally, evaluating candidate networks in NAS involves extensive training, which requires significant time and computational resources. To address this, training-free NAS has been proposed to expedite network evaluation with minimal search time. However, state-of-the-art training-free NAS algorithms struggle to precisely distinguish well-performing networks from poorly performing networks, resulting in inaccurate performance predictions and consequently suboptimal top-one network accuracy. Moreover, they are less effective in activation function exploration. To tackle the challenges, this article proposes RBFleX-NAS, a novel training-free NAS framework that accounts for both activation outputs and input features of the last layer with a radial basis function (RBF) kernel. We also present a detection algorithm to identify optimal hyperparameters using the obtained activation outputs and input feature maps. We verify the efficacy of RBFleX-NAS over a variety of NAS benchmarks. RBFleX-NAS significantly outperforms state-of-the-art training-free NAS methods in terms of top-one accuracy, achieving this with short search time in NAS-Bench-201 and NAS-Bench-SSS. In addition, it demonstrates a higher Kendall correlation compared to layer-based training-free NAS algorithms. Furthermore, we propose the neural network activation function benchmark (NAFBee), a new activation design space that extends the activation type to encompass various commonly used functions. In this extended design space, RBFleX-NAS demonstrates its superiority by accurately identifying the best-performing network during activation function search, providing a significant advantage over other NAS algorithms.

PaperID: 1277,

Authors: Ke Wang, Chaoxu Mu, Anguo Zhang, Changyin Sun

Affiliations: School of Electrical and Information Engineering, Tianjin University, Tianjin, China; School of Artificial Intelligence, Anhui University, Hefei, China; School of Automation, Southeast University, Nanjing, China

Title: Data-Model Hybrid-Driven Safe Reinforcement Learning for Adaptive Avoidance Control Against Unsafe Moving Zones

Abstract:
With the gradual application of reinforcement learning (RL), safety has emerged as a paramount concern. This article presents a novel data-model hybrid-driven safe RL (SRL) scheme to address the challenge of avoidance control in the operation domain containing multiple moving unsafe zones. First, the avoidance problem is transformed into the optimal control problem of an augmented system by encoding a barrier function (BF) term into the cost function. Then, using the idea of integral RL (IRL), an adaptive learning algorithm is proposed for generating safe control policies, in which the actor-critic neural network (NN) structure is established with the aid of state-following (StaF) kernel function. The policy iteration process is executed by this structure; specifically, the critic network undergoes gradient-descent adaptation, while the actor network employs gradient projection updating. Particularly, via a state extrapolation technique, both real-time experience and simulated experience are utilized in the learning process. Next, closed-loop stability and weight convergence are theoretically substantiated. Finally, the effectiveness of the proposed scheme is demonstrated on a single integrator system, a nonlinear numerical system, and a unicycle kinematic system; besides, its advantages over the existing control methods are illustrated by comparisons.

PaperID: 1278,

Authors: Yinyan Zhang, Yuhua Zheng, Feng Gao, Shuai Li

Affiliations: College of Cyber Security and Guangdong Key Laboratory of Data Security and Privacy Preserving, Jinan University, Guangzhou, China; School of Intelligent Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China; Zhejiang Lab, Hangzhou, China

Title: Image-Based Visual Servoing of Manipulators With Unknown Depth: A Recurrent Neural Network Approach

Abstract:
The image-based visual servoing (IBVS) of manipulators is important for intelligent manipulation using visual feedbacks. While the traditional IBVS methods for manipulators require the knowledge of the depth information in the interaction matrix, in this article, we propose a novel IBVS method for manipulators without depth estimation by leveraging the property of the associated image Jacobian. Because of a novel transformation, the IBVS problem is converted into a convex optimization problem subject to the kinematic constraint, joint constraints, and other constraints that are not explicitly related to the depth information. The problem is then solved by developing a recurrent neural network of global asymptotic convergence, and a dynamic neural control law without depth estimation emerges for the IBVS of manipulators. The theoretical guarantee and simulation results are provided to show the efficacy of the proposed method.

PaperID: 1279,

Authors: Yang Lv, Jinlong Lei, Peng Yi

Affiliations: Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University, Shanghai, China; Department of Control Science and Engineering, Tongii University, Shanghai, China

Title: A Local Information Aggregation-Based Multiagent Reinforcement Learning for Robot Swarm Dynamic Task Allocation

Abstract:
In this article, we explore how to optimize task allocation for robot swarms in dynamic environments, emphasizing the necessity of formulating robust, flexible, and scalable strategies for robot cooperation. We introduce a novel framework using a decentralized partially observable Markov decision process (Dec-POMDP), specifically designed for distributed robot swarm networks. At the core of our methodology is the local information aggregation multiagent deep deterministic policy gradient (LIA-MADDPG) algorithm, which merges centralized training with distributed execution. During the centralized training phase, a local information aggregation (LIA) module is meticulously designed to gather critical data from neighboring robots, enhancing decision-making efficiency. In the distributed execution phase, a strategy improvement method is proposed to dynamically adjust task allocation based on changing and partially observable environmental conditions. Our empirical evaluations show that the LIA module can be seamlessly integrated into various centralized training and decentralized execution (CTDE)-based multiagent reinforcement learning (MARL) methods, significantly enhancing their performance. Additionally, by comparing LIA-MADDPG with six conventional reinforcement learning algorithms and a heuristic algorithm, we demonstrate its superior scalability, rapid adaptation to environmental changes, and ability to maintain both stability and convergence speed. These results underscore LIA-MADDPG’s outstanding performance and its potential to significantly improve dynamic task allocation in robot swarms through enhanced local collaboration and adaptive strategy execution.

PaperID: 1280,

Authors: Qinmin Yang, Huaying Li, Zhengwei Ruan, Bo Fan, Shuzhi Sam Ge

Affiliations: State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou, China; Hangzhou Huawei Communication Technology Company Ltd., Hangzhou, China; School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an, China; Department of Electrical and Computer Engineering, Institute for Functional Intelligent Materials, National University of Singapore, Queenstown, Singapore

Title: Reinforcement Learning-Based Fault-Tolerant Control of Uncertain Strict-Feedback Nonlinear Systems With Intermittent Actuator Faults

Abstract:
In this work, a novel reinforcement learning-based adaptive fault-tolerant control (FTC) scheme with actuator redundancy is presented for a nonlinear strict-feedback system with nonlinear dynamics and uncertainties. A learning-based switching function technique is established to steer different groups of actuators automatically and successively to mitigate the impact of faulty actuators by observing a switching performance index. The optimal tracking control problem (OTCP) of strict-feedback nonlinear systems is transformed into an equivalent optimal regulation problem of each affine subsystem via adaptive feedforward controllers. Subsequently, the designed objective functions associated with Hamilton–Jacobi–Bellman (HJB) estimate errors caused by neural network (NN) approximations can be minimized by the reinforcement learning algorithm without value or policy iterations. It is proved that the tracking objective can be achieved and all signals in the closed-loop system can be guaranteed to be bounded, as long as the minimum time interval between two successive failures is bounded. Theoretical results are verified by simulations.

PaperID: 1281,

Authors: Luyang Luo, Yanwen Li, Zhizhong Chai, Huangjing Lin, Pheng-Ann Heng, Hao Chen

Affiliations: Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Sai Kung, Hong Kong; Imsight AI Research Lab, Shenzhen, China; Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong; Department of Computer Science and Engineering, Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong, China

Title: Scale-Aware Super-Resolution Network With Dual Affinity Learning for Lesion Segmentation From Medical Images

Abstract:
Convolutional neural networks (CNNs) have shown remarkable progress in medical image segmentation. However, the lesion segmentation remains a challenge to state-of-the-art CNN-based algorithms due to the variance in scales and shapes. On the one hand, tiny lesions are hard to delineate precisely from the medical images which are often of low resolutions. On the other hand, segmenting large-size lesions requires large receptive fields, which exacerbates the first challenge. In this article, we present a scale-aware super-resolution (SR) network to adaptively segment lesions of various sizes from low-resolution (LR) medical images. Our proposed network contains dual branches to simultaneously conduct lesion mask SR (LMSR) and lesion image SR (LISR). Meanwhile, we introduce scale-aware dilated convolution (SDC) blocks into the multitask decoders to adaptively adjust the receptive fields of the convolutional kernels according to the lesion sizes. To guide the segmentation branch to learn from richer high-resolution (HR) features, we propose a feature affinity (FA) module and a scale affinity (SA) module to enhance the multitask learning of the dual branches. On multiple challenging lesion segmentation datasets, our proposed network achieved consistent improvements compared with other state-of-the-art methods. Code will be available at: https://github.com/poiuohke/SASR_Net.

PaperID: 1282,

Authors: Chenglin Zhang, Hong Yu, Guoyin Wang, Yongfang Xie

Affiliations: Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China; School of Information Science and Engineering, Central South University, Changsha, China

Title: LiNGAM-SF: Causal Structural Learning Method With Linear Non-Gaussian Acyclic Models for Streaming Features

Abstract:
The causal structure learning for streaming features (CSLSFs) faces the following challenges: 1) the precision of learned causal structures is limited due to the score-based learning method and 2) they fail to detect the latent confounders. To address the challenges, this article proposes a novel causal structure learning method with linear non-Gaussian acyclic models for streaming features (LiNGAM-SFs), which utilizes the causal identifiability of the data. It is the first time to utilize LiNGAMs for online causal structure learning. First, we utilize the classical SF algorithm to learn the causal skeleton. This article provides the property of this skeleton, proving that two adjacent variables on an edge is one of three possible structures. Second, we give two propositions and identify the causal directions in the presence of latent variables (ICDPLV) subalgorithm to distinguish among the three structures and precisely identify the causal directions. In addition, the subalgorithm can output a candidate set of latent confounders from a local perspective. Finally, the detecting latent confounder (DLC) subalgorithm detects the latent confounders in the candidate set with a global perspective. The precision of the proposed method is increased at least by 11% on average than those of the state-of-the-art method. Furthermore, the experiments verify that the LiNGAM-SF method is able to detect the latent confounders.

PaperID: 1283,

Authors: Yu Wang, Haodong Zhang, Xingli Yang, Jihong Li

Affiliations: School of Modern Educational Technology, Shanxi University, Taiyuan, China; School of Mathematical Sciences, Shanxi University, Taiyuan, China

Title: Deep CNN Feature Resampling and Ensemble Based on Cross Validation for Image Classification

Abstract:
Deep convolutional neural networks (CNNs) such as AlexNet, VGGNet, ResNet, EfficientNet, and MobileNet have been extensively employed in image classification tasks. A common solution is directly feeding deep CNN features extracted from a deep network into a classification function. However, this solution may easily result in poor accuracy and robustness due to the single experimental result. One alternative is utilizing an ensemble of multiple deep networks. And this would bring very expensive, even unacceptable computational complexity. Thus, we propose a new deep CNN feature ensemble frame based on multiple cross validation resampling results of the single feature layer to cope with the above two issues. Theoretically, the proposed method is proved that having a smaller error rate than the single feature layer method and the same Rademacher complexity as the single feature layer method. Moreover, extensive experiments on several challenging image classification databases demonstrate the superiority of the proposed method.

PaperID: 1284,

Authors: Jiangyi Shao, Junjie Chen, Bin Liu

Affiliations: School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China; School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China

Title: ProFun-SOM: Protein Function Prediction for Specific Ontology Based on Multiple Sequence Alignment Reconstruction

Abstract:
Protein function prediction is crucial for understanding species evolution, including viral mutations. Gene ontology (GO) is a standardized representation framework for describing protein functions with annotated terms. Each ontology is a specific functional category containing multiple child ontologies, and the relationships of parent and child ontologies create a directed acyclic graph. Protein functions are categorized using GO, which divides them into three main groups: cellular component ontology, molecular function ontology, and biological process ontology. Therefore, the GO annotation of protein is a hierarchical multilabel classification problem. This hierarchical relationship introduces complexities such as mixed ontology problem, leading to performance bottlenecks in existing computational methods due to label dependency and data sparsity. To overcome bottleneck issues brought by mixed ontology problem, we propose ProFun-SOM, an innovative multilabel classifier that utilizes multiple sequence alignments (MSAs) to accurately annotate gene ontologies. ProFun-SOM enhances the initial MSAs through a reconstruction process and integrates them into a deep learning architecture. It then predicts annotations within the cellular component, molecular function, biological process, and mixed ontologies. Our evaluation results on three datasets (CAFA3, SwissProt, and NetGO2) demonstrate that ProFun-SOM surpasses state-of-the-art methods. This study confirmed that utilizing MSAs of proteins can effectively overcome the two main bottlenecks issues, label dependency and data sparsity, thereby alleviating the root problem, mixed ontology. A freely accessible web server is available at http://bliulab.net/ ProFun-SOM/.

PaperID: 1285,

Authors: Vítor M. Hanriot, Luiz C. B. Torres, Antônio P. Braga

Affiliations: Graduate Program in Electrical Engineering, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil; Department of Computer and Systems, Universidade Federal de Ouro Preto, João Monlevade, Brazil

Title: Multiclass Graph-Based Large Margin Classifiers: Unified Approach for Support Vectors and Neural Networks

Abstract:
While large margin classifiers are originally an outcome of an optimization framework, support vectors (SVs) can be obtained from geometric approaches. This article presents advances in the use of Gabriel graphs (GGs) in binary and multiclass classification problems. For Chipclass, a hyperparameterless and optimization-less GG-based binary classifier, we discuss how activation functions and support edge (SE)-centered neurons affect the classification, proposing smoother functions and structural SV (SSV)-centered neurons to achieve margins with low probabilities and smoother classification contours. We extend the neural network architecture, which can be trained with backpropagation with a softmax function and a cross-entropy loss, or by solving a system of linear equations. A new subgraph-/distance-based membership function for graph regularization is also proposed, along with a new GG recomputation algorithm that is less computationally expensive than the standard approach. Experimental results with the Friedman test show that our method was better than previous GG-based classifiers and statistically equivalent to tree-based models.

PaperID: 1286,

Authors: Hong-An Tang, Yanhong Wang, Xiaofang Hu, Shukai Duan, Lidan Wang

Affiliations: School of Artificial Intelligence, Chongqing University of Technology, Chongqing, China; College of Artificial Intelligence, Southwest University, Chongqing, China

Title: Fixed-Time Passivity and Synchronization of Multiweighted Coupled Memristive Neural Networks With Adaptive Couplings

Abstract:
Two types of multiweighted coupled memristive neural networks (CMNNs) with adaptive couplings are introduced in this article, and the fixed-time passivity (FXTP) and fixed-time synchronization (FXTS) of such networks are considered. First, under the developed adaptive scheme, a sufficient condition to guarantee the FXTP for multiweighted CMNNs with adaptive couplings is obtained. Second, the FXTP, fixed-time input-strict passivity and fixed-time output-strict passivity for multiweighted CMNNs with adaptive couplings and coupling delays are investigated by devising an appropriate state feedback controller. Third, applying the Lyapunov functional method, it establishes the FXTS criteria for the two kinds of networks presented. Finally, numerical examples are provided to demonstrate the effectiveness of the derived results.

PaperID: 1287,

Authors: Ziyu Sheng, Yuting Cao, Yin Yang, Zhong-kai Feng, Kaibo Shi, Tingwen Huang, Shiping Wen

Affiliations: Australian AI Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW, Australia; College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar; College of Hydrology and Water Resources, Hohai University, Nanjing, China; School of Information Science and Engineering, Chengdu University, Chengdu, China; Department of Science Program, Texas A&M University at Qatar, Doha, Qatar

Title: Residual Temporal Convolutional Network With Dual Attention Mechanism for Multilead-Time Interpretable Runoff Forecasting

Abstract:
As a pivotal subfield within the domain of time series forecasting, runoff forecasting plays a crucial role in water resource management and scheduling. Recent advancements in the application of artificial neural networks (ANNs) and attention mechanisms have markedly enhanced the accuracy of runoff forecasting models. This article introduces an innovative hybrid model, ResTCN-DAM, which synergizes the strengths of deep residual network (ResNet), temporal convolutional networks (TCNs), and dual attention mechanisms (DAMs). The proposed ResTCN-DAM is designed to leverage the unique attributes of these three modules: TCN has outstanding capability to process time series data in parallel. By combining with modified ResNet, multiple TCN layers can be densely stacked to capture more hidden information in the temporal dimension. DAM module adeptly captures the interdependencies within both temporal and feature dimensions, adeptly accentuating relevant time steps/features while diminishing less significant ones with minimal computational cost. Furthermore, the snapshot ensemble method is able to obtain the effect of training multiple models through one single training process, which ensures the accuracy and robustness of the forecasts. The deep integration and collaborative cooperation of these modules comprehensively enhance the model’s forecasting capability from various perspectives. Ablation studies conducted validate the efficacy of each module, and through multiple sets of comparative experiments, it is shown that the proposed ResTCN-DAM has exceptional and consistent performance across varying lead times. We also employ visualization techniques to display heatmaps of the model’s weights, thereby enhancing the interpretability of the model. When compared with the prevailing neural network-based runoff forecasting models, ResTCN-DAM exhibits state-of-the-art accuracy, temporal robustness, and interpretability, positioning it at the forefront of contemporary research.

PaperID: 1288,

Authors: Chenying Jin, Xiang Feng, Huiqun Yu

Affiliations: Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China

Title: Embracing Multiheterogeneity and Privacy Security Simultaneously: A Dynamic Privacy-Aware Federated Reinforcement Learning Approach

Abstract:
With growing demand for privacy-preserving reinforcement learning (RL) applications, federated RL (FRL) has emerged as a potential solution. However, existing FRL methods struggle with multiple sources of heterogeneity, while lacking robust privacy guarantees. In this study, we propose DPA-FedRL, the dynamic privacy-aware FRL framework, to simultaneously mitigate both issues. First, we innovatively put forward the concept of “multiheterogeneity” and embed the environmental heterogeneity into agents’ state representations. Next, to ensure privacy during model aggregation, we incorporate a differentially private mechanism in form of Gaussian noise and modify its global sensitivity, tailored to suit FRL’s unique characteristics. Encouragingly, our approach dynamically allocates privacy budget based on heterogeneity levels, which strikes a balance between privacy and utility. From the theoretical perspective, we give rigorous convergence, privacy, and sensitivity guarantees for our proposed method. Through extensive experiments on diverse datasets, we demonstrate that DPA-FedRL surpasses state-of-the-art approaches (PPO-DP-SGD, PAvg, and QAvg) in some highly heterogeneous environments. Notably, our novel privacy attack simulations enable quantitative privacy assessment, validating that DPA-FedRL offers over 1.359× stronger protection than baselines.

PaperID: 1289,

Authors: Predrag S. Stanimirovic, Spyridon D. Mourtas, Dijana Mosic, Vasilios N. Katsikis, Xinwei Cao, Shuai Li

Affiliations: Faculty of Sciences and Mathematics, University of Niš, Niš, Serbia; Laboratory “Hybrid Methods of Modeling and Optimization in Complex Systems,” Siberian Federal University, Krasnoyarsk, Russia; School of Business, Jiangnan University, Wuxi, China; Faculty of Information Technology, University of Oulu, Oulu, Finland

Title: A Zeroing Neural Network Approach for Calculating Time-Varying G-Outer Inverse of Arbitrary Matrix

Abstract:
Calculation of the time-varying (TV) matrix generalized inverse has grown into an essential tool in many fields, such as computer science, physics, engineering, and mathematics, in order to tackle TV challenges. This work investigates the challenge of finding a TV extension of a subclass of inner inverses on real matrices, known as generalized-outer (G-outer) inverses. More precisely, our goal is to construct TV G-outer inverses (TV-GOIs) by utilizing the zeroing neural network (ZNN) process, which is presently thought to be a state-of-the-art solution to tackling TV matrix challenges. Using known advantages of ZNN dynamic systems, a novel ZNN model, called ZNNGOI, is presented in the literature for the first time in order to compute TV-GOIs. The ZNNGOI performs excellently in performed numerical simulations and an application on addressing localization problems. In terms of solving linear TV matrix equations, its performance is comparable to that of the standard ZNN model for computing the pseudoinverse.

PaperID: 1290,

Authors: Cong Guan, Feng Chen, Lei Yuan, Zongzhang Zhang, Yang Yu

Affiliations: National Key Laboratory for Novel Software Technology and the School of Artificial Intelligence, Nanjing University, Nanjing, China

Title: Efficient Communication via Self-Supervised Information Aggregation for Online and Offline Multiagent Reinforcement Learning

Abstract:
Utilizing messages from teammates can improve coordination in cooperative multiagent reinforcement learning (MARL). Previous works typically combine raw messages of teammates with local information as inputs for policy. However, neglecting message aggregation poses significant inefficiency for policy learning. Motivated by recent advances in representation learning, we argue that efficient message aggregation is essential for good coordination in cooperative MARL. In this article, we propose Multiagent communication via Self-supervised Information Aggregation (MASIA), where agents can aggregate the received messages into compact representations with high relevance to augment the local policy. Specifically, we design a permutation-invariant message encoder to generate common information-aggregated representation from messages and optimize it via reconstructing and shooting future information in a self-supervised manner. Hence, each agent would utilize the most relevant parts of the aggregated representation for decision-making by a novel message extraction mechanism. Furthermore, considering the potential of offline learning for real-world applications, we build offline benchmarks for multiagent communication, which is the first as we know. Empirical results demonstrate the superiority of our method in both online and offline settings. We also release the built offline benchmarks in this article as a testbed for communication ability validation to facilitate further future research in this direction.

PaperID: 1291,

Authors: Lukas Gonon, Antoine Jacquier

Affiliations: School of Computer Science, University of St. Gallen, St. Gallen, Switzerland; Department of Mathematics, Imperial College London, London, U.K.

Title: Universal Approximation Theorem and Error Bounds for Quantum Neural Networks and Quantum Reservoirs

Abstract:
Universal approximation theorems are the foundations of classical neural networks, providing theoretical guarantees that the latter are able to approximate maps of interest. Recent results have shown that this can also be achieved in a quantum setting, whereby classical functions can be approximated by parameterized quantum circuits. We provide here precise error bounds for specific classes of functions and extend these results to the interesting new setup of randomized quantum circuits, mimicking classical reservoir neural networks. Our results show in particular that a quantum neural network with \mathcal O(\varepsilon ^-2) weights and \mathcal O (\lceil \log _2(\varepsilon ^-1)\rceil) qubits suffices to achieve approximation error \varepsilon \gt 0 when approximating functions with integrable Fourier transforms.

PaperID: 1292,

Authors: Pengcheng Xia, Yixiang Huang, Chengliang Liu, Jie Liu

Affiliations: State Key Laboratory of Mechanical System and Vibration and the MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China; Department of Mechanical and Aerospace Engineering, Carleton University, Ottawa, ON, Canada

Title: Learn to Supervise: Deep Reinforcement Learning-Based Prototype Refinement for Few-Shot Motor Fault Diagnosis

Abstract:
Motor fault diagnosis is a fundamental aspect of ensuring the reliability of industrial equipment. However, industrial scenarios exhibit an inherent data scarcity problem, which imposes significant restrictions on the practical application of traditional deep learning-based intelligent fault diagnosis (IFD) methods. Typically, only a small volume of labeled data along with limited informative unlabeled data are available from industrial motors. Effectively utilizing informative unlabeled samples in the context of few-shot fault diagnosis poses a substantial challenge. In this article, a prototype refinement method for semi-supervised few-shot fault diagnosis based on deep reinforcement learning (DRL) is proposed. First, we propose to formalize a Markov decision process (MDP) of an iterative semi-supervised meta-learning strategy involving the selection of informative unlabeled samples and the refinement of category prototypes. Subsequently, we develop a mirror prototypical network (ProtoNet) structure for interaction with a DRL agent, which learns to adaptively select valuable samples to supervise the diagnosis process. Moreover, a state space involving feature embedding and category information is designed, and a comprehensive reward taking into account selection confidence, effectiveness, and representative is proposed. Extensive experiments on several motor experimental datasets verify the method’s effectiveness in few-shot diagnosis of unseen faults and new working conditions.

PaperID: 1293,

Authors: Ye Yuan, Ying Wang, Xin Luo

Title: A Node-Collaboration-Informed Graph Convolutional Network for Highly Accurate Representation to Undirected Weighted Graph

Abstract:
An undirected weighted graph (UWG) is regularly adopted to portray the interactions among a solo set of nodes from big data-connected applications such as the interactive confidence between proteins in a protein network. A graph convolutional network (GCN) is able to represent a UWG for subsequent pattern analysis tasks such as missing link estimation. However, existing GCNs mostly neglect the local collaborative information hidden in connected node pairs, which leads to severe information loss. To address this issue, this study proposes a node-collaboration-informed graph convolutional network (NGCN) model for implementing the precise UWG representation learning with threefold ideas: 1) extracting the nodes’ global graph characteristics via incorporating the residual connection and weighted representation propagation into the GCN module; 2) learning the nodes’ local collaborative information from the observed interactive node pairs via a symmetric latent factor analysis (SLFA) module; and 3) designing an effective strategy to fuse the nodes’ global graph characteristics and local collaborative information adaptively for highly accurate representation to the target UWG. Its high representation ability to target UWG is proved in theory. Empirical studies on six UWGs generated by real-world applications indicate that owing to its elegant modeling for the node collaborations, the proposed NGCN significantly outperforms several leading-edge models in estimation accuracy to the missing links of a UWG. Its high scalability ensures its compatibility with other GCN extensions, which will be investigated in the future.

PaperID: 1294,

Authors: Zhengguo Huang, Mou Chen, Peng Shi, Hao Shen

Affiliations: College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China; School of Electrical and Electronic Engineering, The University of Adelaide, Adelaide, SA, Australia; Anhui Province Key Laboratory of Special Heavy Load Robot and the School of Electrical and Information Engineering, Anhui University of Technology, Ma’anshan, China

Title: Adaptive Neural Network Control for Fixed-Wing UAV With Disturbance Observer Under Switching Disturbance

Abstract:
The adaptive neural network (NN) control for the fixed-wing unmanned aerial vehicle (FUAV) under the unmodeled dynamics and the time-varying switching disturbance (TVSD) is investigated in this article. To better describe the TVSD induced by the change in the flight area of the FUAV, a switching augmented model (SAM) based on the known information about the TVSD is proposed first. The parameter adaptation technique is used to estimate the related TVSD. Thereafter, the time-varying disturbance that cannot be described by the SAM is estimated by the disturbance observer (DO). The radial basis function NN (RBFNN) is adopted to approximate the unknown unmodeled dynamics. The coupling terms derived from the co-design of DO and the parameter adaptation (PA) are separated by some inequality techniques. Then, the separated unknown terms are eliminated by designing the parameters of the controller and that of the adaptive law. The separated known terms are tackled by adding robust control terms to the controller. In addition, to improve the estimation performance for the TVSD and RBFNN, the auxiliary system in the DO form is designed. Sufficient stable conditions about the closed-loop switched system (CLSS) are obtained with and without the inequality about the switching times. Finally, an illustrative example is given to show the feasibility and advantage of the proposed control strategy by the attitude model of the FUAV.

PaperID: 1295,

Authors: Fuhong Min, Chengjie Chen, Neil G. R. Broderick

Affiliations: School of Electrical and Automation Engineering, Nanjing Normal University, Nanjing, China; School Computer and Electronic Information and the School of Artificial Intelligence, Nanjing Normal University, Nanjing, China; Department of Physics, Dodd Walls Centre for Photonics and Quantum Technologies, The University of Auckland, Auckland, New Zealand

Title: Coupled Homogeneous Hopfield Neural Networks: Simplest Model Design, Synchronization, and Multiplierless Circuit Implementation

Abstract:
When using a synapse as a coupler to connect neurons, parameter-based synchronization transitions have been investigated. However, the dependence on initial conditions has not been comprehensively discussed in the literature. This work presents an electrical-synapse-coupled model consisting of two homogeneous Hopfield neural networks (HNNs), which is the simplest network-to-network coupling model known for HNN. The model possesses several fixed points, which are found to be unstable. Simulation results of peak differences, bifurcation diagrams, and normalized mean synchronization errors indicate that complex synchronization transitions occur, depending on both the electrical coupling strength and initial conditions. Particularly, we focus here on mapping the basins of attraction between periodic and chaotic synchronization for bistable patterns. Finally, a multiplierless electrical neuron circuit is developed to validate initial condition-induced synchronization phenomena, which provides a new perspective for the study of collective dynamics of brain-like networks and the development of lightweight neuromorphic circuits.

PaperID: 1296,

Authors: Yingjie Song, Zhi Liu, Gongyang Li, Jiawei Xie, Qiang Wu, Dan Zeng, Lihua Xu, Tianhong Zhang, Jijun Wang

Affiliations: Global Big Data Technologies Centre (GBDTC), School of Electrical and Data Engineering, University of Technology Sydney, Ultimo, NSW, Australia; School of Communication and Information Engineering, Shanghai University, Shanghai, China; Shanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China

Title: EMS: A Large-Scale Eye Movement Dataset, Benchmark, and New Model for Schizophrenia Recognition

Abstract:
Schizophrenia (SZ) is a common and disabling mental illness, and most patients encounter cognitive deficits. The eye-tracking technology has been increasingly used to characterize cognitive deficits for its reasonable time and economic costs. However, there is no large-scale and publicly available eye movement dataset and benchmark for SZ recognition. To address these issues, we release a large-scale Eye Movement dataset for SZ recognition (EMS), which consists of eye movement data from 104 schizophrenics and 104 healthy controls (HCs) based on the free-viewing paradigm with 100 stimuli. We also conduct the first comprehensive benchmark, which has been absent for a long time in this field, to compare the related 13 psychosis recognition methods using six metrics. Besides, we propose a novel mean-shift-based network (MSNet) for eye movement-based SZ recognition, which elaborately combines the mean shift algorithm with convolution to extract the cluster center as the subject feature. In MSNet, first, a stimulus feature branch (SFB) is adopted to enhance each stimulus feature with similar information from all stimulus features, and then, the cluster center branch (CCB) is utilized to generate the cluster center as subject feature and update it by the mean shift vector. The performance of our MSNet is superior to prior contenders, thus, it can act as a powerful baseline to advance subsequent study. To pave the road in this research field, the EMS dataset, the benchmark results, and the code of MSNet are publicly available at https://github.com/YingjieSong1/EMS.

PaperID: 1297,

Authors: Jiahui Qu, Jizhou Cui, Wenqian Dong, Qian Du, Xiaoyang Wu, Song Xiao, Yunsong Li

Affiliations: State Key Laboratory of Integrated Service Network, Xidian University, Xi’an, China; Department of Electronic and Computer Engineering, Mississippi State University, Starkville, MS, USA

Title: A Principle Design of Registration-Fusion Consistency: Toward Interpretable Deep Unregistered Hyperspectral Image Fusion

Abstract:
For hyperspectral image (HSI) and multispectral image (MSI) fusion, it is often overlooked that multisource images acquired under different imaging conditions are difficult to be perfectly registered. Although some works attempt to fuse unregistered images, two thorny challenges remain. One is that registration and fusion are usually modeled as two independent tasks, and there is no yet a unified physical model to tightly couple them. Another is that deep learning (DL)-based methods may lack sufficient interpretability and generalization. In response to the above challenges, we propose an unregistered HSI fusion framework energized by a unified model of registration and fusion. First, a novel registration-fusion consistency physical perception model (RFCM) is designed, which uniformly models the image registration and fusion problem to greatly reduce the sensitivity of fusion performance to registration accuracy. Then, an HSI fusion framework (MoE-PNP) is proposed to learn the knowledge reasoning process for solving RFCM. Each basic module of MoE-PNP one-to-one corresponds to the operation in the optimization algorithm of RFCM, which can ensure clear interpretability of the network. Moreover, MoE-PNP captures the general fusion principle for different unregistered images and therefore has good generalization. Extensive experiments demonstrate that MoE-PNP achieves state-of-the-art performance for unregistered HSI and MSI fusion. The code is available at https://github.com/Jiahuiqu/MoE-PNP.

PaperID: 1298,

Authors: Mingzhu Wang, Xiaodi Li, Shiji Song

Affiliations: School of Mathematics and Statistics, Shandong Normal University, Jinan, China; Department of Automation, Tsinghua University, Beijing, China

Title: Local Synchronization for Delayed Complex Dynamical Networks via Self-Triggered Impulsive Control Involving Delays

Abstract:
This article investigates the local synchronization for delayed complex dynamical networks (CDNs) under self-triggered impulsive control (STIC) approaches involving delays. With the help of Lyapunov-Razumikhin methods and comparison principle, some design criteria of STIC strategies ensuring local synchronization for delayed CDNs with delayed impulses are provided, and Zeno behavior can be avoided. Compared with the existing results on synchronization of CDNs under STIC, in this article, time delays in both continuous and discrete system dynamics are well considered. Moreover, the proposed self-triggered mechanism (STM) is an explicit expression, under which the next triggering instant can be derived directly, with simple structure and easy implementation. Finally, two numerical examples are provided to validate the proposed theoretical criteria.

PaperID: 1299,

Authors: Baifan Chen, Xiaotian Lv, Yuqian Zhao, Lingli Yu

Affiliations: School of Automation, Central South University, Changsha, China

Title: TPDC: Point Cloud Completion by Triangular Pyramid Features and Divide-and-Conquer in Complex Environments

Abstract:
Point cloud completion recovers the complete point clouds from partial ones, providing numerous point cloud information for downstream tasks such as 3-D reconstruction and target detection. However, previous methods usually suffer from unstructured prediction of points in local regions and the discrete nature of the point cloud. To resolve these problems, we propose a point cloud completion network called TPDC. Representing the point cloud as a set of unordered features of points with local geometric information, we devise a Triangular Pyramid Extractor (TPE), using the simplest 3-D structure—a triangular pyramid—to convert the point cloud to a sequence of local geometric information. Our insight of revealing local geometric information in a complex environment is to design a Divide-and-Conquer Splitting Module in a Divide-and-Conquer Splitting Decoder (DCSD) to learn point-splitting patterns that can fit local regions the best. This module employs the Divide-and-Conquer approach to parallelly handle tasks related to fitting ground-truth values to base points and predicting the displacement of split points. This approach aims to make the base points align more closely with the ground-truth values while also forecasting the displacement of split points relative to the base points. Furthermore, we propose a more realistic and challenging benchmark, ShapeNetMask, with more random point cloud input, more complex random item occlusion, and more realistic random environmental perturbations. The results show that our method outperforms both widely used benchmarks as well as the new benchmark.

PaperID: 1300,

Authors: Zhangchi Qiu, Ye Tao, Shirui Pan, Alan Wee-Chung Liew

Affiliations: School of Information and Communication Technology, Griffith University, Gold Coast, Australia

Title: Knowledge Graphs and Pretrained Language Models Enhanced Representation Learning for Conversational Recommender Systems

Abstract:
Conversational recommender systems (CRSs) utilize natural language interactions and dialog history to infer user preferences and provide accurate recommendations. Due to the limited conversation context and background knowledge, existing CRSs rely on external sources such as knowledge graphs (KGs) to enrich the context and model entities based on their interrelations. However, these methods ignore the rich intrinsic information within entities. To address this, we introduce the knowledge-enhanced entity representation learning (KERL) framework, which leverages both the KG and a pretrained language model (PLM) to improve the semantic understanding of entities for CRS. In our KERL framework, entity textual descriptions are encoded via a PLM, while a KG helps reinforce the representation of these entities. We also employ positional encoding to effectively capture the temporal information of entities in a conversation. The enhanced entity representation is then used to develop a recommender component that fuses both entity and contextual representations for more informed recommendations, as well as a dialog component that generates informative entity-related information in the response text. A high-quality KG with aligned entity descriptions is constructed to facilitate this study, namely, the Wiki Movie Knowledge Graph (WikiMKG). The experimental results show that KERL achieves state-of-the-art results in both recommendation and response generation tasks. Our code is publicly available at the link: https://github.com/icedpanda/KERL.

PaperID: 1301,

Authors: Yulin Zhu, Yuni Lai, Kaifa Zhao, Xiapu Luo, Mingquan Yuan, Jun Wu, Jian Ren, Kai Zhou

Affiliations: Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong; Sam’s Club Innovation Center, Dallas, TX, USA; Institute of Cyber Science and Technology, Shanghai Jiaotong University, Shanghai, China; Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, USA

Title: From Bi-Level to One-Level: A Framework for Structural Attacks to Graph Anomaly Detection

Abstract:
The success of graph neural networks stimulates the prosperity of graph mining and the corresponding downstream tasks including graph anomaly detection (GAD). However, it has been explored that those graph mining methods are vulnerable to structural manipulations on relational data. That is, the attacker can maliciously perturb the graph structures to assist the target nodes in evading anomaly detection. In this article, we explore the structural vulnerability of two typical GAD systems: unsupervised FeXtra-based GAD and supervised graph convolutional network (GCN)-based GAD. Specifically, structural poisoning attacks against GAD are formulated as complex bi-level optimization problems. Our first major contribution is then to transform the bi-level problem into one-level leveraging different regression methods. Furthermore, we propose a new way of utilizing gradient information to optimize the one-level optimization problem in the discrete domain. Comprehensive experiments demonstrate the effectiveness of our proposed attack algorithm \textsf BinarizedAttack .

PaperID: 1302,

Authors: Guanchun Wang, Xiangrong Zhang, Zelin Peng, Tianyang Zhang, Xu Tang, Huiyu Zhou, Licheng Jiao

Affiliations: School of Artificial Intelligence, Xidian University, Xi’an, China; MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China; School of Computing and Mathematical Sciences, University of Leicester, Leicester, U.K.

Title: Negative Deterministic Information-Based Multiple Instance Learning for Weakly Supervised Object Detection and Segmentation

Abstract:
Weakly supervised object detection (WSOD) and semantic segmentation with image-level annotations have attracted extensive attention due to their high label efficiency. Multiple instance learning (MIL) offers a feasible solution for the two tasks by treating each image as a bag with a series of instances (object regions or pixels) and identifying foreground instances that contribute to bag classification. However, conventional MIL paradigms often suffer from issues, e.g., discriminative instance domination and missing instances. In this article, we observe that negative instances usually contain valuable deterministic information, which is the key to solving the two issues. Motivated by this, we propose a novel MIL paradigm based on negative deterministic information (NDI), termed NDI-MIL, which is based on two core designs with a progressive relation: NDI collection and negative contrastive learning (NCL). In NDI collection, we identify and distill NDI from negative instances online by a dynamic feature bank. The collected NDI is then utilized in a NCL mechanism to locate and punish those discriminative regions, by which the discriminative instance domination and missing instances issues are effectively addressed, leading to improved object- and pixel-level localization accuracy and completeness. In addition, we design an NDI-guided instance selection (NGIS) strategy to further enhance the systematic performance. Experimental results on several public benchmarks, including PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO, show that our method achieves satisfactory performance. The code is available at: https://github.com/GC-WSL/NDI.

PaperID: 1303,

Authors: Yan-Jun Liu, Xuebin Shang, Li Tang, Sai Zhang

Affiliations: College of Science, Liaoning University of Technology, Jinzhou, China

Title: Finite-Time Consensus Adaptive Neural Network Control for Nonlinear Multiagent Systems Under PDE Models

Abstract:
In this article, a novel adaptive control method based on neural networks is proposed for a class of multiagent systems (MASs) with nonlinear functions and external disturbances. First, the approximation properties of neural networks are used to approximate the MAS partial differential equation (PDE) model with nonlinear terms containing two variables, time t , and spatial variable x . Second, an adaptive controller is constructed to actuate the parabolic MAS to reach consensus under external disturbances. Based on this, the finite-time theorem and special inequalities are applied to prove the stability of the closed-loop system. Thus, MAS that have nonlinear functions and external disturbances are enabled with finite-time consensus. Finally, the effectiveness of the proposed control method is demonstrated by numerical simulations.

PaperID: 1304,

Authors: Duc M. Le, Omkar Sudhir Patil, Cristian F. Nino, Warren E. Dixon

Affiliations: Aurora Flight Sciences, a Boeing Company, Cambridge, MA, USA; Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville, FL, USA

Title: Accelerated Gradient Approach For Deep Neural Network-Based Adaptive Control of Unknown Nonlinear Systems

Abstract:
Recent connections in the adaptive control literature to continuous-time analogs of Nesterov’s accelerated gradient method have led to the development of new real-time adaptation laws based on accelerated gradient methods. However, previous results assume that the system’s uncertainties are linear-in-the-parameters (LIP). To compensate for non-LIP uncertainties, our preliminary results developed a neural network (NN)-based accelerated gradient adaptive controller to achieve trajectory tracking for nonlinear systems; however, the development and analysis only considered single-hidden-layer NNs. In this article, a generalized deep NN (DNN) architecture with an arbitrary number of hidden layers is considered, and a new DNN-based accelerated gradient adaptation scheme is developed to generate estimates of all the DNN weights in real-time. A nonsmooth Lyapunov-based analysis is used to guarantee the developed accelerated gradient-based DNN adaptation design achieves global asymptotic tracking error convergence for general nonlinear control affine systems subject to unknown (non-LIP) drift dynamics and exogenous disturbances. A comprehensive set of simulation studies are conducted on a two-state nonlinear system, a robotic manipulator, and a complex 20-D nonlinear system to demonstrate the improved performance of the developed method. Our simulation studies demonstrate enhanced tracking and function approximation performance from both DNN architectures and accelerated gradient adaptation.

PaperID: 1305,

Authors: Lijiang Li, Xiang Chang, Fei Chao, Chih-Min Lin, Tuan-Tu Huynh, Longzhi Yang, Changjing Shang, Qiang Shen

Affiliations: Department of Artificial Intelligence, School of Informatics, Xiamen University, Xiamen, China; Institute of Mathematics, Physics and Computer Science, Aberystwyth University, Aberystwyth, U.K.; Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan; Faculty of Mechatronics and Electronics, Lac Hong University, Biên Hòa, Vietnam; Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne, U.K.

Title: Self-Organizing Type-2 Fuzzy Double Loop Recurrent Neural Network for Uncertain Nonlinear System Control

Abstract:
Nonlinear systems, such as robotic systems, play an increasingly important role in our modern daily life and have become more dominant in many industries; however, robotic control still faces various challenges due to diverse and unstructured work environments. This article proposes a double-loop recurrent neural network (DLRNN) with the support of a Type-2 fuzzy system and a self-organizing mechanism for improved performance in nonlinear dynamic robot control. The proposed network has a double-loop recurrent structure, which enables better dynamic mapping. In addition, the network combines a Type-2 fuzzy system with a double-loop recurrent structure to improve the ability to deal with uncertain environments. To achieve an efficient system response, a self-organizing mechanism is proposed to adaptively adjust the number of layers in a DLRNN. This work integrates the proposed network into a conventional sliding mode control (SMC) system to theoretically and empirically prove its stability. The proposed system is applied to a three-joint robot manipulator, leading to a comparative study that considers several existing control approaches. The experimental results confirm the superiority of the proposed system and its effectiveness and robustness in response to various external system disturbances.

PaperID: 1306,

Authors: Shuiqing Xu, Lejing Wang, Haosong Dai, Hai Wang, Hongtian Chen, Yi Chai, Wei Xing Zheng

Affiliations: College of Electrical Engineering and Automation, Hefei University of Technology, Hefei, China; Discipline of Engineering and Energy, Murdoch University, Perth, WA, Australia; Department of Automation, Shanghai Jiao Tong University, Shanghai, China; College of Electrical Engineering and Automation, Chongqing University, Chongqing, China; School of Computer, Data and Mathematical Sciences, Western Sydney University, Penrith, NSW, Australia

Title: A Segmented Iterative Learning Scheme-Based Distributed Fault Estimation for Switched Interconnected Nonlinear Systems

Abstract:
In this article, a distributed fault estimation (DFE) approach for switched interconnected nonlinear systems (SINSs) with time delays and external disturbances is proposed using a novel segmented iterative learning scheme (SILS). First, through the utilization of interrelated information among subsystems, a distributed iterative learning observer is developed to enhance the accuracy of fault estimation results, which can realize the fault estimation of all subsystems under time delays and external disturbances. Simultaneously, to facilitate rapid fault information tracking and significantly reduce sensitivity to interference, a new SILS-based fault estimation law is constructed by combining the idea of segmented design with the method of variable gain. Then, an assessment of the convergence of the established fault estimation methodology is conducted, and the configurations of observer gain matrices and iterative learning gain matrices are duly accomplished. Finally, simulation results are showcased to demonstrate the superiority and feasibility of the developed fault estimation approach.

PaperID: 1307,

Authors: Jian Xu, Shian Du, Junmei Yang, Qianli Ma, Delu Zeng

Affiliations: School of Mathematics, South China University of Technology, Guangzhou, China; Shenzhen International Graduate School, Tsinghua University, Shenzhen, China; School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

Title: Neural Operator Variational Inference Based on Regularized Stein Discrepancy for Deep Gaussian Processes

Abstract:
Deep Gaussian process (DGP) models offer a powerful nonparametric approach for Bayesian inference, but exact inference is typically intractable, motivating the use of various approximations. However, existing approaches, such as mean-field Gaussian assumptions, limit the expressiveness and efficacy of DGP models, while stochastic approximation can be computationally expensive. To tackle these challenges, we introduce neural operator variational inference (NOVI) for DGPs. NOVI uses a neural generator to obtain a sampler and minimizes the regularized Stein discrepancy (RSD) between the generated distribution and true posterior in \mathcal L_2 space. We solve the minimax problem using Monte Carlo estimation and subsampling stochastic optimization techniques and demonstrate that the bias introduced by our method can be controlled by multiplying the Fisher divergence with a constant, which leads to robust error control and ensures the stability and precision of the algorithm. Our experiments on datasets ranging from hundreds to millions demonstrate the effectiveness and the faster convergence rate of the proposed method. We achieve a classification accuracy of 93.56 on the CIFAR10 dataset, outperforming state-of-the-art (SOTA) Gaussian process (GP) methods. We are optimistic that NOVI possesses the potential to enhance the performance of deep Bayesian nonparametric models and could have significant implications for various practical applications.

PaperID: 1308,

Authors: Xiaoqing Zhang, Zunjie Xiao, Xiao Wu, Yanlin Chen, Jilu Zhao, Yan Hu, Jiang Liu

Affiliations: Department of Computer Science and Engineering and Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology, Shenzhen, China; Department of Computer Science and Engineering and Research Institute of Trustworthy Autonomous Systems, Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation, Southern University of Science and Technology, Shenzhen, China

Title: Pyramid Pixel Context Adaption Network for Medical Image Classification With Supervised Contrastive Learning

Abstract:
Spatial attention (SA) mechanism has been widely incorporated into deep neural networks (DNNs), significantly lifting the performance in computer vision tasks via long-range dependency modeling. However, it may perform poorly in medical image analysis. Unfortunately, the existing efforts are often unaware that long-range dependency modeling has limitations in highlighting subtle lesion regions. To overcome this limitation, we propose a practical yet lightweight architectural unit, pyramid pixel context adaption (PPCA) module, which exploits multiscale pixel context information to recalibrate pixel position in a pixel-independent manner dynamically. PPCA first applies a well-designed cross-channel pyramid pooling (CCPP) to aggregate multiscale pixel context information, then eliminates the inconsistency among them by the well-designed pixel normalization (PN), and finally estimates per pixel attention weight via a pixel context integration. By embedding PPCA into a DNN with negligible overhead, the PPCA network (PPCANet) is developed for medical image classification. In addition, we introduce supervised contrastive learning to enhance feature representation by exploiting the potential of label information via supervised contrastive loss (CL). The extensive experiments on six medical image datasets show that the PPCANet outperforms state-of-the-art (SOTA) attention-based networks and recent DNNs. We also provide visual analysis and ablation study to explain the behavior of PPCANet in the decision-making process.

PaperID: 1309,

Authors: Chenyu Li, Bing Zhang, Danfeng Hong, Xiuping Jia, Antonio Plaza, Jocelyn Chanussot

Affiliations: Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; School of Engineering and Information Technology, The University of New South Wales, Canberra, ACT, Australia; Department of Technology of Computers and Communications, Escuela Politécnica, Hyperspectral Computing Laboratory, University of Extremadura, Cáceres, Spain

Title: Learning Disentangled Priors for Hyperspectral Anomaly Detection: A Coupling Model-Driven and Data-Driven Paradigm

Abstract:
Accurately distinguishing between background and anomalous objects within hyperspectral images poses a significant challenge. The primary obstacle lies in the inadequate modeling of prior knowledge, leading to a performance bottleneck in hyperspectral anomaly detection (HAD). In response to this challenge, we put forth a groundbreaking coupling paradigm that combines model-driven low-rank representation (LRR) methods with data-driven deep learning techniques by learning disentangled priors (LDP). LDP seeks to capture complete priors for effectively modeling the background, thereby extracting anomalies from hyperspectral images more accurately. LDP follows a model-driven deep unfolding architecture, where the prior knowledge is separated into the explicit low-rank prior formulated by expert knowledge and implicit learnable priors by means of deep networks. The internal relationships between explicit and implicit priors within LDP are elegantly modeled through a skip residual connection. Furthermore, we provide a mathematical proof of the convergence of our proposed model. Our experiments, conducted on multiple widely recognized datasets, demonstrate that LDP surpasses most of the current advanced HAD techniques, exceling in both detection performance and generalization capability.

PaperID: 1310,

Authors: Chen Ou, Hongqiu Zhu, Yuri A. W. Shardt, Lingjian Ye, Xiaofeng Yuan, Yalin Wang, Chunhua Yang

Affiliations: School of Automation, Central South University, Changsha, China; Department of Automation Engineering, Technical University of Ilmenau, Ilmenau, Germany; School of Engineering, Huzhou University, Huzhou, China

Title: Quality-Driven Regularization for Deep Learning Networks and Its Application to Industrial Soft Sensors

Abstract:
The growth of data collection in industrial processes has led to a renewed emphasis on the development of data-driven soft sensors. A key step in building an accurate, reliable soft sensor is feature representation. Deep networks have shown great ability to learn hierarchical data features using unsupervised pretraining and supervised fine-tuning. For typical deep networks like stacked auto-encoder (SAE), the pretraining stage is unsupervised, in which some important information related to quality variables may be discarded. In this article, a new quality-driven regularization (QR) is proposed for deep networks to learn quality-related features from industrial process data. Specifically, a QR-based SAE (QR-SAE) is developed, which changes the loss function to control the weights of the different input variables. By choosing an appropriate inductive bias for the weight matrix, the model provides quality-relevant information for predictive modeling. Finally, the proposed QR-SAE is used to predict the quality of a real industrial hydrocracking process. Comparative experiments show that QR-SAE can extract quality-related features and achieve accurate prediction performance.

PaperID: 1311,

Authors: Yuki Hirayama, Shinya Takamaeda-Yamazaki

Affiliations: Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan

Title: Scalable Moment Propagation and Analysis of Variational Distributions for Practical Bayesian Deep Learning

Abstract:
Bayesian deep learning is one of the key frameworks employed in handling predictive uncertainty. Variational inference (VI), an extensively used inference method, derives the predictive distributions by Monte Carlo (MC) sampling. The drawback of MC sampling is its extremely high computational cost compared to that of ordinary deep learning. In contrast, the moment propagation (MP)-based approach propagates the output moments of each layer to derive predictive distributions instead of MC sampling. Because of this computational property, it is expected to realize faster inference than MC-based approaches. However, the applicability of the MP-based method in deep models has not been explored sufficiently, even though some studies have demonstrated the effectiveness of MP only in small toy models. One of the reasons is that it is difficult to train deep models by MP because of the large variance in activations. To realize MP in deep models, some normalization layers are required but have not yet been studied. In addition, it is still difficult to design well-calibrated MP-based models, because the effectiveness of MP-based methods under various variational distributions has also not been investigated. In this study, we propose a fast and reliable MP-based Bayesian deep-learning method. First, to train deep-learning models using MP, we introduce a batch normalization layer extended to random variables to prevent increases in the variance of activations. Second, to identify the appropriate variational distribution in MP, we investigate the treatment of moments of several variational distributions and evaluate their uncertainty quality of predictions. Experiments with regression tasks demonstrate that the MP-based method provides qualitatively and quantitatively equivalent predictive performance to MC-based methods regardless of variational distributions. In the classification tasks, we show that we can train MP-based deep models by extended batch normalization. We also show that the MP-based approach realizes 2.0–2.8 times faster inference than the MC-based approach while maintaining the predictive performance. The results of this study can help realize a fast and well-calibrated uncertainty estimation method that can be deployed in a wider range of reliability-aware applications.

PaperID: 1312,

Authors: Songqiao Hu, Zeyi Liu, Minyue Li, Xiao He

Affiliations: Department of Automation, Tsinghua University, Beijing, China; Meta Platforms, Stockholm, Sweden

Title: CADM+: Confusion-Based Learning Framework With Drift Detection and Adaptation for Real-Time Safety Assessment

Abstract:
Real-time safety assessment (RTSA) of dynamic systems holds substantial implications across diverse fields, including industrial and electronic applications. However, the complexity and rapid flow nature of data streams, coupled with the expensive label cost and pose significant challenges. To address these issues, a novel confusion-based learning framework, termed confusion-and-detection method plus (CADM+), is proposed in this article. When drift occurs, the model is updated with uncertain samples, which may cause confusion between existing and new concepts, resulting in performance differences. The cosine similarity is used to measure the degree of such conceptual confusion in the model. Furthermore, the change of standard deviation within a fixed-size cosine similarity window is introduced as an indicator for drift detection. Theoretical demonstrations show the asymptotic increase of cosine similarity. In addition, the approximate independence of the change in standard deviation with the number of trained samples is indicated. Finally, the extreme value theory (EVT) is applied to determine the threshold of judging drifts. Several experiments are conducted to verify its effectiveness. Experimental results prove that the proposed framework is more suitable for RTSA tasks compared with state-of-the-art algorithms. The source code is available at https://github.com/THUFDD/CADM-plus.

PaperID: 1313,

Authors: Swapnil Dey, Vipul Arora, Sachchida Nand Tripathi

Affiliations: Department of Electrical Engineering, Indian Institute of Technology at Kanpur, Kanpur, India; Department of Civil Engineering and the Department of Sustainable Energy Engineering, Indian Institute of Technology at Kanpur, Kanpur, India

Title: Leveraging Unsupervised Data and Domain Adaptation for Deep Regression in Low-Cost Sensor Calibration

Abstract:
Air quality monitoring is becoming an essential task with rising awareness about air quality. Low-cost air quality sensors are easy to deploy but are not as reliable as the costly and bulky reference monitors. The low-quality sensors can be calibrated against the reference monitors with the help of deep learning. In this article, we translate the task of sensor calibration into a semi-supervised domain adaptation problem and propose a novel solution for the same. The problem is challenging, because it is a regression problem with a covariate shift and label gap. We use histogram loss instead of mean-squared or mean absolute error (MAE), which is commonly used for regression, and find it useful against covariate shift. To handle the label gap, we propose the weighting of samples for adversarial entropy optimization. In experimental evaluations, the proposed scheme outperforms many competitive baselines, which are based on semi-supervised and supervised domain adaptation, in terms of R^2 score and MAE. Ablation studies show the relevance of each proposed component in the entire scheme.

PaperID: 1314,

Authors: Hui He, Qi Zhang, Kun Yi, Kaize Shi, Zhendong Niu, Longbing Cao

Affiliations: School of Medical Technology, Beijing Institute of Technology, Beijing, China; School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China; Data Science and Machine Intelligence Laboratory, University of Technology Sydney, Sydney, NSW, Australia; DataX Research Centre, School of Computing, Macquarie University, Sydney, NSW, Australia

Title: Distributional Drift Adaptation With Temporal Conditional Variational Autoencoder for Multivariate Time Series Forecasting

Abstract:
Due to the nonstationary nature, the distribution of real-world multivariate time series (MTS) changes over time, which is known as distribution drift. Most existing MTS forecasting models greatly suffer from distribution drift and degrade the forecasting performance over time. Existing methods address distribution drift via adapting to the latest arrived data or self-correcting per the meta knowledge derived from future data. Despite their great success in MTS forecasting, these methods hardly capture the intrinsic distribution changes, especially from a distributional perspective. Accordingly, we propose a novel framework temporal conditional variational autoencoder (TCVAE) to model the dynamic distributional dependencies over time between historical observations and future data in MTSs and infer the dependencies as a temporal conditional distribution to leverage latent variables. Specifically, a novel temporal Hawkes attention (THA) mechanism represents temporal factors that subsequently fed into feedforward networks to estimate the prior Gaussian distribution of latent variables. The representation of temporal factors further dynamically adjusts the structures of Transformer-based encoder and decoder to distribution changes by leveraging a gated attention mechanism (GAM). Moreover, we introduce conditional continuous normalization flow (CCNF) to transform the prior Gaussian to a complex and form-free distribution to facilitate flexible inference of the temporal conditional distribution. Extensive experiments conducted on six real-world MTS datasets demonstrate the TCVAE’s superior robustness and effectiveness over the state-of-the-art MTS forecasting baselines. We further illustrate the TCVAE applicability through multifaceted case studies and visualization in real-world scenarios.

PaperID: 1315,

Authors: Ronghua Shang, Jingyu Zhong, Weitong Zhang, Songhua Xu, Yangyang Li

Affiliations: Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, School of Artificial Intelligence, Xidian University, Xi’an, Shaanxi, China; Department of Health Management and the Institute of Medical Artificial Intelligence, The Second Affiliated Hospital of Xi’an Jiaotong University, Xi’an, China

Title: Multilabel Feature Selection via Shared Latent Sublabel Structure and Simultaneous Orthogonal Basis Clustering

Abstract:
Multilabel feature selection solves the dimension distress of high-dimensional multilabel data by selecting the optimal subset of features. Noisy and incomplete labels of raw multilabel data hinder the acquisition of label-guided information. In existing approaches, mapping the label space to a low-dimensional latent space by semantic decomposition to mitigate label noise is considered an effective strategy. However, the decomposed latent label space contains redundant label information, which misleads the capture of potential label relevance. To eliminate the effect of redundant information on the extraction of latent label correlations, a novel method named SLOFS via shared latent sublabel structure and simultaneous orthogonal basis clustering for multilabel feature selection is proposed. First, a latent orthogonal base structure shared (LOBSS) term is engineered to guide the construction of a redundancy-free latent sublabel space via the separated latent clustering center structure. The LOBSS term simultaneously retains latent sublabel information and latent clustering center structure. Moreover, the structure and relevance information of nonredundant latent sublabels are fully explored. The introduction of graph regularization ensures structural consistency in the data space and latent sublabels, thus helping the feature selection process. SLOFS employs a dynamic sublabel graph to obtain a high-quality sublabel space and uses regularization to constrain label correlations on dynamic sublabel projections. Finally, an effective convergence provable optimization scheme is proposed to solve the SLOFS method. The experimental studies on the 18 datasets demonstrate that the presented method performs consistently better than previous feature selection methods.

PaperID: 1316,

Authors: Zefeng Lu, Ronghao Lin, Haifeng Hu

Affiliations: School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, China

Title: Disentangling Modality and Posture Factors: Memory-Attention and Orthogonal Decomposition for Visible-Infrared Person Re-Identification

Abstract:
Striving to match the person identities between visible (VIS) and near-infrared (NIR) images, VIS-NIR reidentification (Re-ID) has attracted increasing attention due to its wide applications in low-light scenes. However, owing to the modality and pose discrepancies exhibited in heterogeneous images, the extracted representations inevitably comprise various modality and posture factors, impacting the matching of cross-modality person identity. To solve the problem, we propose a disentangling modality and posture factors (DMPFs) model to disentangle modality and posture factors by fusing the information of features memory and pedestrian skeleton. Specifically, the DMPF comprises three modules: three-stream features extraction network (TFENet), modality factor disentanglement (MFD), and posture factor disentanglement (PFD). First, aiming to provide memory and skeleton information for modality and posture factors disentanglement, the TFENet is designed as a three-stream network to extract VIS-NIR image features and skeleton features. Second, to eliminate modality discrepancy across different batches, we maintain memory queues of previous batch features through the momentum updating mechanism and propose MFD to integrate features in the whole training set by memory-attention layers. These layers explore intramodality and intermodality relationships between features from the current batch and memory queues under the optimization of the optimal transport (OT) method, which encourages the heterogeneous features with the same identity to present higher similarity. Third, to decouple the posture factors from representations, we introduce the PFD module to learn posture-unrelated features with the assistance of the skeleton features. Besides, we perform subspace orthogonal decomposition on both image and skeleton features to separate the posture-related and identity-related information. The posture-related features are adopted to disentangle the posture factors from representations by a designed posture-features consistency (PfC) loss, while the identity-related features are concatenated to obtain more discriminative identity representations. The effectiveness of DMPF is validated through comprehensive experiments on two VIS-NIR pedestrian Re-ID datasets.

PaperID: 1317,

Authors: Mingduo Lin, Bo Zhao, Derong Liu

Title: Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis

Abstract:
Optimal learning output tracking control (OLOTC) in a model-free manner has received increasing attention in both the intelligent control and the reinforcement learning (RL) communities. Although the model-free tracking control has been achieved via off-policy learning and Q-learning, another popular RL idea of direct policy learning, with its easy-to-implement feature, is still rarely considered. To fill this gap, this article aims to develop a novel model-free policy optimization (PO) algorithm to achieve the OLOTC for unknown linear discrete-time (DT) systems. The iterative control policy is parameterized to directly improve the discounted value function of the augmented system via the gradient-based method. To implement this algorithm in a model-free manner, a model-free two-point policy gradient (PG) algorithm is designed to approximate the gradient of discounted value function by virtue of the sampled states and the reference trajectories. The global convergence of model-free PO algorithm to the optimal value function is demonstrated with the sufficient quantity of samples and proper conditions. Finally, numerical simulation results are provided to validate the effectiveness of the present method.

PaperID: 1318,

Authors: Xia-An Bi, Ke Chen, Siyu Jiang, Sheng Luo, Wenyan Zhou, Zhao-Xu Xing, Luyun Xu, Zhengliang Liu, Tianming Liu

Affiliations: College of Information Science and Engineering and the Key Laboratory of Computing and Stochastic Mathematics (Ministry of Education), School of Mathematics and Statistics, Hunan Normal University, Hunan, Changsha, China; College of Business, Hunan Normal University, Changsha, China; Department of Computer Science and Bioimaging Research Center, The University of Georgia, Athens, GA, USA

Title: Community Graph Convolution Neural Network for Alzheimer's Disease Classification and Pathogenetic Factors Identification

Abstract:
As a complex neural network system, the brain regions and genes collaborate to effectively store and transmit information. We abstract the collaboration correlations as the brain region gene community network (BG-CN) and present a new deep learning approach, such as the community graph convolutional neural network (Com-GCN), for investigating the transmission of information within and between communities. The results can be used for diagnosing and extracting causal factors for Alzheimer’s disease (AD). First, an affinity aggregation model for BG-CN is developed to describe intercommunity and intracommunity information transmission. Second, we design the Com-GCN architecture with intercommunity convolution and intracommunity convolution operations based on the affinity aggregation model. Through sufficient experimental validation on the AD neuroimaging initiative (ADNI) dataset, the design of Com-GCN matches the physiological mechanism better and improves the interpretability and classification performance. Furthermore, Com-GCN can identify lesioned brain regions and disease-causing genes, which may assist precision medicine and drug design in AD and serve as a valuable reference for other neurological disorders.

PaperID: 1319,

Authors: Xiaokang Zhou, Xuzhe Zheng, Tian Shu, Wei Liang, Kevin I-Kai Wang, Lianyong Qi, Shohei Shimizu, Qun Jin

Affiliations: Faculty of Data Science, Shiga University, Hikone, Japan; School of Frontier Crossover Studies, Hunan University of Technology and Business, Changsha, China; Computer Science Institute, Hunan University of Technology and Business, Changsha, China; Changsha Social Laboratory of Artificial Intelligence, Hunan University of Technology and Business, Changsha, China; Department of Electrical, Computer, Software Engineering, University of Auckland, Auckland, New Zealand; College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China; Faculty of Human Sciences, Waseda University, Tokorozawa, Japan

Title: Information Theoretic Learning-Enhanced Dual-Generative Adversarial Networks With Causal Representation for Robust OOD Generalization

Abstract:
Recently, machine/deep learning techniques are achieving remarkable success in a variety of intelligent control and management systems, promising to change the future of artificial intelligence (AI) scenarios. However, they still suffer from some intractable difficulty or limitations for model training, such as the out-of-distribution (OOD) issue, in modern smart manufacturing or intelligent transportation systems (ITSs). In this study, we newly design and introduce a deep generative model framework, which seamlessly incorporates the information theoretic learning (ITL) and causal representation learning (CRL) in a dual-generative adversarial network (Dual-GAN) architecture, aiming to enhance the robust OOD generalization in modern machine learning (ML) paradigms. In particular, an ITL- and CRL-enhanced Dual-GAN (ITCRL-DGAN) model is presented, which includes an autoencoder with CRL (AE-CRL) structure to aid the dual-adversarial training with causality-inspired feature representations and a Dual-GAN structure to improve the data augmentation in both feature and data levels. Following a newly designed feature separation strategy, a causal graph is built and improved based on the information theory, which can enhance the causally related factors among the separated core features and further enrich the feature representation with the counterfactual features via interventions based on the refined causal relationships. The ITL is incorporated to improve the extraction of low-dimensional feature representations and learn the optimized causal representations based on the idea of “information flow.” A dual-adversarial training mechanism is then developed, which not only enables the generator to expand the boundary of feature distribution in accordance with the optimized feature representation from AE-CRL, but also allows the discriminator to further verify and improve the quality of the augmented data for OOD generalization. Experiment and evaluation results based on an open-source dataset demonstrate the outstanding learning efficiency and classification performance of our proposed model for robust OOD generalization in modern smart applications compared with three baseline methods.

PaperID: 1320,

Authors: Ke Wang, Mingjia Zhu, Zicong Chen, Jian Weng, Ming Li, Siu-Ming Yiu, Weiping Ding, Tianlong Gu

Affiliations: College of Information and Science, Jinan University, Guangzhou, China; College of Cyber Security, Jinan University, Guangzhou, China; Department of Computer Science, The University of Hong Kong, Hong Kong, China; School of Information and Science, Nantong University, Nantong, China; Engineering Research Center of Trustworthy AI, Ministry of Education, Jinan University, Guangzhou, China

Title: A Statistical Physics Perspective: Understanding the Causality Behind Convolutional Neural Network Adversarial Vulnerability

Abstract:
The adversarial vulnerability of convolutional neural networks (CNNs) refers to the performance degradation of CNNs under adversarial attacks, leading to incorrect decisions. However, the causes of adversarial vulnerability in CNNs remain unknown. To address this issue, we propose a unique cross-scale analytical approach from a statistical physics perspective. It reveals that the huge amount of nonlinear effects inherent in CNNs is the fundamental cause for the formation and evolution of system vulnerability. Vulnerability is spontaneously formed on the macroscopic level after the symmetry of the system is broken through the nonlinear interaction between microscopic state order parameters. We develop a cascade failure algorithm, visualizing how micro perturbations on neurons’ activation can cascade and influence macro decision paths. Our empirical results demonstrate the interplay between microlevel activation maps and macrolevel decision-making and provide a statistical physics perspective to understand the causality behind CNN vulnerability. Our work will help subsequent research to improve the adversarial robustness of CNNs.

PaperID: 1321,

Authors: Xingqun Qi, Muyi Sun, Zijian Wang, Jiaming Liu, Qi Li, Fang Zhao, Shanghang Zhang, Caifeng Shan

Affiliations: Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Computer Science, The University of Sydney, Sydney, NSW, Australia; National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University, Beijing, China; School of Intelligence Science and Technology, Nanjing University, Nanjing, China

Title: Biphasic Face Photo-Sketch Synthesis via Semantic-Driven Generative Adversarial Network With Graph Representation Learning

Abstract:
Biphasic face photo-sketch synthesis has significant practical value in wide-ranging fields such as digital entertainment and law enforcement. Previous approaches directly generate the photo-sketch in a global view, they always suffer from the low quality of sketches and complex photograph variations, leading to unnatural and low-fidelity results. In this article, we propose a novel semantic-driven generative adversarial network to address the above issues, cooperating with graph representation learning. Considering that human faces have distinct spatial structures, we first inject class-wise semantic layouts into the generator to provide style-based spatial information for synthesized face photographs and sketches. In addition, to enhance the authenticity of details in generated faces, we construct two types of representational graphs via semantic parsing maps upon input faces, dubbed the intraclass semantic graph (IASG) and the interclass structure graph (IRSG). Specifically, the IASG effectively models the intraclass semantic correlations of each facial semantic component, thus producing realistic facial details. To preserve the generated faces being more structure-coordinated, the IRSG models interclass structural relations among every facial component by graph representation learning. To further enhance the perceptual quality of synthesized images, we present a biphasic interactive cycle training strategy by fully taking advantage of the multilevel feature consistency between the photograph and sketch. Extensive experiments demonstrate that our method outperforms the state-of-the-art competitors on the CUHK Face Sketch (CUFS) and CUHK Face Sketch FERET (CUFSF) datasets.

PaperID: 1322,

Authors: Zhiwen Xiao, Huagang Tong, Rong Qu, Huanlai Xing, Shouxi Luo, Zonghai Zhu, Fuhong Song, Li Feng

Affiliations: School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China; College of Economic and Management, Nanjing Tech University, Nanjing, China; School of Computer Science, University of Nottingham, Nottingham, U.K.; School of Information, Guizhou University of Finance and Economics, Guiyang, China

Title: CapMatch: Semi-Supervised Contrastive Transformer Capsule With Feature-Based Knowledge Distillation for Human Activity Recognition

Abstract:
This article proposes a semi-supervised contrastive capsule transformer method with feature-based knowledge distillation (KD) that simplifies the existing semisupervised learning (SSL) techniques for wearable human activity recognition (HAR), called CapMatch. CapMatch gracefully hybridizes supervised learning and unsupervised learning to extract rich representations from input data. In unsupervised learning, CapMatch leverages the pseudolabeling, contrastive learning (CL), and feature-based KD techniques to construct similarity learning on lower and higher level semantic information extracted from two augmentation versions of the data, “weak” and “timecut,” to recognize the relationships among the obtained features of classes in the unlabeled data. CapMatch combines the outputs of the weak- and timecut-augmented models to form pseudolabeling and thus CL. Meanwhile, CapMatch uses the feature-based KD to transfer knowledge from the intermediate layers of the weak-augmented model to those of the timecut-augmented model. To effectively capture both local and global patterns of HAR data, we design a capsule transformer network consisting of four capsule-based transformer blocks and one routing layer. Experimental results show that compared with a number of state-of-the-art semi-supervised and supervised algorithms, the proposed CapMatch achieves decent performance on three commonly used HAR datasets, namely, HAPT, WISDM, and UCI_HAR. With only 10% of data labeled, CapMatch achieves F_1 values of higher than 85.00% on these datasets, outperforming 14 semi-supervised algorithms. When the proportion of labeled data reaches 30%, CapMatch obtains F_1 values of no lower than 88.00% on the datasets above, which is better than several classical supervised algorithms, e.g., decision tree and k -nearest neighbor (KNN).

PaperID: 1323,

Authors: Hideaki Hayashi

Affiliations: Institute for Datability Science, Osaka University, Suita, Japan

Title: A Hybrid of Generative and Discriminative Models Based on the Gaussian-Coupled Softmax Layer

Abstract:
Generative models offer advantageous characteristics for classification tasks, such as the availability of unsupervised data and calibrated confidence. In contrast, discriminative models have advantages in terms of their potential to outperform their generative counterparts and the simplicity of their model structures and learning algorithms. In this article, we propose a method to train a hybrid of discriminative and generative models in a single neural network (NN), which exhibits the characteristics of both models. The key idea is the Gaussian-coupled softmax layer, which is a fully connected layer with a softmax activation function coupled with Gaussian distributions. This layer can be embedded into an NN-based classifier and allows the classifier to estimate both the class posterior distribution and the input data distribution. We demonstrate that the proposed hybrid model can be applied to semi-supervised learning and confidence calibration.

PaperID: 1324,

Authors: Cong Jin, Cong Luo, Ming Yan, Guangzhe Zhao, Guixuan Zhang, Shuwu Zhang

Affiliations: School of Information and Communication Engineering and the State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing, China; School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China; Institute of Automation (IA), Chinese Academy of Sciences (CAS), Beijing, China; School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China

Title: Weakening the Dominant Role of Text: CMOSI Dataset and Multimodal Semantic Enhancement Network

Abstract:
Multimodal sentiment analysis (MSA) is important for quickly and accurately understanding people’s attitudes and opinions about an event. However, existing sentiment analysis methods suffer from the dominant contribution of text modality in the dataset; this is called text dominance. In this context, we emphasize that weakening the dominant role of text modality is important for MSA tasks. To solve the above two problems, from the perspective of datasets, we first propose the Chinese multimodal opinion-level sentiment intensity (CMOSI) dataset. Three different versions of the dataset were constructed: manually proofreading subtitles, generating subtitles using machine speech transcription, and generating subtitles using human cross-language translation. The latter two versions radically weaken the dominant role of the textual model. We randomly collected 144 real videos from the Bilibili video site and manually edited 2557 clips containing emotions from them. From the perspective of network modeling, we propose a multimodal semantic enhancement network (MSEN) based on a multiheaded attention mechanism by taking advantage of the multiple versions of the CMOSI dataset. Experiments with our proposed CMOSI show that the network performs best with the text-unweakened version of the dataset. The loss of performance is minimal on both versions of the text-weakened dataset, indicating that our network can fully exploit the latent semantics in nontext patterns. In addition, we conducted model generalization experiments with MSEN on MOSI, MOSEI, and CH-SIMS datasets, and the results show that our approach is also very competitive and has good cross-language robustness.

PaperID: 1325,

Authors: Minghao Han, Kiwan Wong, Jacob Euler-Rolle, Lixian Zhang, Robert K. Katzschmann

Affiliations: Soft Robotics Laboratory, ETH Zürich, Zürich, Switzerland; Department of Control Science and Technology, Harbin Institute of Technology, Harbin, China

Title: Robust Learning-Based Control for Uncertain Nonlinear Systems With Validation on a Soft Robot

Abstract:
Existing modeling and control methods for real-world systems typically deal with uncertainty and nonlinearity on a case-by-case basis. We present a universal and robust control framework for the general class of uncertain nonlinear systems. Our data-driven deep stochastic Koopman operator (DeSKO) model and robust learning control framework guarantee robust stability. DeSKO learns the uncertainty of dynamical systems by inferring a distribution of observables. The inferred distribution is used in our robust and stabilizing closed-loop controller for dynamical systems. We also develop a model predictive control framework with integral action to compensate for run-time parametric uncertainty, such as manipulating unknown objects. Modeling and control experiments in simulation show that our presented framework is more robust and scalable for robotic systems than state-of-the-art controllers using deep Koopman operators and reinforcement learning (RL) methods. We demonstrate that our method resists previously unseen uncertainties, such as external disturbances, at a magnitude of up to five times the maximum control input. Furthermore, we test our DeSKO-based control framework on a real-world soft robotic arm. It shows that our framework outperforms model-based controllers that have full knowledge of the model parameters, and the controller can conduct object pick-and-place tasks without further training. Our approach opens up new possibilities in robustly managing internal or external uncertainty while controlling high-dimensional nonlinear systems in a learning framework. This approach serves as a foundation to greatly simplify high-level control and decision-making for robots.

PaperID: 1326,

Authors: Zhibin Zhang, Wanli Xue, Kaihua Zhang, Bo Liu, Chengwei Zhang, Jingen Liu, Shengyong Chen

Affiliations: School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China; School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, China; JD Finance America Corporation, Mountain View, CA, USA; School of Information Science and Technology, Dalian Maritime University, Dalian, China

Title: Learning Self-Corrective Network via Adaptive Self-Labeling and Dynamic NMS for High-Performance Long-Term Tracking

Abstract:
This article presents a self-corrective network-based long-term tracker (SCLT) including a self-modulated tracking reliability evaluator (STRE) and a self-adjusting proposal postprocessor (SPPP). The targets in the long-term sequences often suffer from severe appearance variations. Existing long-term trackers often online update their models to adapt the variations, but the inaccurate tracking results introduce cumulative error into the updated model that may cause severe drift issue. To this end, a robust long-term tracker should have the self-corrective capability that can judge whether the tracking result is reliable or not, and then it is able to recapture the target when severe drift happens caused by serious challenges (e.g., full occlusion and out-of-view). To address the first issue, the STRE designs an effective tracking reliability classifier that is built on a modulation subnetwork. The classifier is trained using the samples with pseudo labels generated by an adaptive self-labeling strategy. The adaptive self-labeling can automatically label the hard negative samples that are often neglected in existing trackers according to the statistical characteristics of target state, and the network modulation mechanism can guide the backbone network to learn more discriminative features without extra training data. To address the second issue, after the STRE has been triggered, the SPPP follows it with a dynamic NMS to recapture the target in time and accurately. In addition, the STRE and the SPPP demonstrate good transportability ability, and their performance is improved when combined with multiple baselines. Compared to the commonly used greedy NMS, the proposed dynamic NMS leverages an adaptive strategy to effectively handle the different conditions of in view and out of view, thereby being able to select the most probable object box that is essential to accurately online update the basic tracker. Extensive evaluations on four large-scale and challenging benchmark datasets including VOT2021LT, OxUvALT, TLP, and LaSOT demonstrate superiority of the proposed SCLT to a variety of state-of-the-art long-term trackers in terms of all measures. Source codes and demos can be found at https://github.com/TJUT-CV/SCLT.

PaperID: 1327,

Authors: Zhong Ji, Zhanyu Jiao, Qiang Wang, Yanwei Pang, Jungong Han

Affiliations: School of Electrical and Information Engineering, Tianjin University, Tianjin, China; Department of Computer Science, The University of Sheffield, Sheffield, U.K.

Title: Imbalance Mitigation for Continual Learning via Knowledge Decoupling and Dual Enhanced Contrastive Learning

Abstract:
Continual learning (CL) aims at studying how to learn new knowledge continuously from data streams without catastrophically forgetting the previous knowledge. One of the key problems is catastrophic forgetting, that is, the performance of the model on previous tasks declines significantly after learning the subsequent task. Several studies addressed it by replaying samples stored in the buffer when training new tasks. However, the data imbalance between old and new task samples results in two serious problems: information suppression and weak feature discriminability. The former refers to the information in the sufficient new task samples suppressing that in the old task samples, which is harmful to maintaining the knowledge since the biased output worsens the consistency of the same sample’s output at different moments. The latter refers to the feature representation being biased to the new task, which lacks discrimination to distinguish both old and new tasks. To this end, we build an imbalance mitigation for CL (IMCL) framework that incorporates a decoupled knowledge distillation (DKD) approach and a dual enhanced contrastive learning (DECL) approach to tackle both problems. Specifically, the DKD approach alleviates the suppression of the new task on the old tasks by decoupling the model output probability during the replay stage, which better maintains the knowledge of old tasks. The DECL approach enhances both low- and high-level features and fuses the enhanced features to construct contrastive loss to effectively distinguish different tasks. Extensive experiments on three popular datasets show that our method achieves promising performance under task incremental learning (Task-IL), class incremental learning (Class-IL), and domain incremental learning (Domain-IL) settings.

PaperID: 1328,

Authors: Kangping Gao, Jianquan Lu, Wei Xing Zheng, Xiangyong Chen

Affiliations: School of Mathematics, Southeast University, Nanjing, China; School of Computer, Data and Mathematical Sciences, Western Sydney University, Sydney, NSW, Australia; School of Automation and Electrical Engineering, Linyi University, Linyi, China

Title: Synchronization in Coupled Neural Networks With Hybrid Delayed Impulses: Average Impulsive Delay-Gain Method

Abstract:
In this article, we propose a new concept called average impulsive delay-gain (AIDG) for studying the synchronization of coupled neural networks (CNNs). Based on the viewpoints of impulsive control and impulsive perturbation, we establish some globally exponential synchronization criteria for CNNs. Our methods are well-suited for addressing the synchronization problems of systems subject to hybrid delayed impulses with time-varying impulsive delay and gain. Moreover, we prove that the AIDG has both positive and negative effects on synchronization. Compared to existing research, our conclusions are more applicable and less conservative as the considered hybrid delayed impulses involve more flexible cases. Finally, we validate the effectiveness of our proposed results by applying them to small-world and scale-free network models.

PaperID: 1329,

Authors: Junwei Sun, Yu Zhai, Peng Liu, Yanfeng Wang

Affiliations: School of Electrical and Information Engineering, Zhengzhou University of Light Industry, Zhengzhou, China

Title: Memristor-Based Neural Network Circuit of Associative Memory With Overshadowing and Emotion Congruent Effect

Abstract:
Most memristor-based neural network circuits consider only a single pattern of overshadowing or emotion, but the relationship between overshadowing and emotion is ignored. In this article, a memristor-based neural network circuit of associative memory with overshadowing and emotion congruent effect is designed, and overshadowing under multiple emotions is taken into account. The designed circuit mainly consists of an emotion module, a memory module, an inhibition module, and a feedback module. The generation and recovery of different emotions are realized by the emotion module. The functions of overshadowing under different emotions and recovery from overshadowing are achieved by the inhibition module and the memory module. Finally, the blocking caused by long-term overshadowing is implemented by the feedback module. The proposed circuit can be applied to bionic emotional robots and offers some references for brain-like systems.

PaperID: 1330,

Authors: Tatsuya Akutsu, Avraham A. Melkman

Affiliations: Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan; Department of Computer Science, Ben-Gurion University of the Negev, Be’er-Sheva, Israel

Title: On the Size and Width of the Decoder of a Boolean Threshold Autoencoder

Abstract:
In this brief paper, we study the size and width of autoencoders consisting of Boolean threshold functions, where an autoencoder is a layered neural network whose structure can be viewed as consisting of an encoder, which compresses an input vector to a lower dimensional vector, and a decoder which transforms the low-dimensional vector back to the original input vector exactly (or approximately). We focus on the decoder part and show that \Omega ((Dn/d)^1/2) and O(\sqrt Dn) nodes are required to transform n vectors in d -dimensional binary space to D - dimensional binary space. We also show that the width can be reduced if we allow small errors, where the error is defined as the average of the Hamming distance between each vector input to the encoder part and the resulting vector output by the decoder.

PaperID: 1331,

Authors: Zhaoyang Sun, Yaxiong Chen, Shengwu Xiong

Affiliations: School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China

Title: SSAT++: A Semantic-Aware and Versatile Makeup Transfer Network With Local Color Consistency Constraint

Abstract:
The purpose of makeup transfer (MT) is to transfer makeup from a reference image to a target face while preserving the target’s content. Existing methods have made remarkable progress in generating realistic results but do not perform well in terms of semantic correspondence and color fidelity. In addition, the straightforward extension of processing videos frame by frame tends to produce flickering results in most methods. These limitations restrict the applicability of previous methods in real-world scenarios. To address these issues, we propose a symmetric semantic-aware transfer network (SSAT++) to improve makeup similarity and video temporal consistency. For MT, the feature fusion (FF) module first integrates the content and semantic features of the input images, producing multiscale fusion features. Then, the semantic correspondence from the reference to the target is obtained by measuring the correlation of fusion features at each position. According to semantic correspondence, the symmetric mask semantic transfer (SMST) module aligns the reference makeup features with the target content features to generate MT results. Meanwhile, the semantic correspondence from the target to the reference is obtained by transposing the correlation matrix and applied to the makeup removal task. To enhance color fidelity, we propose a novel local color loss that forces the transferred results to have the same color histogram distribution as the reference. Furthermore, a morphing simulation is designed to ensure temporal consistency for video MT without requiring additional video frame input and optical flow estimation. To evaluate the effectiveness of our SSAT++, extensive experiments have been conducted on the MT dataset which has a variety of makeup styles, and on the MT-Wild dataset which contains images with diverse poses and expressions. The experiments show that SSAT++ outperforms existing MT methods through qualitative and quantitative evaluation and provides more flexible makeup control. Code and trained model will be available at https://gitee.com/sunzhaoyang0304/ssat-msp and https://github.com/Snowfallingplum/SSAT.

PaperID: 1332,

Authors: Weilin Chen, Jie Qiao, Ruichu Cai, Zhifeng Hao

Affiliations: School of Computer, Guangdong University of Technology, Guangzhou, China; School of Computer Science, Guangdong University of Technology, Guangzhou, China; College of Science, Shantou University, Guangdong, China

Title: On the Role of Entropy-Based Loss for Learning Causal Structure With Continuous Optimization

Abstract:
Causal discovery from observational data is an important but challenging task in many scientific fields. A recent line of work formulates the structure learning problem as a continuous constrained optimization task using an algebraic characterization of directed acyclic graphs (DAGs) and the least-square loss function. Though the least-square loss function is well justified under the standard Gaussian noise assumption, it is limited if the assumption does not hold. In this work, we theoretically show that the violation of the Gaussian noise assumption will hinder the causal direction identification, making the causal orientation fully determined by the causal strength as well as the variances of noises in the linear case and by the strong non-Gaussian noises in the nonlinear case. Consequently, we propose a more general entropy-based loss that is theoretically consistent with the likelihood score under any noise distribution. We run extensive empirical evaluations on both synthetic data and real-world data to validate the effectiveness of the proposed method and show that our method achieves the best in structure Hamming distance, false discovery rate (FDR), and true-positive rate (TPR) matrices.

PaperID: 1333,

Authors: Cheng Zhang, Hai Wang, Long Chen, Yicheng Li, Yingfeng Cai

Affiliations: School of Automotive and Traffic Engineering, Jiangsu University, Zhenjiang, China; Automotive Engineering Research Institute, Jiangsu University, Zhenjiang, China

Title: MixedFusion: An Efficient Multimodal Data Fusion Framework for 3-D Object Detection and Tracking

Abstract:
The performance of environmental perception is critical for the safe driving of intelligent connected vehicles (ICVs). Currently, the most prevalent technical solutions are based on multimodal data fusion to achieve a comprehensive perception of the surrounding environment. However, existing fusion perception methods suffer from issues such as low sensor data utilization and unreasonable fusion strategies, which severely limit their performance in adverse weather conditions. To address these issues, this article proposes a novel multimodal data fusion framework called MixedFusion. In this framework, we introduce two innovative fusion strategies for the data characteristics of each sensor: high-level semantic guidance (HLSG) and multipriority matching (MPM). It not only realizes the efficient utilization of the multimodal data but also further realizes the complementary fusion between the multimodal data. We perform extensive experiments on the nuScenes and K-radar datasets. The experimental results demonstrate that the fusion framework proposed in this article significantly improves the performance of 3-D object detection and tracking in severe weather conditions.

PaperID: 1334,

Authors: Mincan Li, Zidong Wang, Kenli Li, Xiangke Liao

Affiliations: College of Computer Science and Electronic Engineering, National Supercomputing Center, Hunan University, Changsha, Hunan, China; Department of Computer Science, Brunel University London, Uxbridge, U.K.; Collaborative Innovation Center of High Performance Computing, National University of Defense Technology, Changsha, China

Title: Multiagent-System-Based Attention Mechanism for Predicting Product Popularity: Handling Positive-Negative Diffusion Over Social Networks

Abstract:
This brief is concerned with the prediction problem of product popularity under a social network (SN) with positive-negative diffusion (PND). First, a PND model is proposed to enable the simulation of product diffusion, and three user states are defined. Second, an optimal and precise feature vector of every user is extracted through a multi-agent-system-based attention mechanism (MASAM) that is devised. The weight matrix shared in the mechanism of all agents is learned using a distributed learning algorithm provided in MASAM. Third, an MAS model for product diffusion on SN is established based on the feature representations from MASAM. Rules for agent interaction during PND diffusion are suggested, which accelerate the simulation of information spread in SN. Finally, comprehensive experiments are conducted to verify the effectiveness and efficiency of the proposed models and algorithms in prediction and to compare their performance with baseline methods. Furthermore, a case study is provided to illustrate the applicability and extendibility of the developed algorithm.

PaperID: 1335,

Authors: Enrico Civitelli, Alessio Sortino, Matteo Lapucci, Francesco Bagattini, Giulio Galvan

Affiliations: Dipartimento di Ingegneria dell’Informazione, Università di Firenze, Florence, Italy; Flair-Tech, Florence, Italy

Title: A Robust Initialization of Residual Blocks for Effective ResNet Training Without Batch Normalization

Abstract:
Batch normalization is an essential component of all state-of-the-art neural networks architectures. However, since it introduces many practical issues, much recent research has been devoted to designing normalization-free architectures. In this brief, we show that weights initialization is key to train ResNet-like normalization-free networks. In particular, we propose a slight modification to the summation operation of a block output to the skip-connection branch, so that the whole network is correctly initialized. We show that this modified architecture achieves competitive results on CIFAR-10, CIFAR-100 and ImageNet without further regularization nor algorithmic modifications.

PaperID: 1336,

Authors: Qinghao Liu, Yuehao Zhu, Min Liu, Zhao Yao, Yaonan Wang, Erik Meijering

Affiliations: School of Artificial Intelligence and Robotics, the National Engineering Research Center of Robot Visual Perception and Control Technology, and the International Scientific and Technological Innovation Cooperation Base for Biomedical Image Processing, Hunan University, Changsha, Hunan, China; School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia

Title: MBUNeXt: Multibranch Encoder Aggregation Network Based on Layer-Fusion Strategy for Multimodal Brain Tumor Segmentation

Abstract:
Multimodal brain tumor segmentation (BraTS), integrated with surgical robots and navigation systems, enables accurate surgical interventions while maximizing the preservation of surrounding healthy brain tissue. However, multimodal brain scans suffer from large interclass differences in brain tumor subregions and information redundancy, leading to inadequate fusion of multimodal information and significantly affecting the accuracy of BraTS. To address the above problems, we propose a multibranch encoder aggregation (MEA) network based on a layer-fusion strategy called multibranch UNeXt (MBUNeXt). The network comprises three well-designed modules: the multimodal feature attention (MFA) module, the MEA module, and the large-kernel convolution skip (LCS)-connection module. These modules work together to achieve precise segmentation of brain tumors. Specifically, the MFA module preserves the intermodality similarity structure through attention mechanisms and Gaussian modulation functions, thereby filtering redundant information. Then, the MEA module exploits the correlations among multiple modalities to effectively integrate multimodal hybrid feature representation and optimize multimodal information fusion. In addition, the LCS module constructs multiple groups of depthwise separable convolutions with large kernel, which can guide the network to attend to features at different scales, thereby addressing the issue of significant interclass differences in brain tumor subregions. The experimental results on the large-scale public datasets, BraTS2019 and BraTS2021, which consist of approximately 5000 3-D brain scans, demonstrate that our proposed method has achieved SOTA performance, with average Dice scores of 85.84% and 91.11%, respectively. It also performs well on the BraTS-Africa2024 dataset with low imaging quality, confirming its robustness. The code is available at https://github.com/liuqinghao2018/MBUNeXt

PaperID: 1337,

Authors: Tianjiao An, Xiaogang Dong, Bo Dong, Hucheng Jiang, Lei Liu, Bing Ma

Affiliations: School of Mathematics and Statistics, Changchun University of Technology, Changchun, Jilin, China; Department of Control Science and Engineering, Changchun University of Technology, Changchun, Jilin, China; College of Science, Liaoning University of Technology, Jinzhou, China

Title: Event-Triggered Mixed Nonzero-Sum Game Optimal Control for Modular Robotic Manipulator Performing Coordinated Operation Tasks

Abstract:
Taking advantage of high-performance intelligent robots to solve the coordination control problem such as assembly, handling, and installation, transportation is gradually becoming a kind of frontier subject with great scientific research value in the field of robotics. However, due to possible conflicts and inconsistencies between the manipulator and the operating object, it is challenging to design the optimal coordination control scheme between human and robot. This article presents an event-triggered mixed nonzero-sum game optimal control method, which considers both nonzero-sum game and cooperative game cases, for modular robotic manipulator (MRM) systems performing coordinated operation tasks. First, the joint torque feedback technique and joint task assignment method are employed to establish the dynamic model of MRM subsystem, and then, the global state-space description is deduced. For the unknown information containing interconnected dynamic coupling (IDC) terms and friction modeling errors, an adaptive neural network (NN) identifier is established by utilizing the measured input–output data of each joint module. The adaptive updating law guarantees that the NN weight error finally converged to a minimum neighborhood of zero. To ensure the optimality of system overall performance, the corresponding value functions reflecting the interconnectedness among each joint subsystem and manipulated object are constructed. Based on the idea of differential game, the coordination control problem of MRM system is transformed into a mixed nonzero-sum game problem among each joint module and the operated object. Next, by constructing a single critic NN with learning structure, the optimal value function is approximated to solve the event-based Hamiltonian equations, and then, the optimal control strategy of each player is obtained. Finally, the Lyapunov theory is used to analyze system stability, and the effectiveness of the presented method is reinforced by experimental results.

PaperID: 1338,

Authors: Youli Fang, Guosun Zeng

Affiliations: Department of Computer Science and Technology, Embedded System and Service Computing Key Laboratory of the Ministry of Education, Tongji University, Shanghai, China

Title: Toward the Extension and Enhanced Representation for Ambiguous Query With Search Heterogeneous Graph Learning

Abstract:
In an online search, users often input an ambiguous short query to search engines, which leads to search engines being unable to accurately understand the true users’ query intent. Thus, enhancing the users’ query intent is necessary. Traditional methods of guessing and inferring user intentions are based on either personal past search data, or the group’s search history data. The former faces the cold start problem for new users due to the lack of search history data, while the latter cannot accurately get the intent of new search requests due to different users having different intentions even for the same search query. To solve the above issues and to enhance the representation of search requests by adding some query keywords, we construct a user-query-document search heterogeneous graph with users’ search history data of their friend networks, which can express the behavioral features and interrelationships of searches. To facilitate the enhanced representation of a query intent, we present TAHAN, a type-aware heterogeneous graph attention network (GAT) model. Extensive experiments on real-world datasets show that our method not only outperforms the state-of-the-art models, but also achieves superior performance in addressing the data sparsity and cold-start problems.

PaperID: 1339,

Authors: Xiaohong Chen, Yuhang Zhang, Xuesong Xu, Dongbin Hu, Guanying Xu

Affiliations: Business School of Central South University, Xiang Jiang Laboratory, School of Frontier Crossover Studies, School of Management Science and Engineering, Hunan University of Technology and Business, Changsha, China; Business School of Central South University, Xiang Jiang Laboratory, Changsha, China; School of Frontier Crossover Studies, Hunan University of Technology and Business, Xiang Jiang Laboratory, Changsha, China; Business School of Central South University, Changsha, China

Title: Vector Quantization-Based Clustered Federated Learning With Global Feature Anchors for Improved Representation and Generalization

Abstract:
Clustered federated learning (CFL) addresses the challenge of data heterogeneity in federated learning (FL) by customizing models for different groups of clients. However, existing CFL methods heavily rely on indirect metrics, such as model parameters, gradient information, or loss function values, for client clustering. These approaches often fail to fully capture the diversity and intrinsic characteristics of client data distributions, leading to inaccurate representations of client data features. To address this issue, we propose a novel CFL framework called vector quantization-based CFL (VQCFL). First, we introduce a vector quantization network (VQNet), which effectively captures the intrinsic structure of client data by mapping the local feature space into discrete feature dictionary vectors. In addition, to prevent drift in the feature dictionary vectors, we propose a global feature anchor strategy that aligns feature dictionary vectors across clients, ensuring consistent updates within the same feature space. Furthermore, we present a novel cross-cluster knowledge-sharing mechanism that integrates feature information from different clusters through global aggregation of feature dictionary vectors. Combined with a personalized cross-cluster classifier weight adjustment strategy, this mechanism significantly enhances the model’s generalization performance in the presence of mixed data heterogeneity. Experimental results under various settings demonstrate that VQCFL achieves superior local personalization and global generalization performance.

PaperID: 1340,

Authors: Qiong Wang, Luyun Xu, Yinglu Shan, Wenzhuo Shen, Lou Li, Xia-An Bi, Zhonghua Liu

Affiliations: College of Business, Hunan Normal University, Changsha, China; College of Information Science and Engineering, Hunan Normal University, Changsha, China

Title: CPST-GAN: Conditional Probabilistic State Transition Generative Adversarial Network With the Biomedical Large Foundation Models

Abstract:
The risk prediction of Alzheimer’s disease (AD) is crucial for its early prevention and treatment. However, current risk prediction methods face challenges in effectively extracting and fusing multiomics features, particularly overlooking the multilevel evolutionary mechanisms of AD. This article combines biomedical large foundation models with the conditional generative adversarial network (GAN) to mine the evolutionary patterns of AD by considering the regulatory effect of genes on brain lesions. Specifically, we first use biomedical large foundation models to effectively construct high-quality imaging genetic features. Next, a conditional probabilistic state transition mathematical model is constructed to describe AD progression as state transitions of brain regions under genetic regulations. Based on the mathematical model, a conditional probabilistic state transition GAN (CPST-GAN) is proposed. This algorithm can mine the dynamic evolutionary patterns of AD by fusing brain imaging and genetic features to achieve risk prediction of AD. Finally, experiments on the public imaging genetics datasets validate the effectiveness and superiority of CPST-GAN in evolutionary pattern mining and risk prediction of AD. This article not only provides a reliable intelligence algorithm for early intervention of AD but also offers new insights for future research on AD pathogenesis. The code has been published at github.com/fmri123456/CPST-GAN.

PaperID: 1341,

Authors: Lei Ren, Shixiang Li, Haiteng Wang, Yuanjun Laili

Affiliations: School of Automation Science and Electrical Engineering, Beihang University, Beijing, China

Title: ABNN: Adaptive-Gating Binary Neural Network With Dynamic Activation Quantization for Industrial Health Status Prediction

Abstract:
Complex industrial equipment plays a critical role in specific tasks within industrial edge scenarios. Predicting their health status accurately is essential to ensuring safety and reliability in the production process. However, real-world industrial edge scenarios often have limited resources and stringent real-time requirements, making it difficult to deploy high-precision deep learning models directly at the edge. To address this issue, this article proposes an efficient adaptive-gating binary neural network (ABNN). First, a trend-aware encoder (TAE) is proposed to optimize the binarization process of the input layer. Next, a learnable precision indicator (LPI) is proposed to adjust the inference precision level. Finally, an adaptive-gating convolution is proposed to improve the representational capabilities while maintaining the fitting ability without significantly increasing the computational cost. Additionally, a field-programmable gate array (FPGA) hardware accelerator is designed for the proposed network. ABNN achieves approximately a 7% improvement in accuracy and a 45% gain in efficiency compared to the baseline model.

PaperID: 1342,

Authors: Handi Zhang, Langchen Liu, Kangyu Weng, Lu Lu

Affiliations: Graduate Group in Applied Mathematics and Computational Science, University of Pennsylvania, Philadelphia, PA, USA; Department of Statistics and Data Science, Yale University, New Haven, CT, USA

Title: Federated Scientific Machine Learning for Approximating Functions and Solving Differential Equations With Data Heterogeneity

Abstract:
By leveraging neural networks, the emerging field of scientific machine learning (SciML) offers novel approaches to address complex problems governed by partial differential equations (PDEs). In practical applications, challenges arise due to the distributed essence of data, concerns about data privacy, or the impracticality of transferring large volumes of data. Federated learning (FL), a decentralized framework that enables the collaborative training of a global model while preserving data privacy, offers a solution to the challenges posed by isolated data pools and sensitive data issues. Here, this article explores the integration of FL and SciML to approximate complex functions and solve differential equations. We propose two novel models: federated physics-informed neural networks (FedPINNs) and federated deep operator networks (FedDeepONets). We further introduce various data generation methods to control the degree of nonindependent and identically distributed (non-i.i.d.) data and utilize the 1-Wasserstein distance to quantify data heterogeneity in function approximation and PDE learning. We systematically investigate the relationship between data heterogeneity and federated model performance. In addition, we propose a measure of weight divergence and develop a theoretical framework to establish growth bounds for weight divergence in FL compared with centralized learning. To demonstrate the effectiveness of our methods, we conducted ten experiments, including two on function approximation, five PDE problems on FedPINN, and four PDE problems on FedDeepONet. These experiments demonstrate that proposed federated methods surpass the models trained only using local data and achieve competitive accuracy of centralized models trained using all data.

PaperID: 1343,

Authors: Wujie Zhou, Bitao Jian, Yuanyuan Liu, Qiuping Jiang

Affiliations: School of Artificial Intelligence and Information Engineering, Zhejiang University of Science and Technology, Hangzhou, China; School of Computer and Technology, China University of Geosciences, Wuhan, China; School of Information Science and Engineering, Ningbo University, Ningbo, China

Title: Multiattentive Perception and Multilayer Transfer Network Using Knowledge Distillation for RGB-D Indoor Scene Parsing

Abstract:
Scene parsing has gained wide attention in the field of computer vision, with emerging methods and techniques providing superior solutions. Although some methods have improved performance, they tend to neglect the number of model parameters and computational size, which makes achieving real-time operation in practical applications challenging. To address these limitations, we propose a multiattentive perception and multilayer transfer network that employs knowledge distillation (MPMTNet-KD), which is generated by a student network (MPMTNet-S) under the guidance of a teacher network (MPMTNet-T) with the aid of our proposed multilayer transfer knowledge distillation (KD) methods. To capture complete information from different modalities, a multiattentive perception module (MAPM) is introduced to mine features from various perspectives, and hetero-oriented sensing (HOS) convolution is utilized to integrate cross-layer features in a single and holistic manner. Importantly, we introduce multilayer transfer KD to explore the different knowledge types between layers, as well as intraclass and interclass correlations. In addition, we use the discrete cosine transform (DCT) approach combined with filtering during the KD process to mitigate noise that may be induced by the depth map, thereby improving the depth information and further enhancing the knowledge transfer effect. We conducted comprehensive experiments on two challenging indoor benchmark datasets, namely NYUDv2 and SUN RGB-D. Compared with existing methods, the proposed MPMTNet-KD reduces the number of parameters from 125.8 M in MPMTNet-T to 28.3 M in MPMTNet-S, achieving a mean intersection over union (mIoU) of 54.9% in the indoor scene parsing task. MPMTNet-KD was also evaluated on two additional public datasets, namely MFNet and PST900, to demonstrate its generalization capacity. The source code is available at https://github.com/XUEXIKUAIL/MPMTNet

PaperID: 1344,

Authors: Meng Wang, Jinshuo Liu, Víctor Gutiérrez-Basulto, Lina Wang, Jeff Z. Pan

Affiliations: School of Cyber Science and Engineering, Wuhan University, Wuhan, China; School of Computer Science and Informatics, Cardiff University, Wales, U.K.; School of Informatics, The University of Edinburgh, Edinburgh, U.K.

Title: AQE-RF: An Adaptive Quantifier Extension and Rule-Filtering Graph Network for Logical Reasoning of Text

Abstract:
Logical reasoning of text requires neural models to possess strong contextual comprehension and logical reasoning ability to draw conclusions from limited information. To improve the logical reasoning capabilities of pretrained language models (PLMs), existing approaches can be broadly categorized into neural architecture-based methods and large language model (LLM)-driven strategies. While neural methods struggle with fine-grained logic that fails to capture detailed semantic roles and constraints, LLM-driven approaches, despite generating multistep reasoning sequences, lack explicit inference control and suffer from error accumulation due to their implicit and stochastic nature. Some works have tried using logical expressions, like first-order logic, but these approaches often fail to handle quantifiers systematically or support clear reasoning processes. Inspired by first-order logic and generalized quantifier (GQ) theory, we propose AQE-RF, a model based on an adaptive quantifier extension and rule-filtering graph network to address this challenge. The first component constructs a fine-grained text logical graph (FTLG) and then performs GQ instantiation based on option attention. The second component performs rule-filtered deductive reasoning, using conflict scores and dynamic programming (DP) to select coherent, interpretable inference paths. Extensive experiments on the LogiQA, ReClor, and AR-LSAT datasets demonstrate the effectiveness and robustness of AQE-RF.

PaperID: 1345,

Authors: Zuo Zuo, Jiahao Dong, Yao Wu, Yanyun Qu, Zongze Wu

Affiliations: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence and the Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, China; Guangdong Laboratory of Artificial Intelligence and Digital Economy, Shenzhen, China; School of Informatics, Xiamen University, Xiamen, China; College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen, China

Title: PADiff: Reconstruction From Patch to Pixel With Normality-Guided Diffusion Model for Unsupervised Anomaly Localization

Abstract:
Anomaly localization (AL) is an indispensable and challenging task in manufacturing. Recently, diffusion models have been widely used to localize anomalies through discrepancies between original and reconstructed representations, which is based on the hypothesis that diffusion models regard anomalies as noise and reconstruct them to normal representations. However, anomalies usually deviate from prior standard Gaussian distribution and diffusion models cannot reconstruct anomaly parts as normal patterns well due to powerful generalization. These issues hinder the application of diffusion models in AL and lead to suboptimal performance. As a remedy, we present a novel framework for AL based on the diffusion model, dubbed PADiff. To enable the diffusion model to reconstruct abnormal regions to normal regions in an anomaly image, we propose to guide the diffusion model in the reconstruction process using its normal counterpart. High-quality guided normal counterpart plays a key role in our method. Therefore, we propose a patch-substitution strategy to obtain a high-quality-guided normal counterpart. Specifically, we first construct a normal patch memory bank using normal training samples. With a normal memory bank, we find potential anomaly patches in testing images and substitute them with most similar normal patches in the memory bank. After substitution, pseudo-normal images are generated to guide the diffusion model. To make our method more data-efficient, we divide an image into patches and propose patch-wise training and reconstruction. As one of our innovations, we propose to encode each patch into positional embedding and add it on time embedding, which introduces patch-level representation and position information in the diffusion model. Extensive experiments are conducted on three commonly used anomaly detection datasets (MVTec-AD, VisA, and BTAD) to showcase the state-of-the-art (SOTA) performance of the proposed PADiff. The source code is publicly available at https://github.com/Jay-zzcoder/padiff

PaperID: 1346,

Authors: Zongsheng Huang, Tieshan Li, Yue Long, Hongjing Liang

Affiliations: School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China

Title: Prescribed-Time Human-in-the-Loop Optimal Synchronization Control for Multiagent Systems Under DoS Attacks via Reinforcement Learning

Abstract:
The prescribed-time (PT) human-in-the-loop (HiTL) optimal synchronization control problem for multiagent systems (MASs) under link-based denial-of-service (DoS) attacks is investigated. First, the HiTL framework enables the human operator to govern the MASs by transmitting commands to the leader. The link-based DoS attacks cause communication blockages between agents, resulting in topology switching. Under the switching communication topology, a fully distributed observer is proposed for each follower, which simultaneously integrates a prescribed finite-time function to estimate the leader’s output within the PT. This observer is characterized by a bounded gain at the PT point and guarantees global practical PT convergence, while avoiding the use of global topology information. By combining the follower dynamics with the proposed observer, an augmented system is developed. Subsequently, the model-free Q-learning algorithm is used to learn the optimal synchronization policy directly from real system data. To reduce computational burden, the Q-learning algorithm is implemented using a single critic neural network (NN) structure, with the least-squares method applied to train the NN weights. The convergence of the Q-functions generated by the proposed Q-learning algorithm is proven. Finally, simulation results verify the effectiveness of the proposed control scheme.

PaperID: 1347,

Authors: Yanli Li, Zhongliang Guo, Nan Yang, Huaming Chen, Dong Yuan, Weiping Ding

Affiliations: School of Artificial Intelligence and Computer Science, Nantong University, Nantong, China; School of Computer Science, University of St Andrews, St Andrews, U.K.; School of Electrical and Computer Engineering, The University of Sydney, Sydney, Australia

Title: Threats and Defenses in the Federated Learning Life Cycle: A Comprehensive Survey and Challenges

Abstract:
Federated learning (FL) offers innovative solutions for privacy-preserving distributed machine learning (ML). Different from centralized data collection algorithms, FL enables participants to locally train their model and only share the model updates for aggregation. Since private data never leaves the end node, FL effectively mitigates privacy leakage during collaborative training. Despite its promising potential, FL is vulnerable to various attacks due to its distributed nature, affecting the entire life cycle of FL services. These threats can harm the model’s utility or compromise participants’ privacy, either directly or indirectly. In response, numerous defense frameworks have been proposed, demonstrating effectiveness in specific settings and scenarios. To provide a clear understanding of the current research landscape, this article reviews the most representative and state-of-the-art threats and defense frameworks throughout the FL service life cycle. We start by identifying FL threats that harm utility and privacy, including those with potential or direct impacts. Then, we dive into the defense frameworks, analyze the relationship between threats and defenses, and compare the trade-offs among different defense strategies. We subsequently revisit these studies to evaluate their practicality in real-world scenarios and conclude by summarizing existing research bottlenecks and outlining future directions. We hope this survey sheds light on trustworthy FL research and contributes to the FL community.

PaperID: 1348,

Authors: Jinzong Dong, Zhaohui Jiang, Dong Pan, Zhiwen Chen, Qingyi Guan, Hongbin Zhang, Gui Gui, Weihua Gui

Affiliations: School of Automation, Central South University, Changsha, China

Title: A Survey on Confidence Calibration of Deep Learning-Based Classification Models Under Class Imbalance Data

Abstract:
Confidence calibration in classification models is a vital technique for accurately estimating the posterior probabilities of predicted results, which is crucial for assessing the likelihood of correct decisions in real-world applications. Class imbalance data, which biases the model’s learning and subsequently skews predicted posterior probabilities, makes confidence calibration more challenging. Especially for underrepresented classes, which are often more important and tend to have higher uncertainty, confidence calibration is more complex and essential. Unlike previous surveys that typically separately investigate confidence calibration or class imbalance, this article comprehensively investigates confidence calibration methods for deep learning-based classification models under class imbalance. First, the problem of confidence calibration under class imbalance data is outlined. Second, this article explores the impact of class imbalance data on confidence calibration in theory, providing some explanations for empirical findings in existing studies. Third, this article reviews 60 state-of-the-art confidence calibration methods under class imbalance data, divides these methods into six groups according to method differences, and systematically compares seven properties to evaluate their superiority. Then, some commonly used and emerging evaluation methodology are summarized, including public datasets and evaluation metrics. Subsequently, this article performs necessary comparative experiments to provide better guidelines and insights to the readership. Finally, we discuss several application fields and promising research directions that serve as a guideline for future studies.

PaperID: 1349,

Authors: Jianhai Zhang, Tonghua Wan, M. Ethan MacDonald, Bijoy K. Menon, Aravind Ganesh, Wu Qiu

Affiliations: School of Life Science and Technology, Advanced Biomedical Imaging Facility, Huazhong University of Science and Technology, Wuhan, China; Department of Biomedical Engineering, Schulich School of Engineering, University of Calgary, Calgary, AB, Canada; Department of Clinical Neurosciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada

Title: Synchronous Image-Label Diffusion Probability Model With Application to Stroke Lesion Segmentation on Non-Contrast CT

Abstract:
The stroke lesion volume is a key radiologic measurement for assessing the prognosis of acute ischemic stroke (AIS) patients, which is challenging to be automatically measured on noncontrast CT (NCCT) scans. Recent diffusion probabilistic models (DPMs) in the domain of image generation have shown potentials of being used for lesion volume segmentation on medical images. In this article, a novel synchronous image-label diffusion probability model (SDPM) is proposed for stroke lesion segmentation on NCCT using a dual-Markov diffusion process with shared noise. The proposed SDPM is fully based on a generative latent variable model (LVM), offering a probabilistic elaboration from stem to stem. To fit into our segmentation tasks using the strength from generation models, we develop the architecture of the network where an additional net-stream, parallel with a noise prediction stream, is introduced to obtain the initial label estimates with noise for efficiently inferring the final labels. By optimizing the specified variational boundaries, the trained model can infer the final label estimates given the input images at any scale of time in four different label-inference methods, which gives more flexibility to the proposed SDPM. The proposed model was assessed on three stroke lesion datasets including one public and two private datasets. Compared with several U-Net, transformer, and DPM-based segmentation methods, our proposed SDPM model is able to achieve the state-of-the-art accuracy.

PaperID: 1350,

Authors: Kehua Yuan, Duoqian Miao, Hongyun Zhang, Witold Pedrycz

Affiliations: Pilot Software Engineering School, With Chinese Characteristics, School of Computer Science and Technology National, Tongji University, Shanghai, China; Department of Measurement and Control Systems, Silesian University of Technology (SUT), Gliwice, Poland

Title: An Efficient and Robust Feature Selection Approach Based on Zentropy Measure and Neighborhood-Aware Model

Abstract:
The feature selection based on the rough set (RS) theory has been an active research topic in data mining and knowledge discovery. Fuzzy RSs (FRSs), an efficient tool to process the inconsistency between features and decisions, have attracted attention to the problems of feature selection. However, most FRSs-based feature selection methods pay much attention to the approximation space while ignoring the interaction between different levels. Note that the single-level feature selection method, depending on the boundary objects, is easily influenced by the noise data and cannot integrate multiple granular levels to evaluate features accurately. Therefore, this article proposes an efficient and robust feature selection approach based on the neighborhood-aware model and zentropy measure. Specifically, we first define a neighborhood-aware FRS (NAFRS) with weighted fuzzy relation to improve the antinoise ability of FRSs. Then, we propose a fuzzy granule zentropy (FGZE) measure based on zentropy by analyzing the granular level relation in NAFRS. Moreover, a significance measure with FGZE is designed and applied to feature selection. Finally, the experimental results of our method on 22 datasets by comparing it with 12 representative feature selection methods demonstrate the antinoise and the classification ability of the proposed method.

PaperID: 1351,

Authors: Ying Kong, Xi Chen, Yunliang Jiang, Danfeng Sun

Affiliations: Department of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, China; Zhejiang Key Laboratory of Intelligent Education Technology and Application and the School of Computer Science and Technology, Zhejiang Normal University, Jinhua, China; Department of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China

Title: Milne-Hamming Method With Zeroing Neural Network for Time-Varying Nonlinear Optimization and Redundant Manipulator Application

Abstract:
Continuous zeroing neural network (ZNN) and its discrete ZNN (DZNN) are comprehensively developed in many optimization systems. In this article, a Milne–Hamming method with DZNN classified as an implicit method is proposed and discussed upon the previous researches. Specifically, the Milne–Hamming discrete ZNN (MHDZNN) model is aimed for time-varying nonlinear optimization (TV-NO) problem with functional limitations. This Milne–Hamming (MH) method is a four-step discretized formula with fixed parameters and is introduced to discretize the ZNN model. Theoretical analyses of the MHDZNN model derive that MHDZNN possesses a larger stepsize domain \mu \in (0,1/2) of absolute stability. Its convergent error is of order O(\tau ^5) and the corresponding truncation error constant is 1/40 , which shows intimate relation to the accuracy. Compared with the existing DZNN models such as four-step explicit methods with the same O(\tau ^5) pattern, the convergent error constant of MHDZNN is smaller by a factor and maximal stability domain is greater. Finally, numerical simulations and application to redundant manipulators are provided and studied to verify the effectiveness of the proposed MHDZNN model.

PaperID: 1352,

Authors: Yaxin Fan, Peifeng Li, Fang Kong, Qiaoming Zhu

Affiliations: School of Computer Science and Technology, Soochow University, Suzhou, China

Title: Enhancing Multiparty Dialog Discourse Parsing With Dynamic Task-Adaptive Graph Transformer and Difficulty-Aware Task Scheduling

Abstract:
Multiparty dialog discourse parsing (MDDP) aims to identify the links between pairs of utterances and recognize their discourse relations. Previous research has attempted to address data sparsity in discourse parsing through multitask learning, but these efforts often relied on manually annotated fine-grained information, limiting their practical applicability. In this study, we propose dynamic task-adaptive graph transformer with difficulty-aware task scheduling (DTGT-DTS), an innovative multitask approach that enhances discourse parsing by leveraging neighboring tasks like addressee recognition and speaker identification, without requiring additional annotations. These tasks share common discourse links with discourse parsing but also possess distinct private links. To tackle this, we design a dynamic task-adaptive graph transformer (DTGT) that captures shared links between discourse parsing and its neighboring tasks while distinguishing the private links of neighboring tasks. In addition, we develop a difficulty-aware task scheduling (DTS) strategy that promotes multitask learning by dynamically adjusting training priorities based on the relative difficulty of different tasks. Experimental results on two widely used discourse datasets—Molweni (78 245 links and relations) and STAC (12 691 links and relations)—show that our DTGT-DTS model achieves a 6.07% and 5.31% performance improvement in link identification, respectively, and a 7.27% and 6.02% improvement in relation recognition.

PaperID: 1353,

Authors: Ying Yan, Jiayue Sun, Huaguang Zhang, Xin Liu, Jian Pan

Affiliations: College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China; College of Mechanical Engineering and Automation, Northeastern University, Shenyang, Liaoning, China

Title: Mapping Tracking Control to Cascading Optimization in Discrete Strict-Feedback Systems: A Hierarchical Learning Approach

Abstract:
This article introduces a hierarchical learning (HL) framework for discrete-time (DT) systems with strict feedback structures to enable tracking control. Unlike the backstepping approach, our method creates dynamically adjustable virtual targets (VTs) for state variables at each layer, forming a cascading optimization structure. This innovative framework enables each layer to learn by approximating the solution to the DT Hamilton-Jacobi–Bellman (HJB) equation, thereby facilitating inter-layer self-optimization and directing the modification of adjacent VTs. To tackle the noncausal problem, we implement an iterative predictive learning framework that maps the current measurable state and known reference trajectories to VTs. This process allows the VTs to gradually align with the optimal trajectory during policy evaluation and update, achieving indirect tracking of state variables toward the desired targets. Additionally, the action network is transformed into a tracking network, incorporating future tracking errors to optimize its weights. This approach reduces tracking costs in the subsequent policy update while improving tracking performance. Rigorous convergence analysis and numerical simulations confirm the effectiveness of our method, highlighting its considerable potential in adaptive control.

PaperID: 1354,

Authors: Yuxuan Wen, Yunfei Yin

Affiliations: College of Computer Science, Chongqing University, Chongqing, China; Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, OH, USA

Title: Bridging Task-Specific and Task-Interactive Features With Opportune Branching and Adaptive Attention for Object Detection

Abstract:
Object detection is a fundamental task that usually requires the optimization of two sub-tasks (i.e., localization and classification). However, there exists a lack of understanding regarding the changing pattern of their preferred interest locations. Existing work adopts alternating detection head designs in terms of handling task-interactive and task-specific features. To tackle this issue, we conduct a thorough analysis to investigate the contradicting focus-shifting patterns of these sub-tasks. Specifically, we first collect data points on the MS-COCO dataset and conduct numerical analysis to pinpoint the optimal branching point by evaluating the effect size metrics of feature similarity and by calculating the 2-D inter-cluster distances between features among potential branching points. Then, qualitative analysis regarding the feature representation is carried out to further justify the results. At last, we demonstrate the potential generalizability of our analysis pipelines across various architectures, label assignment methods, training techniques, and datasets. In light of the above finding, we propose the opportune branching head that leverages the conflict between task-interactive and task-specific features by decoupling the sub-tasks at the condign point to maximize the preference. We further extend the concept of opportune branching and propose the adaptive attention mechanism to enable more effective attention allocation in a laconic manner, magnifying the effect of opportune branching. We conduct extensive experiments on the MS-COCO benchmark, the PASCAL VOC benchmark, and the Cityscape benchmark, where our method achieves competitive results. We achieve 50.0 AP with the ResNeXt-101-4d-64 backbone and 59.8 AP with the Swin-L transformer backbone on the MS-COCO benchmark, representing the best performance among nontransformer-based methods while also outperforming many state-of-the-art transformer-based methods by a clear margin.

PaperID: 1355,

Authors: Tong Zhang, Wenhua Jiao, Jiguo Yu, Yudou Xiong

Affiliations: School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China; Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou, China; School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China

Title: Short-Term Residential Load Forecasting Framework Based on Spatial-Temporal Fusion Adaptive Gated Graph Convolution Networks

Abstract:
Enhancing the prediction of volatile and intermittent electric loads is one of the pivotal elements that contributes to the smooth functioning of modern power grids. However, conventional deep learning-based forecasting techniques fall short in simultaneously taking into account both the temporal dependencies of historical loads and the spatial structure between residential units, resulting in a subpar prediction performance. Furthermore, the representation of the spatial graph structure is frequently inadequate and constrained, along with the complexities inherent in Spatial–Temporal data, impeding the effective learning among different households. To alleviate those shortcomings, this article proposes a novel framework: Spatial–Temporal fusion adaptive gated graph convolution networks (STFAG-GCNs), tailored for residential short-term load forecasting (STLF). Spatial–Temporal fusion graph construction is introduced to compensate for several existing correlations where additional information are not known or unreflected in advance. Through an innovative gated adaptive fusion graph convolution (AFG-Conv) mechanism, Spatial–Temporal fusion graph convolution network (STFGCN) dynamically model the Spatial–Temporal correlations implicitly. Meanwhile, by integrating a gated temporal convolutional network (Gated TCN) and multiple STFGCNs into a unified Spatial–Temporal fusion layer, STFAG-GCN handles long sequences by stacking layers. Experimental results on real-world datasets validate the accuracy and robustness of STFAG-GCN in forecasting short-term residential loads, highlighting its advancements over state-of-the-art methods. Ablation experiments further reveal its effectiveness and superiority.

PaperID: 1356,

Authors: Htoo Wai Aung, Jiao Jiao Li, Yang An, Steven Weidong Su

Affiliations: School of Biomedical Engineering, Faculty of Engineering and IT, University of Technology Sydney, Sydney, NSW, Australia; College of Medical Information and Artificial Intelligence, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, Shandong, China

Title: A Real-Time Framework for EEG Signal Decoding With Graph Neural Networks and Reinforcement Learning

Abstract:
Brain–computer interfaces (BCIs) rely on accurately decoding electroencephalography (EEG) motor imagery (MI) signals for effective device control. Graph neural networks (GNNs) outperform convolutional neural networks (CNNs) in this regard, by leveraging the spatial relationships between EEG electrodes through adjacency matrices. The EEG graph lottery ticket framework, EEG_GLT-Net, featuring the state-of-the-art (SOTA) EEG_GLT adjacency matrix method, has notably enhanced EEG MI signal classification, evidenced by an average accuracy of 83.95% across 20 subjects on the PhysioNet dataset. This significantly exceeds the 76.10% accuracy rate achieved using the Pearson correlation coefficient (PCC) method in the same framework. In this research, we advance the field by applying a reinforcement learning (RL) approach to the classification of EEG MI signals. Our innovative method empowers the RL agent, enabling not only the classification of EEG MI data points with higher accuracy but effective identification of EEG MI data points that are less distinct. We present the EEG_RL-Net, an enhancement of the EEG_GLT-Net framework, which incorporates the trained EEG_GCN Block from EEG_GLT-Net at an adjacency matrix density of 13.39% alongside the RL-centric dueling deep Q network (Dueling DQN) block. The EEG_RL-Net model showcases exceptional classification performance, achieving an unprecedented average accuracy of 96.40% across 20 subjects within 25 ms. This model illustrates the transformative effect of the RL in EEG MI time point classification.

PaperID: 1357,

Authors: Hairong Dong, Lingbin Ning, Min Zhou, Haifeng Song, Weiqi Bai

Affiliations: School of Automation and Intelligence, Beijing Jiaotong University, Beijing, China; School of Electronic and Information Engineering, Beihang University, Beijing, China

Title: Deep Reinforcement Learning for Integration of Train Trajectory Optimization and Timetable Rescheduling Under Disturbances

Abstract:
High-speed trains are susceptible to unexpected events such as strong winds and equipment failures, which can result in deviations from the scheduled timetable. As the density of traffic increases, these delays can quickly spread to other trains, eventually leading to conflicts in the timetable. To ensure the efficiency of high-speed railways, quickly resolving potential conflicts and generating appropriate rescheduling schemes are essential. The existing hierarchical structure of train control and online rescheduling tends to be inefficient in terms of information communication and can even lead to unfeasible rescheduled timetables and trajectories. To address these issues, an integrated structure of timetable rescheduling and train trajectory optimization is proposed by introducing the train minimum running time into the process of timetable rescheduling and using the adjusted running time as the objective of trajectory optimization. The integration model is formulated by considering the constraints of timetable rescheduling such as the maximum number of trains overtaking trains, platforms at stations, and the priority of the train, as well as the constraints of trajectory optimization. A deep reinforcement learning (DRL)-based approach is proposed to solve the problem. Numerical experiments are conducted on a segment of the Beijing-Shanghai high-speed railway line, using adapted data to demonstrate the effectiveness of the proposed method in rescheduling timetables and optimizing train trajectories. The results show that the integrated rescheduled timetable and the optimized train trajectory can be generated simultaneously and the computation time exhibits a linear increase with respect to the size of the problem.

PaperID: 1358,

Authors: Chuhan Zhang, Wei Pan, Cosimo Della Santina

Affiliations: Department of Cognitive Robotics, Faculty of Mechanical Engineering, Delft University of Technology, Delft, The Netherlands; Department of Computer Science, The University of Manchester, Manchester, U.K.

Title: NiSNN-A: Noniterative Spiking Neural Network With Attention With Application to Motor Imagery EEG Classification

Abstract:
Motor imagery (MI), an important category in electroencephalogram (EEG) research, often intersects with scenarios demanding low energy consumption, such as portable medical devices and isolated environment operations. Traditional deep learning (DL) algorithms, despite their effectiveness, are characterized by significant computational demands accompanied by high energy usage. As an alternative, spiking neural networks (SNNs), inspired by the biological functions of the brain, emerge as a promising energy-efficient solution. However, SNNs typically exhibit lower accuracy than their counterpart convolutional neural networks (CNNs). Although attention mechanisms successfully increase network accuracy by focusing on relevant features, their integration in the SNN framework remains an open question. In this work, we combine the SNN and the attention mechanisms for the EEG classification, aiming to improve precision and reduce energy consumption. To this end, we first propose a noniterative leaky integrate-and-fire (NiLIF) neuron model, overcoming the gradient issues in traditional SNNs that use iterative LIF neurons for long time steps. Then, we introduce the sequence-based attention mechanisms to refine the feature map. We evaluated the proposed noniterative SNN with attention (NiSNN-A) model on two MI EEG datasets, OpenBMI and BCIC IV 2a. Experimental results demonstrate that: 1) our model outperforms other SNN models by achieving higher accuracy and 2) our model increases energy efficiency compared with the counterpart CNN models (i.e., by 2.13 times) while maintaining comparable accuracy.

PaperID: 1359,

Authors: Ruochen Li, Stamos Katsigiannis, Tae-Kyun Kim, Hubert P. H. Shum

Affiliations: Department of Computer Science, Durham University, DH LE Durham, U.K.; Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Yuseong-gu, South Korea

Title: BP-SGCN: Behavioral Pseudo-Label Informed Sparse Graph Convolution Network for Pedestrian and Heterogeneous Trajectory Prediction

Abstract:
Trajectory prediction allows better decision-making in applications of autonomous vehicles (AVs) or surveillance by predicting the short-term future movement of traffic agents. It is classified into pedestrian or heterogeneous trajectory prediction. The former exploits the relatively consistent behavior of pedestrians, but is limited in real-world scenarios with heterogeneous traffic agents such as cyclists and vehicles. The latter typically relies on extra class label information to distinguish the heterogeneous agents, but such labels are costly to annotate and cannot be generalized to represent different behaviors within the same class of agents. In this work, we introduce the behavioral pseudo-labels that effectively capture the behavior distributions of pedestrians and heterogeneous agents solely based on their motion features, significantly improving the accuracy of trajectory prediction. To implement the framework, we propose the behavioral pseudo-label informed sparse graph convolution network (BP-SGCN) that learns pseudo-labels and informs to a trajectory predictor. For optimization, we propose a cascaded training scheme, in which we first learn the pseudo-labels in an unsupervised manner, and then perform end-to-end fine-tuning on the labels in the direction of increasing the trajectory prediction accuracy. Experiments show that our pseudo-labels effectively model different behavior clusters and improve trajectory prediction. Our proposed BP-SGCN outperforms existing methods using both pedestrian [ETH/UCY, pedestrian-only Stanford Drone Dataset (SDD)] and heterogeneous agent datasets (SDD and Argoverse 1).

PaperID: 1360,

Authors: Zijie Zhao, Yuqian Fu, Jiajun Chai, Yuanheng Zhu, Dongbin Zhao

Affiliations: School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China

Title: Meta Learning Task Representation in Multiagent Reinforcement Learning: From Global Inference to Local Inference

Abstract:
Multiagent meta reinforcement learning (MAMRL) enables multiagent systems (MASs) to adapt to multiple tasks. However, partial observability poses a significant challenge by hindering efficient task inference from agents’ limited local experiences. To address this, we propose MG2L, a novel algorithm featuring a global-to-local (G2L) training scheme based on mutual information optimization (MIO). We first extend the centralized training and decentralized execution (CTDE) framework to MAMRL, and introduce a multilevel task encoder for joint global and local task inference. Building on this encoder, the MG2L scheme employs tailored loss functions to optimize task representations. For global inference, the MAS learns a centralized global representation by maximizing the MI between the representation and the task context. For local inference, we formulate conditional MI reduction to quantify the G2L gap. Agents then learn the local representation by minimizing this reduction. The MG2L scheme effectively harmonizes centralized training with decentralized execution, offering a versatile solution for MAMRL challenges. Additionally, we integrate a permutation-invariant attention (PIA) module into the task encoder to reduce sensitivity to behavior policy variations. Extensive experiments—including comparative analyses, ablation studies, meta-test evaluations, and visualizations—demonstrate MG2L’s effectiveness. The implementation of MG2L is publicly available at https://github.com/zhaozijie2022/mg2l.

PaperID: 1361,

Authors: Pratap Anbalagan, Zhiguang Feng, Tingwen Huang, Yukang Cui

Affiliations: College of Mechatronics and Control Engineering and the College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China; College of Automation, Harbin Engineering University, Harbin, China; Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, China; College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen, China

Title: Mean-Square Synchronization of Additive Time-Varying Delayed Markovian Jumping Neural Networks Under Multiple Stochastic Sampling

Abstract:
This study aims to solve the mean-square asymptotic synchronization problem of additive time-varying delayed Markovian jumping neural networks (ATVMJNNs) under the framework of multiple stochastic samplings and its direct application in secure image encryption (SIE). To do this, first, we assume the existence of multiple sampled data periods that satisfy a Bernoulli distribution and introduce random variables to represent the positions of input delays and sampling periods. Then, based on these assumptions, we develop a mode-dependent discontinuous Lyapunov-Krasovskii functional (DLKF) to reduce model conservatism. Next, we introduce a new auxiliary slack-matrix-based integral inequality (ASMBII) to approximate the integral quadratic terms arising from the derivative of the DLKFs. Furthermore, we develop a multiple stochastic sampling framework to achieve asymptotic synchronization between the primary and secondary systems, and less conservative criteria for asymptotic stability in the mean square sense of the error model are derived by solving a set of linear matrix inequalities (LMIs). Finally, we present the numerical validations and corresponding experimental results in a pragmatic application of image processing to demonstrate the benefits of the proposed algorithms and techniques. From both numerical and practical results, the proposed algorithms and techniques can yield superior performance compared to existing studies.

PaperID: 1362,

Authors: Jinglin Zhang, Zekai Zhang, Qinghui Chen, Gang Li, Weiyu Li, Shijiao Ding, Maomao Xiong, Wenhao Zhang, Shengyong Chen

Affiliations: School of Control Science and Engineering, Shandong University, Jinan, China; Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan, China; School of Mathematics, Shandong University, Jinan, China; School of Computer Sciences and Engineering, Tianjin University of Technology, Tianjin, China

Title: Representation Learning Based on Co-Evolutionary Combined With Probability Distribution Optimization for Precise Defect Location

Abstract:
Visual defect detection methods based on representation learning play an important role in industrial scenarios. Defect detection technology based on representation learning has made significant progress. However, existing defect detection methods still face three challenges: first, the extreme scarcity of industrial defect samples makes training difficult. Second, due to the characteristics of industrial defects, such as blur and background interference, it is challenging to obtain fuzzy defect separation edges and context information. Third, industrial defects cannot obtain accurate positioning information. This article proposes feature co-evolution interaction architecture (CIA) and glass container defect dataset to address the above challenges. Specifically, the contributions of this article are as follows: first, this article designs a glass container image acquisition system that combines RGB and polarization information to create a glass container defect dataset containing more than 60000 samples to alleviate the sample scarcity problem in industrial scenarios. Subsequently, this article designs the CIA. CIA optimizes the probability distribution of features through the co-evolution of edge and context features, thereby improving detection accuracy in blurred defects and noisy environments. Finally, this article proposes a novel inforced IoU loss (IIoU loss), which can obtain more accurate position information by being aware of the scale changes of the predicted box. Defect detection experiments in three mainstream industrial manufacturing categories (Northeastern University (NEU)-Det, glass containers, wood) show that CIA only uses 22.5 GFLOPs, and mean average precision (mAP) (NEU-Det: 88.74%, glass containers: 95.38%, wood: 68.42%) outperforms state-of-the-art methods.

PaperID: 1363,

Authors: Feng Yan, Cong Wang, Zichen Wang, Yuhao Shen, Chunjie Yang

Affiliations: State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou, China

Title: SENGraph: A Self-Learning Evolutionary and Node-Aware Graph Network for Soft Sensing in Industrial Processes

Abstract:
The last decade has witnessed the growing prevalence of deep models on soft sensing in industrial processes. However, most of the existing soft sensing models are developed to learn from regular data in the Euclidean space, ignoring the complex coupling relations among process variables. On the other hand, graph networks are gaining attraction in handling non-Euclidean relations in industrial data. However, the existing graph networks on soft sensing models still suffer from two major issues: 1) how to capture the intervariable structural relations and intravariable temporal dependencies from dynamic and strongly coupled industrial data and 2) how to learn from nodes with distinctive importance for the soft sensing task. To address these problems, we propose a self-learning evolutionary and node-aware graph network (SENGraph) for industrial soft sensing. We first develop a self-learning graph generation (SLG) module to combine the coarse- and fine-grained graphs to capture the global trend and local dynamics from process data. Then, we build a self-evolutionary graph module (EGM) to obtain diversified node features from the entire graph using mutation and crossover strategies. Finally, we design a node-aware module (NAM) to highlight the informative nodes and suppress the less significant ones to further improve the discriminative ability of the downstream soft sensing. Extensive experimental results and analysis on four real-world industrial datasets demonstrate that our proposed SENGraph model outperforms the existing state-of-the-art (SOTA) soft sensing methods.

PaperID: 1364,

Authors: Shashank Kotyan, Tatsuya Ueda, Danilo Vasconcellos Vargas

Affiliations: Laboratory of Intelligent Systems, Kyushu University, Fukuoka, Japan; SoftBank Group Corporation, Tokyo, Japan

Title: k* Distribution: Evaluating the Latent Space of Deep Neural Networks Using Local Neighborhood Analysis

Abstract:
Most examinations of neural networks’ learned latent spaces typically employ dimensionality reduction techniques such as t-distributed stochastic neighbor embedding (t-SNE) or uniform manifold approximation and projection (UMAP). These methods distort the local neighborhood in the visualization, making it hard to distinguish the structure of a subset of samples in the latent space. In response to this challenge, we introduce the k distribution and its corresponding visualization technique. This method uses local neighborhood analysis to guarantee the preservation of the structure of sample distributions for individual classes within the subset of the learned latent space. This facilitates easy comparison of different k distributions, enabling analysis of how various classes are processed by the same neural network. Our study reveals three distinct distributions of samples within the learned latent space subset: 1) fractured; 2) overlapped; and 3) clustered, providing a more profound understanding of the existing contemporary visualizations. Experiments show that the distribution of samples within the network’s learned latent space significantly varies depending on the class. Furthermore, we illustrate that our analysis can be applied to explore the latent space of diverse neural network architectures, various layers within neural networks, transformations applied to input samples, and the distribution of training and testing data for neural networks. Thus, the k distribution should aid in visualizing the structure inside neural networks and further foster their understanding.

PaperID: 1365,

Authors: Chen Li, Xianwei Zheng, Chuangquan Chen, Zicong Deng, Yiqing Shu

Affiliations: School of Mathematics, Foshan University, Foshan, China; School of Electronics and Information Engineering, Wuyi University, Jiangmen, China; Guangzhou Vocational College of Technology and Business, Guangzhou, China; School of Physics and Optoelectronic Engineering, Foshan University, Foshan, China

Title: Tiny Data Is Sufficient: A Generalizable CNN Architecture for Temporal Domain Long Sequence Identification

Abstract:
Deep learning (DL) models have made remarkable progress in various sequence processing tasks. It is widely acknowledged that these models heavily rely on numerous training data and finely tuned parameters. Recent studies highlighted that conventional convolutions in deep networks may hamper feature processing efficacy, particularly in long temporal sequence analysis, due to their limited feature representation capabilities. To tackle these challenges, this article introduces a novel generalizable convolutional neural network (GeCNN) architecture tailored for temporal domain long sequence identification. Our framework incorporates three key components: the generic convolutional neural network (CNN), selective CNN, and multiple pooling layers. The generic CNN implements customizable hyper-convolutional operations through non-linear convolvers, thereby enhancing feature representation effectiveness and significantly improving accuracy. Subsequently, the selective CNN is designed to abate the demand for large training data by focusing on various subsequences. We propose the homogeneous striding principle and the partial homogeneous striding theorem to theoretically support the method. The multiple pooling combines eight distinct pooling operations to mitigate the statistical information loss problem typically associated with single pooling actions. Experimental results demonstrate that our GeCNN architecture achieves superior performance with shallow networks and tiny data compared to existing deep networks. The accuracy of the best-trained model surpasses the ResNet and self-attention-based models by 9.51% and 16.79% utilizing only 0.18% of data for training in the GTZAN dataset. Meanwhile, the accuracy of the optimal model overtakes the other two models by 5.35% and 10.16% while using merely 1.56% of data for training in the PLAID dataset.

PaperID: 1366,

Authors: Zhenling Mo, Zijun Zhang, Qiang Miao, Kwok-Leung Tsui

Affiliations: Department of Data Science, College of Computing, City University of Hong Kong, Hong Kong, SAR, China; College of Electrical Engineering, Sichuan University, Chengdu, Sichuan, China; Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA, USA

Title: Extended Invariant Risk Minimization for Machine Fault Diagnosis With Label Noise and Data Shift

Abstract:
Incorrect labels as well as the discrepancy between training and test domain data distributions can significantly affect the effectiveness of supervised data-driven models in machine fault diagnosis applications. Such a challenge can be characterized as the noisy label-domain generalization (NL-DG) problem. In this article, the extended invariant risk minimization (EIRM) is developed, which incorporates flat minima seeking to address the NL-DG challenge. The ability of handling NL-DG is realized by shifting the gradient penalty base from the dummy classifier to the entire model. EIRM is shown to be closely related to locating a flat minimum, which is crucial for label noise (LN) robustness and model generalization. Explorations on function smoothness and algorithm convergence are offered to understand EIRM from the theoretical aspect. An efficient implementation of EIRM is also developed to construct the fault diagnosis model. The EIRM-based fault diagnosis method is compared with strong benchmarks on multiple NL-DG tasks using actuator and gearbox fault datasets. Results indicate that the EIRM-based method on average is more effective than the benchmarks. The code is available at https://github.com/mozhenling/doge-eirm.

PaperID: 1367,

Authors: Jiacheng Wu, Bosen Lian, Hongye Su, Yang Zhu

Affiliations: State Key Laboratory of Industrial Control Technology, Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China; Electrical and Computer Engineering Department, Auburn University, Auburn, AL, USA

Title: Reinforcement Learning-Based H∞ Control of 2-D Markov Jump Roesser Systems With Optimal Disturbance Attenuation

Abstract:
This article investigates model-free reinforcement learning (RL)-based \mathcal H_\infty control problem for discrete-time 2-D Markov jump Roesser systems (2-D MJRSs) with optimal disturbance attenuation level. This is compared to existing studies on \mathcal H_\infty control of 2-D MJRSs with optimal disturbance attenuation levels that are off-line and use full system dynamics. We design a comprehensive model-free RL algorithm to solve optimal \mathcal H_\infty control policy, optimize disturbance attenuation level, and search for the initial stabilizing control policy, via online horizontal and vertical data along 2-D MJRSs trajectories. The optimal disturbance attenuation level is obtained by solving a set of linear matrix inequalities based on online measurement data. The initial stabilizing control policy is obtained via a data-driven parallel value iteration (VI) algorithm. Besides, we further certify the performance including the convergence of the RL algorithm and the asymptotic mean-square stability of the closed-loop systems. Finally, simulation results and comparisons demonstrate the effectiveness of the proposed algorithms.

PaperID: 1368,

Authors: Siyuan Chen, Xin Du, Jiahai Wang

Affiliations: School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China; Civil Aviation Electronic Information Engineering College, Guangzhou Civil Aviation College, Guangzhou, China; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China

Title: A Hierarchical Framework With Spatio-Temporal Consistency Learning for Emergence Detection in Complex Adaptive Systems

Abstract:
Emergence, a global property of complex adaptive systems (CASs) constituted by interactive agents, is prevalent in real-world dynamic systems, e.g., network-level traffic congestions. Detecting its formation and evaporation helps to monitor the state of a system, allowing it to issue a warning signal for harmful emergent phenomena. Since there is no centralized controller of CAS, detecting emergence based on each agent’s local observation is desirable but challenging. Existing works are unable to capture emergence-related spatial patterns, and fail to model the nonlinear relationships among agents. This article proposes a hierarchical framework with spatio-temporal consistency learning (HSTCL) to solve these two problems by learning the system representation and agent representations, respectively. Spatio-temporal encoders (STEs) composed of spatial and temporal transformers are designed to capture agents’ nonlinear relationships and the system’s complex evolution. Agents’ and the system’s representations are learned to preserve the spatio-temporal consistency by minimizing the spatial and temporal dissimilarities in a self-supervised manner in the latent space. Our method achieves more accurate detection than traditional methods and deep learning methods on three datasets with well-known yet hard-to-detect emergent behaviors. Notably, our hierarchical framework is generic in incorporating other deep learning methods for agent-level and system-level detection.

PaperID: 1369,

Authors: Mouquan Shen, Zihao Wang, Song Zhu, Xudong Zhao, Guangdeng Zong, Qing-Guo Wang

Affiliations: College of Electrical Engineering and Control Science, Nanjing Tech University, Nanjing, China; School of Mechanical and Power Engineering, Nanjing Tech University, Nanjing, China; School of Mathematics, China University of Mining and Technology, Xuzhou, China; Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China; School of Control Science and Engineering, Tiangong University, Tianjin, China; Institute of Artificial Intelligence and Future Networks, Beijing Normal University, Zhuhai, China

Title: Neural Network Adaptive Iterative Learning Control for Strict-Feedback Unknown Delay Systems Against Input Saturation

Abstract:
Neural network adaptive iterative learning control (ILC) is developed in this article to treat strict-feedback nonlinear systems with unknown state delays and input saturation. These delays are treated by constructing the Lyapunov-Krasovskii (L-K) functions for each subsystem. A command filter is employed to avoid the derivative explosion caused by continuous differentiation of the virtual controller. Corresponding auxiliary systems are designed and integrated into the backstepping procedure to compensate input saturation and the unimplemented part of the filter. Hyperbolic tangent functions and radial basis function neural networks (RBF NNs) are employed to treat singularity and related unknown terms, respectively. The convergence of the resultant strict-feedback systems is ensured in the framework of composite energy function (CEF). Finally, a simulation example is adopted to substantiate the validity of the proposed algorithm.

PaperID: 1370,

Authors: Resmi Ramachandranpillai, Ricardo Baeza-Yates, Fredrik Heintz

Affiliations: Institute for Experiential AI, Northeastern University, Portland, ME, USA; Institute for Experiential AI, Northeastern University, Silicon Valley, San Jose, CA, USA; Department of Computer and Information Sciences, Linköping University, Linköping, Sweden

Title: FairXAI - A Taxonomy and Framework for Fairness and Explainability Synergy in Machine Learning

Abstract:
Explainable artificial intelligence (XAI) and fair learning have made significant strides in various application domains, including criminal recidivism predictions, healthcare settings, toxic comment detection, automatic speech detection, recommendation systems, and image segmentation. However, these two fields have largely evolved independently. Recent studies have demonstrated that incorporating explanations into decision-making processes enhances the transparency and trustworthiness of AI systems. In light of this, our objective is to conduct a systematic review of FairXAI, which explores the interplay between fairness and explainability frameworks. To commence, we propose a taxonomy of FairXAI that utilizes XAI to mitigate and evaluate bias. This taxonomy will be a base for machine learning researchers operating in diverse domains. Additionally, we will undertake an extensive review of existing articles, taking into account factors such as the purpose of the interaction, target audience, and domain and context. Moreover, we outline an interaction framework for FairXAI considering various fairness perceptions and propose a FairXAI wheel that encompasses four core properties that must be verified and evaluated. This will serve as a practical tool for researchers and practitioners, ensuring the fairness and transparency of their AI systems. Furthermore, we will identify challenges and conflicts in the interactions between fairness and explainability, which could potentially pave the way for enhancing the responsibility of AI systems. As the inaugural review of its kind, we hope that this survey will inspire scholars to address these challenges by scrutinizing current research in their respective domains.

PaperID: 1371,

Authors: Jipeng Guo, Yanfeng Sun, Xin Ma, Junbin Gao, Yongli Hu, Youqing Wang, Baocai Yin

Affiliations: College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China; Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China; Discipline of Business Analytics, The University of Sydney Business School, The University of Sydney, Camperdown, NSW, Australia

Title: Globality Meets Locality: An Anchor Graph Collaborative Learning Framework for Fast Multiview Subspace Clustering

Abstract:
Multiview subspace clustering (MSC) maximizes the utilization of complementary description information provided by multiview data and achieves impressive clustering performance. However, most of them are inefficient or even invalid among large-scale scenarios due to expensive computational complexity. Recently, anchor strategy has been developed to address this, which selects a few representative samples as anchor points for representation learning and anchor graph construction. However, most of them only explore single cross-view correlation, i.e., cross-view consistency from the global aspect or cross-view complementarity from the local aspect, which provides insufficient semantic correlation understanding and exploration for complex multiview data. To effectively address this issue, this study proposes a fast multiview subspace clustering (FMSC) with local-global anchor representation collaborative learning. FMSC integrates the discriminative anchor points learning and anchor graph construction with optimal structure into a joint framework. Furthermore, local (view-specific) and global (view-shared) anchor representations are learned collaboratively under two interaction strategies at different levels, providing beneficial guidance from global learning to local learning. Thus, the proposed FMSC can maximize the exploration of the complementarity-consistency among multiview data and capture a more comprehensive semantic correlation. More importantly, an effective algorithm with linear complexity is designed to solve the corresponding optimization problem of FMSC, making it more practical in large-scale clustering tasks. Extensive experimental results confirm the superiority of the proposed FMSC in both clustering performance and computational efficiency.

PaperID: 1372,

Authors: Yoonhyuk Choi, Jiho Choi, Taewook Ko, Chong-Kwon Kim

Affiliations: School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ, USA; Korea Advanced Institute of Science and Technology (KAIST) AI, Seoul, South Korea; Computer Engineering, Seoul National University, Seoul, South Korea; Energy AI, Korea Institute of Energy Technology, Naju, South Korea

Title: Beyond Message-Passing: Generalization of Graph Neural Networks via Feature Perturbation for Semi-Supervised Node Classification

Abstract:
Graph neural networks (GNNs) that collect information from neighbors are commonly utilized in semi-supervised learning contexts. In particular, a significant body of research has been dedicated to developing effective graph filters and aggregation methods to filter the information from adjacent nodes. Despite their efficacy, these approaches may encounter challenges due to the sparsity of training nodes, especially when their features are represented as sparse vectors (e.g., bag-of-words). This condition can lead to the overfitting of certain dimensions within the first projection matrix (hyperplane), as the training samples may not adequately represent the full spectrum of learnable parameters. To solve this limitation, we propose an innovative perturbation technique. Specifically, we introduce additional training variability by modifying both the initial features and the hyperplane, which contributes to the reduction of prediction variance by updating the entire dimensions. To the best of our knowledge, our approach is the first to address the overfitting issue in GNNs precipitated by sparse node features. Comprehensive experiments on real-world datasets and ablation studies affirm that our proposed method significantly enhances node classification performance, with improvements of up to 46.5% in GNN algorithms.

PaperID: 1373,

Authors: Boao Qin, Shou Feng, Chunhui Zhao, Bobo Xi, Wei Li, Ran Tao

Affiliations: College of Information and Communication Engineering and the Key Laboratory of Advanced Marine Communication and Information Technology, Ministry of Industry and Information Technology, Harbin Engineering University, Harbin, China; State Key Laboratory of Integrated Service Networks, School of Telecommunications Engineering, Xidian University, Xi’an, China; School of Information and Electronics, Beijing Institute of Technology, Beijing, China

Title: FDGNet: Frequency Disentanglement and Data Geometry for Domain Generalization in Cross-Scene Hyperspectral Image Classification

Abstract:
Cross-scene hyperspectral image classification (HSIC) poses a significant challenge in recognizing hyperspectral images (HSIs) from different domains. The current mainstream approaches based on domain adaptation (DA) methods need to access target data when aligning distributions between domains, limiting the applicability of the model. In contrast, recent domain generalization (DG) methods aim to directly generalize to unseen domains, eliminating the requirements for target data during training. Nonetheless, most DG-based methods overly focus on randomizing sample styles, leading to semantically compromised samples. In addition, broadening the source distribution without ensuring reasonable support may result in undesired extended distributions. To address these issues, we propose a novel DG network with frequency disentanglement and data geometry (FDGNet) for cross-scene HSIC. Specifically, we first develop a spectral-spatial encoder based on frequency disentanglement (FDSS encoder), which facilitates synthesized domains to preserve their semantic consistency while simulating interdomain gaps with the source domain. Second, to avoid the generation of unrealistic samples, we incorporate data geometry into adversarial training. This helps diversify new domains while keeping the data geometry of extended domains in an explainable support. To improve the learning of domain-invariant representation, we propose an intermediate domain sampling strategy based on the class-wise perceptual manifold. This strategy synthesizes reliable intermediate domains by sampling from class-wise manifold flows estimated over the source and extended domains. Extensive experiments and analysis on three public HSI datasets yield the superiority of our proposed FDGNet. The codes will be available from the website: https://github.com/Qba-heu/FDGNet.

PaperID: 1374,

Authors: Andrea Ceni

Affiliations: Department of Computer Science, University of Pisa, Pisa, Italy

Title: Random Orthogonal Additive Filters: A Solution to the Vanishing/Exploding Gradient of Deep Neural Networks

Abstract:
Since the recognition in the early 1990s of the vanishing/exploding (V/E) gradient issue plaguing the training of neural networks (NNs), significant efforts have been exerted to overcome this obstacle. However, a clear solution to the V/E issue remained elusive so far. The pursuit of approximate dynamical isometry, i.e., parameter configurations where the singular values of the input-output Jacobian (IOJ) are tightly distributed around 1, leads to the derivation of an NN’s architecture that shares common traits with the popular residual network (ResNet) model. Instead of skipping connections between layers, the idea is to filter the previous activations orthogonally and add them to the nonlinear activations of the next layer, realizing a convex combination between them. Remarkably, the impossibility of the gradient updates to either vanish or explode is demonstrated with analytical bounds that hold even in the infinite depth case. The effectiveness of this method is empirically proved by means of training via backpropagation an extremely deep multilayer perceptron (MLP) of 50k layers, and an Elman NN to learn long-term dependencies in the input of 10k time steps in the past. Compared with other architectures specifically devised to deal with the V/E problem, e.g., LSTMs, the proposed model is way simpler yet more effective. Surprisingly, a single-layer vanilla recurrent NN (RNN) can be enhanced to reach state-of-the-art performance, while converging super fast; for instance, on the psMNIST task, it is possible to get test accuracy of over 94% in the first epoch, and over 98% after just ten epochs.

PaperID: 1375,

Authors: Rongqiang Tang, Xinsong Yang, Guanghui Wen, Jianquan Lu

Affiliations: College of Electronics and Information Engineering, Sichuan University, Chengdu, China; Department of Mathematics, Southeast University, Nanjing, China

Title: Finite-Time Synchronization of Fractional-Order Memristive Fuzzy Neural Networks: Event-Based Control With Linear Measurement Error

Abstract:
This article develops a novel event-triggered finite-time control strategy to investigate the finite-time synchronization (F-tS) of fractional-order memristive neural networks with state-based switching fuzzy terms. A key distinction of this approach, compared with existing event-based finite-time control schemes, is the linearity of the measurement error function in the event-triggering mechanism (ETM). The advantage of linear measurement error not only simplifies computational tasks but also aids in demonstrating the exclusion of Zeno behavior for fractional-order systems (FSs). Furthermore, to derive F-tS criteria in the form of linear matrix inequalities (LMIs), a novel finite-time analytical framework for FSs is proposed. This framework includes two original inequalities and a weighted-norm-based Lyapunov function. The effectiveness and superiority of the theoretical results are demonstrated through two examples. Both theoretical and experimental results suggest that the criteria obtained using the new analytical framework are less conservative than existing results.

PaperID: 1376,

Authors: Feng Yan, Chunjie Yang, Xinmin Zhang, Chong Yang, Zhiyong Ruan

Affiliations: State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou, China; Guangxi Liuzhou Iron and Steel Group Company Ltd., Liuzhou, China

Title: BTPNet: A Probabilistic Spatial-Temporal Aware Network for Burn-Through Point Multistep Prediction in Sintering Process

Abstract:
Burn-through point (BTP) is a very key factor in maintaining the normal operation of the sintering process, which guarantees the yield and quality of sinter ore. Due to the characteristics of time-varying and multivariable coupling in the actual sintering process, it is difficult for traditional soft-sensor models to extract spatial-temporal features and reduce multistep prediction error accumulation. To address these issues, in this study, we propose a probabilistic spatial-temporal aware network, called BTPNet, which is used to extract spatial-temporal feature for accurate BTP multistep prediction. The BTPNet model consists of two parts: an encoder network and a decoder network. In the encoder network, the multichannel temporal convolutional network (MTCN) is employed to extract the temporal features. Meanwhile, we also propose a novel architectural unit called variables interaction-aware module (VIAM) to extract the spatial features. In the decoder network, to reduce the accumulated errors of the last step prediction, a probabilistic estimation (PE) method is proposed to improve the performance of multistep prediction. Finally, the experimental results on a real sintering process demonstrate the proposed BTPNet model outperforms state-of-the-art multistep prediction models.

PaperID: 1377,

Authors: Longzhen Yang, Lianghua He, Die Hu, Yihang Liu, Yitao Peng, Hongzhou Chen, MengChu Zhou

Affiliations: Department of Electronic and Information Engineering, Tongji University, Shanghai, China; Department of Communication Science and Engineering, Fudan University, Shanghai, China; School of Information and Electronic Engineering, Zhejiang Gongshang University, Hangzhou, China

Title: Variational Transformer: A Framework Beyond the Tradeoff Between Accuracy and Diversity for Image Captioning

Abstract:
Accuracy and diversity represent two critical quantifiable performance metrics in the generation of natural and semantically accurate captions. While efforts are made to enhance one of them, the other suffers due to the inherent conflicting and complex relationship between them. In this study, we demonstrate that the suboptimal accuracy levels derived from human annotations are unsuitable for machine-generated captions. To boost diversity while maintaining high accuracy, we propose an innovative variational transformer (VaT) framework. By integrating “invisible information prior (IIP)” and “auto-selectable Gaussian mixture model (AGMM),” we enable its encoder to learn precise linguistic information and object relationships in various scenes, thus ensuring high accuracy. By incorporating the “range-median reward (RMR)” baseline into it, we preserve a wider range of candidates with higher rewards during the reinforcement-learning-based training process, thereby guaranteeing outstanding diversity. Experimental results indicate that our method achieves simultaneous improvements in accuracy and diversity by up to 1.1% and 4.8%, respectively, over the state-of-the-art. Furthermore, our approach demonstrates its performance that is the closest to human annotations in semantic retrieval, with its score of 50.3 versus the human score of 50.6. Thus, the method can be readily put into industrial use.

PaperID: 1378,

Authors: Kaiqun Zhu, Zidong Wang, Derui Ding, Hongli Dong, Cheng-Zhong Xu

Affiliations: Department of Control Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China; Department of Computer Science, Brunel University London, Uxbridge, Middlesex, U.K; Artificial Intelligence Energy Research Institute, Northeast Petroleum University, Daqing, China; Department of Computer and Information Science, State Key Laboratory of Internet of Things for Smart City, University of Macau, Macau, China

Title: Secure State Estimation for Artificial Neural Networks With Unknown-But-Bounded Noises: A Homomorphic Encryption Scheme

Abstract:
This article is concerned with the secure state estimation problem for artificial neural networks (ANNs) subject to unknown-but-bounded noises, where sensors and the remote estimator are connected via open and bandwidth-limited communication networks. Using the encoding-decoding mechanism (EDM) and the Paillier encryption technique, a novel homomorphic encryption scheme (HES) is introduced, which aims to ensure the secure transmission of measurement information within communication networks that are constrained by bandwidth. Under this encoding–decoding-based HES, the data being transmitted can be encrypted into ciphertexts comprising finite bits. The emphasis of this research is placed on the development of a secure set-membership state estimation algorithm, which allows for the computation of estimates using encrypted data without the need for decryption, thereby ensuring data security throughout the entire estimation process. Taking into account the unknown-but-bounded noises, the underlying ANN, and the adopted HES, sufficient conditions are determined for the existence of the desired ellipsoidal set. The related secure state estimator gains are then derived by addressing optimization problems using the Lagrange multiplier method. Lastly, an example is presented to verify the effectiveness of the proposed secure state estimation approach.

PaperID: 1379,

Authors: Guanzhong Tian, Yiran Sun, Yuang Liu, Xianfang Zeng, Mengmeng Wang, Yong Liu, Jiangning Zhang, Jun Chen

Affiliations: Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China

Title: Adding Before Pruning: Sparse Filter Fusion for Deep Convolutional Neural Networks via Auxiliary Attention

Abstract:
Filter pruning is a significant feature selection technique to shrink the existing feature fusion schemes (especially on convolution calculation and model size), which helps to develop more efficient feature fusion models while maintaining state-of-the-art performance. In addition, it reduces the storage and computation requirements of deep neural networks (DNNs) and accelerates the inference process dramatically. Existing methods mainly rely on manual constraints such as normalization to select the filters. A typical pipeline comprises two stages: first pruning the original neural network and then fine-tuning the pruned model. However, choosing a manual criterion can be somehow tricky and stochastic. Moreover, directly regularizing and modifying filters in the pipeline suffer from being sensitive to the choice of hyperparameters, thus making the pruning procedure less robust. To address these challenges, we propose to handle the filter pruning issue through one stage: using an attention-based architecture that adaptively fuses the filter selection with filter learning in a unified network. Specifically, we present a pruning method named adding before pruning (ABP) to make the model focus on the filters of higher significance by training instead of man-made criteria such as norm, rank, etc. First, we add an auxiliary attention layer into the original model and set the significance scores in this layer to be binary. Furthermore, to propagate the gradients in the auxiliary attention layer, we design a specific gradient estimator and prove its effectiveness for convergence in the graph flow through mathematical derivation. In the end, to relieve the dependence on the complicated prior knowledge for designing the thresholding criterion, we simultaneously prune and train the filters to automatically eliminate network redundancy with recoverability. Extensive experimental results on the two typical image classification benchmarks, CIFAR-10 and ILSVRC-2012, illustrate that the proposed approach performs favorably against previous state-of-the-art filter pruning algorithms.

PaperID: 1380,

Authors: Long Jin, Jinchuan Zhao, Liangming Chen, Shuai Li

Affiliations: School of Information Science and Engineering, Lanzhou University, Lanzhou, China

Title: Collective Neural Dynamics for Sparse Motion Planning of Redundant Manipulators Without Hessian Matrix Inversion

Abstract:
Redundant manipulators have been widely used in various industries whose applications not only improve production efficiency and reduce manual labor but also promote innovation in robotics and artificial intelligence. Kinematic control plays a fundamental and crucial role in robot control. Over the past few decades, numerous motion control schemes have been proposed and applied to trajectory tracking tasks. However, most of these schemes do not consider the introduction of sparsity into the motion control of redundant manipulators, resulting in excessive joint movements, which not only consume extra energy but also increase the risk of unexpected collisions in complex environments. To solve this problem, we transform the issue of increasing the sparsity into a nonconvex optimization problem. Furthermore, a collective neural dynamics for sparse motion planning (CNDSMP) scheme for motion planning of redundant manipulators is proposed. By incorporating sparsity into the control scheme, the excessive joint movements are minimized, leading to improved efficiency and reduced collision risks. Through simulations, comparisons, and physical experiments, the effectiveness and superiority of the proposed scheme are demonstrated.

PaperID: 1381,

Authors: Wei Ai, Yuntao Shou, Tao Meng, Keqin Li

Affiliations: School of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, Hunan, China; Department of Computer Science, State University of New York, New Paltz, NY, USA

Title: DER-GCN: Dialog and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Dialog Emotion Recognition

Abstract:
With the continuous development of deep learning (DL), the task of multimodal dialog emotion recognition (MDER) has recently received extensive research attention, which is also an essential branch of DL. The MDER aims to identify the emotional information contained in different modalities, e.g., text, video, and audio, and in different dialog scenes. However, the existing research has focused on modeling contextual semantic information and dialog relations between speakers while ignoring the impact of event relations on emotion. To tackle the above issues, we propose a novel dialog and event relation-aware graph convolutional neural network (DER-GCN) for multimodal emotion recognition method. It models dialog relations between speakers and captures latent event relations information. Specifically, we construct a weighted multirelationship graph to simultaneously capture the dependencies between speakers and event relations in a dialog. Moreover, we also introduce a self-supervised masked graph autoencoder (SMGAE) to improve the fusion representation ability of features and structures. Next, we design a new multiple information Transformer (MIT) to capture the correlation between different relations, which can provide a better fuse of the multivariate information between relations. Finally, we propose a loss optimization strategy based on contrastive learning to enhance the representation learning ability of minority class features. We conduct extensive experiments on the benchmark datasets, Interactive Emotional Dyadic Motion Capture (IEMOCAP) and Multimodal EmotionLines Dataset (MELD), which verify the effectiveness of the DER-GCN model. The results demonstrate that our model significantly improves both the average accuracy and the F1 value of emotion recognition. Our code is publicly available at https://github.com/yuntaoshou/DER-GCN.

PaperID: 1382,

Authors: Yuxuan Shen, Zidong Wang, Hongli Dong, Hongjian Liu, Yun Chen

Affiliations: National Key Laboratory of Continental Shale Oil of China, Northeast Petroleum University, Daqing, China; Department of Computer Science, Brunel University London, Uxbridge, U.K; Key Laboratory of Advanced Perception and Intelligent Control of High-End Equipment, Ministry of Education, Anhui Polytechnic University, Wuhu, China; School of Automation, Hangzhou Dianzi University, Hangzhou, China

Title: Set-Membership State Estimation for Multirate Nonlinear Complex Networks Under FlexRay Protocols: A Neural-Network-Based Approach

Abstract:
In this article, the set-membership state estimation problem is investigated for a class of nonlinear complex networks under the FlexRay protocols (FRPs). In order to address practical engineering requirements, the multirate sampling is taken into account which allows for different sampling periods of the system state and the measurement. On the other hand, the FRP is deployed in the communication network from sensors to estimators in order to alleviate the communication burden. The underlying nonlinearity studied in this article is of a general nature, and an approach based on neural networks is employed to handle the nonlinearity. By utilizing the convex optimization technique, sufficient conditions are established in order to restrain the estimation errors within certain ellipsoidal constraints. Then, the estimator gains and the tuning scalars of the neural network are derived by solving several optimization problems. Finally, a practical simulation is conducted to verify the validity of the developed set-membership estimation scheme.

PaperID: 1383,

Authors: Lisha Yao, Yingda Xia, Zhihong Chen, Suyun Li, Jiawen Yao, Dakai Jin, Yanting Liang, Jiatai Lin, Bingchao Zhao, Chu Han, Le Lu, Ling Zhang, Zaiyi Liu, Xin Chen

Affiliations: School of Medicine, South China University of Technology, Guangzhou, China; DAMO Academy, Alibaba Group, New York, NY, USA; Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China; Department of Radiology, Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China; AMO Academy, Alibaba Group, Hangzhou, China

Title: A Colorectal Coordinate-Driven Method for Colorectum and Colorectal Cancer Segmentation in Conventional CT Scans

Abstract:
Automated colorectal cancer (CRC) segmentation in medical imaging is the key to achieving automation of CRC detection, staging, and treatment response monitoring. Compared with magnetic resonance imaging (MRI) and computed tomography colonography (CTC), conventional computed tomography (CT) has enormous potential because of its broad implementation, superiority for the hollow viscera (colon), and convenience without needing bowel preparation. However, the segmentation of CRC in conventional CT is more challenging due to the difficulties presenting with the unprepared bowel, such as distinguishing the colorectum from other structures with similar appearance and distinguishing the CRC from the contents of the colorectum. To tackle these challenges, we introduce DeepCRC-SL, the first automated segmentation algorithm for CRC and colorectum in conventional contrast-enhanced CT scans. We propose a topology-aware deep learning-based approach, which builds a novel 1-D colorectal coordinate system and encodes each voxel of the colorectum with a relative position along the coordinate system. We then induce an auxiliary regression task to predict the colorectal coordinate value of each voxel, aiming to integrate global topology into the segmentation network and thus improve the colorectum’s continuity. Self-attention layers are utilized to capture global contexts for the coordinate regression task and enhance the ability to differentiate CRC and colorectum tissues. Moreover, a coordinate-driven self-learning (SL) strategy is introduced to leverage a large amount of unlabeled data to improve segmentation performance. We validate the proposed approach on a dataset including 227 labeled and 585 unlabeled CRC cases by fivefold cross-validation. Experimental results demonstrate that our method outperforms some recent related segmentation methods and achieves the segmentation accuracy in DSC for CRC of 0.669 and colorectum of 0.892, reaching to the performance (at 0.639 and 0.890, respectively) of a medical resident with two years of specialized CRC imaging fellowship.

PaperID: 1384,

Authors: Shenglan Liu, Yu-Ning Ding, Jinrong Zhang, Kai-Yuan Liu, Si-Fan Zhang, Fei-Long Wang, Gao Huang

Affiliations: School of Computer Science and Technology, Dalian University of Technology, Dalian, China; Department of Automation, Tsinghua University, Beijing, China

Title: Multidimensional Refinement Graph Convolutional Network With Robust Decouple Loss for Fine-Grained Skeleton-Based Action Recognition

Abstract:
Graph convolutional networks (GCNs) have been widely used in skeleton-based action recognition. However, existing approaches are limited in fine-grained action recognition due to the similarity of interclass data. Moreover, the noisy data from pose extraction increase the challenge of fine-grained recognition. In this work, we propose a flexible attention block called channel-variable spatial–temporal attention (CVSTA) to enhance the discriminative power of spatial–temporal joints and obtain a more compact intraclass feature distribution. Based on CVSTA, we construct a multidimensional refinement GCN (MDR-GCN) that can improve the discrimination among channel-, joint-, and frame-level features for fine-grained actions. Furthermore, we propose a robust decouple loss (RDL) that significantly boosts the effect of the CVSTA and reduces the impact of noise. The proposed method combining MDR-GCN with RDL outperforms the known state-of-the-art skeleton-based approaches on fine-grained datasets, FineGym99 and FSD-10, and also on the coarse NTU-RGB + D 120 dataset and NTU-RGB + D X-view version. Our code is publicly available at https://github.com/dingyn-Reno/MDR-GCN.

PaperID: 1385,

Authors: Mengxin Wang, Yuhu Wu, Sitian Qin

Affiliations: Department of Mathematics, Harbin Institute of Technology, Weihai, China; Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education and the School of Control Science and Engineering, Dalian University of Technology, Dalian, China

Title: Generalized Nash Equilibrium Seeking for Noncooperative Game With Different Monotonicities by Adaptive Neurodynamic Algorithm

Abstract:
This article proposes a novel adaptive neurodynamic algorithm (ANA) to seek generalized Nash equilibrium (GNE) of the noncooperative constrained game with different monotone conditions. In the ANA, the adaptive penalty term, which acts as trajectory-dependent penalty parameters, evolves based on the degree of constraints violation until the trajectory enters the action set of noncooperative game. It is shown that the trajectory of the ANA enters the action set in finite time benefited from the adaptive penalty term. Moreover, it is proven that the trajectory exponentially (or polynomially) converges to the unique GNE when the pseudo-gradient of cost function in noncooperative game satisfies strong (or “generalized” strong) monotonicity. To the best of our knowledge, this is the first time to study the polynomial convergence of GNE seeking algorithm. Furthermore, when the pseudo-gradient mentioned above satisfies monotonicity in general, based on Tikhonov regularization method, a new ANA for finding its \varepsilon -generalized Nash equilibrium ( \varepsilon -GNE) is proposed, and the related exponential convergence of the algorithm is established. Finally, the river basin pollution game and 5G base station location game are given as examples to showcase the algorithm’s effectiveness.

PaperID: 1386,

Authors: Xinhui Zhu, Jianwei Fan, Shuang Pan, Yanling Li, Jian Li, Mingliang Xu

Affiliations: School of Computer and Information Technology, Xinyang Normal University, Xinyang, China; School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China

Title: MNTZNN for Solving Hybrid Double-Deck Dynamic Nonlinear Equation System Applied to Robot Manipulator Control

Abstract:
Compared with conventional dynamic nonlinear equation systems, a hybrid double-deck dynamic nonlinear equation system (H3DNES) not only has multiple layers describing more different tasks in practice, but also has a hybrid nonlinear structure of solution and its derivative describing their nonlinear constraints. Its characteristics lead to the ability to describe more complicated problems involving multiple constraints, and strong nonlinear and dynamic features, such as robot manipulator tracking control. Besides, noises are inevitable in practice and thus strong robustness of models solving H3DNES is also necessary. In this work, a multilayered noise-tolerant zeroing neural network (MNTZNN) model is proposed for solving H3DNES. MNTZNN model has strong robustness and it solves H3DNES successfully even when noises exist in both the two layers of H3DNES. In order to develop the MNTZNN model, a new zeroing neural network (ZNN) design formula is proposed. It not only enables equations with respect to solutions to become equations with respect to the second-order derivatives of solutions but also makes the corresponding model have strong robustness. The robustness of the MNTZNN model is proved when parameters in the model satisfy a loose constraint and the error bounds are programmable via setting appropriate parameter values. Finally, the MNTZNN model is applied to the tracking control of the six-link planar robot manipulator and PUMA560 robot manipulator with hybrid nonlinear constraints of joint angle and velocity.

PaperID: 1387,

Authors: Qiong Wu, Jiaer Xia, Pingyang Dai, Yiyi Zhou, Yongjian Wu, Rongrong Ji

Affiliations: Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen, China; Youtu Laboratory, Tencent Company Ltd., Shanghai, China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, and Fujian Engineering Research Center of Trusted Artificial Intelligence Analysis and Application, Institute of Artificial Intelligence, Xiamen University, Xiamen, China

Title: CycleTrans: Learning Neutral Yet Discriminative Features via Cycle Construction for Visible- Infrared Person Re-Identification

Abstract:
Visible-infrared person re-identification (VI-ReID) is the task of matching the same individuals across the visible and infrared modalities. Its main challenge lies in the modality gap caused by the cameras operating on different spectra. Existing VI-ReID methods mainly focus on learning general features across modalities, often at the expense of feature discriminability. To address this issue, we present a novel cycle-construction-based network for neutral yet discriminative feature learning, termed CycleTrans. Specifically, CycleTrans uses a lightweight knowledge capturing module (KCM) to capture rich semantics from the modality-relevant feature maps according to pseudo anchors. Afterward, a discrepancy modeling module (DMM) is deployed to transform these features into neutral ones according to the modality-irrelevant prototypes. To ensure feature discriminability, another two KCMs are further deployed for feature cycle constructions. With cycle construction, our method can learn effective neutral features for visible and infrared images while preserving their salient semantics. Extensive experiments on SYSU-MM01 and RegDB datasets validate the merits of CycleTrans against a flurry of state-of-the-art (SOTA) methods, +1.88% on rank-1 in SYSU-MM01 and +1.1% on rank-1 in RegDB. Our code is available at https://github.com/DoubtedSteam/CycleTrans.

PaperID: 1388,

Authors: Mingxing Cai, Yuan Yuan, Biao Luo, Fanbiao Li, Xiaodong Xu, Chunhua Yang, Weihua Gui

Affiliations: School of Automation, Central South University, Changsha, China; School of Electrical and Information Engineering, Changsha University of Science and Technology, Changsha, China

Title: Adaptive Neural Consensus Observer Networks Design for a Class of Semilinear Parabolic PDE Systems

Abstract:
This article concerns the investigation on the consensus problem for the joint state-uncertainty estimation of a class of parabolic partial differential equation (PDE) systems with parametric and nonparametric uncertainties. We propose a two-layer network consisting of informed and uninformed boundary observers where novel adaptation laws are developed for the identification of uncertainties. Particularly, all observer agents in the network transmit their information with each other across the entire network. The proposed adaptation laws include a penalty term of the mismatch between the parameter estimates generated by the other observer agents. Moreover, for the nonparametric uncertainties, radial basis function (RBF) neural networks are employed for the universal approximation of unknown nonlinear functions. Given the persistently exciting condition, it is shown that the proposed network of adaptive observers can achieve exponential joint state-uncertainty estimation in the presence of parametric uncertainties and ultimate bounded estimation in the presence of nonparametric uncertainties based on the Lyapunov stability theory. The effects of the proposed consensus method are demonstrated through a typical reaction–diffusion system example, which implies convincing numerical findings.

PaperID: 1389,

Authors: Li Dong, Feibo Jiang, Minjie Wang, Yubo Peng, Xiaolong Li

Affiliations: School of Computer Science, Hunan University of Technology and Business, Changsha, China; Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha, China; College of Information Science and Engineering, Hunan Normal University, Changsha, China

Title: Deep Progressive Reinforcement Learning-Based Flexible Resource Scheduling Framework for IRS and UAV-Assisted MEC System

Abstract:
The intelligent reflecting surface (IRS) and unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) system is widely used in temporary and emergency scenarios. Our goal is to minimize the energy consumption of the MEC system by jointly optimizing UAV locations, IRS phase shift, task offloading, and resource allocation with a variable number of UAVs. To this end, we propose a flexible resource scheduling (FRES) framework by employing a novel deep progressive reinforcement learning that includes the following innovations. First, a novel multitask agent is presented to deal with the mixed integer nonlinear programming (MINLP) problem. The multitask agent has two output heads designed for different tasks, in which a classified head is employed to make offloading decisions with integer variables while a fitting head is applied to solve resource allocation with continuous variables. Second, a progressive scheduler is introduced to adapt the agent to the varying number of UAVs by progressively adjusting a part of neurons in the agent. This structure can naturally accumulate experiences and be immune to catastrophic forgetting. Finally, a light taboo search (LTS) is introduced to enhance the global search of the FRES. The numerical results demonstrate the superiority of the FRES framework, which can make real-time and optimal resource scheduling even in dynamic MEC systems.

PaperID: 1390,

Authors: Adeyemi D. Adeoye, Alberto Bemporad

Affiliations: IMT School for Advanced Studies Lucca, Lucca, Italy

Title: An Inexact Sequential Quadratic Programming Method for Learning and Control of Recurrent Neural Networks

Abstract:
This article considers the two-stage approach to solving a partially observable Markov decision process (POMDP): the identification stage and the (optimal) control stage. We present an inexact sequential quadratic programming framework for recurrent neural network learning (iSQPRL) for solving the identification stage of the POMDP, in which the true system is approximated by a recurrent neural network (RNN) with dynamically consistent overshooting (DCRNN). We formulate the learning problem as a constrained optimization problem and study the quadratic programming (QP) subproblem with a convergence analysis under a restarted Krylov-subspace iterative scheme that implicitly exploits the structure of the associated Karush–Kuhn–Tucker (KKT) subsystem. In the control stage, where a feedforward neural network (FNN) controller is designed on top of the RNN model, we adapt a generalized Gauss–Newton (GGN) algorithm that exploits useful approximations to the curvature terms of the training data and selects its mini-batch step size using a known property of some regularization function. Simulation results are provided to demonstrate the effectiveness of our approach.

PaperID: 1391,

Authors: Cheng-Hong Yang, Sin-Hua Moi, Li-Yeh Chuang, Yu-Da Lin

Affiliations: Department of Information Management, Tainan University of Technology, Tainan, Taiwan; Graduate Institute of Clinical Medicine, College of Medicine, and the Research Center for Precision Environmental Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan; Department of Chemical Engineering, Institute of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung, Taiwan; Department of Computer Science and Information Engineering, National Penghu University of Science and Technology, Mogong City, Penghu, Taiwan

Title: An Information Fusion System-Driven Deep Neural Networks With Application to Cancer Mortality Risk Estimate

Abstract:
Next-generation sequencing (NGS) genomic data offer valuable high-throughput genomic information for computational applications in medicine. Using genomic data to identify disease-associated genes to estimate cancer mortality risk remains challenging regarding to computational efficiency and risk integration. For determining mortality-related genes, we propose an information fusion system based on a fuzzy system to fuse the numerous deep-learning-based risk scores, consider the significance of features related to time-varying effects and risk stratifications, and interpret the directional relationship and interaction between outcome and predictors. Fuzzy rules were implemented to integrate the considerations mentioned above by merging all the risk score models to achieve advanced risk estimation. The genomic data of head and neck squamous cell carcinoma (HNSCC) were used to evaluate the performance of the proposed computational approach. The results indicated that the proposed computational approach exhibited optimal ability to identify mortality risk-related genes in HNSCC patients. The results also suggest that HNSCC mortality is associated with cancer inflammatory response, the interleukin-17A signaling pathway, stellate cell activation, and the extracellular-regulated protein kinase five signaling pathway, which might offer new therapeutic targets HNSCC through immunologic or antiangiogenic mechanisms. The proposed information fusion system can promote the determination of high-risk genes related to cancer mortality. This study contributes a valid cancer mortality risk estimate that can identify mortality-related genes.

PaperID: 1392,

Authors: Zhiwei Li, Cheng Wang

Affiliations: Department of Computer Science and Technology, Tongji University, Shanghai, China

Title: Achieving Sharp Upper Bounds on the Expressive Power of Neural Networks via Tropical Polynomials

Abstract:
The expressive power of neural networks describes the ability to represent or approximate complex functions. The number of linear regions is the standard and most natural measure of expressive power. However, a major challenge in utilizing the number of linear regions as a measure of expressive power is the exponential gap between the theoretical upper and lower bounds, which becomes more pronounced as the neural network capacity increases. In this article, we aim to derive a sharp upper bound on piecewise linear neural networks (PLNNs) to bridge this gap. Specifically, we first establish the relationship between tropical polynomials and PLNNs. In the unexpanded tropical polynomials form, we make the proposition that hyperplanes are not all in the general positions, thereby reducing the number of intersecting hyperplanes. We propose a rank-based approach and present the empirical analysis that this approach outperforms previous Zaslavsky’s theorem-based methods. In the expanded tropical polynomials form, accounting for limitations in weight initialization and model computational precision, we raise the concept that the values range of each term is bounded. We propose a precision-based approach that transforms the approximate exponential growth of the number of linear regions into polynomial growth with width, which is effective at larger layer widths. Finally, we compare the number of linear regions that can be represented by each hidden layer in both forms and derive a sharp upper bound for PLNNs. Empirical analysis and experimental results provide compelling evidence for the efficacy and feasibility of this sharp upper bound on both simulated experiments and real datasets.

PaperID: 1393,

Authors: Junwei Duan, Shiyi Yao, Jiantao Tan, Yang Liu, Long Chen, Zhen Zhang, C. L. Philip Chen

Affiliations: College of Information Science and Technology, Jinan University, Guangzhou, China; Jinan University–University of Birmingham Joint Institute, Jinan University, Guangzhou, China; Department of Computer and Information Science, University of Macau, Macau, China; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

Title: Extreme Fuzzy Broad Learning System: Algorithm, Frequency Principle, and Applications in Classification and Regression

Abstract:
As an effective alternative to deep neural networks, broad learning system (BLS) has attracted more attention due to its efficient and outstanding performance and shorter training process in classification and regression tasks. Nevertheless, the performance of BLS will not continue to increase, but even decrease, as the number of nodes reaches the saturation point and continues to increase. In addition, the previous research on neural networks usually ignored the reason for the good generalization of neural networks. To solve these problems, this article first proposes the Extreme Fuzzy BLS (E-FBLS), a novel cascaded fuzzy BLS, in which multiple fuzzy BLS blocks are grouped or cascaded together. Moreover, the original data is input to each FBLS block rather than the previous blocks. In addition, we use residual learning to illustrate the effectiveness of E-FBLS. From the frequency domain perspective, we also discover the existence of the frequency principle in E-FBLS, which can provide good interpretability for the generalization of the neural network. Experimental results on classical classification and regression datasets show that the accuracy of the proposed E-FBLS is superior to traditional BLS in handling classification and regression tasks. The accuracy improves when the number of blocks increases to some extent. Moreover, we verify the frequency principle of E-FBLS that E-FBLS can obtain the low-frequency components quickly, while the high-frequency components are gradually adjusted as the number of FBLS blocks increases.

PaperID: 1394,

Authors: Qiang Guo, Lexin Fang, Ren Wang, Caiming Zhang

Affiliations: School of Computer Science and Technology and the Shandong Provincial Key Laboratory of Digital Media Technology, Shandong University of Finance and Economics, Jinan, China; School of Software, Shandong University, Jinan, China

Title: Multivariate Time Series Forecasting Using Multiscale Recurrent Networks With Scale Attention and Cross-Scale Guidance

Abstract:
Multivariate time series (MTS) forecasting is considered as a challenging task due to complex and nonlinear interdependencies between time steps and series. With the advance of deep learning, significant efforts have been made to model long-term and short-term temporal patterns hidden in historical information by recurrent neural networks (RNNs) with a temporal attention mechanism. Although various forecasting models have been developed, most of them are single-scale oriented, resulting in scale information loss. In this article, we seamlessly integrate multiscale analysis into deep learning frameworks to build scale-aware recurrent networks and propose two multiscale recurrent network (MRN) models for MTS forecasting. The first model called MRN-SA adopts a scale attention mechanism to dynamically select the most relevant information from different scales and simultaneously employs input attention and temporal attention to make predictions. The second one named as MRN-CSG introduces a novel cross-scale guidance mechanism to exploit the information from coarse scale to guide the decoding process at fine scale, which results in a lightweight and more easily trained model without obvious loss of accuracy. Extensive experimental results demonstrate that both MRN-SA and MRN-CSG can achieve state-of-the-art performance on five typical MTS datasets in different domains. The source codes will be publicly available at https://github.com/qguo2010/MRN.

PaperID: 1395,

Authors: Meng Xi, Jiachen Yang, Jiabao Wen, Zhengjian Li, Wen Lu, Xinbo Gao

Affiliations: School of Electrical and Information Engineering, Tianjin University, Tianjin, China; School of Electronic Engineering, Xidian University, Xi’an, China

Title: An Information-Assisted Deep Reinforcement Learning Path Planning Scheme for Dynamic and Unknown Underwater Environment

Abstract:
An autonomous underwater vehicle (AUV) has shown impressive potential and promising exploitation prospects in numerous marine missions. Among its various applications, the most essential prerequisite is path planning. Although considerable endeavors have been made, there are several limitations. A complete and realistic ocean simulation environment is critically needed. As most of the existing methods are based on mathematical models, they suffer from a large gap with reality. At the same time, the dynamic and unknown environment places high demands on robustness and generalization. In order to overcome these limitations, we propose an information-assisted reinforcement learning path planning scheme. First, it performs numerical modeling based on real ocean current observations to establish a complete simulation environment with the grid method, including 3-D terrain, dynamic currents, local information, and so on. Next, we propose an information compression (IC) scheme to trim the mutual information (MI) between reinforcement learning neural network layers to improve generalization. A proof based on information theory provides solid support for this. Moreover, for the dynamic characteristics of the marine environment, we elaborately design a confidence evaluator (CE), which evaluates the correlation between two adjacent frames of ocean currents to provide confidence for the action. The performance of our method has been evaluated and proven by numerical results, which demonstrate a fair sensitivity to ocean currents and high robustness and generalization to cope with the dynamic and unknown underwater environment.

PaperID: 1396,

Authors: Jie Li, Ryozo Nagamune, Yuhang Zhang, Shengbo Eben Li

Affiliations: School of Vehicle and Mobility, Tsinghua University, Beijing, China; Department of Mechanical Engineering, The University of British Columbia, Vancouver, BC, Canada

Title: Robust Approximate Dynamic Programming for Nonlinear Systems With Both Model Error and External Disturbance

Abstract:
Model error and external disturbance have been separately addressed by optimizing the definite H_\infty performance in standard linear H_\infty control problems. However, the concurrent handling of both introduces uncertainty and nonconvexity into the H_\infty performance, posing a huge challenge for solving nonlinear problems. This article introduces an additional cost function in the augmented Hamilton–Jacobi–Isaacs (HJI) equation of zero-sum games to simultaneously manage the model error and external disturbance in nonlinear robust performance problems. For satisfying the Hamilton–Jacobi inequality in nonlinear robust control theory under all considered model errors, the relationship between the additional cost function and model uncertainty is revealed. A critic online learning algorithm, applying Lyapunov stabilizing terms and historical states to reinforce training stability and achieve persistent learning, is proposed to approximate the solution of the augmented HJI equation. By constructing a joint Lyapunov candidate about the critic weight and system state, both stability and convergence are proved by the second method of Lyapunov. Theoretical results also show that introducing historical data reduces the ultimate bounds of system state and critic error. Three numerical examples are conducted to demonstrate the effectiveness of the proposed method.

PaperID: 1397,

Authors: Shijie Song, Dawei Gong, Minglei Zhu, Yuyang Zhao, Cong Huang

Affiliations: School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, China

Title: Data-Driven Optimal Tracking Control for Discrete-Time Nonlinear Systems With Unknown Dynamics Using Deterministic ADP

Abstract:
This article aims to solve the optimal tracking problem (OTP) for a class of discrete-time (DT) nonlinear systems with completely unknown dynamics. A novel data-driven deterministic approximate dynamic programming (ADP) algorithm is proposed to solve this kind of problem with only input–output (I/O) data. The proposed algorithm has two advantages compared to existing data-driven deterministic ADP algorithms for the OTP. First, our algorithm can guarantee optimality while achieving better performance in the aspects of time-saving and robustness to data. Second, the near-optimal control policy learned by our algorithm can be implemented without considering expected control and enable the system states to track the user-specified reference signals. Therefore, the tracking performance is guaranteed while simplifying the algorithm implementation. Furthermore, the convergence and stability of the proposed algorithm are strictly proved through theoretical analysis, in which the errors caused by neural networks (NNs) are considered. At the end of this article, the developed algorithm is compared with two representative deterministic ADP algorithms through a numerical example and applied to solve the tracking problem for a two-link robotic manipulator. The simulation results demonstrate the effectiveness and advantages of the developed algorithm.

PaperID: 1398,

Authors: Wenbin Qian, Yanqiang Tu, Jintao Huang, Wenhao Shu, Yiu-Ming Cheung

Affiliations: School of Software, Jiangxi Agricultural University, Nanchang, China; School of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China; Department of Computer Science, Hong Kong Baptist University, Hong Kong, SAR, China; School of Information Engineering, East China Jiaotong University, Nanchang, China

Title: Partial Multilabel Learning Using Noise-Tolerant Broad Learning System With Label Enhancement and Dimensionality Reduction

Abstract:
Partial multilabel learning (PML) addresses the issue of noisy supervision, which contains an overcomplete set of candidate labels for each instance with only a valid subset of training data. Using label enhancement techniques, researchers have computed the probability of a label being ground truth. However, enhancing labels in the noisy label space makes it impossible for the existing partial multilabel label enhancement methods to achieve satisfactory results. Besides, few methods simultaneously involve the ambiguity problem, the feature space’s redundancy, and the model’s efficiency in PML. To address these issues, this article presents a novel joint partial multilabel framework using broad learning systems (namely BLS-PML) with three innovative mechanisms: 1) a trustworthy label space is reconstructed through a novel label enhancement method to avoid the bias caused by noisy labels; 2) a low-dimensional feature space is obtained by a confidence-based dimensionality reduction method to reduce the effect of redundancy in the feature space; and 3) a noise-tolerant BLS is proposed by adding a dimensionality reduction layer and a trustworthy label layer to deal with PML problem. We evaluated it on six real-world and seven synthetic datasets, using eight state-of-the-art partial multilabel algorithms as baselines and six evaluation metrics. Out of 144 experimental scenarios, our method significantly outperforms the baselines by about 80%, demonstrating its robustness and effectiveness in handling partial multilabel tasks.

PaperID: 1399,

Authors: Zhongyi Zhao, Zidong Wang, Lei Zou, Hongjian Liu, Weiguo Sheng

Affiliations: School of Mathematics, Southeast University, Nanjing, China; Department of Computer Science, Brunel University London, Uxbridge, U.K; College of Information Science and Technology, Donghua University, Shanghai, China; Key Laboratory of Advanced Perception and Intelligent Control of High-end Equipment, Ministry of Education, Anhui Polytechnic University, Wuhu, China; School of Information Science and Engineering, Hangzhou Normal University, Hangzhou, China

Title: Zonotope-Based Distributed Set-Membership Fusion Estimation for Artificial Neural Networks Under the Dynamic Event-Triggered Mechanism

Abstract:
This article is concerned with the distributed set-membership fusion estimation problem for a class of artificial neural networks (ANNs), where the dynamic event-triggered mechanism (ETM) is utilized to schedule the signal transmission from sensors to local estimators to save resource consumption and avoid data congestion. The main purpose of this article is to design a distributed set-membership fusion estimation algorithm that ensures the global estimation error resides in a zonotope at each time instant and, meanwhile, the radius of the zonotope is ultimately bounded. By means of the zonotope properties and the linear matrix inequality (LMI) technique, the zonotope restraining the prediction error is first calculated to improve the prediction accuracy and subsequently, the zonotope enclosing the local estimation error is derived to enhance the estimation performance. By taking into account the side-effect of the order reduction technique (utilized in designing the local estimation algorithm) of the zonotope, a sufficient condition is derived to guarantee the ultimate boundedness of the radius of the zonotope that encompasses the local estimation error. Furthermore, parameters of the local estimators are obtained via solutions to certain bilinear matrix inequalities. Moreover, the zonotope-based distributed fusion estimator is obtained through minimizing certain upper bound of the radius of the zonotope (that contains the global estimation error) according to the matrix-weighted fusion rule. Finally, the effectiveness of the proposed distributed fusion estimation method is illustrated via a numerical example.

PaperID: 1400,

Authors: Xinsong Yang, Xingxing Ju, Peng Shi, Guanghui Wen

Affiliations: College of Electronics and Information Engineering, Sichuan University, Chengdu, China; School of Electrical and Electronic Engineering, The University of Adelaide, Adelaide, SA, Australia; School of Mathematics, Southeast University, Nanjing, China

Title: Two Novel Noise-Suppression Projection Neural Networks With Fixed-Time Convergence for Variational Inequalities and Applications

Abstract:
This article proposes two novel projection neural networks (PNNs) with fixed-time ( \mathbf FIX_t ) convergence to deal with variational inequality problems (VIPs). The remarkable features of the proposed PNNs are \mathbf FIX_t convergence and more accurate upper bounds for arbitrary initial conditions. The robustness of the proposed PNNs under bounded noises is further studied. In addition, the proposed PNNs are applied to deal with absolute value equations (AVEs), noncooperative games, and sparse signal reconstruction problems (SSRPs). The upper bounds of the settling time for the proposed PNNs are tighter than the bounds in the existing neural networks. The effectiveness and advantages of the proposed PNNs are confirmed by numerical examples.

PaperID: 1401,

Authors: Shuyin Xia, Cheng Wang, Guoyin Wang, Xinbo Gao, Weiping Ding, JianHang Yu, Yujia Zhai, Zizhong Chen

Affiliations: Chongqing Key Laboratory of Computational Intelligence, the Key Laboratory of Cyberspace Big Data Intelligent Security, Ministry of Education, and the Key Laboratory of Big Data Intelligent Computing, Chongqing University of Posts and Telecommunications, Chongqing, China; School of Information Science and Technology, Nantong University, Nantong, China; Department of Computer Science and Engineering, University of California at Riverside, Riverside, CA, USA

Title: GBRS: A Unified Granular-Ball Learning Model of Pawlak Rough Set and Neighborhood Rough Set

Abstract:
Pawlak rough set (PRS) and neighborhood rough set (NRS) are the two most common rough set theoretical models. Although the PRS can use equivalence classes to represent knowledge, it is unable to process continuous data. On the other hand, NRSs, which can process continuous data, rather lose the ability of using equivalence classes to represent knowledge. To remedy this deficit, this article presents a granular-ball rough set (GBRS) based on the granular-ball computing combining the robustness and the adaptability of the granular-ball computing. The GBRS can simultaneously represent both the PRS and the NRS, enabling it not only to be able to deal with continuous data and to use equivalence classes for knowledge representation as well. In addition, we propose an implementation algorithm of the GBRS by introducing the positive region of GBRS into the PRS framework. The experimental results on benchmark datasets demonstrate that the learning accuracy of the GBRS has been significantly improved compared with the PRS and the traditional NRS. The GBRS also outperforms nine popular or the state-of-the-art feature selection methods. We have open-sourced all the source codes of this article at https://www.cquptshuyinxia.com/GBRS.html, https://github.com/syxiaa/GBRS.

PaperID: 1402,

Authors: Shou Feng, Tianyu Lan, Yuanze Fan, Mengmeng Zhang, Chunhui Zhao, Wei Li, Ran Tao

Affiliations: College of Information and Communication Engineering, Harbin Engineering University, Harbin, China; School of Information and Electronics, Beijing Institute of Technology, Beijing, China

Title: An Adaptive Weighted Metric Learning Network Based on Fractional Domain Decoupling for Hyperspectral Change Detection

Abstract:
Hyperspectral image change detection (HSI-CD) possesses strong capabilities in exploring subtle changes in land cover. Due to sensor noise and imaging conditions, different semantic land covers in the same spatial location may exhibit similar spectral characteristics, leading to pseudoinvariant phenomena (identification of changed areas as unchanged areas) and causing a higher rate of false negatives in the model. Existing methods primarily focus on obtaining auxiliary discriminative information from spatial correlations or temporal dependencies. However, the frequency domain, which possesses rich global gradient distribution information, is often overlooked. The fractional Fourier transform (FrFT) is an extension of the Fourier transform (FT), representing a temporal-frequency local transformation suitable for processing nonstationary signals. Furthermore, multiorder fractional Fourier domains provide more observable domains for change discrimination. In this work, the application of FrFT is extended to the field of HSI-CD, and an adaptive weighted metric learning network based on fractional domain decoupling (FrFTML) is proposed. Specifically, the fractional domain decoupling (FrDD) module transforms the original HSI into multiorder FrFT domains and extracts their rich spatial-frequency mixed information, effectively suppressing noise while enhancing the representation of subtle differences. In addition, an adaptive weighted metric learning (AWML) framework is designed to merge multiorder fractional Fourier domain information in an adaptively weighted fusion manner. It introduces deep metric learning to explore the distances between samples of different categories that have relatively high similarity, so as to guide the direction of adaptive weighted fusion. Finally, the differential mask attention (DMA) module is designed to explore global contextual differences between bitemporal HSIs, obtaining change features with well-represented differences. Some experiments conducted on three public datasets indicate that FrFTML outperforms other state-of-the-art methods. Furthermore, the proposed method exhibits superiority in dealing with land cover that may lead to pseudoinvariant phenomena (identification of changed areas as unchanged areas).

PaperID: 1403,

Authors: Zunxun Wang, Junqing Li, Xiaolong Chen, Peiyong Duan, Jiake Li

Affiliations: Yunnan Key Laboratory of Modern Analytical Mathematics and Applications, Yunnan Normal University, Kunming, China; Faculty of Electronics, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China

Title: Uncertain Interruptibility Multiobjective Flexible Job Shop via Deep Reinforcement Learning Based on Heterogeneous Graph Self-Attention

Abstract:
Although an increasing number of studies have focused on the flexible job shop problem, there has been insufficient consideration of realistic constraints, such as the working hours of employees and the noninterruptible nature of certain operations. To address this issue, here an improved deep reinforcement learning (DRL) approach is presented that utilizes end-to-end multidecision-intelligent body proximal policy optimization (m-PPO). In the proposed framework, a heterogeneous graph self-attention neural network (HGAN) model is embedded, which efficiently extracts valuable features from the original state in heterogeneous graphs to capture intricate relationships. Within this framework, agents are divided into five rule-driven job decision agents and data-driven operation-machine ( \mathcal O\text -\mathcal M ) pair decision agents, which incorporate problem-specific knowledge. To optimize the makespan, total costs, and total lateness concurrently, the weight parameters for the objectives are generated by the network and self-updated based on the current state. Numerical experiments demonstrate the effectiveness of the proposed method.

PaperID: 1404,

Authors: Zhangmin Huang, Pengcheng Wang, Shaojie Tang, Bo Lyu, Lingfang Zeng

Affiliations: Zhejiang Laboratory, Hangzhou, China; Department of Management Science and Systems, School of Management, Center for AI Business Innovation, University at Buffalo, Buffalo, NY, USA

Title: Decoupling Neural Networks to Leverage Uniform Representation and Balance Personalization and Collaboration in Federated Learning

Abstract:
Federated learning (FL), a distributed learning paradigm focused on preserving data privacy, faces challenges due to varying data distributions among clients, impacting global model performance. To mitigate data heterogeneity, we propose FedUB—a personalized FL framework leveraging uniform feature representation and balancing personalization and collaboration in the classifier. Specifically, the uniform representation (UR) in FedUB provides all clients with a shared feature extractor and a common representation centroid (RC). Achieving this uniformity involves incorporating a regularization term to reduce the gap between global and local RCs. Additionally, an importance estimation of the parameters in the classifier is provided to partition the parameters into two parts: the personalized component and the collaborated component. Specifically, the personalized component adapts to local data, while the collaborated component prevents the classifier from overfitting local data. Theoretically, we establish the existence of the UR, demonstrating its effectiveness in reducing the average generalization bound. Experiments on benchmark datasets consistently demonstrate the performance gains and improved generalization behavior of FedUB.

PaperID: 1405,

Authors: Qiang Lai, Yudi Xu, Luigi Fortuna

Affiliations: School of Electrical and Automation Engineering, East China Jiaotong University, Nanchang, China; Dipartimento di Ingegneria Elettrica Elettronica e Informatica, University of Catania, Catania, Italy

Title: Generating Simple Cyclic Memristive Neural Network Circuit With Controllable Multiscroll Attractors and Multivariable Amplitude Control

Abstract:
Due to their synaptic-like characteristics and memory properties, memristors are often used in neuromorphic circuits, particularly neural network circuits. However, most of the existing neural network circuits that can generate complex dynamics have high dimensions and excessive connections, which is not conducive to implementation. This article introduces a memristor containing an arctangent function into a simple cyclic neural network (SCNN) circuit to design a simple cyclic memristive neural network (SCMNN) circuit capable of generating complex multiscroll chaotic attractors. The designed SCMNN contains an external stimulus current and generates multiscroll attractors, with the number of scrolls expanding as the switches in the memristor equivalent circuit are activated. By varying the parameters, the multiscroll attractors can be broken into different numbers of coexisting attractors, which also depends on the switch, and it can achieve multivariable amplitude control when there is only one scroll. The anti-interference ability of the circuit is tested. A low-cost circuit-based microcontroller suitable for engineering applications is designed for it, and multiscroll attractors are successfully captured in an oscilloscope. The National Institute of Standards and Technology (NIST) test is carried out to verify its application value.

PaperID: 1406,

Authors: Mincan Li, Zidong Wang, Simon J. E. Taylor, Kenli Li, Xiangke Liao, Xiaohui Liu

Affiliations: College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China; Department of Computer Science, Brunel University London, Middlesex, Uxbridge, U.K.; Collaborative Innovation Center of High Performance Computing, National University of Defense Technology, Changsha, China

Title: Multiple Influences Maximization Under Dynamic Link Strength in Multi-Agent Systems: The Competitive and Cooperative Cases

Abstract:
This article addresses the issue of multiple influences maximization under dynamic link strength (MIMDLS) in multi-agent systems (MASs). Initially, a novel model for dynamic link strength within MASs is suggested to facilitate the simulation of multiple influences diffusion. Subsequently, the MIMDLS problem is formulated with both competitive and cooperative scenarios being examined. In response, two diffusion models, specifically the competitive multiple influences independent cascade (Cp-MIIC) model and the cooperative multiple influences linear threshold (Cr-MILT) model, are designed for MASs. Furthermore, a distributed deep reinforcement learning (DRL) framework is established based on MASs by incorporating asynchronous training and updating processes for seed selection in the context of multiple influences. Moreover, the developed distributed DRL algorithm encompasses the estimation of Q value as well as the management of constraints within Cp-MIIC and Cr-MILT models. Finally, comprehensive experiments are conducted to: 1) validate the effectiveness and efficiency of the proposed models and algorithms in terms of multiple influence diffusion and 2) benchmark their performance against state-of-the-art methods.

PaperID: 1407,

Authors: Yuze Zhao, Zhenya Huang, Kai Zhang, Weibo Gao, Qi Liu, Xukai Liu, Fangzhou Yao, Enhong Chen

Affiliations: State Key Laboratory of Cognitive Intelligence, School of Computer Science and Technology, University of Science and Technology of China, Hefei, China

Title: Semantic-Aligned Code Summarization: Bridging the Gap Between Code and Natural Language Through Data Flow Analysis

Abstract:
Code summarization is designed to generate descriptive natural language for code snippets, facilitating understanding and increasing productivity for developers. Previous research often overlooks the semantic connection between code and its natural language description, resulting in a noticeable gap and suboptimal solution. To address this issue, we introduce a semantic-aligned code summarization framework that leverages crucial data flow information from code for semantic analysis, ensuring alignment between code and summaries. Specifically, we utilize a semantic extraction module (SEM) to decipher the meaning of code and align it with natural language through a semantic alignment module. In the SEM, we construct a code graph that includes data flow edges using static program analysis techniques. Then, on this well-constructed code graph, we innovatively adopt a walking algorithm guided by data flow to extract the semantics of the code. This walking algorithm understands code semantics by analyzing the information transfer between variables during the program execution process. In the semantic alignment module, we integrate a contrastive learning loss mechanism for semantic alignment, which cohesively maps the semantic domains of code and natural language into a unified vector space. We further theoretically analyzed that the data-flow-guided walking algorithm can ensure capturing semantically highly related nodes in shorter paths. Extensive experiments on two benchmark datasets demonstrate the efficacy and broad applicability of the framework.

PaperID: 1408,

Authors: Wenju Cui, Yilin Leng, Yunsong Peng, Chen Bai, Lei Li, Xi Jiang, Gang Yuan, Jian Zheng

Affiliations: School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China; Department of Radiology, Guizhou Province International Science and Technology Cooperation Base for Precision Imaging Diagnosis and Treatment, Key Laboratory of Advanced Medical Imaging and Intelligent Computing of Guizhou Province, Guizhou Provincial People’s Hospital, Guiyang, China; School of Life Sciences and Technology, University of Electronic Science and Technology of China, Chengdu, China

Title: A Novel Dynamic Neural Network for Heterogeneity-Aware Structural Brain Network Exploration and Alzheimer's Disease Diagnosis

Abstract:
Heterogeneity is a fundamental characteristic of brain diseases, distinguished by variability not only in brain atrophy but also in the complexity of neural connectivity and brain networks. However, existing data-driven methods fail to provide a comprehensive analysis of brain heterogeneity. Recently, dynamic neural networks (DNNs) have shown significant advantages in capturing sample-wise heterogeneity. Therefore, in this article, we first propose a novel dynamic heterogeneity-aware network (DHANet) to identify critical heterogeneous brain regions, explore heterogeneous connectivity between them, and construct a heterogeneous-aware structural brain network (HGA-SBN) using structural magnetic resonance imaging (sMRI). Specifically, we develop a 3-D dynamic convmixer to extract abundant heterogeneous features from sMRI first. Subsequently, the critical brain atrophy regions are identified by dynamic prototype learning with embedding the hierarchical brain semantic structure. Finally, we employ a joint dynamic edge-correlation (JDE) modeling approach to construct the heterogeneous connectivity between these regions and analyze the HGA-SBN. To evaluate the effectiveness of the DHANet, we conduct elaborate experiments on three public datasets and the method achieves state-of-the-art (SOTA) performance on two classification tasks.

PaperID: 1409,

Authors: Arthicha Srisuchinnawong, Poramate Manoonpong

Affiliations: Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand

Title: An Interpretable Neural Control Network With Adaptable Online Learning for Sample Efficient Robot Locomotion Learning

Abstract:
Robot locomotion learning using reinforcement learning suffers from training sample inefficiency and exhibits the non-interpretable/closed-box nature. Thus, this work presents a novel SME-Adaptable Gradient-weighting Online Learning (AGOL) to address such problems. First, sequential motion executor (SME) is a three-layer interpretable neural network, where the first produces the sequentially propagating hidden states, the second constructs the corresponding triangular bases with minor non-neighbor interference, and the third maps the bases to the motor commands. Second, the AGOL algorithm prioritizes the update of the parameters with high relevance score, allowing the learning to focus more on the highly relevant ones. Thus, these two components lead to an analyzable framework, where each sequential hidden state/basis represents the learned key poses/robot configuration. Compared to state-of-the-art methods, the SME-AGOL requires 40% fewer samples and receives 150% higher final reward/locomotion performance on a simulated hexapod robot, while taking merely 10 min of learning time from scratch on a physical hexapod robot. Taken together, this work not only proposes the SME-AGOL for sample efficient and understandable locomotion learning but also emphasizes the potential exploitation of interpretability for improving sample efficiency and learning performance.

PaperID: 1410,

Authors: Hongbo Gao, Chengbo Wang, Runda Niu, Xiaozhao Fang, Jinpeng Chen, Yining Sun, Huiqing Jin, Danwei Wang

Affiliations: Department of Automation, School of Information Science and Technology, University of Science and Technology of China, Hefei, China; School of Electrical and Electronic Engineering, Nanyang Technological University, Jurong West, Singapore; School of Automation and the Key Laboratory of Intelligent Detection and The Internet of Things in Manufacturing, Ministry of Education, Guangdong University of Technology, Guangzhou, China; School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing, China; Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China; Institute of Advanced Technology, University of Science and Technology of China, Hefei, China

Title: Driving Risk Assessment for Intelligent Vehicles Based on Entropy-Informed Graph Neural Networks and Gaussian Distributions

Abstract:
This study proposes a novel framework based on an entropy-informed graph neural network (EIGNN) integrated with Gaussian distribution (GD) to assess the driving risk of intelligent vehicles in typical traffic scenarios. Existing research often overlooks comprehensive spatiotemporal modeling of vehicle interaction characteristics and the quantification of uncertainty in dynamic risk assessments. In this work, vehicle speed and acceleration are probabilistically modeled using GD, while entropy theory is introduced to quantify risk uncertainty. A risk assessment model based on graph neural networks (GNNs) is then designed to capture the spatiotemporal dynamics of multivehicle interactions and predict the potential risk levels of driving strategies. The results demonstrate that the framework accurately quantifies collision risks in multivehicle interactions in complex traffic scenarios, with high accuracy and robustness across typical situations such as cruising, cut-ins, lane changes, overtaking, and different density traffic. By thoroughly analyzing traffic risk characteristics and incorporating them into intelligent driving decision-making, this study provides significant technical insights and theoretical support for enhancing the safety and decision-making efficiency of autonomous driving systems.

PaperID: 1411,

Authors: Yuan Li, Yiyan Han, Chongyang Chen, Zhigang Zeng, Jiankun Sun

Affiliations: School of Artificial Intelligence and Automation and the Institute of Artificial Intelligence, Huazhong University of Science and Technology, Wuhan, China; College of Electronic and Information Engineering, Southwest University, Chongqing, China; School of Computer and Information Technology, Xinyang Normal University, Xinyang, China

Title: Online Reinforcement Learning Control Designs With Acceleration Mechanism for Unknown Multiagent Systems Through Value Iteration

Abstract:
In this article, an online reinforcement learning (RL) control method through value iteration (VI) is developed to solve the optimal cooperative control problem for the unknown linear discrete-time multiagent systems (MASs). On the one hand, an online learning scheme with evolving policies is proposed in order to guarantee the stability of the MASs under immature policies generated by VI. Inspired by the event-triggered mechanism, the stability criterion is designed as a trigger to filter the admissible control policies, which eliminates the need to establish a monotonic value function sequence. On the other hand, an acceleration mechanism for the MASs is presented such that the convergence rate of VI can be accelerated. The relationship between the selection of the relaxation factor and the accelerated convergence process is elaborated. Simple backpropagation (BP) neural networks (NNs) are applied for the implementation. Two classical examples are introduced and simulation results are provided in order to substantiate the validity of the designed method.

PaperID: 1412,

Authors: Xiongtao Zou, Jianhua Dai

Affiliations: Key Laboratory of Computing and Stochastic Mathematics (Ministry of Education), School of Mathematics and Statistics, and Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, College of Information Science and Engineering, Hunan Normal University, Changsha, China

Title: Unified Feature Selection Approach for Complex Data Based on Fuzzy β-Covering Reduction via Information Granulation

Abstract:
Feature selection, as an important step of data analysis, is widely used in the fields of data mining, machine learning, and artificial intelligence. It can not only effectively alleviate the curse of dimensionality and improve model performance but also enhance model interpretability. In the real world, data is usually complex such as different feature types, the presence of missing values, and so on. However, most existing feature selection approaches are only capable of handling data with a single feature type. To address the issue of feature selection under the environment of complex data, this article proposes a unified feature selection (UFS) approach for complex data based on fuzzy \beta -covering reduction via information granulation. To begin with, several monotonic uncertainty measures for fuzzy \beta -covering are constructed from the viewpoints of algebra and information theory. Based on the proposed measures, two forward heuristic algorithms are designed for fuzzy \beta -covering reduction. Meanwhile, the complex data with multiple features is represented by fuzzy \beta -covering via information granulation. On this basis, a UFS approach is put forward for complex data. Finally, the effectiveness and superiority of the proposed approach are verified through a series of experiments compared with 12 state-of-the-art feature selection approaches.

PaperID: 1413,

Authors: Ziqing Deng, Xiaofang Chen, Yongfang Xie, Hongliang Zhang, Weihua Gui

Affiliations: School of Automation, Central South University, Changsha, China; School of Metallurgy and Environment, Central South University, Changsha, China

Title: Generalized Cross-Domain Industrial Process Monitoring via Adaptive Discriminative Transfer Dictionary Pair Learning With Attribute Embedding

Abstract:
Real industrial process data from various domains often exhibit divergent distributions, may occupy distinct feature spaces, and are occasionally unlabeled, which limits the effectiveness of conventional process monitoring methods. To address these challenges, we propose an adaptive discriminative transfer dictionary pair learning (ADTDPL) method with attribute embedding for generalized cross-domain industrial process monitoring. Specifically, this method aligns the feature spaces of source and target domains by the aligned transfer reconstruction, enabling the transfer of knowledge through a common synthetical dictionary. Concurrently, semantic attributes relevant to process knowledge are seamlessly fused into data information via attribute embedding, enhancing the transferability and interpretability of dictionary pairs. Considering the relative significance of marginal and conditional distributions, an adaptive distribution consistency function is designed to better reduce the distributional discrepancies. And the discriminative structure regularization is developed to ensure the discrimination of the dictionary pairs and their corresponding coding coefficients. Furthermore, in the absence of target domain labels, a novel selective pseudo-labeling strategy is advanced to adaptively update pseudo-labels. The superior performance of our method for cross-domain process monitoring is verified on the Tennessee Eastman platform and in practical aluminum electrolysis processes (AEPs).

PaperID: 1414,

Authors: Yuan Wang, Yaguo Lei, Naipeng Li, Xiang Li, Bin Yang

Affiliations: Key Laboratory of Education Ministry for Modern Design and Rotor-Bearing System, Xi’an Jiaotong University, Xi’an, China

Title: Multimodal Correlation-Aware Fusion Framework for Enhanced Machinery Health Prognosis With Unlabeled and Low-Quality Data Exploitation

Abstract:
Accurate machinery health prognosis, also known as remaining useful life (RUL) prediction, is critical for preventing catastrophic accidents and implementing predictive maintenance strategies. This makes it a highly attractive research area. Many existing studies have been developed on unimodal data, yet such data can only provide a restricted perspective and incomplete health state monitoring. Some researchers seek to address this issue from a multimodal standpoint. While promising, these methods still have certain shortcomings: 1) the imbalance for unlabeled and low-quality data compared to well-labeled data is not considered, causing their potential underexploited; 2) information richness during fusion is insufficient, discarding many valuable original and subtle health state cues, and they fail to timely tackle unexpected online anomalies; and 3) correlations and complementary information across modalities are neglected. To address these challenges, a multimodal correlation-aware fusion framework is proposed for machinery health prognosis. The framework adopts a pretrain-finetune paradigm with two parts. The first part achieves effective exploitation of the unlabeled and low-quality multimodal data pieces. The second part, through degradation pattern recognition, enables the framework to bridge the gap between scarce multimodal labeled data and accurate RUL prediction. A real industrial multimodal dataset of milling cutters is applied to demonstrate the proposed framework. Results from a series of ablation experiments and comparisons with state-of-the-art prediction methods indicate the effectiveness of each key component within the framework and its overall superiority. The framework shows promise in adapting to more downstream industrial tasks, providing accurate and reliable insights from limited data resources.

PaperID: 1415,

Authors: Jie Hu, Min Wu, Witold Pedrycz

Affiliations: School of Automation, China University of Geosciences, Wuhan, China; Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology (SUT), Gliwice, Poland

Title: Adaptive Weighted Broad Echo State Learning System-Based Dynamic Modeling of Carbon Consumption in Sintering Process

Abstract:
Carbon consumption dynamic modeling is essential for energy saving, emission reduction, and green manufacturing of iron ore sintering process. This article proposes a novel adaptive weighted broad echo state learning system (AWBESLS) for carbon consumption dynamic prediction in the sintering process by integrating adaptive weights and a reservoir with echo state characteristics. Different from previous studies, the AWBESLS adaptively matches a weight to each production data to overcome the effects of anomalous data in production data and utilizes an echo state network (ESN) for catching the dynamic state in sintering process. Carbon consumption experiments using actual production data reveal the effectiveness of the AWBESLS and compare it with some state-of-the-art methods. The results show that the AWBESLS is superior to other methods in improving the prediction performance with lowest prediction error. In summary, the AWBESLS is an effective and applicable technique for dynamic modeling of the sintering process that is easily applicable for the modeling of other manufacturing processes.

PaperID: 1416,

Authors: Xiuwei Chen, Li Lai, Maokang Luo

Affiliations: School of Mathematics, Sichuan University, Chengdu, China

Title: A Novel Fusion and Feature Selection Framework for Multisource Time-Series Data Based on Information Entropy

Abstract:
Information technology growth brings vast time-series data. Despite richness, challenges like redundancy emphasize the need for time-series data fusion research. Rough set theory, a valuable tool for dealing with uncertainty, can identify features and reduce dimensionality, enhancing time-series data fusion. The contribution of the study lies in establishing a fusion and feature selection framework for multisource time-series data. This framework selects optimal information sources by minimizing entropy. In addition, the fusion process integrates a feature selection algorithm to eliminate redundant features, preventing a sequential increase in entropy. Crucial experiments on abundant datasets demonstrate that the proposed approach outperforms several state-of-the-art algorithms in terms of enhancing the accuracy of common classifiers. This research significantly advances the field of time-series data fusion in rough set theory, offering improved accuracy and efficiency in data processing and analysis.

PaperID: 1417,

Authors: Huiling Chen, Chunmei Zhang, Han Yang

Affiliations: School of Mathematics, Southwest Jiaotong University, Chengdu, China

Title: Topology Identification of Weighted Complex Networks Under Intermittent Control and Its Application in Neural Networks

Abstract:
Topology identification of stochastic complex networks is an important topic in network science. In modern identification techniques under a continuous framework, the controller has a negative dynamic gain (feedback gain), such that stochastic LaSalle’s invariance principle (SLIP) is directly satisfied. In this article, the topology identification of stochastic complex networks is studied under aperiodic intermittent control (AIC). It is noteworthy that the AIC has a rest time, which indicates the SLIP is not valid since there is no negative feedback gained during this period. This motivates us to find other methods to obtain identification criteria. In this study, the graph-theoretic method and the stochastic analysis technique are integrated to obtain the almost surely exponential synchronization of drive-response networks. Furthermore, this integration enables the topology identification criteria of the drive network to be derived, which differs from previous work that directly utilized SLIP. It is worth mentioning that the topology identification criteria under the stochastic framework are first proposed based on the AIC in this work. The control strategy not only reduces the control cost but also makes it easier to operate. To enhance the application value of the network model, regime-switching diffusions, multiple weights, and nonlinear couplings are simultaneously considered. Finally, the proposed identification criteria are tested by using neural networks. At the same time, the validity of the theoretical results is further proved by numerical simulations.

PaperID: 1418,

Authors: Ruobin Gao, Minghui Hu, Ruilin Li, Xuewen Luo, Ponnuthurai Nagaratnam Suganthan, Muhammad Tanveer

Affiliations: School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an, China; School of Electrical and Electronics Engineering, Nanyang Technological University, Jurong West, Singapore; School of Computer Science and Engineering, Northeastern University, Shenyang, China; KINDI Computing Research Center, College of Engineering, Qatar University, Doha, Qatar; Department of Mathematics, Indian Institute of Technology Indore, Indore, India

Title: Stacked Ensemble Deep Random Vector Functional Link Network With Residual Learning for Medium-Scale Time-Series Forecasting

Abstract:
The deep random vector functional link (dRVFL) and ensemble dRVFL (edRVFL) succeed in various tasks and achieve state-of-the-art performance compared with other randomized neural networks (NNs). However, existing edRVFL structures need more diversity and error correction ability in an independent network. Our work fills the gap by combining stacked deep blocks and residual learning with the edRVFL. Subsequently, we propose a novel dRVFL combined with residual learning, ResdRVFL, whose deep layers calibrate the wrong estimations from shallow layers. Additionally, we propose incorporating a scaling parameter to control the scaling of residuals from shallow layers, thus mitigating the risk of overfitting. Finally, we present an ensemble deep stacking network, SResdRVFL, based on ResdRVFL. SResdRVFL aggregates multiple blocks into a cohesive network, leveraging the benefits of deep learning and ensemble learning. We evaluate the proposed model on 28 datasets and compare it with the state-of-the-art methods. The comparative study demonstrates that the SResdRVFL is the best-performing approach in terms of average ranking and errors based on 28 datasets.

PaperID: 1419,

Authors: Chong Yu, Zhenyu Meng, Wenmiao Zhang, Lei Lei, Jianbing Ni, Kuan Zhang, Hai Zhao

Affiliations: Department of Computer Science, University of Cincinnati, Cincinnati, OH, USA; Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE, USA; School of Engineering, University of Guelph, Guelph, ON, Canada; Department of Electrical and Computer Engineering, Queen’s University, Kingston, ON, Canada; School of Computer Science and Engineering, Northeastern University, Shenyang, China

Title: Secure and Efficient Federated Learning Against Model Poisoning Attacks in Horizontal and Vertical Data Partitioning

Abstract:
In distributed systems, data may partially overlap in sample and feature spaces, that is, horizontal and vertical data partitioning. By combining horizontal and vertical federated learning (FL), hybrid FL emerges as a promising solution to simultaneously deal with data overlapping in both sample and feature spaces. Due to its decentralized nature, hybrid FL is vulnerable to model poisoning attacks, where malicious devices corrupt the global model by sending crafted model updates to the server. Existing work usually analyzes the statistical characteristics of all updates to resist model poisoning attacks. However, training local models in hybrid FL requires additional communication and computation steps, increasing the detection cost. In addition, due to data diversity in hybrid FL, solutions based on the assumption that malicious models are distinct from honest models may incorrectly classify honest ones as malicious, resulting in low accuracy. To this end, we propose a secure and efficient hybrid FL against model poisoning attacks. Specifically, we first identify two attacks to define how attackers manipulate local models in a harmful yet covert way. Then, we analyze the execution time and energy consumption in hybrid FL. Based on the analysis, we formulate an optimization problem to minimize training costs while guaranteeing accuracy considering the effect of attacks. To solve the formulated problem, we transform it into a Markov decision process and model it as a multiagent reinforcement learning (MARL) problem. Then, we propose a malicious device detection (MDD) method based on MARL to select honest devices to participate in training and improve efficiency. In addition, we propose an alternative poisoned model detection (PMD) method considering model change consistency. This method aims to prevent poisoned models from being used in the model aggregation. Experimental results validate that under the random local model poisoning attack, the proposed MDD method can save over 50% training costs while guaranteeing accuracy. When facing the advanced adaptive local model poisoning (ALMP) attack, utilizing both the proposed MDD and PMD methods achieves the desired accuracy while reducing execution time and energy consumption.

PaperID: 1420,

Authors: Yibo Wang, Changchun Hua, PooGyeon Park, Shichao Liu

Affiliations: School of Electrical Engineering, Yanshan University, Qinhuangdao, China; Department of Electrical Engineering, Pohang University of Science and Technology, Gyeongbuk, Republic of Korea; Department of Electronics, Carleton University, Ottawa, ON, Canada

Title: Relaxed Stability Criteria for Delayed Memristor-Based Neural Network Systems via a Novel Matrix-Separation Legendre Inequality

Abstract:
This article studies the issue of stability in memristor-based neural network (MNN) systems with time-varying delays. First, a novel matrix-separation Legendre inequality is proposed to achieve a tight hierarchical bound on augmented-type integral terms. To derive implementable inequality conditions, several delay-dependent matrices are introduced to eliminate the reciprocal terms associated with time-varying delay. Furthermore, a new Lyapunov-Krasovskii (L-K) functional is proposed by incorporating augmented-type double integrals and delay-product terms. A series of free-weighting matrices are introduced into the L-K functional, leveraging the zero-sum equations and the S-procedure pertaining to both the delay and its derivative. Based on the proposed matrix-separation Legendre inequality and L-K functional, the derived stability conditions exhibit reduced conservatism, as validated by three numerical cases and simulation results.

PaperID: 1421,

Authors: Xiangyu Du, Min Xiao, Jianlong Qiu, Yunxiang Lu, Jinde Cao

Affiliations: College of Automation, Nanjing University of Posts and Telecommunications, Nanjing, China; School of Automation and Electrical Engineering, Linyi University, Linyi, China; School of Mathematics, Southeast University, Nanjing, China

Title: Stability and Dynamics Analysis of Time-Delay Fractional-Order Large-Scale Dual-Loop Neural Network Model With Cross-Coupling Structure

Abstract:
In recent years, the analysis of the dynamics of annular neural networks has received extensive attention and achieved some achievements. However, most of the current research merely focuses on the single-ring, low-dimension, two rings sharing one neuron cases, without considering the rich coupling modes between rings. In this article, a large-scale time-delay fractional-order dual-loop neural network model with cross-coupling structure is established, in which two rings complete information interaction through two shared neurons. Moreover, the Caputo fractional derivative is introduced in this article to describe the neural network more accurately. First, the transmission time delay between each neuron is selected as the key parameter leading to the bifurcation, and the characteristic equation of the network is creatively derived using the Coates flow graph method. Subsequently, through the holistic element method and magnitude angle formula, we simplify the analytical process. Then, we obtain the stability and Hopf bifurcation criterion of the network. Finally, the conclusions of the theoretical analysis are verified by a series of numerical simulations. The results show that the stability region of the network is closely related to the fractional order, the number of neurons, the distribution of neurons, and the self-feedback coefficients. Moreover, the time delays have a significant effect on the amplitude and period of the Hopf bifurcation.

PaperID: 1422,

Authors: Yanfang Liu, Xu Wang, Yituo Song, Bo Wang, Desong Du

Affiliations: Department of Aerospace Engineering, Harbin Institute of Technology, Harbin, China; Department of Aerospace Engineering and the Sate Key Laboratory of Robotics and Systems, Harbin Institute of Technology, Harbin, China; National Key Laboratory of Human Factors Engineering, Astronaut Research and Training Center of China, Beijing, China

Title: General Hamiltonian Neural Networks for Dynamic Modeling: Handling Sophisticated Constraints Automatically and Achieving Coordinates Free

Abstract:
Embedding the Hamiltonian formalisms into neural networks (NNs) enhances the reliability and precision of data-driven models, in which substantial research has been conducted. However, these approaches require the system to be represented in canonical coordinates, i.e., observed states should be generalized position-momentum pairs, which are typically unknown. This poses limitations when the method is applied to real-world data. Existing methods tackle this challenge through coordinate transformation or designing complex NNs to learn the symplectic phase flow of the state evolution. However, these approaches lack generality and are often difficult to train. This article proposes a versatile framework called general Hamiltonian NN (GHNN), which achieves coordinates free and handles sophisticated constraints automatically with concise form. GHNN employs two NNs, namely, an HNet to predict the Hamiltonian quantity and a JNet to predict the interconnection matrix. The gradients of the Hamiltonian quantity with respect to the input coordinates are calculated using automatic differentiation and are then multiplied by the interconnection matrix to obtain state differentials. Subsequently, ordinary differential equations (ODEs) are solved by numerical integration to provide state predictions. The accuracy and versatility of the GHNN are demonstrated through several challenging tasks, including the nonlinear simple and double pendulum, coupled pendulum, and real 3-D crane dynamic system.

PaperID: 1423,

Authors: Wei Xue, Hong He, Yanbing Wang, Ying Zhao

Affiliations: School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China

Title: SAGN: Sparse Adaptive Gated Graph Neural Network With Graph Regularization for Identifying Dual-View Brain Networks

Abstract:
Due to the absence of a gold standard for threshold selection, brain networks constructed with inappropriate thresholds risk topological degradation or contain noise connections. Therefore, graph neural networks (GNNs) exhibit weak robustness and overfitting problems when identifying brain networks. Furthermore, existing studies have predominantly focused on strongly coupled connections, neglecting substantial evidence from other intricate systems that highlight the value of weakly coupled connections. Consequently, the potential of weakly coupled brain networks remains untapped. In this study, we pioneeringly construct weakly coupled brain networks and validate their values in emotion identification tasks. Subsequently, we propose a sparse adaptive gated GNN (SAGN) that can simultaneously perceive the valuable topology of dual-view (i.e., strongly coupled and weakly coupled) brain networks. The SAGN contains a sparse adaptive global receptive field. Moreover, SAGN employs a gated mechanism with feature enhancement and adaptive noise suppression capabilities. To address the lack of inductive bias and the large capacity of SAGN, a graph regularization term built with prior topology of dual-view brain networks is introduced to enhance generalization. Besides a public dataset (SEED), we also built a custom dataset (MuSer) with 60 subjects to evaluate weakly coupled brain networks’ value and validate the SAGN’s performance. Experiments demonstrate that brain physiological patterns associated with different emotional states are separable and rooted in weakly coupled brain networks. In addition, SAGN exhibits excellent generalization and robustness in identifying brain networks.

PaperID: 1424,

Authors: Yuru Guo, Zidong Wang, Jun-Yi Li, Yong Xu

Affiliations: Guangdong-Hong Kong Joint Laboratory for Intelligent Decision and Cooperative Control, Guangdong Provincial Key Laboratory of Intelligent Decision and Cooperative Control, School of Automation, Guangdong University of Technology, Guangzhou, China; Department of Computer Science, Brunel University London, Uxbridge, U.K.

Title: State Estimation for Markovian Jump Neural Networks Under Probabilistic Bit Flips: Allocating Constrained Bit Rates

Abstract:
In this article, the state estimation problem is studied for Markovian jump neural networks (MJNNs) within a digital network framework. The wireless communication channel with limited bandwidth is characterized by a constrained bit rate, and the occurrence of bit flips during wireless transmission is mathematically modeled. A transmission mechanism, which includes coding-decoding under bit-rate constraints and considers probabilistic bit flips, is introduced, providing a thorough characterization of the digital transmission process. A mode-dependent remote estimator is designed, which is capable of effectively capturing the internal state of the neural network. Furthermore, a sufficient condition is proposed to ensure the estimation error to remain bounded under challenging network conditions. Within this theoretical framework, the relationship between the neural network’s estimation performance and the bit rate is explored. Finally, a simulation example is provided to validate the theoretical findings.

PaperID: 1425,

Authors: Jiacheng Wang, Yaojia Chen, Quan Zou

Affiliations: Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People’s Hospital, Quzhou, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China

Title: GRACE: Unveiling Gene Regulatory Networks With Causal Mechanistic Graph Neural Networks in Single-Cell RNA-Sequencing Data

Abstract:
Reconstructing gene regulatory networks (GRNs) using single-cell RNA sequencing (scRNA-seq) data holds great promise for unraveling cellular fate development and heterogeneity. While numerous machine-learning methods have been proposed to infer GRNs from scRNA-seq gene expression data, many of them operate solely in a statistical or black box manner, limiting their capacity for making causal inferences between genes. In this study, we introduce GRN inference with Accuracy and Causal Explanation (GRACE), a novel graph-based causal autoencoder framework that combines a structural causal model (SCM) with graph neural networks (GNNs) to enable GRN inference and gene causal reasoning from scRNA-seq data. By explicitly modeling causal relationships between genes, GRACE facilitates the learning of regulatory context and gene embeddings. With the learned gene signals, our model successfully decoding the causal structures and alleviates the accurate determination of multiple attributes of gene regulation that is important to determine the regulatory levels. Through extensive evaluations on seven benchmarks, we demonstrate that GRACE outperforms 14 state-of-the-art GRN inference methods, with the incorporation of causal mechanisms significantly enhancing the accuracy of GRN and gene causality inference. Furthermore, the application to human peripheral blood mononuclear cell (PBMC) samples reveals cell type-specific regulators in monocyte phagocytosis and immune regulation, validated through network analysis and functional enrichment analysis.

PaperID: 1426,

Authors: Arthur Mukhamedov, Grigory Bugriy, Irina Polikanova, Isaac Chairez, Alexander S. Poznyak, Viktor Chertopolokhov

Affiliations: Center “Supersonic,” Lomonosov Moscow State University, Moscow, Russia; Laboratory of Convergent Research on Cognitive Processes, Federal Scientific Center for Psychological and Interdisciplinary Research, Moscow, Russia; Institute of Advanced Materials for Sustainable Manufacturing, Tecnológico de Monterrey, Zapopan, Jalisco, Mexico; Center of Investigation and Advanced Researching (CINVESTAV-IPN), IPN, Mexico City, Mexico

Title: Adaptive Structure Strategy for Designing Nonparametric Models Based on Differential Neural Networks Using Functional Projection

Abstract:
Differential neural networks (DiNNs) encounter a trade-off between the approximation quality and structural complexity. One promising approach to address this trade-off is incorporating dynamic complexity adjustment as an integral part of the learning process. Taking inspiration from the Fourier approximation theory, this study introduces a novel method for adapting the architecture of DiNNs, when they serve as nonparametric identifiers for dynamic systems with uncertain mathematical models. The structural adaptation process is executed through a recursive algorithm based on a modification structure strategy, which dynamically adjusts the number of neurons within the network’s structure. By applying a projection operator to the set of neurons, this method identifies the most relevant sequence of sigmoidal functions, intending to minimize the mean square error in approximating the trajectories of uncertain systems. This simultaneous reduction in overall complexity enhances the quality of the approximations. Moreover, the proposed method can implement a coarse-to-fine approach, wherein selecting necessary neurons occurs in multiple steps. These steps are determined by an adaptive structure strategy that alters the topology of the DiNN. The resulting framework’s effectiveness is demonstrated by evaluating the proposed identifier’s performance in approximating the evolution of real-life data associated with the ocular response during controlled motions or virtual reality engagement. In both experimental cases, there was a noticeable improvement in the accuracy of eye motion approximation by the DiNN, thanks to the variable structure approximation basis determined by the adaptive structure strategy. Overall, this study presents a formal method to automatically determine a feasible DiNN topology.

PaperID: 1427,

Authors: Chengbao Liu, Jingwei Li, Yuan Li, Jie Tan

Affiliations: Institute of Automation, Chinese Academy of Sciences, Beijing, China

Title: Denoising Multiscale Spectral Graph Wavelet Neural Networks for Gas Utilization Ratio Prediction in Blast Furnace

Abstract:
Given the crucial role of the gas utilization ratio (GUR) in reflecting blast furnace operation and energy consumption, accurately predicting its development trend holds significant value for blast furnace operators. However, in the harsh ironmaking environment, GUR-affecting variables are prone to significant nonstationary noise. Moreover, these variables are coupled and correlated, meaning that improper regulation of one variable can destabilize the furnace and lead to substantial GUR fluctuations. This poses a major challenge for achieving accurate GUR prediction. To tackle this issue, this article proposes a denoising multiscale spectral graph wavelet neural network (DMSGWNN) for online dynamic forecasting of the GUR, which is an end-to-end learning method that removes variable noise and captures complex variable correlations simultaneously. First, a regularized self-representation (RSR) model is constructed to eliminate nonstationary noise in blast furnace process variables. Then, a novel multiscale spectral graph wavelet neural network (MSGWNN) is proposed to capture the complex correlations among input variables and extract their multiscale representations through spectral graph wavelet (SGW) transform with the heat kernel scaling function and Gaussian kernel wavelet functions. Finally, the effectiveness of the proposed DMSGWNN method is verified using actual blast furnace ironmaking process data from a blast furnace in China, achieving an average predictive hit rate (HR) as high as 98.06% for GUR prediction.

PaperID: 1428,

Authors: Junwei Sun, Yijin Shen, Yingcong Wang, Yanfeng Wang

Affiliations: School of Electrical and Information Engineering, Zhengzhou University of Light Industry, Zhengzhou, China

Title: A Memristor-Based Neural Network Circuit With Retrospective Revaluation Effect and Application in Intelligent Household Robots

Abstract:
The traditional association theory maintains that associations between cues can change only in trials where the cue is actually presented. However, the retrospective revaluation (RR) studies the phenomenon that responses to a cue can change even when the cue is not actually presented. A hardware memristor-based neural network circuit with an RR effect is proposed in this article. The neural network circuit successfully demonstrates various phenomena of RR, including the impact of deflation and inflation of companion cue associations on target cue, higher order RR, and context dependence. The correctness of the circuit design is verified by Pspice simulation. The key feature of this design lies in its ability to learn cue associations even in training trials, where the target cues are absent. This distinctive attribute offers a fresh perspective for the creation of more intricate, brain-inspired information processing systems with enhanced integration capabilities.

PaperID: 1429,

Authors: Qingxu Fu, Tenghai Qiu, Jianqiang Yi, Zhiqiang Pu, Xiaolin Ai, Wanmai Yuan

Affiliations: Institute of Automation, Chinese Academy of Sciences, Beijing, China

Title: A Policy Resonance Approach to Solve the Problem of Responsibility Diffusion in Multiagent Reinforcement Learning

Abstract:
State-of-the-art (SOTA) multiagent reinforcement algorithms distinguish themselves in many ways from their single-agent equivalences. However, most of them still totally inherit the single-agent exploration-exploitation strategy. Naively inheriting this strategy from single-agent algorithms causes potential collaboration failures, in which the agents blindly follow mainstream behaviors and reject taking minority responsibility. We name this problem the responsibility diffusion (RD) as it shares similarities with the same-name social psychology effect. In this work, we start by theoretically analyzing the cause of this RD problem, which can be traced back to the exploration-exploitation dilemma of multiagent systems (especially large-scale multiagent systems). We address this RD problem by proposing a policy resonance (PR) approach which modifies the collaborative exploration strategy of agents by refactoring the joint agent policy while keeping individual policies approximately invariant. Next, we show that SOTA algorithms can equip this approach to promote the collaborative performance of agents in complex cooperative tasks. Experiments are performed in multiple test benchmark tasks to illustrate the effectiveness of this approach.

PaperID: 1430,

Authors: Yinghui Wang, Xiaoxue Geng, Guanpu Chen, Wenxiao Zhao

Affiliations: Key Laboratory of Knowledge Automation for Industrial Processes of Ministry of Education, School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China; School of Mathematics and Yunnan Key Laboratory of Modern Analytical Mathematics and Applications, Yunnan Normal University, Kunming, China; School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden; Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China

Title: Achieving the Social Optimum in a Nonconvex Cooperative Aggregative Game: A Distributed Stochastic Annealing Approach

Abstract:
This brief designs a distributed stochastic annealing algorithm for nonconvex cooperative aggregative games, whose players’ cost functions not only depend on players’ own decision variables but also rely on the sum of players’ decision variables. To seek the social optimum of cooperative aggregative games, a distributed stochastic annealing algorithm is proposed, where the local cost functions are nonconvex and the communication topology between players is time-varying. The weak convergence to the social optimum of the algorithm is further analyzed. A numerical example is finally given to illustrate the effectiveness of the proposed algorithm.

PaperID: 1431,

Authors: Xue Geng, Zhe Wang, Chunyun Chen, Qing Xu, Kaixin Xu, Jin Chao, Manas Gupta, Xulei Yang, Zhenghua Chen, Mohamed M. Sabry Aly, Jie Lin, Min Wu, Xiaoli Li

Affiliations: Institute for Infocomm Research, Agency for Science, Technology, and Research (A*STAR), Fusionopolis, Singapore; College of Computing and Data Science (CCDS), Nanyang Technological University, Jurong West, Singapore; Technology, and Research (A*STAR), Institute for Infocomm Research (IR), Agency for Science, Fusionopolis, Singapore

Title: From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of Deep Neural Networks

Abstract:
Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) tasks. However, deploying them brings significant challenges due to the huge cost of memory, energy, and computation. To address these challenges, researchers have developed various model compression techniques such as model quantization and model pruning. Recently, there has been a surge in research on compression methods to achieve model efficiency while retaining performance. Furthermore, more and more works focus on customizing the DNN hardware accelerators to better leverage the model compression techniques. In addition to efficiency, preserving security and privacy is critical for deploying DNNs. However, the vast and diverse body of related works can be overwhelming. This inspires us to conduct a comprehensive survey on recent research toward the goal of high-performance, cost-efficient, and safe deployment of DNNs. Our survey first covers the mainstream model compression techniques, such as model quantization, model pruning, knowledge distillation, and optimizations of nonlinear operations. We then introduce recent advances in designing hardware accelerators that can adapt to efficient model compression approaches. In addition, we discuss how homomorphic encryption can be integrated to secure DNN deployment. Finally, we discuss several issues, such as hardware evaluation, generalization, and integration of various compression approaches. Overall, we aim to provide a big picture of efficient DNNs from algorithm to hardware accelerators and security perspectives.

PaperID: 1432,

Authors: Liping Yang, Hongbo Liu, Daoqiang Sun, Seán F. McLoone, Kai Liu, C. L. Philip Chen

Affiliations: College of Artificial Intelligence, Dalian Maritime University, Dalian, China; School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, Belfast, U.K.; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

Title: Robust Temporal Link Prediction in Dynamic Complex Networks via Stable Gated Models With Reinforcement Learning

Abstract:
Temporal link prediction is one of the most important tasks for predicting time-varying links by capturing dynamics within complex networks. However, it suffers from difficulties such as vulnerability to adversarial attacks and inadaptation to distinct evolutionary patterns. In this article, we propose a robust temporal link prediction architecture via stable gated models with reinforcement learning (SAGE-RL) consisting of a state encoding network (SEN) and a self-adaptive policy network (SPN). The former is utilized to capture network dynamics, while the latter helps the former adapt to distinct evolutionary patterns across various time periods. Within the SEN, a novel stable gate is introduced to ensure multiple spatiotemporal dependency paths and defend against adversarial attacks. An SPN is proposed to select different SEN instances by approximating the optimal action function, thereby adapting to various evolutionary patterns to learn the robust temporal and structural features from dynamic complex networks. It is proven that SAGE-LR with integral Lipschitz graph convolution is stable to relative perturbations in dynamic complex networks. With the aid of extensive experiments on five real-world graph benchmarks, SAGE-LR is shown to substantially outperform current state-of-the-art approaches in terms of precision and stability of temporal link prediction and ability to successfully defend against various attacks. We also implement the temporal link prediction in shipping transaction networks, which forecast effectively its potential transaction risks.

PaperID: 1433,

Authors: Pei Huang, Zhaoming Kong, Limin Wang, Xuming Han, Xiaowei Yang

Affiliations: School of Information Science, Guangdong University of Finance and Economics, Guangzhou, China; School of Software Engineering, South China University of Technology, Guangzhou, China; College of Information Science and Technology, Jinan University, Guangzhou, China

Title: Efficient and Stable Unsupervised Feature Selection Based on Novel Structured Graph and Data Discrepancy Learning

Abstract:
Unsupervised feature selection is an important tool in data mining, machine learning, and pattern recognition. Although data labels are often missing, the number of data classes can be known and exploited in many scenarios. Therefore, a structured graph, whose number of connected components is identical to the number of data classes, has been proposed and is frequently applied in unsupervised feature selection. However, methods based on the structured graph learning face two problems. First, their structured graphs are not always guaranteed to maintain the same number of connected components as the data classes with existing optimization algorithms. Second, they usually lack strategies for choosing moderate hyperparameters. To solve these problems, an efficient and stable unsupervised feature selection method based on a novel structured graph and data discrepancy learning (ESUFS) is proposed. Specifically, the novel structured graph, consisting of a pairwise data similarity matrix and an indicator matrix, can be efficiently learned by solving a discrete optimization problem. Data discrepancy learning focuses on features that maximize the difference among data and helps in selecting discriminative features. Extensive experiments conducted on various datasets show that ESUFS outperforms state-of-the-art methods not only in accuracy (ACC) but also in stability and speed.

PaperID: 1434,

Authors: Yiming Fei, Jiangang Li, Yanan Li

Affiliations: School of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen, China; Department of Engineering and Design, University of Sussex, Brighton, U.K

Title: Selective Memory Recursive Least Squares: Recast Forgetting Into Memory in RBF Neural Network-Based Real-Time Learning

Abstract:
In radial basis function neural network (RBFNN)-based real-time learning tasks, forgetting mechanisms are widely used such that the neural network can keep its sensitivity to new data. However, with forgetting mechanisms, some useful knowledge will get lost simply because they are learned a long time ago, which we refer to as the passive knowledge forgetting phenomenon. To address this problem, this article proposes a real-time training method named selective memory recursive least squares (SMRLS) in which the classical forgetting mechanisms are recast into a memory mechanism. Different from the forgetting mechanism, which mainly evaluates the importance of samples according to the time when samples are collected, the memory mechanism evaluates the importance of samples through both temporal and spatial distribution of samples. With SMRLS, the input space of the RBFNN is evenly divided into a finite number of partitions, and a synthesized objective function is developed using synthesized samples from each partition. In addition to the current approximation error, the neural network also updates its weights according to the recorded data from the partition being visited. Compared with classical training methods including the forgetting factor recursive least squares (FFRLS) and stochastic gradient descent (SGD) methods, SMRLS achieves improved learning speed and generalization capability, which are demonstrated by corresponding simulation results.

PaperID: 1435,

Authors: Zhijia Zhao, Jiale Wu, Chaoxu Mu, Yu Liu, Keum-Shik Hong

Affiliations: School of Mechanical and Electrical Engineering, Guangzhou University, Guangzhou, China; School of Electrical and Information Engineering, Tianjin University, Tianjin, China; School of Automation Science and Engineering, South China University of Technology, Guangzhou, China; Institute for Future, School of Automation, Qingdao University, Qingdao, China

Title: Neural-Network-Based Adaptive Fixed-Time Control for a 2-DOF Helicopter System With Input Quantization and Output Constraints

Abstract:
This study proposes a neural-network (NN)-based adaptive fixed-time control method for a two-degree-of-freedom (2-DOF) nonlinear helicopter system with input quantization and output constraints. First, a hysteresis quantizer is employed to mitigate chattering during signal quantization, and adaptive variables are utilized to eliminate errors in the quantization process. Subsequently, the system uncertainties are approximated using a radial basis function NN. Simultaneously, a logarithmic barrier Lyapunov function (BLF) is constructed to prevent the system outputs from violating the constraint boundaries. Based on a rigorous Lyapunov stability analysis and the fixed-time stability criterion, the signals of the closed-loop system are proven to be bounded within a fixed time. Finally, numerical simulations and experiments verified the feasibility of the proposed method.

PaperID: 1436,

Authors: Shan Gao, Guangqian Guo, Hanqiao Huang, C. L. Philip Chen

Affiliations: Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an, China; College of Computer Science and Engineering, South China University of Technology, Guangzhou, China

Title: Go Deep or Broad? Exploit Hybrid Network Architecture for Weakly Supervised Object Classification and Localization

Abstract:
Weakly supervised object classification and localization are learned object classes and locations using only image-level labels, as opposed to bounding box annotations. Conventional deep convolutional neural network (CNN)-based methods activate the most discriminate part of an object in feature maps and then attempt to expand feature activation to the whole object, which leads to deteriorating the classification performance. In addition, those methods only use the most semantic information in the last feature map, while ignoring the role of shallow features. So, it remains a challenge to enhance classification and localization performance with a single frame. In this article, we propose a novel hybrid network, namely deep and broad hybrid network (DB-HybridNet), which combines deep CNNs with a broad learning network to learn discriminative and complementary features from different layers, and then integrates multilevel features (i.e., high-level semantic features and low-level edge features) in a global feature augmentation module. Importantly, we exploit different combinations of deep features and broad learning layers in DB-HybridNet and design an iterative training algorithm based on gradient descent to ensure the hybrid network work in an end-to-end framework. Through extensive experiments on caltech-UCSD birds (CUB)-200 and imagenet large scale visual recognition challenge (ILSVRC) 2016 datasets, we achieve state-of-the-art classification and localization performance.

PaperID: 1437,

Authors: Vasyl P. Martsenyuk, Marcin Bernas, Aleksandra Klos-Witkowska

Affiliations: Department of Computer Science and Automatics, University of Bielsko-Biala, Bielsko-Biała, Poland

Title: On Model of Recurrent Neural Network on a Time Scale: Exponential Convergence and Stability Research

Abstract:
The majority of the results on modeling recurrent neural networks (RNNs) are obtained using delayed differential equations, which imply continuous time representation. On the other hand, these models must be discrete in time, given their practical implementation in computer systems, requiring their versatile utilization across arbitrary time scales. Hence, the goal of this research is to model and investigate the architecture design of a delayed RNN using delayed differential equations on a time scale. Internal memory can be utilized to describe the calculation of the future states using discrete and distributed delays, which is a representation of the deep learning architecture for artificial RNNs. We focus on qualitative behavior and stability study of the system. Special attention is paid to taking into account the effect of the time-scale parameters on neural network dynamics. Here, we delve into the exploration of exponential stability in RNN models on a time scale that incorporates multiple discrete and distributed delays. Two approaches for constructing exponential estimates, including the Hilger and the usual exponential functions, are considered and compared. The Lyapunov–Krasovskii (L–K) functional method is employed to study stability on a time scale in both cases. The established stability criteria, resulting in an exponential-like estimate, utilizes a tuple of positive definite matrices, decay rate, and graininess of the time scale. The models of RNNs for the two-neuron network with four discrete and distributed delays, as well as the ring lattice delayed network of seven identical neurons, are numerically investigated. The results indicate how the time scale (graininess) and model characteristics (weights) influence the qualitative behavior, leading to a transition from stable focus to quasiperiodic limit cycles.

PaperID: 1438,

Authors: Jingyang Huo, Jiali Yu, Min Wang, Zhang Yi, Jinsong Leng, Yong Liao

Affiliations: School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, China; School of Mathematics and Statistics, Ludong University, Yantai, China; College of Computer Science, Sichuan University, Chengdu, China; School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China

Title: Coexistence of Cyclic Sequential Pattern Recognition and Associative Memory in Neural Networks by Attractor Mechanisms

Abstract:
Neural networks are developed to model the behavior of the brain. One crucial question in this field pertains to when and how a neural network can memorize a given set of patterns. There are two mechanisms to store information: associative memory and sequential pattern recognition. In the case of associative memory, the neural network operates with dynamical attractors that are point attractors, each corresponding to one of the patterns to be stored within the network. In contrast, sequential pattern recognition involves the network memorizing a set of patterns and subsequently retrieving them in a specific order over time. From a dynamical perspective, this corresponds to the presence of a continuous attractor or a cyclic attractor composed of the sequence of patterns stored within the network in a given order. Evidence suggests that the brain is capable of simultaneously performing both associative memory and sequential pattern recognition. Therefore, these types of attractors coexist within the neural network, signifying that some patterns are stored as point attractors, while others are stored as continuous or cyclic attractors. This article investigates the coexistence of cyclic attractors and continuous or point attractors in certain nonlinear neural networks, enabling the simultaneous emergence of various memory mechanisms. By selectively grouping neurons, conditions are established for the existence of cyclic attractors, continuous attractors, and point attractors, respectively. Furthermore, each attractor is explicitly represented, and a competitive dynamic emerges among these coexisting attractors, primarily regulated by adjustments to external inputs.

PaperID: 1439,

Authors: Kehan Li, Jihua Zhu, Zhiming Cui, Xinning Chen, Yang Liu, Fan Wang, Yue Zhao

Affiliations: Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, China; School of Software Engineering, Xi’an Jiaotong University, Xi’an, China; School of Biomedical Engineering, ShanghaiTech University, Shanghai, China; School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China; Department of Orthodontics, Stomatological Hospital of Chongqing Medical University, Chongqing, China

Title: A Novel Hierarchical Cross-Stream Aggregation Neural Network for Semantic Segmentation of 3-D Dental Surface Models

Abstract:
Accurate teeth delineation on 3-D dental models is essential for individualized orthodontic treatment planning. Pioneering works like PointNet suggest a promising direction to conduct efficient and accurate 3-D dental model analyses in end-to-end learnable fashions. Recent studies further imply that multistream architectures to concurrently learn geometric representations from different inputs/views (e.g., coordinates and normals) are beneficial for segmenting teeth with varying conditions. However, such multistream networks typically adopt simple late-fusion strategies to combine features captured from raw inputs that encode complementary but fundamentally different geometric information, potentially hampering their accuracy in end-to-end semantic segmentation. This article presents a hierarchical cross-stream aggregation (HiCA) network to learn more discriminative point/cell-wise representations from multiview inputs for fine-grained 3-D semantic segmentation. Specifically, based upon our multistream backbone with input-tailored feature extractors, we first design a contextual cross-steam aggregation (CA) module conditioned on interstream consistency to boost each view’s contextual representation learning jointly. Then, before the late fusion of different streams’ outputs for segmentation, we further deploy a discriminative cross-stream aggregation (DA) module to concurrently update all views’ discriminative representation learning by leveraging a specific graph attention strategy induced by multiview prototype learning. On both public and in-house datasets of real-patient dental models, our method significantly outperformed state-of-the-art (SOTA) deep learning methods for teeth semantic segmentation. In addition, extended experimental results suggest the applicability of HiCA to other general 3-D shape segmentation tasks. The code is available at https://github.com/ladderlab-xjtu/HiCA.

PaperID: 1440,

Authors: Kexin Liu, Yinyan Zhang

Affiliations: College of Cyber Security, Jinan University, Guangzhou, China

Title: Distributed Dynamic Task Allocation for Moving Target Tracking of Networked Mobile Robots Using k-WTA Network

Abstract:
Tasks allocation plays a pivotal role in cooperative robotics. This study proposes a novel fully distributed task allocation method for target tracking, by which mobile robots only need to share state information with communication neighbors. The proposed method adopts a distributed k winners-take-all (k-WTA) network to select the k mobile robots closest to the moving target to perform the target tracking task. In addition, an innovative robot control law is designed, incorporating speed feedback and nonlinear activation functions to achieve finite-time error convergence. Unlike previous approaches, our distributed task allocation method yields finite-time error convergence, does not rely on consensus filters, and eliminates the need for a central computing unit to get the k-WTA result during the control process. We demonstrate the effectiveness of the proposed method through theoretical analysis and simulations. Compared to traditional methods, our method leads to smaller total moving distances and speed norms, which underscores the significance of our method in enhancing the efficiency and performance of mobile robots in dynamic task allocation.

PaperID: 1441,

Authors: Qiwei Liu, Huaicheng Yan, Hao Zhang, Lu Zeng, Chaoyang Chen

Affiliations: Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China; Department of Control Science and Engineering, Tongji University, Shanghai, China; Academy for Engineering and Technology, Fudan University, Shanghai, China; School of Information and Electrical Engineering, Hunan University of Science and Technology, Xiangtan, China

Title: Adaptive Intermittent Pinning Control for Synchronization of Delayed Nonlinear Memristive Neural Networks With Reaction-Diffusion Items

Abstract:
In this article, the global exponential synchronization problem is investigated for a class of delayed nonlinear memristive neural networks (MNNs) with reaction–diffusion items. First, using the Green formula, Lyapunov theory, and proposing a new fuzzy adaptive pinning control scheme, some novel algebraic criteria are obtained to ensure the exponential synchronization of the concerned networks. Furthermore, the corresponding control gains can be promptly adjusted based on the current states of partial nodes of the networks. Besides, a fuzzy adaptive aperiodically intermittent pinning control law is also designed to synchronize the fuzzy MNNs (FMNNs). The controller with intermittent mechanism can obtain appropriate rest time and save energy consumption. Finally, some numerical examples are provided to confirm the effectiveness of the results in this article.

PaperID: 1442,

Authors: Zhihao Hao, Guancheng Wang, Bob Zhang, Zhuowen Feng, Hai-Sheng Li, Fahui Chong, Yan Pan, Wei Li

Affiliations: Department of Computer and Information Science, PAMI Research Group, University of Macau, Macau, China; College of Literature and Journalism, Guangdong Ocean University, Zhanjiang, China; Beijing Key Laboratory of Big Data Technology for Food Safety and the School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, China; China Industrial Control Systems Cyber Emergency Response Team, Beijing, China

Title: A Novel Public Sentiment Analysis Method Based on an Isomerism Learning Model via Multiphase Processing

Abstract:
The dissemination of public opinion in the social media network is driven by public sentiment, which can be used to promote the effective resolution of social incidents. However, public sentiments for incidents are often affected by environmental factors such as geography, politics, and ideology, which increases the complexity of the sentiment acquisition task. Therefore, a hierarchical mechanism is designed to reduce complexity and utilize processing at multiple phases to improve practicality. Through serial processing between different phases, the task of public sentiment acquisition can be decomposed into two subtasks, which are the classification of report text to locate incidents and sentiment analysis of individuals’ reviews. Performance has been improved through improvements to the model structure, such as embedding tables and gating mechanisms. That being said, the traditional centralized structure model is not only easy to form model silos in the process of performing tasks but also faces security risks. In this article, a novel distributed deep learning model called isomerism learning based on blockchain is proposed to address these challenges, the trusted collaboration between models can be realized through parallel training. In addition, for the problem of text heterogeneity, we also designed a method to measure the objectivity of events to dynamically assign the weights of models to improve aggregation efficiency. Extensive experiments demonstrate that the proposed method can effectively improve performance and outperform the state-of-the-art methods significantly.

PaperID: 1443,

Authors: Chenxi Song, Sitian Qin, Zhigang Zeng

Affiliations: School of Artificial Intelligence and Automation and the Key Laboratory of Image Information Processing and Intelligent Control, Ministry of Education of China, Huazhong University of Science and Technology, Wuhan, China; Department of Mathematics, Harbin Institute of Technology, Weihai, China

Title: Multiple Mittag-Leffler Stability of Almost Periodic Solutions for Fractional-Order Delayed Neural Networks: Distributed Optimization Approach

Abstract:
This article proposes new theoretical results on the multiple Mittag–Leffler stability of almost periodic solutions (APOs) for fractional-order delayed neural networks (FDNNs) with nonlinear and nonmonotonic activation functions. Profited from the superior geometrical construction of activation function, the considered FDNNs have multiple APOs with local Mittag–Leffler stability under given algebraic inequality conditions. To solve the algebraic inequality conditions, especially in high-dimensional cases, a distributed optimization (DOP) model and a corresponding neurodynamic solving approach are employed. The conclusions in this article generalize the multiple stability of integer- or fractional-order NNs. Besides, the consideration of the DOP approach can ameliorate the excessive consumption of computational resources when utilizing the LMI toolbox to deal with high-dimensional complex NNs. Finally, a simulation example is presented to confirm the accuracy of the theoretical conclusions obtained, and an experimental example of associative memories is shown.

PaperID: 1444,

Authors: Stephan Naunheim, Yannick Kuhl, David Schug, Volkmar Schulz, Florian Mueller

Affiliations: Department of Physics of Molecular Imaging Systems, Medical Faculty, Institute for Experimental Molecular Imaging, RWTH Aachen University, Aachen, Germany

Title: Improving the Timing Resolution of Positron Emission Tomography Detectors Using Boosted Learning-A Residual Physics Approach

Abstract:
Artificial intelligence (AI) is entering medical imaging, mainly enhancing image reconstruction. Nevertheless, improvements throughout the entire processing, from signal detection to computation, potentially offer significant benefits. This work presents a novel and versatile approach to detector optimization using machine learning (ML) and residual physics. We apply the concept to positron emission tomography (PET), intending to improve the coincidence time resolution (CTR). PET visualizes metabolic processes in the body by detecting photons with scintillation detectors. Improved CTR performance offers the advantage of reducing radioactive dose exposure for patients. Modern PET detectors with sophisticated concepts and read-out topologies represent complex physical and electronic systems requiring dedicated calibration techniques. Traditional methods primarily depend on analytical formulations successfully describing the main detector characteristics. However, when accounting for higher-order effects, additional complexities arise matching theoretical models to experimental reality. Our work addresses this challenge by combining traditional calibration with AI and residual physics, presenting a highly promising approach. We present a residual physics-based strategy using gradient tree boosting and physics-guided data generation. The explainable AI framework SHapley Additive exPlanations (SHAPs) was used to identify known physical effects with learned patterns. In addition, the models were tested against basic physical laws. We were able to improve the CTR significantly (more than 20%) for clinically relevant detectors of 19 mm height, reaching CTRs of 185 ps (450–550 keV).

PaperID: 1445,

Authors: Jingxin Zhang, James Xiao, Maoyin Chen, Xia Hong

Affiliations: Department of Automation and the Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Southeast University, Nanjing, China; School of Information Science and Engineering/School of Artificial Intelligence, China University of Petroleum (Beijing), Beijing, China; Department of Computer Science, School of Mathematical, Physical and Computational Sciences, University of Reading, Reading, U.K

Title: Multimodal Continual Learning for Process Monitoring: A Novel Weighted Canonical Correlation Analysis With Attention Mechanism

Abstract:
Aimed at sequential dynamic modes, a novel multimodal weighted canonical correlation analysis using an attention (MWCCA-A) mechanism is introduced to derive a single model for process monitoring, by integrating two ideas of replay and regularization in continual learning. Under the assumption that data are received sequentially, subsets of data from past modes with dynamic features are selected and stored as replay data, which are utilized together with the current mode data for continual model parameter estimation. The weighted canonical correlation analysis (WCCA) is introduced to achieve appropriate weightings of past modes’ replay data so that the latent variables are extracted by maximizing the weighted correlation with its prediction via the attention mechanism. Specifically, replay data weightings are obtained via the probability density estimation from each mode. This is also beneficial in overcoming data imbalance among multiple modes and consolidating the significant features of past modes further. Alternatively, the proposed model also regularizes parameters based on its previous modes’ importance, which is measured by synaptic intelligence (SI). Meanwhile, the objective is decoupled into a regularization-related part and a replay-related part, to overcome the potentially unstable optimization trajectory of SI-based continual learning. In comparison with several multimode monitoring methods, the effectiveness of the proposed MWCCA-A approach is demonstrated by a continuous stirred tank heater (CSTH), Tennessee Eastman process (TEP), and a practical coal pulverizing system.

PaperID: 1446,

Authors: Antoine Ledent, Petr Kasalický, Rodrigo Alves, Hady W. Lauw

Affiliations: School of Computing and Information Sciences (SCIS), Singapore Management University, Singapore; Department of Applied Mathematics, Faculty of Information Technology, Czech Technical University in Prague, Prague, Czech Republic

Title: Conv4Rec: A 1-by-1 Convolutional Autoencoder for User Profiling Through Joint Analysis of Implicit and Explicit Feedback

Abstract:
We introduce a new convolutional autoencoder architecture for user modeling and recommendation tasks with several improvements over the state of the art. First, our model has the flexibility to learn a set of associations and combinations between different interaction types in a way that carries over to each user and item. Second, our model is able to learn jointly from both the explicit ratings and the implicit information in the sampling pattern (which we refer to as ”implicit feedback”). It can also make separate predictions for the probability of consuming content and the likelihood of granting it a high rating if observed. This not only allows the model to make predictions for both the implicit and explicit feedback, but also increases the informativeness of the predictions: in particular, our model can identify items that users would not have been likely to consume naturally, but would be likely to enjoy if exposed to them. Finally, we provide several generalization bounds for our model, which, to the best of our knowledge, are among the first generalization bounds for autoencoders in a Recommender systems setting; we also show that optimizing our loss function guarantees the recovery of the exact sampling distribution over interactions up to a small error in total variation. In experiments on several real-life datasets, we achieve state-of-the-art performance on both the implicit and explicit feedback prediction tasks despite relying on a single model for both, and benefiting from additional interpretability in the form of individual predictions for the probabilities of each possible rating.

PaperID: 1447,

Authors: Xiaoqiang Liao, Dong Wang, Xinguo Ming, Min Xia

Affiliations: School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China; School of Engineering, Lancaster University, Lancaster, U.K.

Title: DLCNN: A Deep Logic Convolutional Network for Interpretable Fault Diagnosis of Hoist Mechanism on Ship-to-Shore Cranes

Abstract:
The fault diagnosis of hoist mechanisms in ship-to-shore cranes (STSCs) is paramount for maintaining shipping schedules and ensuring personnel safety at ports. Although deep networks have achieved some success in diagnosing faults in hoist mechanisms, their opaque nature often precludes them from providing trustworthy explanations for their decisions. To address this problem, this article introduces a deep logic convolutional neural network (DLCNN), which incorporates two symbolic languages (confidence and classification rules) to visualize how convolutional neural networks (CNNs) work. Confidence rules are extracted from logic convolutions (LCs). In the LC, confidence rules are designed from three perspectives—information loss, the tradeoff between soundness and interpretability, and quantitative reasoning—to provide a comprehensive understanding of the feature learning and reasoning of stacked convolutions. Besides, classification rules are extracted from CNN’s full-connected layers to elucidate implicit relationships between fault features and labels. Our experimental investigations on an STSC testbed demonstrate that DLCNNs have powerful performance in fault recognition, interpretability, and potential engineering value.

PaperID: 1448,

Authors: Kun Li, Guangtao Ran, Yanning Guo, Ju H. Park, Yao Zhang

Affiliations: Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, China; Department of Electrical Engineering, Yeungnam University, Gyeongsan, South Korea; Department of Mechanical Engineering, University College London, London, U.K.

Title: Joint Trajectory Replanning for Mars Ascent Vehicle Under Propulsion System Faults: A Suboptimal Learning-Based Warm-Start Approach

Abstract:
This article presents a suboptimal joint trajectory replanning (SJTR) method for Mars ascent vehicle (MAV) launch missions under propulsion system faults. Conventional step-by-step trajectory replanning may fail to make timely decisions, risking mission failure. The SJTR method formulates a joint convex optimization problem of target orbit and flight trajectory after a fault. By applying penalty coefficients for terminal constraints, it adheres to the orbit redecision principles, enabling a concise and rapid solution. To further enhance the convergence and the accuracy of orbit-type determination, a learning-based warm-start scheme is proposed. Offline, a deep neural network (DNN) is trained with data generated by various trajectory replanning methods following the redecision principles. Online, the DNN provides initial guesses for the time optimization variables based on the fault scenario. Numerical simulations on mass flow rate and specific impulse drops validate the reliability of the proposed method, demonstrating at least 49.5% higher computational efficiency compared with the upgrading and downgrading replanning methods.

PaperID: 1449,

Authors: Jorge Paz-Ruza, Amparo Alonso-Betanzos, Bertha Guijarro-Berdiñas, Brais Cancela, Carlos Eiras-Franco

Affiliations: LIDIA Group-CITIC, Universidade da Coruña, A Coruñ, Spain

Title: Beyond RMSE and MAE: Introducing EAUC to Unmask Hidden Bias and Unfairness in Dyadic Regression Models

Abstract:
Dyadic regression models, which output real-valued predictions for pairs of entities, are fundamental in many domains [e.g., obtaining user-product ratings in recommender systems (RSs)] and promising and under exploration in others (e.g., tuning patient–drug dosages in precision pharmacology). In this work, we prove that nonuniform observed value distributions of individual entities lead to severe biases in state-of-the-art models, skewing predictions toward the average of observed past values for the entity and providing worse-than-random predictive power in eccentric yet crucial cases; we name this phenomenon eccentricity bias. We show that global error metrics like root-mean-squared error (RMSE) are insufficient to capture this bias, and we introduce eccentricity area under the curve (EAUC) as a novel metric that can quantify it in all studied domains and models. We prove the intuitive interpretation of EAUC by experimenting with naive post-training bias corrections and theorize other options to use EAUC to guide the construction of fair models. This work contributes a bias-aware evaluation of dyadic regression to prevent unfairness in critical real-world applications of such systems.

PaperID: 1450,

Authors: Zeyu Zhou, Yuhui Wang, Qingxian Wu

Affiliations: College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangning, China

Title: An Advanced Optimal Tracking Control for Nonlinear Discrete-Time Systems Based on (N + 1)-Step Gradient Learning

Abstract:
In this article, to address the issue of accelerating convergence performance and eliminating the tracking error, an advanced optimal control method for nonlinear discrete-time systems is investigated based on an improved N-step [( N +1 )-step] gradient learning algorithm. Independent of the discount factor, this article introduces a novel tracking error index without quadratic input terms for the steady-state and convergence performances, which obtains the optimal control policy without calculating the reference control input. Compared with classic N-step gradient learning algorithms with infinite future reward assumption, the proposed algorithm investigates the (N +1)-step return with a fixed N and a step forward for finite tracking problems based on a long-term weighting parameter. Based on the above theory, value iteration (VI) and policy iteration (PI) methods are utilized to derive the convergence, monotonicity, optimality, and stability properties of the proposed algorithm, which can be conducted without the traditional assumption of zero initial functions. In the implementation of the algorithms, the actor–critic structure, constructed by four neural networks, is established to approximate the states, the value functions, and the control policy, respectively. Three simulation experiments on a helicopter system validate the efficacy and practicality of the control methods in addressing nonlinear optimal tracking challenges.

PaperID: 1451,

Authors: Wuzhida Bao, Yuting Cao, Yin Yang, Shiping Wen

Title: Long Short-Term Financial Time Series Forecasting Based on Residual Multiscale TCN Sparse Expert Network and Informer

Abstract:
Due to the inherent high volatility and complexity of financial markets, traditional time series forecasting models face numerous challenges in handling both short- and long-term predictions in the stock market. Most traditional neural network-based financial prediction models are limited to short-term forecasting and struggle to capture long-term trends and global dependencies in the market fully. To address this, we propose a novel network architecture called ResMMoT-Informer. This model combines the strengths of the residual multiscale temporal convolutional network (TCN) sparse expert network (ResMMoT) and the Informer, enabling it to effectively capture multiscale local features and global dependencies in the stock market. ResMMoT achieves stable training through a residual structure and a sparse multiscale TCN expert network, allowing it to flexibly model complex temporal features and learn trends across different time-step scales. Meanwhile, the Informer optimizes long-sequence forecasting performance through an improved self-attention mechanism. Additionally, we introduce the wavelet noise reduction (WNR) method, further enhancing the model’s robustness and prediction accuracy. In the experimental section, ablation experiments first validate the effectiveness and necessity of the proposed strategies and network structure. Subsequent comparison experiments on the NASDAQ100 dataset demonstrate that ResMMoT-Informer excels in both long- and short-term time series forecasting tasks in the stock market, with significantly better prediction accuracy and generalization ability than existing models. Compared to other popular neural network-based financial forecasting models, ResMMoT-Informer leads in prediction accuracy, time robustness, and interpretability, showcasing its cutting-edge advantage in contemporary research.

PaperID: 1452,

Authors: Qian Kang, Dengxiu Yu, Bowen Xu, Zhen Wang

Affiliations: School of the Cybersecurity, Northwestern Polytechnical University, Xi’an, China; School of Artificial Intelligence, Optics and Electronics, Northwestern Polytechnical University, Xi’an, China

Title: Deterministic Convergence Analysis and Application of Elman Neural Network via Sparse Mechanism and Entropy Error Function

Abstract:
In this study, we employed the batch gradient method to investigate the monotonicity and convergence of the Elman neural network (ENN) based on the entropy error function (EEF) and regularization methods. This enhances network stability and sparsity while also boosting its ability to generalize. Traditional mean square error (mse) functions in complex networks often result in slower convergence during training, prone-to-local minima, and even incorrect saturation issues. To address this drawback, we propose a novel EEF for training ENN, effectively avoiding the problem of learning speed degradation. Furthermore, by leveraging smoothing group L_1/2 regularization (\text SGL_1/2) methods in studying ENN based on EEF, we effectively overcome the drawbacks of traditional group L_1/2 regularization (\text GL_1/2) leading to error function oscillations. In addition, we optimize the network architecture effectively in two key ways: reducing redundant nodes to near 0 and driving redundant weights toward 0 for remaining nodes, further boosting network sparsity. This article rigorously proves the monotonicity of the error function, alongside presenting strong and weak convergence outcomes for the novel method. The effectiveness and correctness of our approach are clearly illustrated through experimental results. The simulation results align with the theoretical findings.

PaperID: 1453,

Authors: Dingsen Zhang, Yingwei Zhang, Kaicheng Shang, Xianwen Gao

Affiliations: College of Information Science and Engineering, Northeastern University, Shenyang, China

Title: Multivariable Collaborative Modeling With Knowledge Transfer and Its Application in Soft Sensing of Iron Flotation Grade

Abstract:
In the iron flotation production process, production stages often undergo updates due to equipment upgrades, changes in raw materials, and other reasons. The operating condition prediction model established based on data from previous production stages may not meet the requirements of the new stage, resulting in a significant waste of collected datasets. Data-driven models established using small samples collected during the current stage may lack accuracy due to the limited sample size. This study proposes a method based on knowledge transfer to effectively leverage a large amount of outdated data. It allows for the rapid establishment of a new model that aligns with production requirements while minimizing the need for additional data collection. In previous tailings grade soft sensors, more emphasis was placed on quality parameters such as flotation froth features, often overlooking production process parameters. To enhance model accuracy, we introduce a multivariate collaborative modeling approach. The experimental results and industrial applications validate the effectiveness of this method.

PaperID: 1454,

Authors: Jules Rostand, Chen-Chien James Hsu, Cheng-Kai Lu

Affiliations: Department of Electrical Engineering, National Taiwan Normal University, Taipei, Taiwan

Title: Adaptive Locality Guidance: Using Locality Guidance to Initialize the Learning of Vision Transformers on Tiny Datasets

Abstract:
While we keep working toward leveraging the benefits of vision transformers (VTs) on small datasets, convolutional neural networks (CNNs) still remain the choice of preference when extensive training data is unavailable. As studies show that lack of sufficient data leads VTs to mainly learn global information from the input, the recently proposed locality guidance (LG) approach uses a lightweight CNN pretrained on the same dataset to guide the VT into learning local features as well. Under a dual learning framework, the use of the LG significantly boosts the accuracy of different VTs on multiple tiny datasets, at the mere cost of a slight increase in training time. However, we also find that the use of the LG prevents the models from learning global aspects to their full ability, sometimes leading to worsened performances compared to the original baselines. In order to overcome this limitation, we propose the adaptive LG (ALG), an improved version which uses the LG as an initialization tool, and after a certain number of epochs lets the VT learn by itself in a supervised fashion. Specifically, we estimate the needed duration for the LG based on a threshold set on the evolution of the distance separating the features of the VT from those of the lightweight CNN used for guidance. Since our improved method can be used in a plug-and-play fashion, we successfully apply it across ten different VTs, and five different datasets. Experimental results show that the proposed ALG significantly reduces the computational cost added in training by the LG (by 37%~64%), and further increases the validation accuracy by up to 6.71%.

PaperID: 1455,

Authors: Zhongli Wang, Guohui Tian, Shijie Guo

Affiliations: School of Artificial Intelligence and Data Science, Hebei University of Technology, Tianjin, China; School of Control Science and Engineering, Shandong University, Jinan, China; School of Mechanical Engineering, Hebei University of Technology, Tianjin, China

Title: GMM Enabled by Multimodal Information Fusion Network for Detection and Motion Planning of Robotic Liquid Pouring

Abstract:
When humans perform pouring tasks, they exhibit consistent accuracy, regardless of the liquid type, container, or environmental conditions. This proficiency stems from their ability to effectively utilize both vision and hearing while also considering various factors. However, in the domain of robotic liquid pouring, the combination of multimodal information is effectively rarely leveraged to accomplish automatic control of robotic liquid pouring. To address this limitation, a multimodal information fusion network (MMFNet) is designed for estimating liquid height and pouring state. The MMFNet employs cross-attention networks and motion features to enhance visual features (VFs). Subsequently, multimodal transformers are utilized to fuse audio features with the enhanced VFs, enabling the MMFNet to estimate both liquid height and pouring state accurately. Finally, the detection results are combined with demonstration learning to make robots learn pouring motion trajectory encoded by the Gaussian mixture model (GMM). The experimental results demonstrate the effectiveness of MMFNet in significantly improving the detection accuracy of liquid height and pouring state. Furthermore, by employing the GMM enabled by MMFNet, robots can acquire robust pouring motion planning, enhancing their capabilities in performing pouring tasks.

PaperID: 1456,

Authors: Qingyang Dai, Chunhui Zhao, Biao Huang

Affiliations: State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou, China; Department of Chemical and Materials Engineering, University of Alberta, Edmonton, AB, Canada

Title: M2D-VAE: Self-Supervised Probabilistic Temporal-Spatial Latent Representation Learning for Unsupervised Industrial Operational Applications Under Missing Value Interference

Abstract:
Due to sensor malfunctions and data transmission corruptions, the industrial process data collected commonly contain missing values. It poses a significant challenge for data-driven approaches in aggregating temporal-spatial correlations that reflect dependencies across both variables and times, which makes it difficult to directly carry out downstream industrial operational applications. In this study, a self-supervised representation learning model is proposed to extract probabilistic temporal-spatial latent variables (LVs) from sequential data under missing value interference. The extracted LVs can be utilized for typical industrial operational applications through a unified framework. First, a novel deep dynamic probabilistic latent variable model, named Markov dynamic variational autoencoder (MD-VAE), is proposed to explicitly model the temporal-spatial dependencies between LVs. The latent posteriors are Bayesian smoothed by global sequence information for effective variational inference (VI). Second, a self-supervised learning approach, termed masked MD-VAE (M2D-VAE), is proposed to address the challenge of directly extracting temporal-spatial LVs under missing value interference. Controllable constraints with practical interpretations are introduced to balance the latent bottleneck capacity with reconstruction accuracy during model optimization. A unified framework is proposed to utilize the latent representations for typical industrial downstream tasks. Case studies conducted on a real-world multiphase flow process demonstrate the superiority of M2D-VAE in unsupervised industrial operational applications including missing value imputation and dynamic process monitoring under missing value interference.

PaperID: 1457,

Authors: Dongdong Chen, Mengjun Liu, Zhenrong Shen, Linlin Yao, Xiangyu Zhao, Zhiyun Song, Haolei Yuan, Qian Wang, Lichi Zhang

Affiliations: School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China; GeneScience Pharmaceutical Company Ltd., Shanghai, China; School of Biomedical Engineering, ShanghaiTech University, Shanghai, China

Title: Exploring Multiconnectivity and Subdivision Functions of Brain Network via Heterogeneous Graph Network for Cognitive Disorder Identification

Abstract:
Brain serves as a critical cornerstone of human intelligence, which involves a series of complex neuropsychological activities that lead to the coordination of various functions in the brain network. In recent years, brain network analysis methods based on graph neural networks (GNNs) have attracted increasing attention for the identification of brain disorders. However, these methods generally assume that the brain network is a homogeneous graph while ignoring its heterogeneity among human brain activities, which is reflected in both the complex connectivity of the brain network and distinctive brain functions. To overcome this problem, we propose a heterogeneous subdivision GNN (HSGNN), which captures the heterogeneous connections and functions of the brain network simultaneously. Specifically, we first employ two fundamental brain connectivity patterns to capture both statistical dependency and directional information flow among different brain regions and construct a heterogeneous brain connectivity network for each subject. Then, we develop a functional subdivision method that encodes brain networks into multiple latent feature subspaces corresponding to heterogeneous brain functions and extracts features of brain networks accordingly. Considering the intricate interactions of brain functions to facilitate cognitive activities within the brain network, we further employ the self-attention mechanism to obtain comprehensive representations of brain networks in a joint latent space. Finally, we propose a composite loss function to train the model for obtaining the heterogeneous brain network representation, which can be utilized for disease classification. The experimental results in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and Autism Brain Imaging Data Exchange (ABIDE) datasets demonstrate that our method outperforms several state-of-the-art (SOTA) methods to identify different types of brain cognitive-related disorders.

PaperID: 1458,

Authors: Weihua Xu, Yang Zhang, Yuhua Qian

Affiliations: College of Artificial Intelligence, Southwest University, Chongqing, China; Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China

Title: A Novel Unsupervised Feature Selection for High-Dimensional Data Based on FCM and k-Nearest Neighbor Rough Sets

Abstract:
Large amounts of high-dimensional unlabeled data typically contain only a small portion of truly effective information. Consequently, the issue of unsupervised feature selection methods has gained significant attention in research. However, current unsupervised feature selection approaches face limitations when dealing with datasets that exhibit uneven density, and they also require substantial computational time. To address this problem, this research article proposes a feature extraction technique that combines the Fuzzy C-Means (FCM) and k-nearest neighbor rough sets. FCM is a clustering algorithm grounded in fuzzy theory, which takes into account the inherent data structure and the correlations between different features. Consequently, FCM is particularly well-suited for datasets with uneven density. Our proposed method consists of three steps. First, the FCM algorithm is used to cluster the unlabeled data. Second, a measure that evaluates the importance of features is defined and sorted based on the clustering results. Finally, redundant features are filtered using k-nearest neighbor rough sets while retaining important features, significantly reducing the running time. In addition, we designed the feature selection algorithm (KND-UFS) and conducted experiments on 12 public datasets. We compared KND-UFS with eight existing algorithms in terms of running time, classification accuracy, and the number of selected features. The experimental results provided strong evidence supporting the superior performance of the KND-UFS algorithm.

PaperID: 1459,

Authors: Hakan Cevikalp, Hasan Saribas, Bedirhan Uzun

Affiliations: Department of Electrical and Electronics Engineering, Eskişehir Osmangazi University, Eskişehir, Türkiye; AIE Department, Huawei Türkiye Research and Development Center, Istanbul, Türkiye

Title: Reaching Nirvana: Maximizing the Margin in Both Euclidean and Angular Spaces for Deep Neural Network Classification

Abstract:
The classification loss functions used in deep neural network classifiers can be split into two categories based on maximizing the margin in either Euclidean or angular spaces. Euclidean distances between sample vectors are used during classification for the methods maximizing the margin in Euclidean spaces whereas the Cosine similarity distance is used during the testing stage for the methods maximizing the margin in the angular spaces. This article introduces a novel classification loss that maximizes the margin in both the Euclidean and angular spaces at the same time. This way, the Euclidean and Cosine distances will produce similar and consistent results and complement each other, which will in turn improve the accuracies. The proposed loss function enforces the samples of classes to cluster around the centers that represent them. The centers approximating classes are chosen from the boundary of a hypersphere, and the pair-wise distances between class centers are always equivalent. This restriction corresponds to choosing centers from the vertices of a regular simplex inscribed in a hypersphere. The proposed loss function can be effortlessly applied to classical classification problems as there is a single hyperparameter that must be set by the user, and setting this parameter is straightforward. Additionally, the proposed method can effectively reject test samples from unfamiliar classes by measuring their distances from the known class centers, which are compactly clustered around their corresponding centers. Therefore, the proposed technique is especially suitable for open set recognition problems. Despite its simplicity, experimental studies have demonstrated that the proposed method outperforms other techniques in both open set recognition and classical classification problems. Interested individuals can access the source code for the proposed approach at https://github.com/Cevikalp/dsc.

PaperID: 1460,

Authors: Junwei Sun, Jinjiang Wang, Shiping Wen, Yingcong Wang, Yanfeng Wang

Affiliations: School of Electrical and Information Engineering, Zhengzhou University of Light Industry, Zhengzhou, China; Australian AI Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia

Title: Neural Network Circuits for Bionic Associative Memory and Temporal Order Memory Based on DNA Strand Displacement

Abstract:
Pavlovian associative memory plays an important role in our daily life and work. The realization of Pavlovian associative memory at the deoxyribonucleic acid (DNA) molecular level will promote the development of biological computing and broaden the application scenarios of neural networks. In this article, bionic associative memory and temporal order memory circuits are constructed by DNA strand displacement (DSD) reactions. First, a temporal logic gate is constructed on the basis of DSD circuit and extended to a three-input temporal logic gate. The output of temporal logic gate is used for the weight species of associative memory. Second, the forgetting module and output module based on the DSD circuit are constructed to realize some functions of associative memory, including associative memory with simultaneous stimulus, associative memory with interstimulus interval effect, and the facilitation by intermittent stimulus. In addition, the coding, storage, and retrieval modules are designed based on the analysis and memory capabilities of temporal logic gate for temporal information. The temporal order memory circuit is constructed, demonstrating the temporal order memory ability of DNA circuit. Finally, the reliability of the circuit is verified through Visual DSD software simulation. Our work provides ideas and inspiration to construct more complex DNA bionic circuits and intelligent circuits by using DSD technology.

PaperID: 1461,

Authors: Meng Lou, Shu Zhang, Hong-Yu Zhou, Sibei Yang, Chuan Wu, Yizhou Yu

Affiliations: School of Computing and Data Science, The University of Hong Kong, Hong Kong, SAR, China; Artificial Intelligence Laboratory, Deepwise Healthcare, Beijing, China; Department of Biomedical Informatics, Harvard Medical School, Boston, CA, USA; School of Information Science and Technology, ShanghaiTech University, Shanghai, China

Title: TransXNet: Learning Both Global and Local Dynamics With a Dual Dynamic Token Mixer for Visual Recognition

Abstract:
Recent studies have integrated convolutions into transformers to introduce inductive bias and improve generalization performance. However, the static nature of conventional convolution prevents it from dynamically adapting to input variations, resulting in a representation discrepancy between convolution and self-attention as self-attention calculates attention matrices dynamically. Furthermore, when stacking token mixers that consist of convolution and self-attention to form a deep network, the static nature of convolution hinders the fusion of features previously generated by self-attention into convolution kernels. These two limitations result in a suboptimal representation capacity of the constructed networks. To find a solution, we propose a lightweight dual dynamic token mixer (D-Mixer) to simultaneously learn global and local dynamics, that is, mechanisms that compute weights for aggregating global contexts and local details in an input-dependent manner. D-Mixer works by applying an efficient global attention module and an input-dependent depthwise convolution separately on evenly split feature segments, endowing the network with strong inductive bias and an enlarged effective receptive field. We use D-Mixer as the basic building block to design TransXNet, a novel hybrid CNN–transformer vision backbone network that delivers compelling performance. In the ImageNet-1K image classification task, TransXNet-T surpasses Swin-T by 0.3% in top-1 accuracy while requiring less than half of the computational cost. Furthermore, TransXNet-S and TransXNet-B exhibit excellent model scalability, achieving top-1 accuracy of 83.8% and 84.6%, respectively, with reasonable computational costs. In addition, our proposed network architecture demonstrates strong generalization capabilities in various dense prediction tasks, outperforming other state-of-the-art networks while having lower computational costs. Code is publicly available at https://github.com/LMMMEng/TransXNet.

PaperID: 1462,

Authors: Quan Qian, Jun Luo, Yi Qin

Affiliations: State Key Laboratory of Mechanical Transmission for Advanced Equipment, College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing, China

Title: Adaptive Intermediate Class-Wise Distribution Alignment: A Universal Domain Adaptation and Generalization Method for Machine Fault Diagnosis

Abstract:
Many transfer learning methods have been proposed to implement fault transfer diagnosis, and their loss functions are usually composed of task-related losses, distribution distance losses, and correlation regularization losses. The intrinsic parameters and trade-off parameters between losses, however, need to be tuned according to the specific diagnosis tasks; thus, the generalization abilities of these methods in multiple tasks are limited. Besides, the alignment goal of most domain adaptation (DA) mechanisms dynamically changes during the training process, which will result in loss oscillation, slow convergence and poor robustness. To overcome the above-mentioned issues, a novel and simple transfer learning diagnosis method named adaptive intermediate class-wise distribution alignment (AICDA) model is proposed, and it is established via the proposed AICDA mechanism, dynamic intermediate alignment (DIA) adaptive layer and AdaSoftmax loss. The AICDA mechanism develops an adaptive intermediate distribution as the alignment goal of multiple source domains and target domains, and it can simultaneously align the global and class-wise distributions of these domains. The DIA layer is designed to adaptively achieve domain confusion without the distribution distance loss and the correlation regularization loss. Meanwhile, to ensure the classification performance of the AICDA mechanism, AdaSoftmax loss is proposed for boosting the separability of Softmax loss. Finally, in order to evaluate the effectiveness and universality of the AICDA diagnosis model to the most degree, various multisource mixed fault transfer diagnosis tasks of wind turbine planetary gearboxes, including DA and domain generalization (DG), are implemented, and the experimental results indicate that our proposed AICDA model has a higher diagnosis accuracy and a stronger generalization ability than other state-of-the-art transfer learning methods.

PaperID: 1463,

Authors: Haoen Huang, Zhigang Zeng

Affiliations: School of Artificial Intelligence and Automation and the Key Laboratory of Image Processing and Intelligent Control of Education Ministry of China, Huazhong University of Science and Technology, Wuhan, China

Title: An Accelerated Approach on Adaptive Gradient Neural Network for Solving Time-Dependent Linear Equations: A State-Triggered Perspective

Abstract:
To improve the acceleration performance, a hybrid state-triggered discretization (HSTD) is proposed for the adaptive gradient neural network (AGNN) for solving time-dependent linear equations (TDLEs). Unlike the existing approaches that use an activation function or a time-varying coefficient for acceleration, the proposed HSTD is uniquely designed from a control theory perspective. It comprises two essential components: adaptive sampling interval state-triggered discretization (ASISTD) and adaptive coefficient state-triggered discretization (ACSTD). The former addresses the gap in acceleration methods related to the variable sampling period, while the latter considers the underlying evolutionary dynamics of the Lyapunov function to determine coefficients greedily. Finally, compared with commonly used discretization methods, the acceleration performance and computational advantages of the proposed HSTD are substantiated by the numerical simulations and applications to robotics.

PaperID: 1464,

Authors: David Mulvey, Chuan Heng Foh, Muhammad Ali Imran, Rahim Tafazolli

Affiliations: G/G Innovation Centre, Institute for Communications Systems, University of Surrey, Guildford, U.K.; School of Engineering, University of Glasgow, Glasgow, U.K.

Title: Use of Parallel Explanatory Models to Enhance Transparency of Neural Network Configurations for Cell Degradation Detection

Abstract:
In a previous paper, we have shown that a recurrent neural network (RNN) can be used to detect cellular network radio signal degradations accurately. We unexpectedly found, though, that accuracy gains diminished as we added layers to the RNN. To investigate this, in this article, we build a parallel model to illuminate and understand the internal operation of neural networks (NNs), such as the RNN, which store their internal state to process sequential inputs. This model is widely applicable in that it can be used with any input domain where the inputs can be represented by a Gaussian mixture. By looking at RNN processing from a probability density function (pdf) perspective, we are able to show how each layer of the RNN transforms the input distributions to increase detection accuracy. At the same time we also discover a side effect acting to limit the improvement in accuracy. To demonstrate the fidelity of the model, we validate it against each stage of RNN processing and output predictions. As a result, we have been able to explain the reasons for RNN performance limits with useful insights for future designs for RNNs and similar types of NN.

PaperID: 1465,

Authors: Jiaping Xiao, Phumrapee Pisutsin, Mir Feroskhan

Affiliations: School of Mechanical and Aerospace Engineering, Nanyang Technological University, Jurong West, Singapore

Title: Collaborative Target Search With a Visual Drone Swarm: An Adaptive Curriculum Embedded Multistage Reinforcement Learning Approach

Abstract:
Equipping drones with target search capabilities is highly desirable for applications in disaster rescue and smart warehouse delivery systems. Multiple intelligent drones that can collaborate with each other and maneuver among obstacles show more effectiveness in accomplishing tasks in a shorter amount of time. However, carrying out collaborative target search (CTS) without prior target information is extremely challenging, especially with a visual drone swarm. In this work, we propose a novel data-efficient deep reinforcement learning (DRL) approach called adaptive curriculum embedded multistage learning (ACEMSL) to address these challenges, mainly 3-D sparse reward space exploration with limited visual perception and collaborative behavior requirements. Specifically, we decompose the CTS task into several subtasks including individual obstacle avoidance, target search, and inter-agent collaboration, and progressively train the agents with multistage learning. Meanwhile, an adaptive embedded curriculum (AEC) is designed, where the task difficulty level (TDL) can be adaptively adjusted based on the success rate (SR) achieved in training. ACEMSL allows data-efficient training and individual-team reward allocation for the visual drone swarm. Furthermore, we deploy the trained model over a real visual drone swarm and perform CTS operations without fine-tuning. Extensive simulations and real-world flight tests validate the effectiveness and generalizability of ACEMSL. The project is available at https://github.com/NTU-UAVG/CTS-visual-drone-swarm.git.

PaperID: 1466,

Authors: Xiaomeng Gao, Yajing Li, Xinyi Liu, Yinlin Ye, Hongtao Fan

Affiliations: College of Science, Northwest A&F University, Yangling, Shaanxi, China

Title: Stability Analysis of Fractional Bidirectional Associative Memory Neural Networks With Multiple Proportional Delays and Distributed Delays

Abstract:
This article investigates the finite-time stability of a class of fractional-order bidirectional associative memory neural networks (FOBAMNNs) with multiple proportional and distributed delays. Different from the existing Gronwall integral inequality with single proportional delay ( N = 1 ), we establish the Gronwall integral inequality with multiple proportional delays for the first time in the case of N \geq 2 . Since the existing fractional-order single-constant delay Gronwall inequality with two different orders cannot be directly applied to the stability analysis of the aforementioned system, initially, we skillfully develop a novel one with generalized fractional multiproportional delays’ Gronwall inequalities of different orders. Furthermore, combined with the newly constructed generalized inequality, the stability criteria of FOBAMNNs with fractional orders 0 < \alpha < 1 and 1 < \alpha < 2 under weaker conditions, i.e., at most linear growth and linear growth conditions rather than the global Lipschitz condition, are given respectively. Finally, numerical experiments verify the effectiveness of the proposed method.

PaperID: 1467,

Authors: Mengran Li, Yong Zhang, Shaofan Wang, Yongli Hu, Baocai Yin

Affiliations: Department of Information Science, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China

Title: Redundancy is Not What You Need: An Embedding Fusion Graph Auto-Encoder for Self-Supervised Graph Representation Learning

Abstract:
Attribute graphs are a crucial data structure for graph communities. However, the presence of redundancy and noise in the attribute graph can impair the aggregation effect of integrating two different heterogeneous distributions of attribute and structural features, resulting in inconsistent and distorted data that ultimately compromises the accuracy and reliability of attribute graph learning. For instance, redundant or irrelevant attributes can result in overfitting, while noisy attributes can lead to underfitting. Similarly, redundant or noisy structural features can affect the accuracy of graph representations, making it challenging to distinguish between different nodes or communities. To address these issues, we propose the embedded fusion graph auto-encoder framework for self-supervised learning (SSL), which leverages multitask learning to fuse node features across different tasks to reduce redundancy. The embedding fusion graph auto-encoder (EFGAE) framework comprises two phases: pretraining (PT) and downstream task learning (DTL). During the PT phase, EFGAE uses a graph auto-encoder (GAE) based on adversarial contrastive learning to learn structural and attribute embeddings separately and then fuses these embeddings to obtain a representation of the entire graph. During the DTL phase, we introduce an adaptive graph convolutional network (AGCN), which is applied to graph neural network (GNN) classifiers to enhance recognition for downstream tasks. The experimental results demonstrate that our approach outperforms state-of-the-art (SOTA) techniques in terms of accuracy, generalization ability, and robustness.

PaperID: 1468,

Authors: Yuchen He, Xueqin Yang, Lijuan Qian, Le Yao, Lingjian Ye, Ping Wu, Gangyue Ye, Weirong Ye, Yafang Shen

Affiliations: Key Laboratory of Intelligent Manufacturing Quality Big Data Tracing and Analysis of Zhejiang Province, China Jiliang University, Hangzhou, China; School of Mathematics, Hangzhou Normal University, Hangzhou, China; School of Engineering, Huzhou University, Huzhou, China; School of Information Science and Engineering, Zhejiang Sci-Tech University, Hangzhou, China; Zhejiang Public Information Industry Company Ltd., Hangzhou, China

Title: Soft Sensing for Time Series With Irregular Sampling Internals Based on a Denoising Interval Attention LSTM Network

Abstract:
The prediction of key quality variables plays an important role in industrial status identification and monitoring. Due to process disturbance and hard device limitation, data collection in modern industries often exhibits high noise and irregular data sampling. To solve the above problems, this article proposes a stacked supervised and reconstructed input denoising autoencoder integrated with internal attention long short-term memory (SSRDAE-IALSTM) network for soft sensing modeling. First, a stacked supervised and reconstructed input denoising autoencoder (SSRDAE) is designed. Compared with the original DAE, each supervised and reconstructed input DAE (SRDAE) can simultaneously reconstruct the process data and quality data at the output layer, aiming to reduce information loss and extract quality-related features. Second, the denoised features are fed into the interval attention LSTM (IALSTM) to adjust the influence of different historical samples on the current sample in irregular sampling data to capture long-term temporal features. Finally, performance validations are carried out on an industrial debutanizer column and a penicillin fermentation process. The experimental results show that the proposed model can enhance the learning ability of process features and obtain better prediction performance than other comparison methods.

PaperID: 1469,

Authors: Xia-An Bi, Wenzhuo Shen, Yinglu Shan, Dayou Chen, Luyun Xu, Ke Chen, Zhonghua Liu

Affiliations: College of Information Science and Engineering and the Key Laboratory of Computing and Stochastic Mathematics (Ministry of Education), School of Mathematics and Statistics, Hunan Normal University, Changsha, China; College of Business, Hunan Normal University, Changsha, China; College of Information Science and Engineering, Hunan Normal University, Changsha, China

Title: MSAFF: Multi-Way Soft Attention Fusion Framework With the Large Foundation Models for the Diagnosis of Alzheimer's Disease

Abstract:
Complementary information in multi-omics data are crucial for understanding the pathogenesis of Alzheimer’s Disease (AD). However, existing studies face challenges in addressing the high-level noise and heterogeneity in multi-omics data. This article presents a novel approach that combines large foundation models (LFMs) with soft attention mechanisms to enhance, select, and fuse multi-omics features, thereby improving the performance of disease classification. Specifically, we first propose a mathematical model based on soft attention mechanisms. This model employs multi-head attention (MHA) and self-attention (SA) for feature selection, and uses cross-attention (CA) for feature fusion. Then, a multi-way soft attention fusion framework (MSAFF) with LFMs is proposed. In this approach, biomedical LFMs are used to construct low-noise biomedical features. The multi-way soft attention algorithm implements effective feature selection and fusion described in the mathematical model. Experimental results on the public imaging genetics datasets demonstrate the advanced performances of MSAFF in both disease classification and AD-related pathogeny discrimination. This article provides intelligent support for the diagnosis and pathogenesis research of AD. Our code can be accessed at github.com/fmri123456/MSAFF.

PaperID: 1470,

Authors: Wenjie Tian, Xu Guo, Min Xu, Xiangpeng Zhang

Affiliations: School of Marine Science and Technology, Tianjin University, Tianjin, China; School of Mechanical Engineering, Tianjin University, Tianjin, China

Title: Pose Error Prediction, Compensation Method, and Applicable Condition Determination of Parallel Motion Platform Based on Transfer Learning

Abstract:
Collecting a large amount of measured configuration data for robots entails high costs and time, which restricts the widespread use of neural networks for robot error prediction and compensation. In this study, a “transfer network” is established by taking the motion transmission characteristics inherent in the ideal kinematic model as prior knowledge and transferring it to a network trained based on the actual poses. Compared with the traditional back propagation (BP) neural network trained by actual poses alone, the transfer network shows significant performance advantages, effectively solving the problems of low prediction accuracy and weak generalization ability in the case of the small-sample measured data. Considering this, a method for determining the applicability of transfer learning is proposed. This method is achieved by evaluating the similarity between the learning tasks and then revealing the coupling effect of task similarity and the sample number of actual poses on the performance of the transfer network. Experiments are conducted on a six degrees of freedom (6-DOF) parallel robot. The results verify the superiority of transfer learning applied in robot precision compensation and the effectiveness of the proposed determination method.

PaperID: 1471,

Authors: You Zhao, Xing He, Mingliang Zhou, Junzhi Yu, Tingwen Huang

Affiliations: Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, College of Electronic and Information Engineering, Southwest University, Chongqing, China; School of Computer Science, Chongqing University, Chongqing, China; Department of Advanced Manufacturing and Robotics, College of Engineering, State Key Laboratory for Turbulence and Complex Systems, Peking University, Beijing, China; Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, China

Title: Distributed Projection Neurodynamic Approaches in Continuous and Discrete Time for BP With Block Decomposition of Measurement Matrix

Abstract:
Aiming at the situation where the measurement matrix B has a flexible block decomposition, this article designs two novel distributed continuous- and discrete-time projection neurodynamic approaches to solve the basis pursuit (BP) problem for sparse recovery. These approaches only require information from each flexible block of the measurement matrix B, rather than from each row, column, or the entire matrix. First, with the aid of the primal–dual dynamical approach, projection operator, and second-order multiagent consensus condition, a novel distributed projection neurodynamic approach in continuous time (DPNA-CT-B) is proposed, and its optimality and global asymptotic stability are rigorously proved. Moreover, based on the forward and backward Euler methods and variable substitution methods, a corresponding distributed projection neurodynamic approach in discrete time (DPNA-DT-B) is designed. Finally, through sparse signal and image reconstruction experiments, the effectiveness and superiority of the proposed neurodynamic approaches are verified.

PaperID: 1472,

Authors: Junwei Sun, Haojie Wang, Yi Yue, Dan Ling, Yanfeng Wang

Affiliations: School of Electrical and Information Engineering, Zhengzhou University of Light Industry, Zhengzhou, China

Title: Design of Hopfield Neural Network Based on DNA Strand Displacement Circuits and Its Application in Sudoku Conjecture

Abstract:
In recent years, biological neural networks have developed rapidly due to their advantages of fast parallel computing processing speed and strong fault tolerance. This article is dedicated to explore innovation in this field and successfully constructing a Hopfield neural network model based on DNA strand displacement (DSD) circuits. First, this article constructs four core functional modules based on DSD, including an encoder module, weighted sum module, comparator module, and decoder module. These functional modules together form the design foundation of the DSD circuit, achieving effective circuit construction. Second, the construction of the Hopfield neural network is achieved through DSD circuits. The construction of this network achieves the integration of DSD technology and neural networks. Finally, the Sudoku conjecture problem is solved through the neural network. This article conducts a simulation in visual DSD, which verifies the feasibility of Sudoku conjecture. Our work integrates DSD technology with neural networks and uses them to solve practical problems. This fusion broadens the research field of neural networks and demonstrates the potential of biotechnology in practical applications.

PaperID: 1473,

Authors: Hanguang Su, Yi Cui, Huaguang Zhang, Xiangpeng Xie, Xiaodong Liang, Jiawei Wang

Affiliations: School of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China; School of Information Science and Engineering and the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang, China; School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, China; Department of Electrical and Computer Engineering, University of Saskatchewan, Saskatoon, SK, Canada; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Northeastern University, Shenyang, China

Title: Adaptive Secure Finite-Time Optimal Control of Unknown Nonlinear Systems With State Constraints via Generalized Fuzzy Hyperbolic Models

Abstract:
In this article, a novel adaptive critic learning (ACL) framework is constructed for a class of nonzero-sum (NZS) differential games problem of unknown continuous-time (CT) nonlinear systems with state constraints. First, generalized fuzzy hyperbolic model (GFHM)-based identifiers are established to reconstruct the unknown system dynamics. Then, under the ACL framework, a critic network with secure finite-time experience replay turning law is developed for each player to acquire the Nash equilibrium point solution in finite time while the finite-time stability is guaranteed via Lyapunov analysis. Meanwhile, the persistence of excitation (PE) condition is no longer needed in this work, by introducing an easy-to-check rank condition. Furthermore, by incorporating the immediate cost function associated with each player and the control barrier function (CBF), the algorithm ensures that the system states evolve in a secure environment. Finally, two numerical examples are presented to demonstrate the validity of the developed scheme.

PaperID: 1474,

Authors: Lei Wang, Huaguang Zhang, Jinhai Liu, Fengyuan Zuo

Affiliations: College of Information Science and Engineering, Northeastern University, Shenyang, China

Title: Knowledge Transfer and Reinforcement Based on Biunbiased Neural Network: A Novel Solution for Open-Set Fault Transfer Diagnosis

Abstract:
Fault transfer diagnosis is a key technology to ensure the reliability and safety of industrial systems, the core of which is to identify the health status of the equipment among different working conditions with multiclassification methods. However, most of them are based on a closed-set assumption that the label space among different working conditions is consistent, which is hard to satisfy in a practical industrial environment as unknown faults would inevitably occur during operation, i.e., the open-set fault transfer diagnosis (OSFTD) problem. Moreover, during the transfer process, unnecessary source-specific knowledge tends to be adapted, which brings about biased diagnostics on both domain and category. Aiming at this issue, an OSFTD framework, coined as knowledge transfer and reinforcement based on biunbiased neural network (KTR-BUNN), is proposed. First, a domain-unbiased knowledge transfer subnet is proposed, including an uncertainty-aware fault transferability evaluator (FTE) that estimates the transferability of target-domain samples unbiasedly to guide distribution alignment of known faults and a triple-tier unknown fault separator (UFS) that takes transferability as the criterion to extrapolate unknown faults. Second, a class-unbiased knowledge reinforcement subnet is designed to promote the recognition of fault semantic features at the embedding space, where fault knowledge graphs (FKGs) are constructed to describe the relationships between fault types, and they are optimized by a contrastive fault correlation loss, so that fine-grained class-level fault features can be further aligned. The knowledge transfer and knowledge reinforcement mechanisms work jointly to facilitate the performance of OSFTD. Finally, extensive experimental results conducted on diverse diagnostic tasks illustrate the superiority of the proposed KTR-BUNN.

PaperID: 1475,

Authors: John Sum, Chi-Sing Leung, Janet C. C. Chang

Affiliations: Institute of Technology Management, National Chung Hsing University, Taichung, Taiwan; Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China

Title: Analysis and Design of a Distributed kWTA With Application in Sealed-Bid Auctions With Bidding Price Privacy Protection

Abstract:
This article presents a distributed k-winner-take-all (kWTA) with application in sealed-bid auctions with bidding price privacy protection. The proposed kWTA is in essence a distributed network of n agents which are arbitrarily connected. Let \aleph _i be the set of neighbor agents of the ith agent, u_i , x_i , and z_i are, respectively, its input, state variable, and output. The dynamics of the ith agent is given by ((dx_i(t))/dt) = \tau \left \ z_i(x_i(t)) - (k/n) - \beta \sum _j\in \aleph _i (x_i(t) - x_j(t)) \right \, z_i(x_i(t)) = h(u_i-x_i(t)), \text for~i = 1, \ldots , n where \beta \gt 0 , k is the number of winners and h(\cdot) is the Heaviside function. By the theory of discontinuous dynamic systems, it is shown that the state equation for d\mathbf x(t)/dt could be formulated as a gradient differential inclusion which minimizes the following nonsmooth convex function. V(\mathbf x) = \sum _i=1^n \max \0, u_i - x_i\ + (k/n) \sum _i=1^n x_i + (\beta /2)\mathbf x^T \mathbf L \mathbf x where \mathbf x = (x_1, \ldots , x_n)^n and \mathbf L \in R^n× n is the graph Laplacian matrix. A sufficient condition for \beta is derived for the kWTA giving correct output and the condition is then applied in showing that \mathbf z(t) converges to the correct output in finite-time. If \beta \rightarrow \infty and x_1(0) = \cdots = x_n(0) , we further show that x_1(t) = \cdots = x_n(t) for t \geq 0 , and both \mathbf z(t) and \mathbf x(t) converge in finite-time. Besides, x_i converges to u_\pi _n-k+1 (resp. u_\pi _n-k ) if x_i(0) \gg 1 (resp. x_i(0) = 0) for i = 1, \ldots , n . If the input u_i is set to be the bid price of the ith bidder and k = 1 , the proposed kWTA is able to determine both the winners and the clearing price for a sealed-bid first (resp. second) price auction in a distributed manner. Once \mathbf z(t) and \mathbf x(t) converge, each bidder can reveal from: 1) z_i if he/she is a winner and 2) x_i the clearing price. As bidders do not have to disclose their bidding prices during the winner (resp. the clearing price) determination process, the loosing (resp. winning) bidding price privacy can be protected in a sealed-bid first (resp. second) price auction. It is insofar the first application of an kWTA beyond the winner’s determination.

PaperID: 1476,

Authors: Wenquan Zhang, Fei Zhao, Bo Feng, Xuesong Mei

Affiliations: State Key Laboratory for Manufacturing System Engineering, Xi’an Jiaotong University, Xi’an, China

Title: A Reinforcement Learning Control Framework Based on Scalable Graph Transformer for Large-Scale Fuzzy Job Shop Scheduling Problems

Abstract:
The job shop scheduling problem (JSSP) is a classic NP-hard problem. This article focuses on a realistic variant of the JSSP incorporating fuzzy processing times, with the objective of minimizing the maximum completion time. We propose a proximal policy optimization with graph transformer (GT-PPO) algorithm, which leverages proximal policy optimization (PPO) as the foundational framework, to address this problem for the first time. First, the intricate variability in states and actions often leads to suboptimal scheduling outcomes. To address this, we refine the representation of states and actions for improved performance. Second, to overcome inherent limitations of conventional graph neural networks (GNNs)—including difficulty in handling heterogeneity, over-squashing, and limited ability to capture long-range dependencies—we employ a graph transformer (GT) architecture for the first time in this study. These transformers effectively capture both the topological relationships in fuzzy disjunctive graph models and the long-range dependencies in large-scale JSSP instances. Additionally, we also reduce the computational complexity of the GT to O(n) , enabling the agent to derive optimal scheduling solutions for large disjunctive graphs more efficiently, with reduced memory usage. Finally, the testing results demonstrate the strong robustness of our model across various scales of generated instances and public datasets after a single training session. Notably, on large-scale DMU and Taillard public datasets, the model exhibited exceptional robustness, further validating its effectiveness in addressing large-scale fuzzy JSSP.

PaperID: 1477,

Authors: Jinping Liu, Sheng Chen, Meiling Cai, Haidong Shao, Weihua Gui

Affiliations: College of Information Science and Engineering, Hunan Normal University, Changsha, China; College of Mechanical and Vehicle Engineering, Hunan University, Changsha, China; School of Automation, Central South University, Changsha, China

Title: Semi-Heterogeneous Graph-Perception Network With Gradient-Weighted Class Activation Mapping for Class-Incremental Industrial Fault Recognition and Root Cause Diagnosis

Abstract:
Modern industrial systems often operate under complex dynamics and strict reliability constraints, demanding a timely and precise fault diagnosis with efficient root cause analysis to ensure operational safety and minimize downtime. However, the inherent uncertainties and complexities of industrial processes present significant challenges for conventional diagnostic approaches. Specifically, even minor anomalies can escalate into critical incidents, while process uncertainties frequently induce distribution shifts, leading to novel fault types that complicate fault detection and diagnosis. To address these challenges, this article proposes a novel industrial flow topology-induced semi-heterogeneous graph perception network (IFT-SHGPN) model for class-incremental fault diagnosis of complex industrial processes. By embedding the physical topology of industrial processes into a semi-heterogeneous graph perception network (SHGPN) and incorporating gradient-weighted class activation mapping (Grad-CAM), the proposed approach demonstrates strong capability in effective class-incremental fault recognition and interpretable root cause analysis. Rigorous experiments on the Tennessee Eastman process (TEP) and a multiphase flow facility process under various operation conditions showcase the superiority of IFT-SHGPN over existing methods. The proposed approach achieves high diagnostic accuracy for both historical and emerging fault categories while enabling efficient root cause identification with low computational overhead, making it particularly suitable for deployment in resource-constrained industrial environments.

PaperID: 1478,

Authors: Yingjie Tang, Shou Feng, Chunhui Zhao, Yongqi Chen, Zhiyong Lv, Weiwei Sun

Affiliations: College of Information and Communication Engineering, Harbin Engineering University, Harbin, China; School of Computer Science and Engineering, Shaanxi Key Laboratory of Network Computing and Security Technology, Xi’an University of Technology, Xi’an, China; Department of Geography and Spatial Information Techniques, Ningbo University, Ningbo, Zhejiang, China

Title: A Semantic Change Detection Network Based on Boundary Detection and Task Interaction for High-Resolution Remote Sensing Images

Abstract:
Semantic change detection (CD) not only helps pinpoint the locations where changes occur, but also identifies the specific types of changes in land cover and land use. Currently, the mainstream approach for semantic CD (SCD) decomposes the task into semantic segmentation (SS) and CD tasks. Although these methods have achieved good results, they do not consider the incentive effect of task correlation on the entire model. Given this issue, this article further elucidates the SCD task through the lens of multitask learning theory and proposes a semantic change detection network based on boundary detection and task interaction (BT-SCD). In BT-SCD, the boundary detection (BD) task is introduced to enhance the correlation between the SS task and the CD task in SCD, thereby promoting positive reinforcement between SS and CD tasks. Furthermore, to enhance the communication of information between the SS and CD tasks, the pixel-level interaction strategy and the logit-level interaction strategy are proposed. Finally, to fully capture the temporal change information of the bitemporal features and eliminate their temporal dependency, a bidirectional change feature extraction module is proposed. Extensive experimental results on three commonly used datasets and a nonagriculturalization dataset (NAFZ) show that our BT-SCD achieves state-of-the-art performance. The code is available at https://github.com/TangYJ1229/BT-SCD

PaperID: 1479,

Authors: Zhuorui Li, Jun Ma, Jiande Wu, Pak Kin Wong, Xiaodong Wang, Xiang Li

Affiliations: Faculty of Information Engineering and Automation and the Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, China; School of Engineering and the Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming, China; Department of Electromechanical Engineering, University of Macau, Macau, China

Title: A Gated Recurrent Generative Transfer Learning Network for Fault Diagnostics Considering Imbalanced Data and Variable Working Conditions

Abstract:
Transfer learning (TL) and generative adversarial networks (GANs) have been widely applied to intelligent fault diagnosis under imbalanced data and different working conditions. However, the existing data synthesis methods focus on the overall distribution alignment between the generated data and real data, and ignore the fault-sensitive features in the time domain, which results in losing convincing temporal information for the generated signal. For this reason, a novel gated recurrent generative TL network (GRGTLN) is proposed. First, a smooth conditional matrix-based gated recurrent generator is proposed to extend the imbalanced dataset. It can adaptively increase the attention of fault-sensitive features in the generated sequence. Wasserstein distance (WD) is introduced to enhance the construction of mapping relationships to promote data generation ability and transfer performance of the fault diagnosis model. Then, an iterative “generation-transfer” co-training strategy is developed for continuous parallel training of the model and the parameter optimization. Finally, comprehensive case studies demonstrate that GRGTLN can generate high-quality data and achieve satisfactory cross-domain diagnosis accuracy.

PaperID: 1480,

Authors: Jianghong Zhou, Yi Qin

Affiliations: State Key Laboratory of Mechanical Transmission for Advanced Equipment, Chongqing University, Chongqing, China

Title: A Continuous Remaining Useful Life Prediction Method With Multistage Attention Convolutional Neural Network and Knowledge Weight Constraint

Abstract:
The rotating machinery is continuously monitored in practical application. However, the historical life-cycle data cannot be always preserved due to the limited storage resource; meanwhile, the on-site computing platform cannot process a large number of monitoring samples. It brings a great challenge for the remaining useful life (RUL) prediction. Thus, continuous learning (CL) is introduced into RUL prediction model for achieving its knowledge accumulation and dynamic update. To improve the performance of continuous RUL prediction, this article presents a new RUL prediction methodology with a multistage attention convolutional neural network (MSACNN) and knowledge weight constraint (KWC). First, an improved multihead full-channel sight self-attention (MFCSSA) mechanism is proposed to capture the global degradation information across all channels. MSACNN is then constructed by embedding MFCSSA, squeeze-and-excitation (SE) mechanism, and convolutional block attention module (CBAM) into different stages of feature extraction, which enables it to capture the global degradation information and refine the feature representations progressively. The KWC mechanism based on the importance of weight parameters and gradient information is proposed and integrated into MSACNN to achieve the continuous RUL prediction task. The proposed KWC can effectively alleviate catastrophic forgetting in CL. Finally, the experimental results on the life-cycle bearing and gear datasets demonstrate that MSACNN has a higher accuracy than the existing prediction methods. Moreover, the KWC mechanism performs better than typical CL methods in retaining the previously learned knowledge while acquiring the new task knowledge. Therefore, the proposed methodology can be better applied to the continuous RUL prediction tasks than the advanced methods of the same kind.

PaperID: 1481,

Authors: He Wang, Yang Xu, Zebin Wu, Zhihui Wei

Affiliations: School of Computer Science and Engineering, Nanjing University of Science and Technology (NJUST), Nanjing, China; School of Computer Science and Engineering, NJUST, Nanjing, China

Title: Unsupervised Hyperspectral and Multispectral Image Blind Fusion Based on Deep Tucker Decomposition Network With Spatial-Spectral Manifold Learning

Abstract:
Hyperspectral image (HSI) and multispectral image (MSI) fusion aims to generate high spectral and spatial resolution hyperspectral image (HR-HSI) by fusing high-resolution multispectral image (HR-MSI) and low-resolution hyperspectral image (LR-HSI). However, existing fusion methods encounter challenges such as unknown degradation parameters, and incomplete exploitation of the correlation between high-dimensional structures and deep image features. To overcome these issues, in this article, an unsupervised blind fusion method for LR-HSI and HR-MSI based on Tucker decomposition and spatial–spectral manifold learning (DTDNML) is proposed. We design a novel deep Tucker decomposition network that maps LR-HSI and HR-MSI into a consistent feature space, achieving reconstruction through decoders with shared parameters. To better exploit and fuse spatial–spectral features in the data, we design a core tensor fusion network (CTFN) that incorporates a spatial–spectral attention mechanism for aligning and fusing features at different scales. Furthermore, to enhance the capacity to capture global information, a Laplacian-based spatial–spectral manifold constraint is introduced in shared-decoders. Sufficient experiments have validated that this method enhances the accuracy and efficiency of hyperspectral and multispectral fusion on different remote sensing datasets. The source code is available at https://github.com/Shawn-H-Wang/DTDNML.

PaperID: 1482,

Authors: Kenji Kashima, Ryota Yoshiuchi, Ran Wang, Yu Kawano

Affiliations: Graduate School of Informatics, Kyoto University, Kyoto, Japan; Graduate School of Advanced Science and Engineering, Hiroshima University, Higashihiroshima, Japan

Title: A Unified Framework for Dynamics Modeling and Control Design Using Deep Learning With Side Information on Stabilizability

Abstract:
This article presents a unified framework for dynamics modeling and control design using deep learning, focusing on incorporating prior side information on stabilizability. Control theory provides systematic techniques for designing feedback systems while ensuring fundamental properties such as stabilizability, which are crucial for practical control applications. However, conventional data-driven approaches often overlook or struggle to explicitly incorporate such control properties into learned models. To address this, we introduce a novel neural network (NN)-based approach that concurrently learns the system dynamics, a stabilizing feedback controller, and a Lyapunov function for the closed-loop system, thus explicitly guaranteeing stabilizability in the learned model. Our proposed deep learning framework is versatile and applicable across a wide range of control problems, including safety control, L_2 -gain control, passivation, and solutions to Hamilton-Jacobi inequalities. By embedding stabilizability as a core property within the learning process, our method allows for the development of learned models that are not only data-driven but also grounded in control-theoretic guarantees, greatly enhancing their utility in real-world control applications. This article includes examples that demonstrate the effectiveness of this approach, showcasing the stability and control performance improvements achieved in various control scenarios. The methods proposed in this article can be easily applied to modeling without control design. The code has been open-sourced and is available at https://github.com/kashctrl/Deep_Stabilizable_Models.

PaperID: 1483,

Authors: Yuru Guo, Zidong Wang, Jun-Yi Li, Yong Xu

Title: Nonfragile Impulsive State Estimation for Complex Networks With Markovian Switching Topologies Subject to Limited Bit Rate Constraints

Abstract:
In this article, we consider the impulsive estimation problem for a specific category of discrete-time complex networks (CNs) characterized by Markovian switching topologies. The measurement outputs of the underlying CNs, transmitted to the observer over wireless networks, are subject to bit rate constraints. To effectively reduce the estimation error and enhance estimation performance, a mode-dependent impulsive observer is proposed that employs the impulse mechanism. The application of stochastic analysis techniques leads to the derivation of a sufficient condition for ensuring the mean-square boundedness of the estimation error dynamics. The upper bound of the error is then analyzed by iteratively exploring the Lyapunov relation at both impulsive and non-impulsive instants. Moreover, an optimization algorithm is presented for handling the bit rate allocation, which is coupled with the design of desired observer gains using the linear matrix inequality (LMI) approach. Within this theoretical framework, the relationship between the mean-square estimation performance and the bit rate allocation protocol is further elucidated. Finally, a simulation example is provided to demonstrate the validity and effectiveness of the proposed impulsive estimation approach.

PaperID: 1484,

Authors: Jian Huang, Xiaoyang Sun, Steven X. Ding, Xu Yang, Okan K. Ersoy

Affiliations: Key Laboratory of Knowledge Automation for Industrial Processes of Ministry of Education, School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China; Institute of Automatic Control and Complex Systems (AKS), University of Duisburg-Essen, Duisburg, Germany; Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA

Title: Variational Discriminative Stacked Auto-Encoder: Feature Representation Using a Prelearned Discriminator, and Its Application to Industrial Process Monitoring

Abstract:
In deep-learning-based process monitoring, obtaining an effective feature representation is a critical step in constructing a reliable deep-learning monitoring model. Conventional deep-learning methods like stacked auto-encoders (SAEs) capture feature representation by minimizing the data reconstruction errors, which lack the expression of essential information and ultimately lead to degradation of the monitoring performance. To solve this problem, variational discriminative SAE (VDSAE) is proposed in this article. First, a variational generative discriminative structure is designed to obtain a reliable prelearned discriminator. Based on this new variational discriminator, the authenticity of the reconstructed data is evaluated as an important criterion for feature learning. Then, an SAE incorporating the prelearned discriminator is trained by both minimizing the reconstruction error and maximizing the data authenticity. In this way, the prelearned discriminator makes the network effectively capture the essential expression of the reconstructed data. The proposed approach enables SAE to learn a better feature representation owing to the excellent reconstruction performance. Finally, the feature representation and fault detection performance of VDSAE are verified in two cases. The results show that the average fault detection rates (FDRs) of the multiphase flow facility and the waste-water treatment process (WWTP) can be improved to 72% and 97%, respectively, compared with the other fault detection methods.

PaperID: 1485,

Authors: Wensheng Tang, Hang Cai, Lin Xiao, Yongjun He, Linju Li, Qiuyue Zuo, Jichun Li

Affiliations: Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing and MOE-LCSM, Hunan Normal University, Changsha, China; School of Computing, Newcastle University, Newcastle upon Tyne, U.K

Title: A Predefined-Time Adaptive Zeroing Neural Network for Solving Time-Varying Linear Equations and Its Application to UR5 Robot

Abstract:
Time-varying linear equations (TVLEs) play a fundamental role in the engineering field and are of great practical value. Existing methods for the TVLE still have issues with long computation time and insufficient noise resistance. Zeroing neural network (ZNN) with parallel distribution and interference tolerance traits can mitigate these deficiencies and thus are good candidates for the TVLE. Therefore, a new predefined-time adaptive ZNN (PTAZNN) model is proposed for addressing the TVLE in this article. Unlike previous ZNN models with time-varying parameters, the PTAZNN model adopts a novel error-based adaptive parameter, which makes the convergence process more rapid and avoids unnecessary waste of computational resources caused by large parameters. Moreover, the stability, convergence, and robustness of the PTAZNN model are rigorously analyzed. Two numerical examples reflect that the PTAZNN model possesses shorter convergence time and better robustness compared with several variable-parameter ZNN models. In addition, the PTAZNN model is applied to solve the inverse kinematic solution of UR5 robot on the simulation platform CoppeliaSim, and the results further indicate the feasibility of this model intuitively.

PaperID: 1486,

Authors: Yifei Wang, Pengju Ding, Congjing Wang, Shiyue He, Xin Gao, Bin Yu

Affiliations: School of Data Science, Qingdao University of Science and Technology of China, Qingdao, China; Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia; School of Data Science, Qingdao University of Science and Technology, Qingdao, China

Title: RPI-GGCN: Prediction of RNA-Protein Interaction Based on Interpretability Gated Graph Convolution Neural Network and Co-Regularized Variational Autoencoders

Abstract:
RNA-protein interactions (RPIs) play an important role in several fundamental cellular physiological processes, including cell motility, chromosome replication, transcription and translation, and signaling. Predicting RPI can guide the exploration of cellular biological functions, intervening in diseases, and designing drugs. Given this, this study proposes the RPI-gated graph convolutional network (RPI-GGCN) method for predicting RPI based on the gated graph convolutional neural network (GGCN) and co-regularized variational autoencoder (Co-VAE). First, different types of feature information were extracted from RNA and protein sequences by nine feature extraction methods. Second, Co-VAEs are used to eliminate the redundancy of fused features and generate optimal features. Finally, this study introduces gated cyclic units into graph convolutional networks (GCNs) to construct a model for RPI prediction, which efficiently extracts topological information and improves the model’s interpretable feature learning and expression capabilities. In the fivefold cross-validation test, the RPI-GGCN method achieved prediction accuracies of 97.27%, 97.32%, 96.54%, 95.76%, and 94.98% on the RPI369, RPI488, RPI1446, RPI1807, and RPI2241 datasets. To test the generalization performance of the model, we used the model trained on RPI369 to predict the independent NPInter v3.0 dataset and achieved excellent performance in all six independent validation sets. By visualizing the RPI network graph based on the prediction results, we aim to provide a new perspective and reference for studying RPI mechanisms and exploring new RPIs. Extensive experimental results demonstrate that RPI-GGCN can provide an efficient, accurate, and stable RPI prediction method.

PaperID: 1487,

Authors: Xueqian Fu, Chunyu Zhang, Yan Xu, Youmin Zhang, Hongbin Sun

Affiliations: College of Information and Electrical Engineering, China Agricultural University, Beijing, China; School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang, Singapore; Department of Mechanical, Industrial and Aerospace Engineering, Concordia University, Montreal, QC, Canada; Department of Electrical Engineering, State Key Laboratory of Power Systems, Tsinghua University, Beijing, China

Title: Statistical Machine Learning for Power Flow Analysis Considering the Influence of Weather Factors on Photovoltaic Power Generation

Abstract:
It is generally accepted that the impact of weather variation is gradually increasing in modern distribution networks with the integration of high-proportion photovoltaic (PV) power generation and weather-sensitive loads. This article analyzes power flow using a novel stochastic weather generator (SWG) based on statistical machine learning (SML). The proposed SML model, which incorporates generative adversarial networks (GANs), probability theory, and information theory, enables the generation and evaluation of simulated hourly weather data throughout the year. The GAN model captures various weather variation characteristics, including weather uncertainties, diurnal variations, and seasonal patterns. Compared to shallow learning models, the proposed deep learning model exhibits significant advantages in stochastic weather simulation. The simulated data generated by the proposed model closely resemble real data in terms of time-series regularity, integrity, and stochasticity. The SWG is applied to model PV power generation and weather-sensitive loads. Then, we actively conduct a power flow analysis (PFA) on a real distribution network in Guangdong, China, using simulated data for an entire year. The results provide evidence that the GAN-based SWG surpasses the shallow machine learning approach in terms of accuracy. The proposed model ensures accurate analysis of weather-related power flow and provides valuable insights for the analysis, planning, and design of distribution networks.

PaperID: 1488,

Authors: Quan Liu, Mincheng Cai, Kun Chen, Qingsong Ai, Li Ma

Affiliations: School of Information Engineering, Wuhan University of Technology, Wuhan, China

Title: Reconstruction of Adaptive Leaky Integrate-and-Fire Neuron to Enhance the Spiking Neural Networks Performance by Establishing Complex Dynamics

Abstract:
Since digital spiking signals can carry rich information and propagate with low computational consumption, spiking neural networks (SNNs) have received great attention from neuroscientists and are regarded as the future development object of neural networks. However, generating the appropriate spiking signals remains challenging, which is related to the dynamics property of neurons. Most existing studies imitate the biological neurons based on the correlation of synaptic input and output, but these models have only one time constant, thus ignoring the structural differentiation and versatility in biological neurons. In this article, we propose the reconstruction of adaptive leaky integrate-and-fire (R-ALIF) neuron to perform complex behaviors similar to real neurons. First, a synaptic cleft time constant is introduced into the membrane voltage charging equation to distinguish the leakage degree between the neuron membrane and the synaptic cleft, which can expand the representation space of spiking neurons to facilitate SNNs to obtain better information expression way. Second, R-ALIF constructs a voltage threshold adjustment equation to balance the firing rate of output signals. Third, three time constants are transformed into learnable parameters, enabling the adaptive adjustment of dynamics equation and enhancing the information expression ability of SNNs. Fourth, the computational graph of R-ALIF is optimized to improve the performance of SNNs. Moreover, we adopt a temporal dropout (TemDrop) method to solve the overfitting problem in SNNs and propose a data augmentation method for neuromorphic datasets. Finally, we evaluate our method on CIFAR10-DVS, ASL-DVS, and CIFAR-100, and achieve top1 accuracy of 81.0%, 99.8%, and 67.83%, respectively, with few time steps. We believe that our method will further promote the development of SNNs trained by spatiotemporal backpropagation (STBP).

PaperID: 1489,

Authors: Guoqiang Tan, Zhanshan Wang

Affiliations: State Key Laboratory of Synthetical Automation for Process Industries (SAPI) and the College of Information Science and Engineering, Northeastern University, Shenyang, China

Title: Stability Analysis of Recurrent Neural Networks With Time-Varying Delay Based on a Flexible Negative-Determination Quadratic Function Method

Abstract:
This brief investigates the stability problem of recurrent neural networks (RNNs) with time-varying delay. First, by introducing some flexibility factors, a flexible negative-determination quadratic function method is proposed, which contains some existing methods and has less conservatism. Second, some integral inequalities and the flexible negative-determination quadratic function method are used to give an accurate upper bound of the Lyapunov–Krasovskii functional (LKF) derivative. As a result, a less conservative stability criterion of delayed RNNs is derived, whose effectiveness and superiority are finally illustrated through two numerical examples.

PaperID: 1490,

Authors: Pinzhuo Tian, Qiubo Ma, Hang Yu, Jie Lu

Affiliations: School of Computer Engineering and Science, Shanghai University, Shanghai, China; Australian Artificial Intelligence Institute,, University of Technology Sydney, Sydney, NSW, Australia

Title: ReCL: A Plug-and-Play Module for Enhancing Generalized Category Discovery Using Transport-Based Method to Uncover the Relationship in Samples

Abstract:
Deep learning systems excel in closed-set environments but face challenges in open-set settings due to mismatched label spaces between training and test data. Generalized category discovery (GCD) is one of such real-world open-set learning problems. In GCD, given a dataset, only a subset of samples is labeled. The model is expected to simultaneously classify samples from labeled and unlabeled classes. Contrastive learning plays a critical role in solving the GCD problem, used to learn discriminative features for samples. However, in contrast to labeled data, due to the absence of label information, unlabeled samples rely solely on unsupervised contrastive loss to learn discriminated features by keeping different views of the same data consistent. Unfortunately, this approach often overlooks the relationships within unlabeled samples. In this article, we propose a relationship-based contrastive learning (ReCL) module. In ReCL, we use a transport-based assignment method to find appropriate samples for each unlabeled data point. Then, a prototype-based fusion method is applied to merge these selected samples, creating a positive anchor in contrastive learning that helps pull the unlabeled samples closer to the corresponding positive anchor. Extensive experimental evaluation across different domains demonstrates that our method can be seamlessly integrated with various existing GCD models and further improve them to achieve the state-of-the-art performance across different benchmarks. Notably, we also analyze the sample selection process between our transport-based method and the cosine similarity-based method. The results show that our method provides samples that contain semantic similarity while offering greater diversity.

PaperID: 1491,

Authors: Wei Qian, Yanmin Wu, Zidong Wang

Affiliations: School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo, China; Department of Computer Science, Brunel University London, Uxbridge, U.K.

Title: New WTOD Protocol-Based Fault Detection Filter Design for Interval Type-2 Fuzzy Systems via an Adaptive Differential Evolution Algorithm

Abstract:
This article is concerned with the design problem of an H_\infty optimal fault detection (FD) filter for networked interval type-2 (IT2) fuzzy systems that are subjected to stochastic cyberattacks. To effectively reduce the utilization of constrained network resources, a new dynamically adjusted event-triggered weighted try-once-discard (DAET-WTOD) protocol is developed, in which two adaptive rules are constructed based on the measured output and the probability of denial-of-service (DoS) attacks. Furthermore, a fuzzy switched-like FD filter is designed with the purpose of detecting system fault signals, while simultaneously considering the DAET-WTOD protocol and stochastic cyberattacks. Subsequently, by utilizing an imperfect premise matching (IPM) scheme, an opposition-based learning adaptive differential evolution algorithm is proposed to deal with the networked IT2 fuzzy systems. This algorithm is capable of iteratively searching the membership function values of the fuzzy filter in real time, thereby achieving improved H_\infty performance. Finally, some simulation results are provided to verify the feasibility and advantages of the proposed H_\infty optimal FD technique.

PaperID: 1492,

Authors: Xiaofeng Yuan, Zhenzhen Jia, Zijian Xu, Nuo Xu, Lingjian Ye, Kai Wang, Yalin Wang, Chunhua Yang, Weihua Gui, Feifan Shen

Affiliations: School of Automation, Central South University, Changsha, China; School of Engineering, Huzhou University, Huzhou, China; School of Information Science and Engineering, Ningbo Institute of Technology, Zhejiang University, Ningbo, China

Title: Hierarchical Self-Attention Network for Industrial Data Series Modeling With Different Sampling Rates Between the Input and Output Sequences

Abstract:
For industrial processes, it is significant to carry out the dynamic modeling of data series for quality prediction. However, there are often different sampling rates between the input and output sequences. For the most traditional data series models, they have to carefully select the labeled sample sequence to build the dynamic prediction model, while the massive unlabeled input sequences between labeled samples are directly discarded. Moreover, the interactions of the variables and samples are usually not fully considered for quality prediction at each labeled step. To handle these problems, a hierarchical self-attention network (HSAN) is designed for adaptive dynamic modeling. In HSAN, a dynamic data augmentation is first designed for each labeled step to include the unlabeled input sequences. Then, a self-attention layer of variable level is proposed to learn the variable interactions and short-interval temporal dependencies. After that, a self-attention layer of sample level is further developed to model the long-interval temporal dependencies. Finally, a long short-term memory network (LSTM) network is constructed to model the new sequence that contains abundant interactions for quality prediction. The experiment on an industrial hydrocracking process shows the effectiveness of HSAN.

PaperID: 1493,

Authors: Juncai He

Affiliations: Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia

Title: On the Optimal Expressive Power of ReLU DNNs and Its Application in Approximation With the Kolmogorov Superposition Theorem

Abstract:
This article is devoted to studying the optimal expressive power of rectified linear unit (ReLU) deep neural networks (DNNs) and its application in approximation via the Kolmogorov superposition theorem (KST). We first constructively prove that any continuous piecewise linear (CPwL) functions on [0,1] , comprising \mathcal O(N^2L) segments, can be represented by ReLU DNNs with L hidden layers and N neurons per layer. Subsequently, we demonstrate that this construction is optimal regarding the parameter count of DNNs, achieved through investigating the shattering capacity of ReLU DNNs. Moreover, by invoking the KST, we achieve an enhanced approximation rate for ReLU DNNs of arbitrary width and depth when dealing with continuous functions in high-dimensional spaces.

PaperID: 1494,

Authors: Zhehua Zhou, Xuan Xie, Jiayang Song, Zhan Shu, Lei Ma

Affiliations: Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada

Title: GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model

Abstract:
Safe reinforcement learning (SRL) aims to realize a safe learning process for deep reinforcement learning (DRL) algorithms by incorporating safety constraints. However, the efficacy of SRL approaches often relies on accurate function approximations, which are notably challenging to achieve in the early learning stages due to data insufficiency. To address this issue, we introduce, in this work, a novel generalizable safety enhancer (GenSafe) that can overcome the challenge of data insufficiency and enhance the performance of SRL approaches. Leveraging model order reduction techniques, we first propose an innovative method to construct a reduced order Markov decision process (ROMDP) as a low-dimensional approximator of the original safety constraints. Then, by solving the reformulated ROMDP-based constraints, GenSafe refines the actions of the agent to increase the possibility of constraint satisfaction. Essentially, GenSafe acts as an additional safety layer for SRL algorithms. We evaluate GenSafe on multiple SRL approaches and benchmark problems. The results demonstrate its capability to improve safety performance, especially in the early learning phases, while maintaining satisfactory task performance. Our proposed GenSafe not only offers a novel measure to augment existing SRL methods but also shows broad compatibility with various SRL algorithms, making it applicable to a wide range of systems and SRL problems.

PaperID: 1495,

Authors: Cunbo Li, Tian Tang, Yue Pan, Lei Yang, Shuhan Zhang, Zhaojin Chen, Peiyang Li, Dongrui Gao, Huafu Chen, Fali Li, Dezhong Yao, Zehong Cao, Peng Xu

Affiliations: Clinical Hospital of Chengdu Brain Science Institute, the Ministry of Education (MOE) Key Laboratory for Neuroinformation, and the School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China; School of Computer Science, Chengdu University of Information Technology, Chengdu, China; School of Bioinfomatics, Chongqing University of Posts and Telecommunications, Chongqing, China; STEM, University of South Australia, Adelaide, SA, Australia

Title: An Efficient Graph Learning System for Emotion Recognition Inspired by the Cognitive Prior Graph of EEG Brain Network

Abstract:
Benefiting from the high-temporal resolution of electroencephalogram (EEG), EEG-based emotion recognition has become one of the hotspots of affective computing. For EEG-based emotion recognition systems, it is crucial to utilize state-of-the-art learning strategies to automatically learn emotion-related brain cognitive patterns from emotional EEG signals, and the learned stable cognitive patterns effectively ensure the robustness of the emotion recognition system. In this work, to realize the efficient decoding of emotional EEG, we propose a graph learning system [Graph Convolutional Network framework with Brain network initial inspiration and Fused attention mechanism (BF-GCN)] inspired by the brain cognitive mechanism to automatically learn graph patterns from emotional EEG and improve the performance of EEG emotion recognition. In the proposed BF-GCN, three graph branches, i.e., cognition-inspired functional graph branch, data-driven graph branch, and fused common graph branch, are first elaborately designed to automatically learn emotional cognitive graph patterns from emotional EEG signals. And then, the attention mechanism is adopted to further capture the brain activation graph patterns that are related to emotion cognition to achieve an efficient representation of emotional EEG signals. Essentially, the proposed BF-CGN model is a cognition-inspired graph learning neural network model, which utilizes the spectral graph filtering theory in the automatic learning and extracting of emotional EEG graph patterns. To evaluate the performance of the BF-GCN graph learning system, we conducted subject-dependent and subject-independent experiments on two public datasets, i.e., SEED and SEED-IV. The proposed BF-GCN graph learning system has achieved 97.44% (SEED) and 89.55% (SEED-IV) in subject-dependent experiments, and the results in subject-independent experiments have achieved 92.72% (SEED) and 82.03% (SEED-IV), respectively. The state-of-the-art performance indicates that the proposed BF-GCN graph learning system has a robust performance in EEG-based emotion recognition, which provides a promising direction for affective computing.

PaperID: 1496,

Authors: Boqiang Cao, Xiaobing Nie, Wei Xing Zheng, Jinde Cao

Affiliations: School of Mathematics, Southeast University, Nanjing, China; School of Computer, Data and Mathematical Sciences, Western Sydney University, Sydney, NSW, Australia

Title: Multistability of State-Dependent Switched Fractional-Order Hopfield Neural Networks With Mexican-Hat Activation Function and Its Application in Associative Memories

Abstract:
The multistability and its application in associative memories are investigated in this article for state-dependent switched fractional-order Hopfield neural networks (FOHNNs) with Mexican-hat activation function (AF). Based on the Brouwer’s fixed point theorem, the contraction mapping principle and the theory of fractional-order differential equations, some sufficient conditions are established to ensure the existence, exact existence and local stability of multiple equilibrium points (EPs) in the sense of Filippov, in which the positively invariant sets are also estimated. In particular, the analysis concerning the existence and stability of EPs is quite different from those in the literature because the considered system involves both fractional-order derivative and state-dependent switching. It should be pointed out that, compared with the results in the literature, the total number of EPs and stable EPs increases from 5^\ell _1 3^\ell _2 and 3^\ell _1 2^\ell _2 to 7^\ell _1 5^\ell _2 and 4^\ell _1 3^\ell _2 , respectively, where 0 \leq \ell _1 + \ell _2 \leq n with n being the system dimension. Besides, a new method is designed to realize associative memories for grayscale and color images by introducing a deviation vector, which, in comparison with the existing works, not only improves the utilization efficiency of EPs, but also reduces the system dimension and computational burden. Finally, the effectiveness of the theoretical results is illustrated by four numerical simulations.

PaperID: 1497,

Authors: Zanyu Tang, Yunong Zhang, Liangjie Ming

Affiliations: School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China; School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, China

Title: Novel Snap-Layer MMPC Scheme via Neural Dynamics Equivalency and Solver for Redundant Robot Arms With Five-Layer Physical Limits

Abstract:
To obtain smoother kinematic control of minimum motion, a novel snap-layer minimum motion scheme, otherwise known as the minimum motion planning and control (MMPC) scheme for redundant robot arms, is proposed for the first time in this study. With the primary task of tracking planned paths and the consideration of satisfying five-layer physical limits, the snap-layer MMPC problem is transformed into a quadratic programming (QP) problem. Five-layer physical limits include angle-layer, velocity-layer, acceleration-layer, jerk-layer, and snap-layer limits, which are all considered and then transformed into a unified-layer bounded constraint through Zhang neural dynamics (ZND) equivalency. Furthermore, the snap-layer performance index and equation constraint are derived by utilizing the ZND formula. Therefore, the proposed snap-layer MMPC scheme is formulated as a standard QP that can avoid the potential physical damage of redundant robot arms. The snap-layer projection neural dynamics (PND) solver is presented and used to acquire the neural solution of the QP. Simulation results on a 6-degrees-of-freedom (DOF) planar redundant robot arm are presented to substantiate the effectiveness and superiority of the proposed snap-layer MMPC scheme by comparing it with the jerk-layer MMPC scheme and the minimum snap norm (MSN) scheme.

PaperID: 1498,

Authors: Lei Ren, Haiteng Wang, Tingyu Mo, Laurence T. Yang

Affiliations: School of Automation Science and Electrical Engineering, Beihang University, Beijing, China; Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada

Title: A Lightweight Group Transformer-Based Time Series Reduction Network for Edge Intelligence and Its Application in Industrial RUL Prediction

Abstract:
Recently, deep learning-based models such as transformer have achieved significant performance for industrial remaining useful life (RUL) prediction due to their strong representation ability. In many industrial practices, RUL prediction algorithms are deployed on edge devices for real-time response. However, the high computational cost of deep learning models makes it difficult to meet the requirements of edge intelligence. In this article, a lightweight group transformer with multihierarchy time-series reduction (GT-MRNet) is proposed to alleviate this problem. Different from most existing RUL methods computing all time series, GT-MRNet can adaptively select necessary time steps to compute the RUL. First, a lightweight group transformer is constructed to extract features by employing group linear transformation with significantly fewer parameters. Then, a time-series reduction strategy is proposed to adaptively filter out unimportant time steps at each layer. Finally, a multihierarchy learning mechanism is developed to further stabilize the performance of time-series reduction. Extensive experimental results on the real-world condition datasets demonstrate that the proposed method can significantly reduce up to 74.7% parameters and 91.8% computation cost without sacrificing accuracy.

PaperID: 1499,

Authors: Jinli Zhang, Junzhe Jiang, Fenglong Ma, Zongli Jiang, Qi Tan, Yongcheng Zhou

Affiliations: College of Computer Science, Beijing University of Technology, Beijing, China; The Pennsylvania State University, Pennsylvania, PA, USA; Department of Neonatology, Children’s Hospital, Chongqing Medical University, Chongqing, China; The School of Automation, Chongqing University, Chongqing, China

Title: A Novel Approach for Perceptions of Physician Decision-Making and Latent Topic Refinement in Large Language Model-Enhanced Medical Dialog Generation

Abstract:
The rapid advancement of medical dialog generation (MDG) techniques has enabled medical dialog systems (MDSs) to generate high-quality responses rich in medical expertise by integrating diverse medical information. However, they still encounter several challenges, including generic response generation, lack of semantic precision, and imprecise dialog topic extraction. This study aims to design a novel model to address these challenges simultaneously. Correspondingly, we propose the TRL-HMIE model, which represents transformer reinforcement learning (RL) for heterogeneous medical information extraction. In particular, we incorporate GPT-3 from transformer-related models as the reference language model. Our enhancements focused on three key aspects. First, we developed a conversation-topic classifier to precisely categorize conversation topics, supporting the conversation-topic locator module in generating reliable conversation topics. Second, the model employs a multihead attention mechanism to capture crucial information from the dialog context, facilitating the extraction of key dialog information and enhancing the accuracy of heterogeneous information extraction. Finally, the model integrates RL and a reward fusion mechanism, which, combined with its ability to handle multisource information and long dialog contexts, generates optimized rewards for the TRL-HMIE model, encouraging the production of doctor responses with precise semantics and dialog topics. The experimental results demonstrate that the proposed method achieves a 6.07% improvement over the benchmark model on the MedDG and MedDialog datasets. The experimental results demonstrate that the proposed method achieves a 6.07% improvement over the benchmark model on the MedDG and MedDialog datasets.

PaperID: 1500,

Authors: Sang-Hoon Lee, Ha-Yeong Choi, Seung-Bin Kim, Seong-Whan Lee

Affiliations: Department of Software and Computer Engineering, Department of Artificial Intelligence, Ajou University, Suwon-si, South Korea; Gen AI Lab, KT Corporation, Seoul, South Korea; Department of Artificial Intelligence, Korea University, Seoul, South Korea

Title: HierSpeech++: Bridging the Gap Between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-Shot Speech Synthesis

Abstract:
Large language model (LLM)-based speech synthesis has been widely adopted in zero-shot speech synthesis. However, they require a large-scale data and possess the same limitations as previous autoregressive speech models, including slow inference speed and lack of robustness. This article proposes HierSpeech++, a fast and strong zero-shot speech synthesizer for text-to-speech (TTS) and voice conversion (VC). We verified that hierarchical speech synthesis frameworks could significantly improve the robustness and expressiveness of the synthetic speech. Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios. For TTS, we adopt the text-to-vec (TTV) framework, which generates a self-supervised speech representation and an F0 representation based on text representations and prosody prompts. Then, HierSpeech++ generates speech from the generated vector, F0, and voice prompt. We further introduce a high-efficient speech super-resolution (SpeechSR) framework from 16 to 48 kHz. The experimental results demonstrated that the hierarchical variational autoencoder could be a strong zero-shot speech synthesizer given that it outperforms LLM-based and diffusion-based models. Moreover, we achieved the first human-level quality zero-shot speech synthesis. Audio samples and source code are available at https://github.com/hierspeechpp/code

PaperID: 1501,

Authors: Chenglang Yuan, Jianpeng Li, Bin Huang, Mingyu Wang, Kangyang Cao, Yanji Luo, Yujian Zou, Shi-Ting Feng, Bingsheng Huang

Affiliations: Medical AI Laboratory, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China; Department of Radiology, The Tenth Affiliated Hospital of Southern Medical University (Dongguan People’s Hospital), Dongguan, China; Department of Radiology, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China

Title: An Efficient Domain Knowledge-Guided Semantic Prediction Framework for Pathological Subtypes on the Basis of Radiological Images With Limited Annotations

Abstract:
Accurate prediction of pathological subtypes on radiological images is one of the most important deep learning (DL) tasks for the appropriate selection of clinical treatment. It is challenging for conventional DL models to obtain sufficient pathological labels for training because of the heavy workload, invasive surgery, and knowledge requirements in pathological analysis. However, existing methods based on limited annotations, such as active learning (AL) and semi-supervised learning (SSL), have difficulty in capturing lesion’s effective features because of the complicated semantic information of radiologic images. In this article, we introduce an efficient domain knowledge-guided semantic prediction framework that integrates domain knowledge-guided AL and SSL methods. This framework can effectively predict pathological subtypes on the basis of radiologic images with limited pathological annotations via three key modules: 1) the discriminative spatial-semantic feature extraction module captures the spatial-semantic features of lesions as semantic information that can better reflect the semantic relationship and effectively mitigate overfitting risk; 2) the explicit sign-guided anchor attention module measures the multimodal semantic distribution of samples under the guidance of clinical domain knowledge, thus selecting the most representative AL samples for pathological labeling; and 3) the implicit radiomics-guided dual-task entanglement module exploits the inherent constraint relationships between implicit radiomics features (IRFs) and pathological subtypes, facilitating the aggregation of unlabeled data. Experiments have been extensively conducted to evaluate our method in two clinical tasks: the pathological grading prediction in pancreatic neuroendocrine neoplasms (pNENs) and muscular invasiveness prediction in bladder cancer (BCa). The experimental results on both tasks demonstrate that the proposed method consistently outperforms the state-of-the-art approaches by a large margin.

PaperID: 1502,

Authors: Ziwen Wei, Xiaolong Wu, Qi Wang, Yunbiao Zhou, Shaozhuang Zhai, Tao Jiang, Zhihua Liu, Yang Zhang, Hongcang Gu, Shuanghu Yuan, Junchao Qian

Affiliations: Department of Radiation Oncology, First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China; Anhui Province Key Laboratory of Medical Physics and Technology,Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, China; School of Biological Science and Medical Engineering, Southeast University, Nanjing, China; Department of Radiation Oncology, Anhui Provincial Cancer Hospital, Hefei, China

Title: RSME: Respiration-Driven Synchronized Motion Estimator for Real-Time Thoracic 3-D CT Reconstruction Using Low-Rank Motion Fields and Dose-Free Surface Imaging

Abstract:
Real-time 3-D CT reconstruction during radiotherapy, based on planning 4-D CT, has emerged as a central area of interest in the field of image-guided radiotherapy (IGRT). However, current methodologies still rely on time-consuming patient-specific training, real-time registration, or additional radiation doses. Our contribution, respiration-driven synchronized motion estimator (RSME), represents a novel and efficient forward propagation network, leveraging an explicit spatial-temporal low-rank decomposition of displacement fields and surface imaging to enable real-time 3-D CT reconstruction during respiration. RSME is derived from a well-designed, interpretable inverse optimization problem, mapping static spatial components and dynamic skin depth images to dynamic temporal components for reconstruction. In RSME, we customize two core transformer-based encoders: the motion former (M-Former) and the motion pattern former (P-Former), and incorporate customized cross attention mechanism to effectively gauge the interdependencies between the two encoders. Strategically, we input skin depth images into M-Former to inquire about motion information, thus circumventing the requirement for real-time registration. In P-Former, we introduce static spatial components that integrate explicit respiratory pattern details to distinguish between patients without the necessity for patient-specific training, and harness the static nature to evade additional inference latency. Extensive experimental results demonstrate that RSME achieves comparable or even superior accuracy than state-of-the-art methods, with a cumulative latency of 96 ms. Notably, RSME accomplishes this without necessitating additional radiation dosage, time-consuming patient-specific training or real-time registration.

PaperID: 1503,

Authors: Jiangyu Wang, Ding Wang, Jin Ren, Derong Liu, Junfei Qiao

Title: Parallel Multistep Evaluation With Efficient Data Utilization for Safe Neural Critic Control and Its Application to Orbital Maneuver Systems

Abstract:
Data-driven methods have significantly advanced optimal learning control, but some approaches overlook systematic considerations of data utilization, including safety, efficiency, and error accumulation. To address the neglects in safe neural critic control, this article introduces a parallel multistep evaluation mechanism that combines data from the system interaction with data generated by data-driven models. Based on this evaluation mechanism, we propose a novel parallel multistep Q-learning algorithm that enhances data utilization efficiency and mitigates the error accumulation. Furthermore, we formulate a novel control barrier function (CBF) to ensure safety during learning and control processes, which is capable of dealing with asymmetric constraints and adjusting the constraint strength. In addition, the analysis reveals that multistep information introduced by data-driven models influences the learning performance of actor–critic neural networks (NNs). Finally, parallel multistep Q-learning, which makes use of data in aspects of safety, efficiency, and error bounds, is validated within an orbital maneuver system.

PaperID: 1504,

Authors: Mengrui Cao, Lin Xiao, Qiuyue Zuo, Linju Li, Xieping Gao

Affiliations: Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing and MOE-LCSM, Hunan Normal University, Changsha, China

Title: Data-Based Model-Free Predictive Control System Under the Design Philosophy of MPC and Zeroing Neurodynamics for Robotic Arm Pose Tracking

Abstract:
Involving both position and orientation tracking, pose tracking control for the end-effector of a redundant manipulator is a critical problem in robotic motion control. However, existing methods often suffer from dependency on model parameters and lack joint constraints. To remedy these weaknesses, this article proposes a data-based predictive tracking control of position and orientation (DBPTCPO) for redundant manipulators with undetermined parameters. Specifically, in addition to minimizing tracking error, the DBPTCPO scheme can also minimize joint velocity and acceleration to optimize energy efficiency. Furthermore, it directly handles three-level joint constraints, effectively preventing a reduction in the feasible domain of decision variables. As for the uncertain parameters of redundant manipulators, a method based on zeroing neurodynamics (ZNs) is developed to estimate the Jacobian matrix, requiring only the sensory output and control signals. Ultimately, a ZN-based solver is designed to solve the quadratic programming (QP) problem with inequality constraints derived from the DBPTCPO scheme. Necessary theoretical analyses for the control process are provided, and the higher tracking accuracy of the proposed method is numerically validated when compared with other control schemes.

PaperID: 1505,

Authors: Yuxuan Shen, Zidong Wang, Hongli Dong, Hongjian Liu, Xiaohui Liu

Affiliations: Artificial Intelligence Energy Research Institute, Heilongjiang Provincial Key Laboratory of Networking and Intelligent Control, National Key Laboratory of Continental Shale Oil of China, Northeast Petroleum University, Daqing, China; Department of Computer Science, Brunel University London, Uxbridge, Middlesex, U.K; Key Laboratory of Advanced Perception and Intelligent Control of High-End Equipment, Ministry of Education, and the School of Mathematics and Physics, Anhui Polytechnic University, Wuhu, China

Title: Joint State and Unknown Input Estimation for a Class of Artificial Neural Networks With Sensor Resolution: An Encoding-Decoding Mechanism

Abstract:
This article is concerned with the joint state and unknown input (SUI) estimation for a class of artificial neural networks (ANNs) with sensor resolution (SR) under the encoding–decoding mechanisms. The consideration of SR, which is an important specification of sensors in the real world, caters to engineering practice. Furthermore, the implementation of the encoding–decoding mechanism in the communication network aims to accommodate the limited bandwidth. The objective of this study is to propose a set-membership estimation algorithm that accurately estimates the state of the ANN without being influenced by the unknown input while accounting for the SR and the encoding–decoding mechanism. First, a sufficient condition is derived to ensure an ellipsoidal constraint on the estimation error. Then, by addressing an optimization problem, the design of the estimator gains is accomplished, and the minimal ellipsoidal constraint on the state estimation error is obtained. Finally, an example is provided to confirm the validity of the proposed joint SUI estimation scheme.

PaperID: 1506,

Authors: Lin Xiao, Xiangru Yan, Yongjun He, Biao Luo, Qiya Song

Affiliations: Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing and MOE-LCSM, Hunan Normal University, Changsha, China; School of Automation, Central South University, Changsha, China

Title: A Nonlinear Noise-Resistant Zeroing Neural Network Model for Solving Time-Varying Quaternion Generalized Lyapunov Equation and Applications to Color Image Processing

Abstract:
The time-varying Lyapunov equation (TVLE) plays a crucial role in control design and system stability. However, there has been limited research conducted on the time-varying generalized Lyapunov equation in the quaternion field. To tackle the time-varying quaternion generalized Lyapunov equation, a nonlinear noise-resistant zeroing neural network (NNR-ZNN) model with a novel power activation function (NPAF) is devised. The issue of non-commutativity within quaternion is circumvented by utilizing the real representation. The theoretical analyses provide a sufficient explanation for the global stability, fixed-time convergence, and robustness of the NNR-ZNN model. Under several different kinds of noises, the exceptional robustness of the NNR-ZNN model is highlighted by comparison with other existing models. In the end, the successful applications of the NNR-ZNN model to color image fusion and color image denoising confirm the practical value of the NNR-ZNN model.

PaperID: 1507,

Authors: Jing Xu, Chuandong Li, Xing He, Hongsong Wen, Xingxing Ju

Title: A Fixed-Time Proximal Gradient Neurodynamic Network With Time-Varying Coefficients for Composite Optimization Problems and Sparse Optimization Problems With Log-Sum Function

Abstract:
This article presents a novel proximal gradient neurodynamic network (PGNN) for solving composite optimization problems (COPs). The proposed PGNN with time-varying coefficients can be flexibly chosen to accelerate the network convergence. Based on PGNN and sliding mode control technique, the proposed time-varying fixed-time proximal gradient neurodynamic network (TVFxPGNN) has fixed-time stability and a settling time independent of the initial value. It is further shown that fixed-time convergence can be achieved by relaxing the strict convexity condition via the Polyak-Lojasiewicz condition. In addition, the proposed TVFxPGNN is being applied to solve the sparse optimization problems with the log-sum function. Furthermore, the field-programmable gate array (FPGA) circuit framework for time-varying fixed-time PGNN is implemented, and the practicality of the proposed FPGA circuit is verified through an example simulation in Vivado 2019.1. Simulation and signal recovery experimental results demonstrate the effectiveness and superiority of the proposed PGNN.

PaperID: 1508,

Authors: Vazim Ibrahim, Faouzi Alaya Cheikh, Vijayan K. Asari, Joseph Suresh Paul

Affiliations: CUSAT Research Center at the Indian Institute of Information Technology and Management—Kerala (IIITM-K), Thiruvananthapuram, Kerala, India; Department of Computer Science, Norwegian University of Science and Technology, Gjovik, Norway; Department of Electrical and Computer Engineering, School of Engineering, University of Dayton, Dayton, OH, USA; Medical Image Computing and Pattern Discovery Lab, Indian Institute of Information Technology and Management—Kerala (IIITM-K), Thiruvananthapuram, Kerala, India

Title: Extrapolation Convolution for Data Prediction on a 2-D Grid: Bridging Spatial and Frequency Domains With Applications in Image Outpainting and Compressed Sensing

Abstract:
Extrapolation plays a critical role in machine/deep learning (ML/DL), enabling models to predict data points beyond their training constraints, particularly useful in scenarios deviating significantly from training conditions. This article addresses the limitations of current convolutional neural networks (CNNs) in extrapolation tasks within image restoration and compressed sensing (CS). While CNNs show potential in tasks such as image outpainting and CS, traditional convolutions are limited by their reliance on interpolation, failing to fully capture the dependencies needed for predicting values outside the known data. This work proposes an extrapolation convolution (EC) framework that models missing data prediction as an extrapolation problem using linear prediction within DL architectures. The approach is applied in two domains: first, image outpainting, where EC in encoder–decoder (EnDec) networks replaces conventional interpolation methods to reduce artifacts and enhance fine detail representation; second, Fourier-based CS-magnetic resonance imaging (CS-MRI), where it predicts high-frequency signal values from undersampled measurements in the frequency domain, improving reconstruction quality and preserving subtle structural details at high acceleration factors. Comparative experiments demonstrate that the proposed EC-DecNet and FDRN outperform traditional CNN-based models, achieving high-quality image reconstruction with finer details, as shown by improved peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and kernel inception distance (KID)/Frechet inception distance (FID) scores. Ablation studies and analysis highlight the effectiveness of larger kernel sizes and multilevel semi-supervised learning in FDRN for enhancing extrapolation accuracy in the frequency domain.