- ICMLSmart Initial Basis Selection for Linear ProgramsZhenan * Fan, Xinglu * Wang, Oleksandr * Yakovenko, and 4 more authorsIn International Conference on Machine Learning (ICML), 2023
The simplex method, introduced by Dantzig more than half a century ago, is still to date one of the most efficient methods for solving large-scale linear programming (LP) problems. While the simplex method is known to have the finite termination property under mild assumptions, the number of iterations until optimality largely depends on the choice of initial basis. Existing strategies for selecting an advanced initial basis are mostly rule-based. These rules usually require extensive expert knowledge and empirical study to develop. Yet, many of them fail to exhibit consistent improvement, even for LP problems that arise in a single application scenario. In this paper, we propose a learning-based approach for initial basis selection. We employ graph neural networks as a building block and develop a model that attempts to capture the relationship between LP problems and their optimal bases. In addition, during the inference phase, we supplement the learning-based prediction with linear algebra tricks to ensure the validity of the generated initial basis. We validate the effectiveness of our proposed strategy by extensively testing it with state-of-the-art simplex solvers, including the open-source solver HiGHS and the commercial solver OptVerse. Through these rigorous experiments, we demonstrate that our strategy achieves substantial speedup and consistently outperforms existing rule-based methods. Furthermore, we extend the proposed approach to generating restricted master problems for column generation methods and present encouraging numerical results.
- IEEE TSPPolar Deconvolution of Mixed SignalsZhenan Fan, Halyun Jeong, Babhru Joshi, and 1 more authorIEEE Transactions on Signal Processing, 2022
The signal demixing problem seeks to separate a superposition of multiple signals into its constituent components. This paper describes a two-stage approach that first decompresses and subsequently deconvolves the noisy and undersampled observations of the superposition. Probabilistic error bounds are given on the accuracy with which this process approximates the individual signals. The theory of polar convolution of convex sets and gauge functions plays a central role in the analysis and solution process. If the measurements are random and the noise is bounded, this approach stably recovers low-complexity and mutually incoherent signals, with high probability and with optimal sample complexity. We develop an efficient algorithm, based on level-set and conditional-gradient methods, that solves the convex optimization problems with sublinear iteration complexity and linear space requirements. Numerical experiments on both real and synthetic data confirm the theory and the efficiency of the approach.
- ICDEImproving Fairness for Data Valuation in Horizontal Federated LearningZhenan Fan, Huang Fang, Zirui Zhou, and 4 more authorsIn International Conference on Data Engineering (ICDE), 2022
Federated learning is an emerging decentralized machine learning scheme that allows multiple data owners to work collaboratively while ensuring data privacy. The success of federated learning depends largely on the participation of data owners. To sustain and encourage data owners’ participation, it is crucial to fairly evaluate the quality of the data provided by the data owners and reward them correspondingly. Federated Shapley value, recently proposed by Wang et al. [Federated Learning, 2020], is a measure for data value under the framework of federated learning that satisfies many desired properties for data valuation. However, there are still factors of potential unfairness in the design of federated Shapley value because two data owners with the same local data may not receive the same evaluation. We propose a new measure called completed federated Shapley value to improve the fairness of federated Shapley value. The design depends on completing a matrix consisting of all the possible contributions by different subsets of the data owners. It is shown under mild conditions that this matrix is approximately low-rank by leveraging concepts and tools from optimization. Both theoretical analysis and empirical evaluation verify that the proposed measure does improve fairness in many circumstances.
- SubmittedFair and efficient contribution valuation for vertical federated learningZhenan Fan, Huang Fang, Zirui Zhou, and 3 more authorsSubmitted, 2022
Federated learning is a popular technology for training machine learning models on distributed data sources without sharing data. Vertical federated learning or feature-based federated learning applies to the cases that different data sources share the same sample ID space but differ in feature space. To ensure the data owners’ long-term engagement, it is critical to objectively assess the contribution from each data source and recompense them accordingly. The Shapley value (SV) is a provably fair contribution valuation metric originated from cooperative game theory. However, computing the SV requires extensively retraining the model on each subset of data sources, which causes prohibitively high communication costs in federated learning. We propose a contribution valuation metric called vertical federated Shapley value (VerFedSV) based on SV. We show that VerFedSV not only satisfies many desirable properties for fairness but is also efficient to compute, and can be adapted to both synchronous and asynchronous vertical federated learning algorithms. Both theoretical analysis and extensive experimental results verify the fairness, efficiency, and adaptability of VerFedSV.
- SubmittedA dual approach for federated learningZhenan * Fan, Huang * Fang, and Michael P FriedlanderSubmitted, 2022
We study the federated optimization problem from a dual perspective and propose a new algorithm termed federated dual coordinate descent (FedDCD), which is based on a type of coordinate descent method developed by Necora et al. [Journal of Optimization Theory and Applications, 2017]. Additionally, we enhance the FedDCD method with inexact gradient oracles and Nesterov’s acceleration. We demonstrate theoretically that our proposed approach achieves better convergence rates than the state-of-the-art primal federated optimization algorithms under mild conditions. Numerical experiments on real-world datasets support our analysis.
- SubmittedKnowledge-Injected Federated LearningZhenan Fan, Zirui Zhou, Jian Pei, and 4 more authorsSubmitted, 2022
Federated learning is an emerging technique for training models from decentralized data sets. In many applications, data owners participating in the federated learning system hold not only the data but also a set of domain knowledge. Such knowledge includes human know-how and craftsmanship that can be extremely helpful to the federated learning task. In this work, we propose a federated learning framework that allows the injection of participants’ domain knowledge, where the key idea is to refine the global model with knowledge locally. The scenario we consider is motivated by a real industry-level application, and we demonstrate the effectiveness of our approach to this application.
- SubmittedAchieving Model Fairness in Vertical Federated LearningChangxin * Liu, Zhenan * Fan, Zirui Zhou, and 4 more authorsSubmitted, 2022
Vertical federated learning (VFL) has attracted greater and greater interest since it enables multiple parties possessing non-overlapping features to strengthen their machine learning models without disclosing their private data and model parameters. Similar to other machine learning algorithms, VFL faces demands and challenges of fairness, i.e., the learned model may be unfairly discriminatory over some groups with sensitive attributes. To tackle this problem, we propose a fair VFL framework in this work. First, we systematically formulate the problem of training fair models in VFL, where the learning task is modelled as a constrained optimization problem. To solve it in a federated and privacy-preserving manner, we consider the equivalent dual form of the problem and develop an asynchronous gradient coordinate-descent ascent algorithm, where some active data parties perform multiple parallelized local updates per communication round to effectively reduce the number of communication rounds. The messages that the server sends to passive parties are deliberately designed such that the information necessary for local updates is released without intruding on the privacy of data and sensitive attributes. We rigorously study the convergence of the algorithm when applied to general nonconvex-concave min-max problems. We prove that the algorithm finds a δ-stationary point of the dual objective in O(δ^-4) communication rounds under mild conditions. Finally, the extensive experiments on three benchmark datasets demonstrate the superior performance of our method in training fair models.
- SubmittedCardinality-constrained structured data-fitting problemsZhenan * Fan, Huang * Fang, and Michael P FriedlanderSubmitted to Open Journal of Mathematical Optimization, 2021
A memory-efficient framework is described for the cardinality-constrained structured data-fitting problem. Dual-based atom-identification rules are proposed that reveal the structure of the optimal primal solution from near-optimal dual solutions. These rules allow for a simple and computationally cheap algorithm for translating any feasible dual solution to a primal solution that satisfies the cardinality constraint. Rigorous guarantees are provided for obtaining a near-optimal primal solution given any dual-based method that generates dual iterates converging to an optimal dual solution. Numerical experiments on real-world datasets support confirm the analysis and demonstrate the efficiency of the proposed approach.
- AISTATSGreed Meets Sparsity: Understanding and Improving Greedy Coordinate Descent for Sparse OptimizationHuang Fang, Zhenan Fan, Yifan Sun, and 1 more authorIn International Conference on Artificial Intelligence and Statistics (AISTATS), 2020
We consider greedy coordinate descent (GCD) for composite problems with sparsity inducing regularizers, including 1-norm regularization and non-negative constraints. Empirical evidence strongly suggests that GCD, when initialized with the zero vector, has an implicit screening ability that usually selects at each iteration coordinates that at are nonzero at the solution. Thus, for problems with sparse solutions, GCD can converge significantly faster than randomized coordinate descent. We present an improved convergence analysis of GCD for sparse optimization, and a formal analysis of its screening properties. We also propose and analyze an improved selection rule with stronger ability to produce sparse iterates. Numerical experiments on both synthetic and real-world data support our analysis and the effectiveness of the proposed selection rule.
- FnT OPTAtomic Decomposition via Polar Alignment: The Geometry of Structured OptimizationZhenan Fan, Halyun Jeong, Yifan Sun, and 1 more authorFoundations and Trends in Optimization, 2020
Structured optimization uses a prescribed set of atoms to assemble a solution that fits a model to data. Polarity, which extends the familiar notion of orthogonality from linear sets to general convex sets, plays a special role in a simple and geometric form of convex duality. This duality correspondence yields a general notion of alignment that leads to an intuitive and complete description of how atoms participate in the final decomposition of the solution. The resulting geometric perspective leads to variations of existing algorithms effective for large-scale problems. We illustrate these ideas with many examples, including applications in matrix completion and morphological component analysis for the separation of mixtures of signals.
- ICLRFast convergence of stochastic subgradient method under interpolationHuang Fang, Zhenan Fan, and Michael FriedlanderIn International Conference on Learning Representations (ICLR), 2020
This paper studies the behaviour of the stochastic subgradient descent (SSGD) method applied to over-parameterized nonsmooth optimization problems that satisfy an interpolation condition. By leveraging the composite structure of the empirical risk minimization problems, we prove that when interpolation holds, SSGD converges at the same rate as the stochastic gradient descent (SGD) method applied to smooth problems. Our analysis provides a partial explanation for the empirical observation that sometimes SGD and SSGD behave similarly for training smooth and nonsmooth machine learning models. We also prove that the rates we derive are optimal for the subgradient method in the convex and interpolation setting.
- ACSSCBundle methods for dual atomic pursuitZhenan Fan, Yifan Sun, and Michael P FriedlanderIn 53rd Asilomar Conference on Signals, Systems, and Computers (ACSSC), 2019
The aim of structured optimization is to assemble a solution, using a given set of (possibly uncountably infinite) atoms, to fit a model to data. A two-stage algorithm based on gauge duality and bundle method is proposed. The first stage discovers the optimal atomic support for the primal problem by solving a sequence of approximations of the dual problem using a bundle-type method. The second stage recovers the approximate primal solution using the atoms discovered in the first stage. The overall approach leads to implementable and efficient algorithms for large problems.