{
  "id": "2307.15154",
  "version": "v1",
  "published": "2023-07-27T19:03:36.000Z",
  "updated": "2023-07-27T19:03:36.000Z",
  "title": "A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity",
  "authors": [
    "Zhihan Xiong",
    "Romain Camilleri",
    "Maryam Fazel",
    "Lalit Jain",
    "Kevin Jamieson"
  ],
  "comment": "25 pages, 6 figures",
  "categories": [
    "cs.LG",
    "stat.ML"
  ],
  "abstract": "We investigate the fixed-budget best-arm identification (BAI) problem for linear bandits in a potentially non-stationary environment. Given a finite arm set $\\mathcal{X}\\subset\\mathbb{R}^d$, a fixed budget $T$, and an unpredictable sequence of parameters $\\left\\lbrace\\theta_t\\right\\rbrace_{t=1}^{T}$, an algorithm will aim to correctly identify the best arm $x^* := \\arg\\max_{x\\in\\mathcal{X}}x^\\top\\sum_{t=1}^{T}\\theta_t$ with probability as high as possible. Prior work has addressed the stationary setting where $\\theta_t = \\theta_1$ for all $t$ and demonstrated that the error probability decreases as $\\exp(-T /\\rho^*)$ for a problem-dependent constant $\\rho^*$. But in many real-world $A/B/n$ multivariate testing scenarios that motivate our work, the environment is non-stationary and an algorithm expecting a stationary setting can easily fail. For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over $\\mathcal{X}$ at each time then the error probability decreases as $\\exp(-T\\Delta^2_{(1)}/d)$, where $\\Delta_{(1)} = \\min_{x \\neq x^*} (x^* - x)^\\top \\frac{1}{T}\\sum_{t=1}^T \\theta_t$. As there exist environments where $\\Delta_{(1)}^2/ d \\ll 1/ \\rho^*$, we are motivated to propose a novel algorithm $\\mathsf{P1}$-$\\mathsf{RAGE}$ that aims to obtain the best of both worlds: robustness to non-stationarity and fast rates of identification in benign settings. We characterize the error probability of $\\mathsf{P1}$-$\\mathsf{RAGE}$ and demonstrate empirically that the algorithm indeed never performs worse than G-optimal design but compares favorably to the best algorithms in the stationary setting.",
  "revisions": [
    {
      "version": "v1",
      "updated": "2023-07-27T19:03:36.000Z"
    }
  ],
  "analyses": {
    "keywords": [
      "linear bandits",
      "error probability decreases",
      "a/b testing",
      "robustness",
      "non-stationarity"
    ],
    "note": {
      "typesetting": "TeX",
      "pages": 25,
      "language": "en",
      "license": "arXiv",
      "status": "editable"
    }
  }
}