Quantifying the Complexity and Learnability of Strategic Classification Problems

Author:Murphy  |  View: 28411  |  Time: 2025-03-22 22:06:36
Image generated by the author using DALL-E 3.

In the first article in this series, we formally defined the strategic classification problem, denoted Sᴛʀᴀᴄ⟨H, R, c, as a generalization of canonical binary classification. We did so based on the paper PAC-Learning for Strategic Classification (Sundaram, Vullikanti, Xu, & Yao, 2021). Along the way, we explored why we should care about considering the various preferences of rational agents during classification and how we can do so (subject to certain assumptions). We will rely heavily on the concepts introduced in the previous article, so I encourage you to read it if you haven't already.

Extending PAC Learning to a Strategic Classification Setting

We'll pick up where we left off, using the definition of Sᴛʀᴀᴄ⟨H, R, c⟩ as a jumping-off point for the useful concept of strategic VC dimension (SVC). Once we make sense of SVC, what I call the Fundamental Theorem of Strategic Learning will follow rather naturally.

While helpful, prior familiarity with shattering coefficients, the canonical VC dimension, and the Fundamental Theorem of Statistical Learning will not be necessary for you to follow along. However, there's far more depth to each of them than I could ever hope to cover as part of this series, let alone in a single article. The curious reader is referred to Andrew Rothman‘s wonderful and very thorough articles on the (canonical) shattering coefficient and VC dimension.

Statistical Learning Theory Part 5: Shattering Coefficient

Statistical Learning Theory Part 6: Vapnik–Chervonenkis (VC) Dimension

As we'll see, strategic shattering coefficients and SVC are fairly natural generalizations of their canonical (i.e., non-strategic) counterparts. We will therefore begin with a brief rundown of each of those counterparts before explaining how they can be modified to work in a strategic setting.


Counting Achievable Labelings: Canonical Shattering Coefficients

Verbally defining shattering coefficients seems straightforward at first glance:

Given a hypothesis class H, **its nᵗʰ shattering coefficient, denoted Sₙ_(H), represents the largest number of labelings achievable by classifiers in_ H on a sample of n feature vectors.**

But what is a "labeling"? And what makes it "achievable"? Answering those questions will help us lay some groundwork in pursuit of a more formal definition.

In the context of binary classification, a labeling of a sample of feature vectors is simply any one of the ways we can assign values from the set { -1, 1 } to those vectors. As a very simple example, consider two one-dimensional feature vectors (i.e., points on a number line), _x_₁ = 1 and _x_₂ = 2.

A visualization of the four possible labelings of the sample _x_₁ = 1, _x_₂ = 2. Red points are negatively classified, blue ones are positively classified. Image by the author.

The possible labelings are any combination of the classification values we can assign the individual feature vectors independent of one another. We can represent each labeling as a vector, where the first and second coordinate represent the values assigned to _x_₁ and _x_₂, respectively. The set of possible labelings is thus { (-1, -1), (-1, 1), (1, -1), (1, 1) }. Note that a sample of size 2 yields 2² = 4 possible labelings – we'll see how this generalizes to arbitrarily-sized samples soon.

We say that a labeling is achievable by a hypothesis class H if there exists a classifier hH from which that labeling can result. Continuing with our simple example, suppose we are limited to classifiers of the form xk, k ∈ ℝ, that is, one-dimensional thresholds such that anything to the right of the threshold is positively classified. The labeling (1, -1) is not achievable by this hypothesis class. x₂ being greater than x₁ implies that any threshold that classifies x₁ positively must do the same for x₂. The set of achievable labelings is therefore { (-1, -1), (-1, 1), (1, 1) }.

Examples of one-dimensional threshold classifiers that can be used to achieve all but one of the possible labelings of the sample _x_₁ = 1, _x_₂ = 2. Image by the author.

Having understood the basic terminology, we can start to develop some notation to formally express elements of the verbal definition with which we started.

We stick to representing labelings as vectors as we did in our simple example, with each coordinate representing the classification value assigned to the corresponding feature vector. There are 2 possible labelings in total: there are two possible choices for each feature vector, and we can think of a labeling as a collection of n such choices, each made independently of the rest. If a hypothesis class H can achieve all possible labelings of a sample

Tags: Classification Algorithms Game Theory Machine Learning Thoughts And Theory Vc Dimension

Comment