Why Strategic Classification Is Useful: Motivation
Binary classification is a cornerstone of machine learning. It was the first topic I was taught when I took an introductory course on the subject; the real-world example we examined back then was the problem of classifying emails as either spam or not spam. Other common examples include diagnosing a disease and screening resumes for a job posting.
The basic binary classification setup is intuitive and easily applicable to our day-to-day lives, and it can serve as a helpful demonstration of the ways we can leverage machine learning to solve human problems. But how often do we stop to consider the fact that people usually have a vested interest in the classification outcome of such problems? Spammers want their emails to make it through spam filters, not everyone wants their COVID test to come back positive, and job seekers may be willing to stretch the truth to score an interview. The data points aren’t just data points — they’re active participants in the classification process, often aiming to game the system to their own benefit.
In light of this, the canonical binary classification setup seems a bit simplistic. However, the complexity of reexamining binary classification while tossing out the implicit assumption that the objects we wish to classify are uninfluenced by external stakes sounds unmanageable. The preferences that could affect the classification process come in so many different forms — how could we possibly take all of them into account?
It turns out that, under certain assumptions, we can. Through a clever generalization of the canonical binary classification model, the paper’s authors demonstrate the feasibility of designing computationally-tractable, gaming-resistant classification algorithms.
From Data Points to Rational Agents: Preference Classes
First, if we want to be as realistic as possible, we have to properly consider the wide breadth of forms that real-world preferences can take among rational agents. The paper mentions five increasingly general categories of preferences (which I’ll call preference classes). The names I’ll use for them are my own, but are based on the terminology used in the paper.
- Impartial: No preferences, just like in canonical binary classification.
- Homogeneous: Identical preferences across all the agents involved. For example, within the set of people who are willing to fill out the paperwork necessary to apply for a tax refund, we can reasonably expect that everyone is equally motivated to get their money back (i.e., to be classified positively).
- Adversarial: Equally-motivated agents aim to induce the opposite of their true labels. Think of bluffing in poker — a player with a weak hand (negatively classified) wants their opponents to think they have a strong hand (positively classified), and vice versa. For the “equally-motivated” part, imagine all players bet the same amount.
- Generalized Adversarial: Unequally-motivated agents aim to induce the opposite of their true labels. This isn’t too different from the plain Adversarial case. Still, it should be easy to understand how a player with $100 dollars on the line would be willing to go to greater lengths to deceive their opponents than a player betting $1.
- General Strategic: “Anything goes.” This preference class aims to encompass any set of preferences imaginable. All four of the previously mentioned preference classes are strict subsets of this one. Naturally, this class is the main focus of the paper, and most of the results demonstrated in the paper apply to it. The authors give the wonderful example of college applications, where “students (who) have heterogeneous preferences over universities (…) may manipulate their application materials during the admission process.”
How can the canonical classification setup be modified to account for such rich agent preferences? The answer is astoundingly simple. Instead of limiting our scope to (x, y) ∈ x × { -1, 1 }, we consider data points of the form (x, y, r) ∈ x × { -1, 1 } × R. A point’s r value represents its preference, which we can break down into two equally important components:
- The sign of r indicates whether the data point wants to be positively or negatively classified (r > 0 or r < 0, respectively).
- The absolute value of r specifies how strong the data point’s preference is. For example, a data point with r = 10 would be much more strongly motivated to manipulate its feature vector x to ensure it ends up being positively classified than a data point with r = 1.
What determines the preference class we operate within is the set R. We can formally define each of the aforementioned preference classes in terms of R and see how the formal definitions align with their intuitive descriptions and examples:
- Impartial: R = { 0 }. (This makes it abundantly clear that the strategic setup is just a generalization of the canonical setup.)
- Homogeneous: R = { 1 }.
- Adversarial: R = { -1, 1 }, with the added requirement that all data points prefer to be classified as the opposite of their true labels.
- Generalized Adversarial: R ⊆ ℝ (and all data points prefer to be classified as the opposite of their true labels.)
- General Strategic: R ⊆ ℝ.
Giving Preference Magnitude Meaning: Cost Functions
Clearly, though, R on its own isn’t enough to construct an entire general strategic framework. The very idea of a data point’s preference having a certain magnitude is meaningless without tying it to the cost the data point incurs in manipulating its feature vector. Otherwise, any data point with a positive r, no matter how small, would have no reason not to manipulate its feature vector ad infinitum. This is where the concept of cost functions comes into play.
Let c: x × x → ℝ⁺. For simplicity, we will assume (as the paper’s authors do) that c is induced by seminorms. We say that a test data point (x, y, r) may transform its feature vector x into z ∈ x with cost c(z; x). It’s important to note in this context that the paper assumes that the training data is unmanipulated.
We can divide cost functions into two categories, with the former being a subset of the latter. An instance-invariant cost function is the same across all data points. To put it more formally:
∃ℓ: x × x → ℝ⁺ . ∀(x, y, r) ∈ x × { -1, 1 } × R . ∀z ∈ x . c(z; x) = ℓ(z – x)
I.e., there exists a function ℓ such that for all data points and all potential manipulated feature vectors, c(z ; x) simply takes the value of ℓ(z – x).
An instance-wise cost function may vary between data points. Formally:
∀(x, y, r) ∈ x × { -1, 1 } × R . ∃ℓₓ: x × x → ℝ⁺ .∀z ∈ x . c(z; x) = ℓₓ(z – x)
I.e., each data point can have its own function, ℓₓ, and c(z; x) takes the value of ℓₓ(z – x) for each individual data point.
As we will see in the final article in this series, while the difference between the two types of cost functions may seem subtle, instance-wise cost functions are significantly more expressive and harder to learn.
Preference Classes and Cost Functions in Action: An Example
Let’s take a look at an example given in the paper to help hammer home the aspects of the setup we’ve covered so far.
In this example, we have a decision boundary induced by a linear binary classifier and four data points with individual preferences. General strategic is the only applicable preference class in this case.
The dotted perimeter around each xᵢ shows the manipulated feature vectors z to which it would cost the point exactly 1 to move. Since we assume the cost function is induced by seminorms, everything inside a perimeter has a cost of less than 1 for the corresponding data point to move to. We can easily tell that the cost function in this example varies from data point to data point, which means it is instance-wise.
As we can see, the leftmost data point (x₁, -1, -1) has no incentive to cross the decision boundary since it is on the negative side of the decision boundary while also having a negative preference. (x₄, -1, 2), however, wants to be positively classified, and since the reward for manipulating x₄ to cross the boundary (which is 2) outweighs the cost of doing so (which is less than 1), it makes sense to go through with the manipulation. (x₃, 1, -2) is symmetric to (x₄, -1, 2), also deciding to manipulate its feature to achieve its desired classification outcome. Lastly, (x₂, -1, 1), the cost function of which we can see is based on taxicab distance, opts to stay put regardless of its preference to be positively classified. This is because the cost of manipulating x₂ to cross the decision boundary would be greater than 1, surpassing the reward the data point would stand to gain by doing so.
Assuming the agents our data points represent are rational, we can very easily tell when a data point should manipulate its feature vector (benefits outweigh costs) and when it shouldn’t (costs outweigh benefits). The next step is to turn our intuitive understanding into something more formal.
Balancing Costs & Benefits: Defining Data Point Best Response
This leads us to define the data point best response:
So we’re looking for the feature vector(s) z ∈ x that maximize… what exactly? Let’s break down the expression we’re aiming to maximize into more manageable parts.
- h: A given binary classifier (h: x → { -1, 1 }).
- c(z; x): As stated above, this expresses the cost of modifying the feature vector x to be z.
- 𝕀(h(z) = 1): Here, 𝕀(p) is the indicator function, returning 1 if the predicate p is upheld or 0 if it isn’t. The predicate h(z) = 1 is true if the vector z under consideration is positively classified by h. Putting that together, we find that 𝕀(h(z) = 1) evaluates to 1 for any z that is positively classified. If r is positive, that’s good. If it’s negative, that’s bad.
The bottom-line is that we want to find vector(s) z for which 𝕀(h(z) = 1) ⋅ r, which we can call the realized reward, outweighs the cost of manipulating the original x into z by as much as possible. To put it in game theoretic terms, the data point best response maximizes the utility of its corresponding agent in the context of the binary classification under consideration.
Putting It All Together: A Formal Definition of the Strategic Classification Problem
Finally, we’ve laid all the necessary groundwork to formally define the strategic classification problem.
Given a hypothesis class H, a preference class R, a cost function c, and a set of n data points drawn from a distribution D, we want to find a binary classifier h’ that minimizes the loss as defined in the diagram above. Note that the loss is simply a modification of the canonical zero-one loss, plugging in the data point best response instead of h(x).
Conclusion
Starting from the canonical binary classification setup, we introduced the notion of preference classes. Next, we saw how to formalize that notion using an r value for each data point. We then saw how cost functions complement data point preferences. After that, we broke down an example before defining the key concept of data point best response based on the ideas we explored beforehand. Lastly, we used the data point best response to define the modified zero-one loss used in the definition of the strategic classification problem.
Join me next time as I define and explain the strategic VC dimension, which is the natural next step from where we left off this time.
References
(1) R. Sundaram, A. Vullikanti, H. Xu, F. Yao. PAC-Learning for Strategic Classification (2021), International Conference on Machine Learning.