When LLMs give us results that reveal flaws in human society, can we choose to listen to what they tell us?
By now, I'm sure most of you have heard the news about ai-bias-ethics” rel=”noopener ugc nofollow” target=”_blank”>Google's new LLM*, Gemini, generates images of racially diverse people in Nazi uniforms. This bit of news reminded me of something I wanted to discuss, which is when models have blind spots, so we apply expert rules to the predictions they generate to avoid returning something wildly outlandish to the user.
In my experience, this sort of thing is not that uncommon in machine learning, especially when you have bad or limited training data. A good example of this that I remember from my own work was predicting when a package would be delivered to a business office. Mathematically, our model would be very good at estimating exactly when the package would arrive physically close to the office, but sometimes, truckers arrive at their destinations late at night and then rest in their truck or at a hotel until the morning. Because? Because there is no one in the office to receive/sign for the package outside of business hours.
Teaching a model the idea of “business hours” can be very difficult, and the much easier solution was to simply say: “If the model says the delivery will arrive outside of business hours, add enough time to the prediction so that it changes to the business hours”. “The next hour the office appears open.” Simple! It solves the problem and reflects the real circumstances on the ground. We're simply giving the model a little boost to help its results work better.
However, this causes some problems. For one thing, we now have two different model predictions to manage. We can't just throw away the original model prediction, because that's what we use for monitoring and model performance metrics. You can't evaluate a model based on predictions after humans put their paws there, that's not mathematically sound. But to get a clear idea of the impact of the model in the real world, it is good to look at the post-rule prediction, because that is what the customer actually experienced/saw in their application. In ML, we're used to a very simple framework, where every time you run a model you get a result or a set of results, and that's it, but when you start modifying the results before you let them go, then you need to think on a scale. different.
I suspect this is a form of what is happening with LLMs like Gemini. However, instead of a post-prediction rule, it appears that the technology/2024/02/22/google-gemini-ai-image-generation-pause/?pwapi_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJyZWFzb24iOiJnaWZ0IiwibmJmIjoxNzA4ODM3MjAwLCJpc3MiOiJzdWJzY3JpcHRpb25zIiwiZXhwIjoxNzEwMjE1OTk5LCJpYXQiOjE3MDg4MzcyMDAsImp0aSI6IjFhMzAyYjkyLTRkN2ItNDNmMi1hNThlLWY1MDBjY2I2NDFjMyIsInVybCI6Imh0dHBzOi8vd3d3Lndhc2hpbmd0b25wb3N0LmNvbS90ZWNobm9sb2d5LzIwMjQvMDIvMjIvZ29vZ2xlLWdlbWluaS1haS1pbWFnZS1nZW5lcmF0aW9uLXBhdXNlLyJ9.E-JdVAohho0X-rTsTb1bfof4gIpYl8-NpPdZwL6h9Dc” rel=”noopener ugc nofollow” target=”_blank”>Smart Money says Gemini and other models are applying “secret” indication increases to try to change the results that LLMs produce.
In essence, without this nudge, the model will produce results that reflect the content it has been trained on. That is, content created by real people. Our social media posts, our history books, our museum paintings, our popular songs, our Hollywood movies, etc. The model takes all of that in and learns the underlying patterns, whether they're things we're proud of or things we're proud of. No. A model, given all the means available in our contemporary society, will be very exposed to racism, sexism and many other forms of discrimination and inequality, not to mention violence, war and other horrors. While the model is learning what people look like, how they sound, what they say, and how they move, it is learning the version warts and all.
Our social media posts, our history books, our museum paintings, our popular songs, our Hollywood movies, etc. The model takes all of that in and learns the underlying patterns, whether they're things we're proud of or things we're proud of. No.
This means that if you ask the underlying model to show you a doctor, it will probably be a white man in a lab coat. This is not simply random, it is because in our modern society white men have disproportionate access to high status professions such as doctors, because on average they have access to more and better education, financial resources, mentoring, social privileges, etc. . The model is reflecting an image to us that can make us uncomfortable because we don't like to think about that reality.
The obvious argument is, “Well, we don't want the model to reinforce the prejudices our society already has, we want it to improve representation of underrepresented populations.” I am quite sympathetic to this argument and I care about representation in our media. However, there is a problem.
Applying these adjustments is highly unlikely to be a sustainable solution. Remember the story I started about Gemini. It's like playing whack-a-mole, because the work never stops; We now have people of color appearing in Nazi uniforms, and this, understandably, is deeply offensive to many people. So maybe where we start by randomly applying “as a black person” or “as an indigenous person” to our prompts, we have to add something else to exclude cases where it is inappropriate, but how do you express that, in a How can you understand an LLM? We'll probably have to go back to the beginning, think about how the original solution works, and review the whole approach. At best, applying a tweak like this fixes a particular problem with your results while creating more.
Let's represent another very real example. What if we added to the message “Never use explicit or profane language in your answers, including (list of bad words here)”? Maybe that will work in many cases, and the model will refuse to say the bad words that a 13-year-old boy asks of him to be funny. ai-list-dirty-naughty-obscene-bad-words/” rel=”noopener ugc nofollow” target=”_blank”>But sooner or later this has additional unexpected side effects. What if someone is looking for the history of sussex, england? Alternatively, someone will come up with a bad word that you left off the list, so it will be a constant job to maintain. What about bad words in other languages? Who judges what goes on the list?? My head hurts just thinking about it.
These are just two examples and I'm sure you can think of more similar scenarios. It's like putting Band-Aids on a leaky pipe, and every time a spot is repaired, another leak appears.
So what do we really want from LLMs? Do we want them to generate a highly realistic mirror image of what human beings are really like and what our human society really looks like from the perspective of our media? Or do we want a sanitized version that cleans the edges?
Honestly, I think we probably need something in between and we have to keep renegotiating boundaries, even if it's hard. We do not want LLMs to reflect the real horrors and cesspools of violence, hatred and more that human society contains, which is a part of our world that should not be amplified even a little. Zero content moderation is not the answer. Fortunately, this motivation aligns with the desires of large corporate entities that use these models to be popular with the public and make a lot of money.
…we have to continue renegotiating the limits, even if it is difficult. We do not want LLMs to reflect the real horrors and cesspools of violence, hatred and more that human society contains, which is a part of our world that should not be amplified even a little. Zero content moderation is not the answer.
However, I want to continue to gently advocate the fact that we can also learn something from this dilemma in the world of LLMs. Instead of simply being offended and blaming the technology when a model generates a bunch of photos of a white doctor, we should pause to understand why that is what we receive from the model. And then we should carefully debate whether the model response should be allowed, and make a decision that is based on our values and principles, and try to carry it out as best we can.
As I said before, an LLM is not an alien from another universe, it is us. He is trained in things. us wrote/said/filmed/recorded/did. If we want our model to show us doctors of diverse sexes, genders, races, etc., we need to create a society that allows all of those different types of people to have access to that profession and the education it requires. If we worry about how the model reflects us, but don't take seriously the fact that we are the ones who need to be better, not just the model, then we are missing the point.
If we want our model to show us doctors of diverse sexes, genders, races, etc., we need to create a society that allows all of those different types of people to have access to that profession and the education it requires.