In Olga Tokarczuk’s novel, Flights, a man sits on a train, reflecting on why he takes so many photos of his trips if the pictures just end up unprocessed or stored in a box. “The things I’ve seen are mine now,” the man reflects.
Images carry meaning, convey stories, and preserve memories. These narratives and sentiments are uniquely human traits and are the result of rapid cognitive processing by the human brain. As computer vision expert Fei-Fei Li states in her 2015 TedTalk, this phenomenon is evident even in small children who are capable of not just seeing the world around them but recognizing and processing what they see. Li goes on to say,
“…to take pictures is not the same as to see, and by seeing, we really mean understanding.”
Our ability to see individual images, recognize the context of said images, and process it into a legible and recognizable narrative is an innate and tacit skill. We can barely comprehend how we do it ourselves, so how do you teach a computer? Very slowly, according to Li. Using a massive image database, Li and her team succeeded in teaching a computer to recognize a “cat” in an image, with the ultimate goal of having the computer also recognize a “bed”, a “lamp”, and other components within the picture. Another example Li referenced was a picture of a happy boy looking at a cake. The computer was able to recognize “boy” and “cake” but failed to recognize the true context of the photo. The boy is actually Li’s son, he’s wearing his favorite t-shirt that was a gift from his father, and the cake is a special holiday cake.
This calls to mind a question that continually arises during our quick studies into CUIs and chatbots: “What is the computer (or virtual agent) doing that a human cannot?” But Li’s research compels me to reverse that question: “What are humans doing that a computer cannot?” Often during our quick studies, the line between human and computer was blurred, or we spent significant time trying to make our virtual agents just like a human. Despite the advancements since Li’s talk, image recognition technology is still in its infancy. Now is the time for us, as humans, to take a moment, take a breath, and remember what it is that makes us us. What traits do we want to preserve as purely a human trait, and what traits are we comfortable with transferring to computers and virtual agents? The things we’ve seen are ours…for now at least.