Does a digital divide in machine learning exist?
Let’s start with a brief aside. You may have noticed the podcast audio from last week’s episode was a little different. I accidentally pressed the pattern button on the back of my Blue Yeti X microphone and moved from stereo mode to omni mode. The change did not take away from the listenability of the overall podcast episode, but it does change the temperature or color of the overall audio recording feel. Personally, I strongly prefer the results of the stereo mode that have a slight natural reverb and increased depth. You may recall that I tried the cardioid mode as well earlier as it is recommended by Blue microphones for podcasts, but accidental button presses aside I’m going to stick with the stereo mode for recording going forward. I’ll take a moment now to kindly remember the iconic words of science fiction writer Douglas Adams, “Buttons aren't toys!” That is good advice to remember in the future when moving the Yeti X around my desk.
We can now return to the question at hand related to the title of this essay, “Does a digital divide in machine learning exist?” Yes. A digital divide exists. A world of online content exists, but it has a certain barrier to entry or access. You need a smartphone, computer, or tablet with access to the internet to participate with and access the digital world. It is distinct and separate. To that end a divide exists. It really is a digital divide between technology usage and a normative set of functions distinctly separate. Within the machine learning landscape I would argue that a digital divide exists and we could probably categorize it in multiple ways. First, you have the layer of digital divide that exists between those who have access to technology and elect to use it. Beyond that first layer you have to consider the complexity of machine learning models and the underlying data. As a second layer, you probably have to consider that the digital divide creates an inherent bias where under representation or even complete exclusion exists inside the data being used to train and implement machine learning models. That is probably a structural data inequality that is not easily corrected to remove bias related to a lack of inclusion.
I did read an article titled, “Exploring the Intersection of the Digital Divide and Artificial Intelligence: A Hermeneutic Literature Review,” from 2020 . The paper is free to download and does look at a lot of literature. You can generally jump to “Appendix A: Digital Divide Research” to see some extra content about the digital divide if that is an area you are interested in better understanding. They had a focus on visible and invisible AI that I found interesting. A focus on AI that is visible to a user and elements that would not be visible. A lot of the machine learning models at work today are not visible to the users they are impacting.
The metaverse is not visible to most people. For most of us the metaverse is an example where a very real digital divide will exist within two distinct worlds of interaction. I don’t participate in the metaverse. A more pressing example than the metaverse would be access to care between those that can utilize digital access as a vector and people who are unable to use technology for scheduling. Within that frame of reference, I read an article from Anita Ramsetty and Cristin Adams about the “Impact of the digital divide in the age of COVID-19” from the Journal of the American Medical Informatics Association . It was only 2 pages long, but it was directly looking at the topic I wanted to read about. The authors very carefully argued about how underserved communities could exist from a healthcare perspective based on a digital divide.
Machine learning itself also creates problems based on the efforts required to make it work . A lot of gig workers help train and work with the data necessary to make machine learning work. The article listed above from the MIT Technology Review really was focused on the content the title indicated, “AI needs to face up to its invisible-worker problem.” Within the article a NeurIPS talk was referenced by Saiph Savage . That talk is over an hour long and will make you really think about how the largest datasets got labeled and who did that work. It really is something to consider and understand about how foundations are built within the largest language models.
Links and thoughts:
1. I listened to a New York Times podcast from Kara Swisher this week called Sway that covered how Elon Musk might shape the future of Twitter. Casey Newton showed up and they dug into the potential changes at Twitter now that Elon Musk is the largest single shareholder.
2. This week you are getting a second podcast link. This one was titled “Is streaming just becoming cable again? Julia Alexander thinks so” from the Decoder podcast with Nilay Patel. It was an interesting conversation with two people who have obviously spent a lot of time talking.
Top 5 Tweets of the week:
 Carter, L., Liu, D., & Cantrell, C. (2020). Exploring the Intersection of the Digital Divide and Artificial Intelligence: A Hermeneutic Literature Review. AIS Transactions on Human-Computer Interaction, 12(4), 253-275. https://doi.org/10.17705/1thci.00138
 Anita Ramsetty, Cristin Adams, Impact of the digital divide in the age of COVID-19, Journal of the American Medical Informatics Association, Volume 27, Issue 7, July 2020, Pages 1147–1148, https://doi.org/10.1093/jamia/ocaa078
What’s next for The Lindahl Letter?
Week 67: My thoughts on NFTs
Week 68: Publishing a model or selling the API?
Week 69: A machine learning cookbook?
Week 70: ML and Web3 (decentralized internet)
Week 71: What are the best ML newsletters?
I’ll try to keep the what’s next list for The Lindahl Letter forward looking with at least five weeks of posts in planning or review. If you enjoyed this content, then please take a moment and share it with a friend. Thank you and enjoy the week ahead.