Over the years the team over at Google has made a really big knowledge graph that you can access via an API and that they use as an informational backbone [1]. In some ways it is the best of what the old web had to offer. They note that it is a database of billions of facts. We are starting to see the creation of just a ton of middling, mediocre, or otherwise terribly written content online [2][3]. Now imagine you had built a knowledge graph of billions of facts. You can’t stop updating that large of a knowledge graph. It would grow stale so quickly with how fast the intersection of technology and modernity is occurring. Let me say that another way you now face a situation where a great flooding of bad content is going to overwhelm your knowledge graph. Yeah a tsunami of imagined information and otherwise hallucinated content is going to destabilize the integration of that knowledge graph. Even the notion that it would be built on facts begins to fade away as a sea of LLMs spit out confusion in the form of very confidently written fabrication.
I’m now going to dig into the world of thought related to combining or using in concert LLMs and knowledge graphs. Probably the most interesting breakdown for the future of knowledge graphs will be proprietary locked in ones vs. the decentralized knowledge graphs that could even be powered by a blockchain [4]. We are going to see a huge battle between decentralized knowledge graphs that maybe use even a federated approach to stay fresh and the near monolithic large knowledge graphs that individual corporations are trying to keep perpetually fresh. One paper dealing with the combination of LLMs and knowledge graphs would be, “Large Language Models and Knowledge Graphs: Opportunities and Challenges,” that was published back in 2023 [5]. Another paper from 2023 would be, “Connecting AI: Merging Large Language Models and Knowledge Graph,” that is generally covering the same conceptual landscape [6].
Some people are starting to make arguments that maybe the internet is really starting to break. The internet, once hailed as a beacon of boundless opportunity, now finds itself at a precarious crossroads clouded by mounting concerns. As behemoth tech entities tighten their grip, questions of control and influence darken the digital horizon. Privacy breaches and the insidious spread of misinformation cast a long shadow over its once-promising landscape. Algorithms meant to connect have inadvertently fueled division, while the exploitation of personal data raises profound ethical quandaries. Meanwhile, the internet's infrastructure strains under the weight of cyber threats and a persistent digital divide, where access remains unequal and opportunities unevenly distributed. Yet amidst these challenges, a sense of loss pervades—the fading promise of an open, inclusive digital future. Navigating this uncertain terrain demands a collective effort to reclaim the internet's original ideals of empowerment and connectivity, ensuring it remains a force for good amid mounting challenges.
Footnotes:
[1] https://support.google.com/knowledgepanel/answer/9787176?hl=en
[2] https://www.niemanlab.org/2022/12/the-ai-content-flood/
[3] https://www.thealgorithmicbridge.com/p/how-the-great-ai-flood-could-kill
[4] https://medicpro.london/decentralised-knowledge-graphs/
[5] https://arxiv.org/pdf/2308.06374
[6] https://www.computer.org/csdl/magazine/co/2023/11/10286238/1RimWA0RzFK
What’s next for The Lindahl Letter?
Week 161: Structuring really large knowledge graphs
Week 162: Indexing facts vs. graphing knowledge
Week 163: Self-Supervised Learning
Week 164: Graph-Based Feature Engineering
Week 165: Federated Feature Engineering
If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. Stay curious, stay informed, and enjoy the week ahead!
Increasingly problematic knowledge graph updates