Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more
Disclosure: The author is the managing director of Connected Data World.
AI is on a roll. Adoption is increasing across the board, and organizations are already seeing tangible benefits. However, the definition of what AI is and what it can do is up for grabs, and the investment required to make it work isn’t always easy to justify. Despite AI’s newfound practicality, there’s still a long way to go.
Let’s take a tour through the past, present, and future of AI, and learn from leaders and innovators from LinkedIn, Intel Labs, and cutting-edge research institutes.
Connecting data with duct tape at LinkedIn
Mike Dillinger is the technical lead for Taxonomies and Ontologies at LinkedIn’s AI Division. He has a diverse background, ranging from academic research to consulting on translation technologies for Fortune 500 companies. For the last several years, he has been working with taxonomies at LinkedIn.
LinkedIn relies heavily on taxonomies. As the de facto social network for professionals, launching a skill-building platform is a central piece in its strategy. Following CEO Ryan Roslanski’s statement, LinkedIn Learning Hub was recently announced, powered by the LinkedIn Skills Graph, dubbed “the world’s most comprehensive skills taxonomy.”
The Skills Graph includes more than 36,000 skills, more than 14 million job postings, and the largest professional network with more than 740 million members. It empowers LinkedIn users with richer skill development insights, personalized content, and community-based learning.
For Dillinger, however, taxonomies may be overrated. In his upcoming keynote in Connected Data World 2021, Dillinger is expected to refer to taxonomies as the duct tape of connecting data. This alludes to Perl, the programming language that was often referred to as the duct tape of the internet.
“Duct tape is good because it’s flexible and easy to use, but it tends to hide problems rather than fix them,” Dillinger said.
A lot of effort goes into building taxonomies, making them correct and coherent, then getting sign-off from key stakeholders. But this is when problems start appearing.
Key stakeholders such as product managers, taxonomists, users, and managers take turns punching holes in what was carefully constructed. They point out issues of coverage, accuracy, scalability, and communication. And they’re all right from their own point of view, Dillinger concedes. So the question is — what gives?
Dillinger’s key thesis,, is that taxonomies are simply not very good as a tool for knowledge organization. That may sound surprising at first, but coming from someone like Dillinger, it carries significant weight.
Dillinger goes a long way to elaborate on the issues with taxonomies, but perhaps more interestingly, he also provides hints for a way to alleviate those issues:
“The good news is that we can do much better than taxonomies. In fact, we have to do much better. We’re building the foundations for a new generation of semantic technologies and artificial intelligence. We have to get it right,” says Dillinger.
Dillinger goes on to talk about more reliable building blocks than taxonomies for AI. He cites concept catalogs, concept models, explicit relation concepts, more realistic epistemological assumptions, and next-generation knowledge graphs.
It’s the next generation, Dillinger says, because today’s knowledge graphs do not always use concepts with explicit human-readable semantics. These have many advantages over taxonomies, and we need to work on people, processes, and tools levels to be able to get there.
Thrill-K: Rethinking higher machine cognition
The issue of knowledge organization is a central one for Gadi Singer as well. Singer is VP and director of Emergent AI at Intel Labs. With one technology after another, he has been pushing the leading edge of computing for the past four decades and has made key contributions to Intel’s computer architectures, hardware and software development, AI technologies, and more.
Singer said he believes that the last decade has been phenomenal for AI, mostly because of deep learning, but there’s a next wave that is coming: a “third wave” of AI that is more cognitive, has a better understanding of the world, and higher intelligence. This is going to come about through a combination of components:
“It’s going to have neural networks in it. It’s going to have symbolic representation and symbolic reasoning in it. And, of course, it’s going to be based on deep knowledge. And when we have it, the value that is provided to individuals and businesses will be redefined and much enhanced compared to even the great things that we can do today”, Singer says.
In his upcoming keynote for Connected Data World 2021, Singer will elaborate on Thrill-K, his architecture for rethinking knowledge layering and construction for higher machine cognition.
Singer distinguishes recognition, as in the type of pattern-matching operation using shallow data and deep compute at which neural networks excel, from cognition. Cognition, Singer argues, requires understanding the very deep structure of knowledge.
To be able to process even seemingly simple questions requires organizing an internal view of the world, comprehending the meaning of words in context, and reasoning on knowledge. And that’s precisely why even the more elaborate deep learning models we have currently, namely language models, are not a good match for deep knowledge.
Language models contain statistical information, factual knowledge, and even some common sense knowledge. However, they were never designed to serve as a tool for knowledge organization. Singer believes there are some basic limitations in language models that make them good, but not great for the task.
Singer said that what makes for a great knowledge model is the capability to scale well across five areas of capabilities: scalability, fidelity, adaptability, richness, and explainability. He adds that sometimes there’s so much information learned in language models, that we can extract it and enhance dedicated knowledge models.
To translate the principles of having a great knowledge model to an actual architecture that can support the next wave of AI, Singer proposes an architecture for knowledge and information organized at three levels, which he calls Thrill-K.
The first level is for the most immediate knowledge, which Singer calls the Giga scale, and believes should sit in a neural network.
The next level of knowledge is the deep knowledge base, such as a knowledge graph. This is where intelligible, structured, explicit knowledge is stored at the Terascale, available on demand for the neural network.
And, finally, there’s the world information and the world knowledge level, where data is stored at the Zetta scale.
Knowledge, Singer argues, is the basis for making reasoned intelligent decisions. It can adapt to new circumstances and new tasks. That’s because the data and the knowledge are not structured for a particular task, but it’s there with all their richness and expressivity.
It will take concerted effort to get there, and Intel Labs on its part is looking into aspects of NLP, multi-modality, common sense reasoning, and neuromorphic computing.
Systems that learn and reason
If knowledge organization is something that both Dillinger and Singer value as a key component in an overarching framework for AI, for Frank van Harmelen it’s the centerfold in his entire career. Van Harmelen leads the Knowledge Representation & Reasoning Group in the Computer Science Department of the VU University Amsterdam.
He is also Principal investigator of the Hybrid Intelligence Centre, a $22.7 million, (€20 million), ten-year collaboration between researchers at six Dutch universities into AI that collaborates with people instead of replacing them.
Van Harmelen notes that after the breakthroughs of machine learning (deep learning or otherwise) in the past decade, the shortcomings of machine learning are also becoming increasingly clear: unexplainable results, data hunger, and limited generalisability are all becoming bottlenecks.
In his upcoming keynote in Connected Data World 2021, Van Harmelen will look at how the combination with symbolic AIin the form of very large knowledge graphs can give us a way forward: Towards machine learning systems that can explain their results, that need less data, and that generalize better outside their training set.
The emphasis in modern AI is less on replacing people with AI systems, and more on AI systems that collaborate with people and support them. For Van Harmelen, however, it’s clear that current AI systems lack background knowledge, contextual knowledge, and the capability to explain themselves, which makes them not very human-centered:
“They can’t support people and they can’t be competent partners. So what’s holding AI back? Why are we in this situation? For a long time, AI researchers have locked themselves into one of two towers. In the case of AI, we could call these the symbolic AI tower and the statistical AI tower”.
If you’re in the statistical AI camp, you build your neural networks and machine learning programs. If you’re in the symbolic AI camp, you build knowledge bases and knowledge graphs and you do inference over them. Either way, you don’t need to talk to people in the other camp, because they’re wrong anyway.
What’s actually wrong, argues Van Harmelen, is this division. Our brains work in both ways, so there’s no reason why approximating them with AI should rely exclusively on either approach. In fact, those approaches complement each other very well in terms of strengths and weaknesses.
Symbolic AI, most famously knowledge graphs, is expensive to build and maintain as it requires manual effort. Statistical AI, most famously deep learning, requires lots of data, plus oftentimes also lots of effort. They both suffer from the “performance cliff” issue (, i.e. their performance drops under certain circumstances, but the circumstances and the way differ).
Van Harmelen provides many examples of practical ways in which symbolic and statistical AI can complement each other. Machine learning can help build and maintain knowledge graphs, and knowledge graphs can provide context to improve machine learning:
“It is no longer true that symbolic knowledge is expensive and we cannot obtain it all. Very large knowledge graphs are witness to the fact that this symbolic knowledge is very well available, so it is no longer necessary to learn what we already know.
We can inject what we already know into our machine learning systems, and by combining these two types of systems produce more robust, more efficient, and more explainable systems,” says Van Harmelen.
The pendulum has been swinging back and forth between symbolic and statistical AI for decades now. Perhaps it’s a good time for the two camps to reconcile and start a conversation. To build AI for the real world, we’ll have to connect more than data. We’ll also have to connect people and ideas.