Combined with the semantic web, the future of LLMs is brighter

Emre Barack Sokullu
3 min readOct 12, 2023

Original of this article was published on: https://emresokullu.blog/2023/10/11/combined-with-the-semantic-web-the-future-of-llms-is-brighter/

Today, LLMs resemble a compressed web. It’s such an amazing technology that it can fit into your mobile phone, yet it performs better in the sense that it responds more effectively most of the time, than a global search engine like Google. The latter runs through and parses clouds from Amazon, Google, Microsoft, Alibaba, and Tencent, as well as an equally big amount of web sites that reside outside these silos. An open-source ChatGPT clone, such as Llama2, can, however, run on your mobile phone entirely offline. That’s why the invention of LLMs is as monumental as the creation of the internet.

Now, consider this: ChatGPT has, up to now, parsed only the open web and perhaps some private data silos with which negotiations have been successful. Imagine if it could also navigate through social networks.

Well, they can’t because all social networks are closed. Even the most open one, X (also known as Twitter), became closed off after Elon Musk took the reins.

However, with the integration of certain semantic web formats, this could become possible. Subsequently, you could pose deeply personal questions to your ChatGPT, such as:

  • “Would my friends Laura and Alev get along?”
  • “Is there an individual within my secondary network who possesses a deep understanding of vector databases and substantial expertise in real-time PHP development?”

In both questions, we operate under the assumption that Laura, Alev, and I all have personal websites compatible with microformats in the public domain (for instance, my website, emresokullu.com, utilizes microformats). The rest hinges on the magical capabilities of LLMs, empowered by the semantic web.

You might now wonder, doesn’t Google already do that with Bard?

Firstly, with all due respect, Bard doesn’t perform as well as ChatGPT4.

Secondly, while Bard’s promise to integrate with your Gmail and Google Drive is impressive, and something all LLMs will need to catch up with (including integration with your Dropbox, Slack, Outlook, etc.), the semantic web is a different discipline.

While Dropbox and Gmail do provide access to your personal data, which is indeed quite enriching, the combination of the semantic web and LLM operates in a gray area between public data and individual personal data. The potential here, as demonstrated in the previous examples, is immense.

Therefore, multimodality (for instance, communicating with LLMs through not only text but also audio and visuals), personal data silos, plus the critical addition of the semantic web, will render LLMs truly amazing!

The semantic web, particularly microformats, lays the foundation for decentralized identities and, when combined with fediverse feed mechanisms such as Mastodon — enhanced, of course, by truly decentralized data stores like IPFS, as detailed in my whitepaper — is poised to be revolutionary. However, this presumes we can overcome the private data silos of Facebook. While Instagram’s commitment to converting Threads into a fediverse product has not yet been realized, as I anticipated, competition will likely drive it to become a reality in one way or another.

In conclusion, although the semantic web did not advance and become Web 3.0 as we had hoped during the Web 2.0 era, I believe it may gain increased importance in this new era.

P.S.: You might be wondering what exactly the semantic web is, or how emresokullu.com becomes a part of it. For clarification, I encourage you to visit microformats.com. What integrates my site into the semantic web is the utilization of straightforward HTML definitions. For instance, when defining my company, Grou.ps, I encapsulate it in a ‘p’ tag with a class of ‘p-experience,’ all under a ‘body’ tag designated with a class of ‘h-resume.’ This method transforms my personal page into a sort of independent LinkedIn branch/node. While ChatGPT could theoretically parse my website and extract all this information, it presents two issues: (1) it would be computationally and unnecessarily taxing, and (2) it doesn’t allow me to define some hidden information, such as a hidden block of:

<div class="d-none">
<p>More info...</p>
<p class="p-education">Galatasaray Lisesi</p>
<p class="p-education">Bogazici University</p>
<time class="dt-bday">1983-01-15</time>
<p class="p-sex">m</p>
</div>

which enriches the website with information I may not necessarily want to display. You could even expand this concept to create a mini Match.com node using invented microformats like

<p class="x-dating">f</p>
<p class="x-status">single</p>

🙃 why not?

--

--

Emre Barack Sokullu

🌎 Globetrotter 👨🏻‍💻 Volupta LLC 👨🏻‍💻 RISG Corp 📈 0 to 1 Ventures (Wordpress, H2O.ai, BTC) ⚽️ Galatasaray SK Congressmember