With the rapid adoption of large language models (LLMs), it became clear early on that real-world enterprise applications require something more than general AI knowledge. To be genuinely useful, especially within companies, models need to work not just with their own training data, but with internal company data—data that lives behind firewalls,...
I specialize in integrating AI systems into products and businesses.
Latest Articles
If you've already set up a vector database—I've previously covered the basics on this blog—you know how powerful they can be for semantic searches, document retrieval, clustering related content, and much more. But trust me, the real excitement kicks in when you combine this database with a Large Language Model (LLM).
One particularly useful technique when working with vectorized documents is similarity detection—in simpler terms, identifying when an author may have copied or heavily borrowed from another source. Whether it's a literal copy/paste from another document or paraphrased content, semantic embeddings allow us to catch it.