ChatGPT and Bard (from Google) are currently hot topics. After initial enthusiastic responses to natural language and deep learning, attention switched to the unreliability of the information they provide. However, attention to these shortcomings rather misses the point. “Rubbish in, rubbish out” is a commonly accepted observation. Self-evidently, the reliance of these demonstrations on public domain sources will severely undermine the reliability of some of the answers.
Don’t shoot the messenger
The developers wanted to expose AI to diverse knowledge and perspectives, enabling ChatGPT to generate helpful, informative, and relevant responses on a broad range of subjects. As an AI language model, they trained ChatGPT using a wide variety of sources, with a significant portion coming from the public domain, including sources such as Wikipedia. Training data also includes books, articles, websites, and other sources. Internet sources are limited to information available up to and including September 2021.
An optimist will believe that the balance between wildly inaccurate Internet content and authoritative content will favour reliability. And I think there is evidence that this is true when looking at broad topics and well-documented subjects found in textbooks. But, unfortunately, this isn’t true for events handled almost exclusively in blogs, press releases, daily news and social media. Therefore, it’s hardly surprising that it’s straightforward to persuade ChatGPT to deliver wildly speculative or incorrect responses. It’s a direct result of the content it has consumed.
A finger on the Pulse
As some readers might know, BSL has developed a software suite (Pulse) for a global business consultancy. Currently running on an Azure/Cognitive Search platform, our Pulse software has access to a knowledge base containing millions of documents, most of which are provided by certified news agencies, newspapers and commercial data sources. Unlike most Internet content, these documents have authority, delivering reliable, verifiable information.
We process tens of thousands of new documents daily, classifying and selecting content to brief the clients’ employees via real-time news alerts. These briefings are curated by analysts and automatically distributed by our software. We use Cognitive Search to select appropriate content using a rich and flexible set of APIs.
ChatGPT and Pulse
We’ve been experimenting with ChatGPT using the recently announced Azure OpenAI Service. The applications of this technology are almost endless, starting with the introduction of natural language queries such as:
“What’s the current EV charging infrastructure in European countries?”
This example replaces the use of complex, structured queries containing keywords, such as:
“electric car” AND Europe AND headline_lead_metadata: “charging station”
We can combine Azure OpenAI and Azure Cognitive Search to use conversational language to select content, so this works more or less “out of the box”. And, of course, by using verified and reliable sources, our ChatGPT, trained on Pulse data rather than the Internet, will not provide incorrect or unqualified responses.
By training ChatGPT with Pulse content, we have many more quick wins. Low-hanging fruit includes creating alert summaries or comparisons between the articles we include in an alert. We also want to train ChatGPT to select the appropriate sources for specific subject areas. After all, we have several thousand sources in Pulse (7000+), and it’s helpful to narrow down sources to include only the most reliable and relevant information.
ChatGPT and Cognitive Search – A marriage made in heaven
The combination of Azure Cognitive Search and Azure OpenAI Service yields an effective solution for our scenario. It integrates Azure’s enterprise-grade characteristics, Cognitive Search’s ability to index, understand and retrieve the right pieces of data across large knowledge bases, and ChatGPT’s impressive capability for interacting in natural language to answer questions or take turns in a conversation.
This powerful combination releases end-users’ potential to intuitively drill down and interrogate their data. In addition, unlike any other tool we’ve used, ChatGPT understands the context of a question, making it easy to ask for new insights. Using the above example, after receiving the answer to “What’s the current EV charging infrastructure in European countries?” users could simply ask, “What about in France, specifically?” to receive a new answer.
Contact the Bright Side of Life
We have over 30 years of experience with content management, contextual search and knowledge databases. We’ve worked with many software tools for renowned Dutch organisations, including the ANP, the national library, and several national and regional news providers. If you’d like to talk with us about your data and to help unlock the knowledge in your data sources, please get in touch.
The image for this blog was created using the AI service from Midjourney.com.
A new blog about the Midjourney AI service will be published soon.