data.world August Product Launch

The August release of data.world brings a number of new and improved product capabilities, including an improved user interface for resource creation, real-time metadata sync with Databricks, a new metadata field to improve an understanding of where catalog resources come from, and enhancements to Microsoft, Salesforce and Databricks collectors.

Also available now is an exciting improvement to our AI Context Engine™ that helps provide explainable answers from your structured data.

Read on to learn about these exciting new features!


Active Directory authentication for Microsoft Collectors

The SQL Server, SQL Server Reporting Services, Power BI Report Server, and SQL Server Integration Services Collectors now support Active Directory domain credentials using NTLM authentication type allowing the collector to connect securely using Active Directory-managed authentication.


Salesforce Collector

The all-new Salesforce Collector catalogs rich metadata from Salesforce, helping maintain a comprehensive inventory of Salesforce assets, facilitating better governance, discovery, and utilization of data across your organization.

The new version of the collector now harvests metadata for objects, fields, dashboards, and reports directly via Salesforce APIs.

An example collection from Salesforce


Databricks Collector harvests lineage to Amazon S3 and ADLS Gen2

The Databricks Collector now harvests External Locations allowing users to understand cross-system lineage between Databricks, Amazon S3, and Azure Data Lake Storage Gen2.


AI Context Engine - new "detailed answer" endpoint

The new "detailed answer" endpoint in AI Context Engine works similarly to the existing "Answer Tool" endpoint, but it returns much more information, including:

  • answer - Textual response, same as before
  • result - raw data & schema (frictionless data format)
  • sparql - SPARQL query
  • sql - the SQL query
  • targetSql - SQL queries executed against target systems
  • terms - Business terms that were used to generate the query
  • ontologyUsed - The parts of the ontology that were used to generate this response
  • evidence - the "thoughts" that were generated during the run (same as what you might see in the debug chat tool, Archimedes)

In contrast, "tool" endpoints are simpler, returning only the response in order to integrate seamlessly with other LLMs (e.g., OpenAI).

In the future, this and other "detailed answer" endpoints in AI Context Engine will return additional information and evidence as we further deliver on accurate, explainable and governed answers from your structured data.

Link: https://developer.data.world/reference/callanswer


Source System metadata field

Source System is a new default field that consistently describes the system from which the catalog record metadata was sourced (e.g. Tableau). This field will be used to improve discovery by allowing types to be organized by Source as well as helping to differentiate between ambiguous resource type names (e.g. Dataset). Read more about how to extend and configure this field for your custom types and collectors here.

For more information see our product documentation.


Databricks Publisher real-time updates

This feature allows automatic triggering of Databricks Publisher automation whenever a Databricks Column or Table description is updated in the data.world catalog via the UI or public API. This ensures real-time synchronization of metadata between data.world and Databricks, eliminating the need for manual updates.

Databricks Publisher (announced last month) is currently in Beta, so to get access, reach out to your Customer Success Manager.


Improved UX for resource creation

We’ve enhanced the UI for creating new resources in data.world. Now, when users create a resource in data.world, a new multi-step wizard flow replaces the old small pop up modal.

This new approach makes resource creation much easier for a wider variety of users - thanks for your feedback on this important aspect of the catalog user experience!