data.world May Product Launch

The May release of data.world has something for everyone – user experience improvements for navigation and understanding, a new source of technical lineage, big performance improvements across multiple collectors, and a set of powerful and time-saving capabilities for the admins and program teams building and managing their catalog experience.

Read on to learn about these exciting new features!

New relationships summary and browsing

In our continued effort to streamline the user experience, we've rolled out a few changes to our catalog metadata resource details pages. These adjustments have been carefully designed to save valuable time and provide the information needed to understand and navigate related resources. Stay tuned for more updates to these pages in the coming months!

Collector performance improvements and Oracle lineage

This month, we’re excited to announce improvements to the overall runtime for a number of our collectors and an update to the Oracle collector for harvesting lineage metadata.

We are also excited to announce that both the Amazon DynamoDB Collector and Azure Data Factory Collector are now generally available.

Performance improvements across collectors

A number of performance improvements were made to collectors to improve their overall runtime and reduce memory usage. Customers may see up to 90% improvement depending on the collector and shape/weight of their database/data warehouse.

These collectors include Snowflake, Redshift, Databricks, Denodo, Oracle, PostgreSQL, Teradata, MySQL, Db2, Netezza, SQL Server, dbt Core, and dbt Cloud collectors.

Oracle collector harvests lineage metadata

The Oracle collector now harvests lineage relationships from Oracle views, stored procedures, and functions. With this new metadata, users can now visualize and query for how data is moved within Oracle and other technologies.

Start exploring today

These new collector updates help users catalog their sources faster, facilitate troubleshooting for analysts, and increase trust for business end users. Learn more about what is supported in our documentation:

Oracle Collector documentation

Organization Details Public API

We’ve added a utility endpoint to our public API to surface organization details such as extended description and avatar for use in integration development. Visit the developer portal to learn more.

Catalog Resource Public API

We're delighted to announce an updated suite of API endpoints focused on flexible catalog management.

We’ve seen wide adoption of advanced catalog features like custom resources, relationships and integrations. Our public API now has full support for our “catalog anything” mission and brings the flexibility of the knowledge graph to the initiatives you are working on, both on and off the data.world platform.

Visit our interactive developer portal to learn more and try out the new functionality.

In-App Technical Reference

As a companion to the Catalog Management APIs, we’ve added a new in-app reference to each resource page that provides in-depth information about the ontology and configuration details for the resource. The Technical Reference page can be found by navigating to the “Settings” tab of any resource and clicking “Technical Reference” in the left navigation menu.

Use this reference as a starting point for taking full advantage of the power of the knowledge graph through SPARQL queries and our Public API.

The reference provides details about the supported relationships, metadata fields, selection values, asset statuses and type inheritance for the resource.

For a comprehensive introduction to the Technical Reference, visit our documentation portal.

User Management Utilities

Administrators with the Instance Admin role now have expanded capabilities for managing active users of the platform. Administrators can now self-serve on deactivating users when someone leaves the company or should no longer have access to the system.

We’ve also enabled instance admins to promote other users to the role without needing to do so through data.world support.

Access Audit Utility

Administrators are often asked to troubleshoot access issues and confirm that access has been appropriately revoked when users change roles or need help. The user management portal includes a quick reference utility to check the access level a user has to any resource on the platform.

You can learn more about both of these features in our documentation.

data.world April Product Launch

April brings multiple new metadata collection capabilities to data.world, including Collector enhancements for Snowflake and Databricks, and a new Collector for Amazon Managed Streaming for Kafka.

Read on to learn about these exciting new features!

Catalog Snowflake Streamlit Apps, Databricks Tags, and Amazon Managed Streaming for Kafka (MSK) assets

We’re excited to announce updates to the Snowflake and Databricks collectors to harvest more metadata and collector support for Amazon Managed Stream for Kafka. These updates gather more metadata from these systems and seamlessly bring it into our data.world platform. This metadata helps both technical and non-technical users discover and understand their data quickly, govern their data with greater context, and increase trust in data by providing information about data health and transformations.

All new features are generally available.

The Snowflake Collector harvests metadata from Streamlit in Snowflake

The Snowflake Collector now catalogs metadata from Streamlit in Snowflake, facilitating better governance, discovery, and utilization of Streamlit apps across your organization.

The metadata harvested for Streamlit apps includes comments, owners, creation date, and root location. From data.world, users can discover apps and navigate directly to the app in Snowflake.

An example of a Streamlit app

Databricks Collector harvests Databricks tags

The Databricks Collector now catalogs tags from Databricks catalogs, schemas, tables, and columns. Tags are used in Databricks to simplify the search and discovery of data assets. With these tags now in data.world, users can quickly discover data assets in Databricks. For instance, product teams can now build their data products in Databricks and identify them in data.world.

An example of Databricks Tags

Amazon Managed Streaming for Kafka (MSK) Collector

The new Amazon Managed Streaming for Kafka (MSK) Collector catalogs metadata from Amazon MSK, helping maintain a comprehensive inventory of MSK assets, facilitating better governance, discovery, and utilization of data across your organization.

This collector harvests metadata for clusters, brokers, topics, consumers, and producers.

An example collection from Amazon MSK

Start exploring today

These new collector updates help users understand where data in these reports are sourced from, facilitating troubleshooting for analysts and increasing trust for business end users. Learn more in our product documentation:

data.world March Product Launch

March brings a host of new capabilities to data.world, including a new Snowflake integration for Tag Syncing, two new collectors (Power BI Report Server, Amazon QuickSight), a highly-requested interface improvement to better understand relationships, and a Chrome Extension for Hoots.

Read on to learn about these exciting new features!

Snowflake Tag Sync Automation [beta]

This automation allows users to edit and create new Snowflake tags within the data.world platform and then sync those Snowflake tags between Snowflake and data.world.

Key Features:

Easily edit and create new Snowflake tags using data.world’s simple user interface
Sync edited/new Snowflake tags back to Snowflake with the push of a button
Display Snowflake tags in a new section titled "Snowflake Tags" on resource pages
data.world becomes the source of truth for Snowflake tags when this automation is enabled

Why integrate your Snowflake tags in data.world?

Creating and editing tags is a breeze within data.world’s UI. Snowflake tags are powerful governance tools that allow users to apply policies, control access, and discover resources. An easier method for users to create and edit tags via the data.world means it’s easier to govern Snowflake resources.

Inside the data.world platform, you can view Snowflake Tags on Snowflake resource pages, like this Column page. You can also Sync the tags back to Snowflake with a simple push of a button.

The Snowflake Tag Sync Automation is currently in beta and is available as part of the Data Governance Premium offering. If you are interested in this feature, please reach out to your Customer Success Director and they will help enable the feature for you. You can read our product documentation here for full details.

Relationships as Fields

You asked, we listened. Our latest improvement is crafted from the desire to streamline the enrichment experience, making it easier to build, manage, and see important relationships. This capability allows metadata fields to be built using custom relationships between resource types, providing a more intelligent way to manage metadata and inspire users to build relationships. For example, if you cataloged your Teams and Data Products, you might want to create a relationship to show which teams govern which products (screenshot below). Your users might see and navigate to Team resources via Data Products, or vice-versa.

These new types of fields help users see, understand, and navigate relationships and show the knowledge graph at work. This enhancement compliments our flexible architecture that allows you to build custom Types, Fields and Relationships - deciding what to catalog and how best your users might want to navigate the related resources. This feature focuses on the philosophy that there are many kinds of relationships - some of which have an "attribute-like" utility, rather than just a related object.

You can read more about how you might use this feature in our documentation. This is now available to all enterprise customers using either MDP or Catalog Toolkit for configuration.

We hope you enjoy the new opportunities this enhancement brings to your catalog.

New collectors for Power BI Report Server and Amazon QuickSight

We’re excited to announce two new collectors: Power BI Report Server Collector and Amazon QuickSight Collector, which gather metadata from these systems and seamlessly bring it into the data.world platform. This metadata helps both technical and non-technical users discover and understand their data quickly, govern their data with greater context, and increase trust in data by providing information about data health and transformations.

Both new collectors are available in Private Preview, please contact your Customer Success Director if you are interested in participating in a Private Preview program. More information is available in our product documentation: Power BI Report Server collector, Amazon QuickSight collector.

Hoots Browser Extension for Google Chrome

Now available in the Chrome App Store is the data.world extension for Google Chrome, with the exciting new capability of automatic display of the Hoots badges on the data products where your organization’s users are using data and making decisions.

Now the valuable data trust signals, related glossary terms and additional context from your data.world catalog Hoots configuration are more easily displayed on your BI & analytics applications. With the Hoots Browser Extension capability, Hoots can be shown on Tableau, Power BI, Looker, and any other web-based application, and there’s no integration required to embed the Hoot display.

More information about the Hoots Browser Extension is available in our product documentation: Using Google Chrome Extension for Hoots.

data.world January Product Launch

We are excited to announce the launch of new features and latest improvements:

Cloud Collectors - configure and run collectors hosted by data.world NEW
Support for Snowflake Data Quality - collect and catalog Snowflake Data Metric Functions (DMFs) NEW
Bulk operations UX improvements - streamlined bulk enrichment workflow IMPROVED
Enrichment and discovery UX improvements - more context and default sorting IMPROVED

Read the sections below for full details on each new feature!

NEW Introducing: Cloud Collectors!

We are excited to announce the launch of Cloud Collectors, the newest way to collect metadata on data.world!

Now, you can configure and run collectors that are hosted by data.world with just a few clicks! This feature not only provides a no-code way to start bringing metadata into your catalogs faster, it also has robust functionality around scheduling and monitoring to make setup more transparent and seamless. If you have cloud-accessible data sources that you're ready to bring into your catalog, this feature is for you!

👩‍💼 How can I use Cloud Collectors?

Users with Admin access will see a new option in the collector setup wizard that says "Cloud."

Once you enter your source information, you will be able to set a custom name for your collector configuration, and set a schedule for how frequently the collector should run.

After a collector completes, you will see the metadata and resource types that were collected, as well as the source information you entered while setting up the collector. Here you will also find what might have gone wrong if the collector run failed, and you'll have the ability to cancel the run as well.

You can view all of the collectors you have set up, whether they are from collectors that you host or Cloud Collectors, on the Metadata Collection tab. From here, you can view, edit, and delete collector configurations. And if you're setting up multiple collectors for one source with the same credentials, try the "Duplicate Configuration" button to quickly set all of them up.

For a full list of supported sources and more details on the feature, please refer to the documentation here.

NEW Announcing support for Snowflake Data Quality

We are thrilled to introduce an exciting addition to our existing Snowflake collector – support for Snowflake’s brand-new Data Quality feature, currently available in private preview. This enhancement empowers users to elevate their data quality assessment to new levels.

Key Features:

📊 Collect and catalog Snowflake Data Metric Functions (DMFs): Users can now measure the quality of their data using Snowflake’s powerful "data metric functions" (DMFs) and catalog this context with data.world. Example DMFs include Null Count, Unique Count, and Freshness – providing comprehensive insights into the health of your data.

🔍 Find and understand data quality metrics: The DMFs and observations (recorded metrics) are seamlessly integrated into resource pages on your data.world platform and are also presented as Hoots associated with Snowflake tables and views. This user-friendly interface makes it easy for individuals across your organization to discover and understand data quality metrics effortlessly.

Why Snowflake Data Quality & data.world?

🌐 Compliance & Consistency: In today's data-driven landscape, ensuring compliance and consistency is paramount. This Data Quality feature & integration help you meet these standards by offering real-time insights into critical data metrics.

🔒 Build Trust: Trust is the foundation of effective data utilization. This Data Quality integration helps users to trust their data by bringing metrics related to freshness, blank values, and inaccuracies to the catalog and everyday tools, such as Tableau and Power BI, via Hoots.

Who Benefits?

👩‍💼 Data Stewards, Engineers, Admins: Empower your data stewards and technical teams by providing them with a tool that gives immediate insights into the current state of their data based on specific metrics.

🚨 Data Consumers: With Hoots, you can identify and take swift action on tables and views that require attention, ensuring data quality monitoring is seamlessly integrated with considerations for cost, consistency, and performance.

Experience a new era of data quality and reliability with data.world’s support for Snowflake's Data Quality today!

Note: Snowflake Data Quality is an enhancement within the existing Snowflake collector, and is currently available to Snowflake Private Preview customers. ❄️🚀

Using Hoots, users can quickly see data quality issues, like duplicate data, and easily fix the errors.

Improvements and Enhancements

IMPROVED Improvements to Bulk Operations UX

Bulk operations are a crucial part of keeping a catalog updated and accurate. We're excited to announce some improvements that will streamline and accelerate bulk operations such as bulk editing tags and attributes and bulk moving resources between collections.

First, we have consolidated these operations into a single menu for each place you can initiate a bulk operation (the Glossary tab, the Resources tab, and the Collection Contains tab). Now you can Quick edit, Add resources to collections, and Export/Import resources from all three locations.

Next, we've added the granular selection experience, that previously existed only in Quick edit, to the Export/Import spreadsheet flow as well. This is available on all three entry points (Glossary, Resources, Collections), which should significantly reduce the time it takes to make changes via the spreadsheet option.

Finally, we've simplified and clarified the experience around moving resources between collections. Previously this experience only existed within the Quick edit flow, but now you can select 'Add to Collections' or 'Move or Add Collections' to access this functionality. From the Glossary and Resources tab, you'll be able to add resources to one or multiple collections, and from the Collection tab (example below), you'll be able to add resources to one or multiple collections, or move resources from one or all collections to one or multiple collections.

With these improvements, administrators and curators will be able to perform bulk operations on resources much more quickly. For more information, please refer to the documentation for bulk editing resources here and for bulk editing glossary here.

IMPROVED Added context in various search experiences

The suggested search dropdown now has more context, including the list of collections, owning Organization or User profile, and more. We’ve also added more context to the search experience when a user is relating one resource to another. This added context makes it easier to see and understand what has already been added.

IMPROVED Default sorting improvement + column index sort

We’ve provided a default sort experience that makes scanning the related, contained, and column resources faster. We also added column index as a sort option so users can understand the original column order from the database.

IMPROVED Expansion of the Summary field

The Summary field is now available on all resource types out-of-the-box. The field is available on all Types without the need for configuration.

IMPROVED Rich Text Editing without Markdown

Multi-line fields on catalog resources support Rich Text for more engaging and understandable content, and now these fields can be edited in a What-You-See-Is-What-You-Get (WYSIWYG) user experience rather than users having to create and edit content using Markdown.

Markdown editing is still available for users that prefer it, but now more data owners and users can create compelling rich text content.

Introducing Three New Collectors: Azure Data Factory, DynamoDB, and Teradata

We’re excited to announce three new collectors: Azure Data Factory (ADF), DynamoDB, and Teradata. These collectors gather metadata from these systems and seamlessly bring it into our data.world platform. This metadata helps both technical and non-technical users with discovering and understanding their data quickly, governing their data with greater context, and increasing trust in data by providing information about data health and transformations.

Azure Data Factory Collector: Detailed Data Tracking

The ADF Collector allows users to understand how your data was moved or transformed, the format changes it underwent, and its migration journey to build a foundation of trust. This collector fetches metadata for Factory, Pipeline, Activity, Linked Service, Dataset, Dataflow, Trigger, Integration Runtime, and Global Parameter within Azure Data Factory. It also provides column-level Lineage, showing how data moves between ADF Datasets and connected sources like Snowflake, Databricks, S3, and ADLS. This helps users understand data movements and transformations, increasing trust. It also allows monitoring of pipelines for health checks, boosting confidence in data integrity and reliability.

DynamoDB Collector: Simplified Discovery

Our DynamoDB Collector helps users discover and understand DynamoDB resources. This collector captures deep metadata for Tables and Streams. It’s useful for both technical users managing DynamoDB resources and non-technical users exploring metadata and understanding how they can use DynamoDB resources through an intuitive interface. Technical users will appreciate getting insight into DynamoDB resources, including Tables, Keys, Indexes, and more.

Teradata Collector: Comprehensive Data Insight

The Teradata Collector allows users to see a holistic view of all their Teradata assets to help them manage and discover their data. This collector covers metadata for Database, Table, SQL Procedures, User Defined Functions, View, External Procedures, Triggers, User Defined Methods, and User Defined Types. It also offers Profiling and Lineage, showcasing column-level lineage between views and sourced columns, plus lineage for stored procedures. Users can track ownership and freshness of Databases and Tables, which helps understand data quality. Users can also see metadata about how the data was queried via SQL procedures, user defined functions and methods, and triggers, boosting trust in data products.

Start Exploring Today

These collectors enhance how data users explore, discover, understand, and trust their data. Whether you're a tech pro or not, these tools make navigating metadata easier and help teams become more data-driven. Dive into your data world with these new collectors, and embark on a journey of empowered decision-making.

Happy exploring, The data.world Team

SQL Server Reporting Services (SSRS) support for metadata collection is now Live!

Announcing our newest metadata collector - SQL Server Reporting Services (SSRS)! This collector is designed to provide you with an effective solution for extracting metadata from your SSRS environment into your data.world catalog. Our integration facilitates the automated extraction, organization, and presentation of specific metadata elements from your SSRS system. You'll gain valuable insights into your datasets, data sources, folders, KPIs, reports, and linked reports – all within your easily navigable catalog.

With the SSRS collector, you can:

Learn more about your reports and data, including who created a report or dataset and when they were last updated, helping you understand and trust your data
See the lineage of which datasets were used in a report, allowing you a comprehensive view of the data flowing into a report
Keep track of KPIs from SSRS and integrate them with business metrics from other source systems, all within one easy-to-use catalog, leading to better data-informed decisions

Are you ready to unlock the potential of your SQL Server Reporting Services? You can read more about how this collector works and all it harvests in the documentation. This collector is Tier 2 for Enterprise customers, and is available in dwcc version 2.151 and later.

An example of metadata from an SSRS Report, including Lineage:

Announcing Enhanced Email Notification Options

Visit your notifications settings page to customize the transactional emails you receive from data.world.

You can choose to:

Turn off all non-essential email communications
Unsubscribe from a category of email notifications
Customize which digests you receive
Customize dataset and project activity notifications

Learn more

Announcing support for Confluent Kafka metadata

Announcing our newest metadata collector - Confluent Kafka! We know how important it is to have the most up-to-date streaming data, so we’ve created this collector to allow you to easily monitor and collect Kafka metadata from your Confluent streaming platform.

With Kafka metadata in data.world, you and your teams can:

Easily discover and monitor streaming metadata for real-time applications
Understand what is being streamed from on-prem and cloud Confluent
Have a single source of truth for your Confluent schemas for better discovery and governance

The data.world Confluent Collector is actually two collectors, one for Confluent Platform (on-prem) and one for Confluent Cloud. With these collectors, you can capture, store, and analyze metadata including Cluster, Consumer, Producer, Broker, Partition, Schema, Consumer Group, Topic, and Environment (for Cloud). The collectors can optionally harvest metadata from Avro, JSON-schema, and Protobuf schemas stored in Confluent Schema Registry.

These Collectors are Tier 2 for Enterprise Customers. You can read the full documentation for Confluent Platform here and for Confluent Cloud here.

An example of metadata for an Avro Schema in the data.world platform

Announcing Azure Data Lake Storage Gen 2 Collector and Databricks Collector Lineage and Jobs

We’re excited to announce new enhancements to data.world’s Databricks Collector and a brand new Collector for Azure Data Lake Storage Gen 2! With the help of these additional metadata harvesting and lineage capabilities, you can now get more detailed insights into your data than ever before.

Our Databricks Collector allows you to quickly and easily collect metadata from your Databricks environment into data.world. Now, with the addition of Jobs harvesting and lineage capabilities, you can get a deeper understanding of where your data is coming from, how it’s being used, and what insights you can discover.

Our new Jobs harvesting feature allows you to collect additional information about your workflows, such as creator, description, success, schedule, and more. This lets you better understand how and why your data was transformed.

The new lineage capabilities let you track your data’s journey, from its source all the way through its transformations. This means you can easily trace your data’s history, identify potential bottlenecks or sources of errors, and quickly gain an understanding of how your data has changed over time.

Our Azure Data Lake Storage Gen 2 Collector allows you to bring insights about your data storage layer into data.world. With this Collector, you can efficiently harvest metadata about Blobs and Containers, including the owner, last modified, path, and more. This information is vital for understanding your underlying data, leading to more trust and confidence in your data-driven decision-making.

You can learn more about these Features in our Databricks documentation and our Azure Data Lake Storage documentation. Both these Collectors are Tier 2 for Enterprise Customers.

An image showing am Blob from ADLS in the data.world platform

An example of ADLS Blob metadata in the data.world platform

data.world Usage and Audit Events now available as a Snowflake Marketplace Private Listing

As a Snowflake Powered By partner, data.world is proud to announce that usage and audit event data, previously only available in a data.world dataset, is now available as a Snowflake Marketplace Private Listing. This allows data.world customers access to their full history of data.world events data via the Snowflake Data Cloud enabling high performance and advanced analytic functions on this data. It also makes data.world events and logging data available via Snowflake with no ETL required for integration in a wide variety of use cases. To read more about this capability and how to request access to a Private Listing, please see our documentation here.

Show Previous Entries

data.world Product data.world Product What's New?

https://data.world

data.world May Product Launch

New relationships summary and browsing

Collector performance improvements and Oracle lineage

Performance improvements across collectors

Oracle collector harvests lineage metadata

Start exploring today

Organization Details Public API

Catalog Resource Public API

In-App Technical Reference

User Management Utilities

Access Audit Utility

data.world April Product Launch

Catalog Snowflake Streamlit Apps, Databricks Tags, and Amazon Managed Streaming for Kafka (MSK) assets

The Snowflake Collector harvests metadata from Streamlit in Snowflake

Databricks Collector harvests Databricks tags

Amazon Managed Streaming for Kafka (MSK) Collector

Start exploring today

data.world March Product Launch

Snowflake Tag Sync Automation [beta]

Relationships as Fields

New collectors for Power BI Report Server and Amazon QuickSight

Hoots Browser Extension for Google Chrome

data.world January Product Launch

NEW Introducing: Cloud Collectors!

NEW Announcing support for Snowflake Data Quality

Improvements and Enhancements

IMPROVED Improvements to Bulk Operations UX

IMPROVED Added context in various search experiences

IMPROVED Default sorting improvement + column index sort

IMPROVED Expansion of the Summary field

IMPROVED Rich Text Editing without Markdown

Introducing Three New Collectors: Azure Data Factory, DynamoDB, and Teradata

SQL Server Reporting Services (SSRS) support for metadata collection is now Live!

Announcing Enhanced Email Notification Options

Announcing support for Confluent Kafka metadata

Announcing Azure Data Lake Storage Gen 2 Collector and Databricks Collector Lineage and Jobs

data.world Usage and Audit Events now available as a Snowflake Marketplace Private Listing