Launching Sigma and InfluxDB Collectors


🚀 Exciting News: Launching Two New Metadata Collectors on data.world! 🚀

Today, we are excited to announce the release of two new metadata collectors on data.world: the Sigma Collector and the InfluxDB Collector. These tools are designed to simplify and supercharge your data integration and management capabilities.

🔍 Why you'll love the Sigma Collector:

  • For Sigma users, governance can sometimes be a challenge. We’re here to help! Now, with data.world, you can obtain a clear visibility into your Sigma workbooks, files, datasets, and more, ensuring data protection, control, and traceability.
  • Business decision-makers can now effortlessly visualize and integrate Sigma's KPIs and metrics with data from other sources, leading to informed decisions.
  • Data analysts will have improved data trust levels. How? Metadata for Sigma workbooks and elements (e.g., titles, descriptions, authors, last updated info) are now easily accessible within data.world, giving quick and deep insight.
  • Gain valuable insight into data quality and health via a Sigma Hoot, our latest DataOps feature. Read more about Hoots in this Whatsnew Post from August.

🌟 Sigma Collector Features:

  • Metadata Harvested: Workspace, Workbook, Folder, Dataset, Connection, Tag, Grant, Member, Team, and Data Element.
  • Lineage: Capture inter-system lineage within Sigma, connecting to tables, datasets, and other data elements in the workbook.

An example of Sigma Workbook metadata inside the data.world platform, such as who created and last updated the workbook as well as permissions, as well as Lineage.

🔍 How the InfluxDB Collector helps you:

  • Enhance your visualization dashboards! With InfluxDB being a pivotal data source for Grafana, it’s essential to understand the intricate Influx-Grafana relationship. We make that possible.
  • Dive deep into metadata about buckets, tasks, and measurement columns, empowering users to easily locate and monitor time series data, coupled with insights on its processing.

🌟 InfluxDB Collector Features:

  • Metadata Harvested: Bucket, Measurement Schema, Measurement Column, Task, Label, Organization, and Telegraf Configuration.

An example of metadata from an InfluxDB Task inside the data.world platform, including the expression, last run status, and task status.

In a world where data is continuously evolving, it’s crucial to have the right tools for discoverability, integration, and governance. With our new collectors, we aim to make your journey smoother, more insightful, and more powerful.

Sigma, a Tier 1 collector, and InfluxDB, a Tier 2 collector, are both available immediately for Enterprise Customers. Read the documentation for full details:

Hoots and BB Bots available for enterprise customers

Note – in January 2024, BB Bots were renamed Sentry Bots

Release Notes – August 17, 2023

Announcing the launch of Hoots and BB Bots, the latest in our set of DataOps application features, free to all tiers of our enterprise catalog customers.

What problems do Hoots and BB Bots solve? Hoots bring the relevant information from the catalog to your data-consuming teams (analysts, scientists, executives, etc.) and provide simple communication and timely updates about data quality and freshness via BB Bots. Together, these features increase communication and trust and save your data engineering team valuable time in reanswering the same data questions across your data-consuming teams.

What is a Hoot? A Hoot surfaces important context about your data – including data quality and usage information – directly to the applications being used to make data-driven decisions. This saves data producers time that would otherwise be spent answering questions about the state of the data and ensure that data consumers have the context they need to use data confidently.


How do Hoots work? Hoots are simple trust badges that turn green, red, or yellow depending on the health status of your data pipeline. Hoots are configured from the catalog and added to your web-based data product to inform users of health status and more information that is fed automatically from the catalog and automated monitors called BB Bots.


What is a BB Bot? BB Bots are automated monitors that change the status color of the Hoots, providing a trust signal to end-users and allowing data engineers more time to investigate issues and less time answering and re-answering questions.

How do BB Bots work? BB Bots monitor the data.world Data Catalog Platform and other orchestration and observability tools, like Airflow, Monte Carlo, dbt and Matillion. BB Bots automate the communication of data quality and health status and surface this information to the Hoot where it can provide important context alongside other information from the catalog, like definitions, lineage, owner, and policies. All of this information is surfaced in the Hoot that lives on the applications that data consumers are using, like Looker, PowerBI, and Tableau.


To find out how to configure a Hoot, you can read more about these features in our product documentation and enroll in the DataOps and BB Bots course available at data.world University.

SQL Server Reporting Services (SSRS) support for metadata collection is now Live!

Announcing our newest metadata collector - SQL Server Reporting Services (SSRS)! This collector is designed to provide you with an effective solution for extracting metadata from your SSRS environment into your data.world catalog. Our integration facilitates the automated extraction, organization, and presentation of specific metadata elements from your SSRS system. You'll gain valuable insights into your datasets, data sources, folders, KPIs, reports, and linked reports – all within your easily navigable catalog. 

With the SSRS collector, you can:

  • Learn more about your reports and data, including who created a report or dataset and when they were last updated, helping you understand and trust your data
  • See the lineage of which datasets were used in a report, allowing you a comprehensive view of the data flowing into a report
  • Keep track of KPIs from SSRS and integrate them with business metrics from other source systems, all within one easy-to-use catalog, leading to better data-informed decisions

Are you ready to unlock the potential of your SQL Server Reporting Services? You can read more about how this collector works and all it harvests in the documentationThis collector is Tier 2 for Enterprise customers, and is available in dwcc version 2.151 and later.

An example of metadata from an SSRS Report, including Lineage:


Announcing Enhanced Email Notification Options

Visit your notifications settings page to customize the transactional emails you receive from data.world.

You can choose to:

  • Turn off all non-essential email communications
  • Unsubscribe from a category of email notifications
  • Customize which digests you receive
  • Customize dataset and project activity notifications

Learn more

🚀 Introducing data.world's New Governance Application - What's New! 🚀

We are thrilled to announce the launch of data.world's powerful new Governance Application, designed to automate time-consuming tasks, streamline workflows, and enable you to begin to treat data like a product.  With this latest release, we are bringing you a suite of exciting features to help ensure data integrity, compliance, and security while empowering you to make data-driven decisions with confidence.

🎥 Watch our Digital Event on Best Practices using our new Governance Application Features here: https://data.world/resources/webinar/how-automation-unlocks-productive-engaging-data-governance/ 

🎯 Core Automations - Standard for All Customers 🎯

1️⃣ Default Value Assignment: Set default values for specific metadata fields, ensuring consistent and accurate data entry. Learn more in our Product Docs [link].

2️⃣ Inherited Assignment: Easily inherit metadata from related resources, saving time and reducing redundancy. Product Docs [link] have all the details.

3️⃣ Metadata Completeness: Get automated insights into the completeness of metadata across your datasets, enabling better data tracking and quality control. Explore how it works in our Product Docs [link].

4️⃣ Metadata Freshness Review: Want to set a cadence by which to review your stewards must metadata and ensure it’s still fresh?  Learn how to set it up in our Product Docs [link].

5️⃣ Metadata Freshness Refresh: Ensure your metadata remains up-to-date with automated refreshes. Check out our Product Docs [link] for step-by-step instructions.

6️⃣ Sensitive Data Discovery: Safeguard your sensitive information by automatically identifying and managing sensitive data elements. Get started with our Product Docs [link].

💡 Query Based Actions App - Unlock Advanced Capabilities 💡

For our advanced users, we are introducing the Query Based Actions App - a game-changer! Utilize this app to perform custom actions based on scheduled queries, providing unparalleled flexibility. To gain access, complete our upcoming Training Course [coming soon] or reach out to your Designated Customer Success Director.

🏆 Premium Automation - Exclusive Benefits for Paid Customers 🏆

🌟 Access Request Approval: Take control of your data access with this premium automation. Easily manage access requests and ensure data privacy and security. Set up sophisticated, multi-step workflows to more effectively manage your end users data requests in requesting data and enable your business users to requesting granting access to data with one click. Integrate with ServiceNnow to extend the automation to your ticketing system. Contact your Customer Success Director to activate this powerful application feature.

See it in action!

🌟Task Management: Native Task management provides one central place in data.world for any user to go and view all the tasks they need to complete and be able to action on those tasks, immediately.

A suite of additional premium automation coming in the following releases.


🏃‍♂️ Early Adopter Program - Our customers helping us mold our product 🏃‍♀️

We are excited to offer an Early Adopter Program to the first round of our customers. Early Adopter Customers receive dedicated support and guidance throughout the adoption process and provide us with valuable feedback to shape future improvements.

🔄 Continuous Improvement and Iteration 🔄

At data.world, we are committed to your success. We will continuously improve and iterate on our Governance Product based on your feedback and needs. Expect exciting enhancements and new features over the next quarters.

🔧 Easy Configuration for Administrators 🔧

Setting up these automation is a breeze for any admin! No technical knowledge is required for all core automation except the Query Based Actions App, ensuring that you can start leveraging the benefits right away.

Don't miss out on the opportunity to revolutionize your data governance practices. Embrace the power of Data.world's new Governance Product today!


Improvements to the Metadata Collectors Page and CLI Command Builder

We are thrilled to announce the General Availability of the Metadata Collectors page and CLI Command Builder tool! In addition, we've introduced the ability for users to create, manage, and delete Service Account tokens. These 3 features empower catalog administrators to more quickly set up on-premises collectors so your catalog users can get started discovering and understanding your data faster. In addition, seeing all the collectors (on-premises or cloud) that are bringing metadata into their catalogs allows you to maintain and govern your catalog more effectively.

For more information on these features, continue reading below.


Metadata Collectors Page: found in the Settings tab of an Organization, this page shows all of the collectors that are currently appearing in your catalog and other important information, such as the last time the collector ran. This page also includes cloud collectors set up via Connection Manager. For more information, refer to the documentation.

The CLI Command Builder allows users to step through a wizard to set up on-premises collectors. The wizard generates either a CLI command or a YAML file, so users can more quickly set up collectors during implementation. Since the BETA release, we've streamlined the form fields to more clearly differentiate required fields from optional fields For more information, refer to the documentation (available sources are denoted as "collector wizard available").

Service Accounts: administrators can now create, refresh (edit the expiration date), and delete service accounts from the UI. From the wizard, there is a "Create a service account" link that will take you to the "Service accounts" tab in the Settings page, and clicking on the "Add service account" button will generate an API token. We recommend using service accounts when setting up a collector, so the configurations aren't tied to user accounts. For more information, refer to the documentation.


Announcing support for Confluent Kafka metadata

Announcing our newest metadata collector - Confluent Kafka! We know how important it is to have the most up-to-date streaming data, so we’ve created this collector to allow you to easily monitor and collect Kafka metadata from your Confluent streaming platform. 

With Kafka metadata in data.world, you and your teams can: 

  • Easily discover and monitor streaming metadata for real-time applications
  • Understand what is being streamed from on-prem and cloud Confluent
  • Have a single source of truth for your Confluent schemas for better discovery and governance

The data.world Confluent Collector is actually two collectors, one for Confluent Platform (on-prem) and one for Confluent Cloud. With these collectors, you can capture, store, and analyze metadata including Cluster, Consumer, Producer, Broker, Partition, Schema, Consumer Group, Topic, and Environment (for Cloud). The collectors can optionally harvest metadata from Avro, JSON-schema, and Protobuf schemas stored in Confluent Schema Registry.

These Collectors are Tier 2 for Enterprise Customers. You can read the full documentation for Confluent Platform here and for Confluent Cloud here.

Avro Schema example metadata

An example of metadata for an Avro Schema in the data.world platform

Archie Bots - AI features available for enterprise customers

Release Notes – June 5, 2023

Announcing the beta rollout of catalog platform features that utilize OpenAI and your knowledge graph to accelerate the performance and productivity of your data team.

1. Data asset enrichment: Automatically generates descriptions of data assets based on metadata and related objects. Also provides summaries of SQL queries in certain types of objects, improving the explainability and comprehension of data policies and ETL resources.


2. Data Exploration: Generate questions that the data could answer, promoting deeper data understanding and discovery of untapped insights. Collection editors can generate questions and quickly explore collections with tables to better understand the types of business questions the data could answer.

3. SQL Generation and Code Summaries: Enables users to create SQL queries using natural language, making data querying more accessible to a wider audience. Also enables workspace users to get a summary of SQL code to get quick comprehension of data queries.

4. AI-Assisted Search: Enhances search capabilities by parsing search terms into keywords, advanced search syntax, and providing suggested filters. Users can search beyond just keywords.


These new features bolster the performance of data teams and democratize data access across the enterprise, reinforcing our commitment to empowering organizations to make data-driven decisions effectively. If you are interested in these opt-in beta features, please reach out to your customer service manager.

Announcing Azure Data Lake Storage Gen 2 Collector and Databricks Collector Lineage and Jobs

We’re excited to announce new enhancements to data.world’s Databricks Collector and a brand new Collector for Azure Data Lake Storage Gen 2! With the help of these additional metadata harvesting and lineage capabilities, you can now get more detailed insights into your data than ever before.

Our Databricks Collector allows you to quickly and easily collect metadata from your Databricks environment into data.world. Now, with the addition of Jobs harvesting and lineage capabilities, you can get a deeper understanding of where your data is coming from, how it’s being used, and what insights you can discover.

Our new Jobs harvesting feature allows you to collect additional information about your workflows, such as creator, description, success, schedule, and more. This lets you better understand how and why your data was transformed.

The new lineage capabilities let you track your data’s journey, from its source all the way through its transformations. This means you can easily trace your data’s history, identify potential bottlenecks or sources of errors, and quickly gain an understanding of how your data has changed over time.

Our Azure Data Lake Storage Gen 2 Collector allows you to bring insights about your data storage layer into data.world. With this Collector, you can efficiently harvest metadata about Blobs and Containers, including the owner, last modified, path, and more. This information is vital for understanding your underlying data, leading to more trust and confidence in your data-driven decision-making.

You can learn more about these Features in our Databricks documentation and our Azure Data Lake Storage documentation. Both these Collectors are Tier 2 for Enterprise Customers.

An image showing am Blob from ADLS in the data.world platform

An example of ADLS Blob metadata in the data.world platform


data.world Usage and Audit Events now available as a Snowflake Marketplace Private Listing

As a Snowflake Powered By partner, data.world is proud to announce that usage and audit event data, previously only available in a data.world dataset, is now available as a Snowflake Marketplace Private Listing.  This allows data.world customers access to their full history of data.world events data via the Snowflake Data Cloud enabling high performance and advanced analytic functions on this data.  It also makes data.world events and logging data available via Snowflake with no ETL required for integration in a wide variety of use cases.  To read more about this capability and how to request access to a Private Listing, please see our documentation here.

Show Previous EntriesShow Previous Entries