What is the Premium Metadata Completeness automation? The premium version of our metadata completeness automation checks resources for missing metadata, identifies incomplete information, and lets you assign tasks to your team to update and complete this information efficiently. This gives you automated insights into the completeness of metadata across your resources, enabling better data tracking and quality control.
How does this automation work? If the automation identifies any resources that lack essential metadata, it generates a task for each of these resources. The user group associated with the automation receives in-app notifications regarding these tasks, and can then act upon the resources to provide the missing metadata.
How is this automation different from the Metadata completeness Automation? The Metadata Completeness Automation solely generates a report listing incomplete resources. In contrast, the Premium Metadata Completeness Automation not only produces the report but also creates a list of actionable tasks for the identified incomplete resources that can be assigned to user groups. It then alerts the authorized users to navigate the resource pages, prompting them to fill in the missing information.
How will this automation help me?
As an admin, I can now delegate tasks effortlessly from within the Reports, monitor progress over time, and quickly spot issues of incomplete resources.
As a data owner, I can now take action promptly because I can see task-relevant guidance to help me understand what metadata is missing or incomplete and how to fix it. Also, all my tasks are found in one place so I can quickly manage the next steps.
As a data consumer, I have increased trust in my data because now I have instant visibility into completeness. Incomplete resources are automatically flagged for data me, increasing my understanding and trust of the resource.
The Premium Metadata completeness Automation is available only to customers that have purchased the Data Governance Premium tier. You can read more about the Premium Metadata Completeness Automation in the documentation here: https://docs.data.world/en/229853-premium-metadata-completeness-automation.html
An example of a Metadata Completeness Report.
We have an exciting update to our previous Netezza collector. Now this collector harvests much more metadata, including Lineage.
You can use this collector to harvest metadata from Netezza Performance Server. It collects metadata for database information, schemas, tables, views, materialized views, functions, stored procedures, and columns. It also supports Lineage for views, materialized views, and procedures. You can read more about the new features here: https://docs.data.world/en/197948-about-the-netezza-collector.html
Netezza is a Tier 3 collector, and available to Enterprise Customers.
An example of Lineage from the Netezza Collector.
Enterprise catalog customers can now configure custom metadata fields and out-of-the-box fields including Title, Description, Summary, Tags, and Status as read-only. This configuration option prevents contributors from editing fields through the UI or through catalog APIs and should be used to identify fields that are updated through automations and collectors.
Read-only fields can be configured through Catalog Toolkit or custom profiles.
An example of Read Only fields in the UI.
]]>Read the sections below for full details on each new feature!
We are excited to announce the launch of Cloud Collectors, the newest way to collect metadata on data.world!
Now, you can configure and run collectors that are hosted by data.world with just a few clicks! This feature not only provides a no-code way to start bringing metadata into your catalogs faster, it also has robust functionality around scheduling and monitoring to make setup more transparent and seamless. If you have cloud-accessible data sources that you're ready to bring into your catalog, this feature is for you!
👩💼 How can I use Cloud Collectors?
Users with Admin access will see a new option in the collector setup wizard that says "Cloud."
Once you enter your source information, you will be able to set a custom name for your collector configuration, and set a schedule for how frequently the collector should run.
After a collector completes, you will see the metadata and resource types that were collected, as well as the source information you entered while setting up the collector. Here you will also find what might have gone wrong if the collector run failed, and you'll have the ability to cancel the run as well.
You can view all of the collectors you have set up, whether they are from collectors that you host or Cloud Collectors, on the Metadata Collection tab. From here, you can view, edit, and delete collector configurations. And if you're setting up multiple collectors for one source with the same credentials, try the "Duplicate Configuration" button to quickly set all of them up.
For a full list of supported sources and more details on the feature, please refer to the documentation here.
We are thrilled to introduce an exciting addition to our existing Snowflake collector – support for Snowflake’s brand-new Data Quality feature, currently available in private preview. This enhancement empowers users to elevate their data quality assessment to new levels.
Key Features:
📊 Collect and catalog Snowflake Data Metric Functions (DMFs): Users can now measure the quality of their data using Snowflake’s powerful "data metric functions" (DMFs) and catalog this context with data.world. Example DMFs include Null Count, Unique Count, and Freshness – providing comprehensive insights into the health of your data.
🔍 Find and understand data quality metrics: The DMFs and observations (recorded metrics) are seamlessly integrated into resource pages on your data.world platform and are also presented as Hoots associated with Snowflake tables and views. This user-friendly interface makes it easy for individuals across your organization to discover and understand data quality metrics effortlessly.
Why Snowflake Data Quality & data.world?
🌐 Compliance & Consistency: In today's data-driven landscape, ensuring compliance and consistency is paramount. This Data Quality feature & integration help you meet these standards by offering real-time insights into critical data metrics.
🔒 Build Trust: Trust is the foundation of effective data utilization. This Data Quality integration helps users to trust their data by bringing metrics related to freshness, blank values, and inaccuracies to the catalog and everyday tools, such as Tableau and Power BI, via Hoots.
Who Benefits?
👩💼 Data Stewards, Engineers, Admins: Empower your data stewards and technical teams by providing them with a tool that gives immediate insights into the current state of their data based on specific metrics.
🚨 Data Consumers: With Hoots, you can identify and take swift action on tables and views that require attention, ensuring data quality monitoring is seamlessly integrated with considerations for cost, consistency, and performance.
Experience a new era of data quality and reliability with data.world’s support for Snowflake's Data Quality today!
Note: Snowflake Data Quality is an enhancement within the existing Snowflake collector, and is currently available to Snowflake Private Preview customers. ❄️🚀
Using Hoots, users can quickly see data quality issues, like duplicate data, and easily fix the errors.
Bulk operations are a crucial part of keeping a catalog updated and accurate. We're excited to announce some improvements that will streamline and accelerate bulk operations such as bulk editing tags and attributes and bulk moving resources between collections.
First, we have consolidated these operations into a single menu for each place you can initiate a bulk operation (the Glossary tab, the Resources tab, and the Collection Contains tab). Now you can Quick edit, Add resources to collections, and Export/Import resources from all three locations.
Next, we've added the granular selection experience, that previously existed only in Quick edit, to the Export/Import spreadsheet flow as well. This is available on all three entry points (Glossary, Resources, Collections), which should significantly reduce the time it takes to make changes via the spreadsheet option.
Finally, we've simplified and clarified the experience around moving resources between collections. Previously this experience only existed within the Quick edit flow, but now you can select 'Add to Collections' or 'Move or Add Collections' to access this functionality. From the Glossary and Resources tab, you'll be able to add resources to one or multiple collections, and from the Collection tab (example below), you'll be able to add resources to one or multiple collections, or move resources from one or all collections to one or multiple collections.
With these improvements, administrators and curators will be able to perform bulk operations on resources much more quickly. For more information, please refer to the documentation for bulk editing resources here and for bulk editing glossary here.
The suggested search dropdown now has more context, including the list of collections, owning Organization or User profile, and more. We’ve also added more context to the search experience when a user is relating one resource to another. This added context makes it easier to see and understand what has already been added.
We’ve provided a default sort experience that makes scanning the related, contained, and column resources faster. We also added column index as a sort option so users can understand the original column order from the database.
The Summary field is now available on all resource types out-of-the-box. The field is available on all Types without the need for configuration.
Multi-line fields on catalog resources support Rich Text for more engaging and understandable content, and now these fields can be edited in a What-You-See-Is-What-You-Get (WYSIWYG) user experience rather than users having to create and edit content using Markdown.
Markdown editing is still available for users that prefer it, but now more data owners and users can create compelling rich text content.
]]>Azure Data Factory Collector: Detailed Data Tracking
The ADF Collector allows users to understand how your data was moved or transformed, the format changes it underwent, and its migration journey to build a foundation of trust. This collector fetches metadata for Factory, Pipeline, Activity, Linked Service, Dataset, Dataflow, Trigger, Integration Runtime, and Global Parameter within Azure Data Factory. It also provides column-level Lineage, showing how data moves between ADF Datasets and connected sources like Snowflake, Databricks, S3, and ADLS. This helps users understand data movements and transformations, increasing trust. It also allows monitoring of pipelines for health checks, boosting confidence in data integrity and reliability.
DynamoDB Collector: Simplified Discovery
Our DynamoDB Collector helps users discover and understand DynamoDB resources. This collector captures deep metadata for Tables and Streams. It’s useful for both technical users managing DynamoDB resources and non-technical users exploring metadata and understanding how they can use DynamoDB resources through an intuitive interface. Technical users will appreciate getting insight into DynamoDB resources, including Tables, Keys, Indexes, and more.
Teradata Collector: Comprehensive Data Insight
The Teradata Collector allows users to see a holistic view of all their Teradata assets to help them manage and discover their data. This collector covers metadata for Database, Table, SQL Procedures, User Defined Functions, View, External Procedures, Triggers, User Defined Methods, and User Defined Types. It also offers Profiling and Lineage, showcasing column-level lineage between views and sourced columns, plus lineage for stored procedures. Users can track ownership and freshness of Databases and Tables, which helps understand data quality. Users can also see metadata about how the data was queried via SQL procedures, user defined functions and methods, and triggers, boosting trust in data products.
Start Exploring Today
These collectors enhance how data users explore, discover, understand, and trust their data. Whether you're a tech pro or not, these tools make navigating metadata easier and help teams become more data-driven. Dive into your data world with these new collectors, and embark on a journey of empowered decision-making.
Happy exploring, The data.world Team
We're super excited to roll out some cool new updates to our Governance product. This December, we're not just adding fancy features – we're transforming your daily governance tasks to make them smoother, easier, and yes, even a bit more enjoyable. Let's dive into the cool new changes that await you!
We know managing a bunch of automations can be overwhelming. So, we've introduced neat new tabs: Active, Pending Activation, and Archived. This means no more sifting through inactive automations to find what you need.
Let these tabs keep your screen tidy and your mind clear. 🧠✨
✅ Active: View and manage automations that are currently in use.
⏳Pending Activation: Keep track of automations that are configured but not yet activated. When you create a new automation and don't turn it on, it lands here!
❌ Archived: Access your inactive automations without cluttering your main screen.
2. No Guessing in Automation Setup 🛠️
Setting up automations should be clear and straightforward. To ensure this, we've added clear visual cues: No more wondering if your automation will start immediately after saving. We've made it crystal clear that 'save and continue' means just that.
Forget about forgetting! Our upcoming summary page improvements means you won't have to scratch your head trying to recall the details of your automations. This is especially a lifesaver for our governance admins who set and forget their automations. This enhancement will soon allow you to:
✅ See all key details of your automations at a glance.
✅ Better manage and remember your automation settings.
✅ Enjoy a more intuitive interface, particularly beneficial for Core and Premium admins.
🗣️ Your Voice Matters!
Your feedback is what shapes our future updates. We love hearing from you, so let us know what you think about these new features, or any ideas you have for what comes next.
🎁 Cheers to Easier Governing and Happy Holidays! 🎁
The data.world Governance Team
]]>Release notes: Enjoy an even more nimble and intuitive catalog with our latest UX enhancements. Our aim? To streamline your workflow and cut down on unnecessary clicks, duplicates, and possible errors. Here is a list of some of our latest updates:
These updates aim to enhance your experience and productivity. For more details and full list of release notes, please visit docs.data.world.
]]>Catalog administrators and stewards can now more easily operate on Column resources in bulk with the spreadsheet export/import flow. Previously, Columns could only be selected for bulk operations at the Collection level. Now, users can export all Columns of a parent Table, edit attributes in the spreadsheet, and upload changes.
The new functionality is accessible for users with correct access via the 'Columns' tab on a parent Table's resource page (example shown below).
For more information, please refer to the documentation.
]]>Previously for Quick Edit, users could only filter a set of resources by Resource Type. But now users can leverage search facets, advanced filtering, and text search capabilities available in other parts of the data.world platform. Users can also perform multiple searches and apply multiple filters to continually add resources to the selection without restarting each time. This will streamline bulk operations by allowing users to more seamlessly select the exact set of resources intended for bulk enrichment and editing.
These capabilities are now available wherever Quick Edit lives: Glossary, Resources, and Collections. They will appear once you select either "Quick Edit" for Glossary, or the "Edit Multiple Resources" entry point for Resources and Collections, shown below:
In a future release, we will enable these capabilities for the Bulk Upload/Edit feature as well, making offline editing more targeted and effective.
For more information, please refer to the documentation for Glossary Quick Edit, Resources Quick Edit, and Collections Quick Edit.
]]>1) Archie Bots - description generator enhancement
Archie Bots can now effortlessly describe all types of catalog resources, including custom resources. This improvement saves you time enriching your catalog, improving discoverability and understandability. You can read more about Archie Bots here.
2) Improvements to UX and increased max character count of descriptions
Enjoy getting wordy! We've increased the maximum character count of the Description field to 5000, allowing for more comprehensive and detailed information. We've also included markdown support in the hover-over view of descriptions and increased the view window size in search results.
3) Improvements to the search and navigation of Glossary terms
Users can now quickly filter by the first letter, making it easier to locate and manage terms. We've also made improvements to how special characters are sorted in the glossary, ensuring a more intuitive and organized experience.
4) Now you can query the catalog layers
Customers can now query the layers of the graph using a named graph called :current. This feature federates your source data and catalog enrichments into one queryable graph, simplifying data exploration across catalog layers and allowing for easier exploration and analysis of your data assets. You can read more about the catalog layers and how to query them here.
We hope these enhancements empower you to make the most of your enterprise data catalog. Stay tuned for more exciting updates in the future!
]]>Some of the new functionality in this release are:
For more information, refer to the documentation here.
]]>🚀 Exciting News: Launching Two New Metadata Collectors on data.world! 🚀
Today, we are excited to announce the release of two new metadata collectors on data.world: the Sigma Collector and the InfluxDB Collector. These tools are designed to simplify and supercharge your data integration and management capabilities.
🔍 Why you'll love the Sigma Collector:
🌟 Sigma Collector Features:
An example of Sigma Workbook metadata inside the data.world platform, such as who created and last updated the workbook as well as permissions, as well as Lineage.
🔍 How the InfluxDB Collector helps you:
🌟 InfluxDB Collector Features:
An example of metadata from an InfluxDB Task inside the data.world platform, including the expression, last run status, and task status.
In a world where data is continuously evolving, it’s crucial to have the right tools for discoverability, integration, and governance. With our new collectors, we aim to make your journey smoother, more insightful, and more powerful.
Sigma, a Tier 1 collector, and InfluxDB, a Tier 2 collector, are both available immediately for Enterprise Customers. Read the documentation for full details:
]]>Announcing the launch of Hoots and BB Bots, the latest in our set of DataOps application features, free to all tiers of our enterprise catalog customers.
What problems do Hoots and BB Bots solve? Hoots bring the relevant information from the catalog to your data-consuming teams (analysts, scientists, executives, etc.) and provide simple communication and timely updates about data quality and freshness via BB Bots. Together, these features increase communication and trust and save your data engineering team valuable time in reanswering the same data questions across your data-consuming teams.
What is a Hoot? A Hoot surfaces important context about your data – including data quality and usage information – directly to the applications being used to make data-driven decisions. This saves data producers time that would otherwise be spent answering questions about the state of the data and ensure that data consumers have the context they need to use data confidently.
How do Hoots work? Hoots are simple trust badges that turn green, red, or yellow depending on the health status of your data pipeline. Hoots are configured from the catalog and added to your web-based data product to inform users of health status and more information that is fed automatically from the catalog and automated monitors called BB Bots.
What is a BB Bot? BB Bots are automated monitors that change the status color of the Hoots, providing a trust signal to end-users and allowing data engineers more time to investigate issues and less time answering and re-answering questions.
How do BB Bots work? BB Bots monitor the data.world Data Catalog Platform and other orchestration and observability tools, like Airflow, Monte Carlo, dbt and Matillion. BB Bots automate the communication of data quality and health status and surface this information to the Hoot where it can provide important context alongside other information from the catalog, like definitions, lineage, owner, and policies. All of this information is surfaced in the Hoot that lives on the applications that data consumers are using, like Looker, PowerBI, and Tableau.
To find out how to configure a Hoot, you can read more about these features in our product documentation and enroll in the DataOps and BB Bots course available at data.world University.
With the SSRS collector, you can:
Are you ready to unlock the potential of your SQL Server Reporting Services? You can read more about how this collector works and all it harvests in the documentation. This collector is Tier 2 for Enterprise customers, and is available in dwcc version 2.151 and later.
An example of metadata from an SSRS Report, including Lineage:
You can choose to:
🎯 Core Automations - Standard for All Customers 🎯
1️⃣ Default Value Assignment: Set default values for specific metadata fields, ensuring consistent and accurate data entry. Learn more in our Product Docs [link].
2️⃣ Inherited Assignment: Easily inherit metadata from related resources, saving time and reducing redundancy. Product Docs [link] have all the details.
3️⃣ Metadata Completeness: Get automated insights into the completeness of metadata across your datasets, enabling better data tracking and quality control. Explore how it works in our Product Docs [link].
4️⃣ Metadata Freshness Review: Want to set a cadence by which to review your stewards must metadata and ensure it’s still fresh? Learn how to set it up in our Product Docs [link].
5️⃣ Metadata Freshness Refresh: Ensure your metadata remains up-to-date with automated refreshes. Check out our Product Docs [link] for step-by-step instructions.
6️⃣ Sensitive Data Discovery: Safeguard your sensitive information by automatically identifying and managing sensitive data elements. Get started with our Product Docs [link].
💡 Query Based Actions App - Unlock Advanced Capabilities 💡
For our advanced users, we are introducing the Query Based Actions App - a game-changer! Utilize this app to perform custom actions based on scheduled queries, providing unparalleled flexibility. To gain access, complete our upcoming Training Course [coming soon] or reach out to your Designated Customer Success Director.
🏆 Premium Automation - Exclusive Benefits for Paid Customers 🏆
🌟 Access Request Approval: Take control of your data access with this premium automation. Easily manage access requests and ensure data privacy and security. Set up sophisticated, multi-step workflows to more effectively manage your end users data requests in requesting data and enable your business users to requesting granting access to data with one click. Integrate with ServiceNnow to extend the automation to your ticketing system. Contact your Customer Success Director to activate this powerful application feature.
See it in action!
🌟Task Management: Native Task management provides one central place in data.world for any user to go and view all the tasks they need to complete and be able to action on those tasks, immediately.
A suite of additional premium automation coming in the following releases.
🏃♂️ Early Adopter Program - Our customers helping us mold our product 🏃♀️
We are excited to offer an Early Adopter Program to the first round of our customers. Early Adopter Customers receive dedicated support and guidance throughout the adoption process and provide us with valuable feedback to shape future improvements.
🔄 Continuous Improvement and Iteration 🔄
At data.world, we are committed to your success. We will continuously improve and iterate on our Governance Product based on your feedback and needs. Expect exciting enhancements and new features over the next quarters.
🔧 Easy Configuration for Administrators 🔧
Setting up these automation is a breeze for any admin! No technical knowledge is required for all core automation except the Query Based Actions App, ensuring that you can start leveraging the benefits right away.
Don't miss out on the opportunity to revolutionize your data governance practices. Embrace the power of Data.world's new Governance Product today!
For more information on these features, continue reading below.
Metadata Collectors Page: found in the Settings tab of an Organization, this page shows all of the collectors that are currently appearing in your catalog and other important information, such as the last time the collector ran. This page also includes cloud collectors set up via Connection Manager. For more information, refer to the documentation.
The CLI Command Builder allows users to step through a wizard to set up on-premises collectors. The wizard generates either a CLI command or a YAML file, so users can more quickly set up collectors during implementation. Since the BETA release, we've streamlined the form fields to more clearly differentiate required fields from optional fields For more information, refer to the documentation (available sources are denoted as "collector wizard available").
Service Accounts: administrators can now create, refresh (edit the expiration date), and delete service accounts from the UI. From the wizard, there is a "Create a service account" link that will take you to the "Service accounts" tab in the Settings page, and clicking on the "Add service account" button will generate an API token. We recommend using service accounts when setting up a collector, so the configurations aren't tied to user accounts. For more information, refer to the documentation.
With Kafka metadata in data.world, you and your teams can:
The data.world Confluent Collector is actually two collectors, one for Confluent Platform (on-prem) and one for Confluent Cloud. With these collectors, you can capture, store, and analyze metadata including Cluster, Consumer, Producer, Broker, Partition, Schema, Consumer Group, Topic, and Environment (for Cloud). The collectors can optionally harvest metadata from Avro, JSON-schema, and Protobuf schemas stored in Confluent Schema Registry.
These Collectors are Tier 2 for Enterprise Customers. You can read the full documentation for Confluent Platform here and for Confluent Cloud here.
An example of metadata for an Avro Schema in the data.world platform
]]>Announcing the beta rollout of catalog platform features that utilize OpenAI and your knowledge graph to accelerate the performance and productivity of your data team.
1. Data asset enrichment: Automatically generates descriptions of data assets based on metadata and related objects. Also provides summaries of SQL queries in certain types of objects, improving the explainability and comprehension of data policies and ETL resources.
2. Data Exploration: Generate questions that the data could answer, promoting deeper data understanding and discovery of untapped insights. Collection editors can generate questions and quickly explore collections with tables to better understand the types of business questions the data could answer.
3. SQL Generation and Code Summaries: Enables users to create SQL queries using natural language, making data querying more accessible to a wider audience. Also enables workspace users to get a summary of SQL code to get quick comprehension of data queries.
4. AI-Assisted Search: Enhances search capabilities by parsing search terms into keywords, advanced search syntax, and providing suggested filters. Users can search beyond just keywords.
These new features bolster the performance of data teams and democratize data access across the enterprise, reinforcing our commitment to empowering organizations to make data-driven decisions effectively. If you are interested in these opt-in beta features, please reach out to your customer service manager.
]]>Our Databricks Collector allows you to quickly and easily collect metadata from your Databricks environment into data.world. Now, with the addition of Jobs harvesting and lineage capabilities, you can get a deeper understanding of where your data is coming from, how it’s being used, and what insights you can discover.
Our new Jobs harvesting feature allows you to collect additional information about your workflows, such as creator, description, success, schedule, and more. This lets you better understand how and why your data was transformed.
The new lineage capabilities let you track your data’s journey, from its source all the way through its transformations. This means you can easily trace your data’s history, identify potential bottlenecks or sources of errors, and quickly gain an understanding of how your data has changed over time.
Our Azure Data Lake Storage Gen 2 Collector allows you to bring insights about your data storage layer into data.world. With this Collector, you can efficiently harvest metadata about Blobs and Containers, including the owner, last modified, path, and more. This information is vital for understanding your underlying data, leading to more trust and confidence in your data-driven decision-making.
You can learn more about these Features in our Databricks documentation and our Azure Data Lake Storage documentation. Both these Collectors are Tier 2 for Enterprise Customers.
An example of ADLS Blob metadata in the data.world platform
Audit events allow administrators to monitor the actions performed by all the users in the data.world application through the UI or while using API. The audit log reporting functionality enhances the accountability of actions in the application. Administrators can track the actions taken by users in the application and find the root cause of issues by identifying the resources on which the action was performed and who performed the action.
Please see the Audit Events documentation for more details including full table descriptions and sample queries.
]]>With this feature, catalog administrators can more easily set up on-premises collectors during implementation. Once you've run a collector, you can see all the collectors (on-premises or cloud) that are bringing metadata into your catalog on the Metadata Collectors page.
For more information on the CLI Command Builder, refer to the documentation for each collector ("Metadata collector" column).
We've also separated the overview and the details of the different types of related resources by giving the related resources their own tabs. The related resources are now highly scannable, sortable, expandable, and searchable. These views offer an organized and condensed presentation of the metadata, making it easier to quickly access and understand the information.
As always, you can read more about these changes in our documentation portal. If you have feedback, please leave a suggestion in our support portal. For those interested in learning more about our data discovery solutions, please visit our website and book a demo or reach out to your customer service representative today. We look forward to helping you and your teams discover your data!
In addition to adding Advanced Search to our global search bar, we've improved it by including the ability to pre-scope your search to a certain Organization. If you are an Enterprise customer with multiple organizations, you can now pre-select the organization to which you want to limit your search.
But wait, there's more!
For customers who have hierarchical collections, you'll want to check out our beta release of the collection picker in the Advanced Search modal. This gives users the ability to quickly scope their search to a branch of collections in the domain hierarchy. Be sure to turn on the beta feature flag in the advanced settings to see this feature.
The new advanced search features in the global search bar are designed to provide a more intuitive and powerful data discovery experience from login. With these new capabilities, you'll be able to find the data and insights you need, faster than ever before.
Refer to the advanced search documentation to learn more.
We're committed to continually improving our platform and are eager to hear your feedback. Please feel free to reach out via our support portal to share your thoughts, experiences, and suggestions regarding our new Advanced Search features. Visit our website to learn more about Data Discovery or book a demo. Together, let's unlock the true potential of data-driven decision-making.
Users could previously bulk load and edit business glossary terms, but now this is possible for any type of resource. Simply download a spreadsheet of all the resources in a Collection, enrich them by editing fields or add new resources, and upload the changes to the platform.
On the Settings page of a Collection, users with edit access to the Collection will see the following modal:
Clicking "Download resources" will generate a spreadsheet that contains all resources, sorted by resource type, in that collection. This spreadsheet contains helpful instructions on how to use the spreadsheet on the "Overview" sheet.
You will see two sections: 1) organization-wide group access provides the group access to the entire catalog or all of the workspaces beyond the member default and 2) the direct access control section provides a view of the access the Group has to individual collections or workspaces along with the level of access to each. Users can manage direct access from this view without having to visit each collection or workspace.
This new view makes Group access management much easier by providing a one-stop summary of Group access. You can read more about managing Groups in our documentation portal.
At data.world, our goal is to help organizations unlock the full potential of their data. If you're interested in learning more about agile data governance, please visit our website and book a demo or reach out to your customer service representative. We look forward to helping you manage your data and transform the way your users discover it!
With these features, catalog administrators can more easily set up on-premises collectors and see all the collectors (on-premises or cloud) that are bringing metadata into their catalogs.
Currently in BETA, this feature has two components:
When paired with data.world, dbt users can more easily discover transformations and related assets, collaborate with colleagues, and govern data and analytics projects.
Customers can now leverage the dbt cloud Collector to connect to your dbt Cloud project and harvest dbt assets and also lineage relationships from dbt transformations. In the data.world platform, users will see metadata about models, snapshots, projects, seeds, sources, and tests, as well as relationships between views and referenced database tables or columns and between dbt resources and upstream or downstream resources. In the near future, this Collector will also support metrics and multiple database sources.
The dbt Cloud Collector is a Tier 1 Collector for Enterprise Customers.
For both SAP Hana and Netezza Collectors, users will now be able to take advantage of these systems' advanced analytics to gain deeper visibility into your data. Inside data.world, users will see metadata for columns, tables, and views, such as name, description, and data type.
The SAP Hana and Netezza Collectors are both Tier 3 Collectors for Enterprise Customers.
If you are interested in using these new Collectors, please contact us for support for your initial run. Please see the documentation for each collector: dbt Cloud; SAP Hana; Netezza
These views offer a highly organized and condensed presentation of the metadata, making it easier to quickly access and understand the information.
We'd love your feedback and thoughts before we roll them into the main UI. To see the views, please click "Turn on preview" in the banner at the top of any collection, resource, or glossary term page. To leave your feedback, please visit the help section (question mark in the lower left of the global navigation) and leave a suggestion via the support link.
You can read more about these changes in our documentation portal. If you're interested in learning more about our data discovery solutions, please visit our website and book a demo or reach out to your customer service representative today. We look forward to helping you and your teams discover your data!
Collection hierarchy is a tree-like view of your hierarchical collection relationships viewable on the collection overview tab. It organizes data resources and semantic concepts into increasingly more granular or specific groups based on their common characteristics - like domains, categories, markets, etc. It allows you to express your data taxonomy in a way that makes sense to your users:
For example, a sales steward might organize resources and terms into different Sales subcollections, such as Sales by Region, Sales by Product, and Sales by Customer. With this structure, your sales analysts can easily find the data they need to make more informed decisions.
Collection hierarchy, combined with other recently released features like Groups, Collection Access Control, and the ability to create new Collection types, provides organizations with the building blocks to build powerful data products and organize their domain-driven data catalog, simplifying data management, and improving decision-making. We look forward to seeing the positive impact it will have on our customers.
At data.world, our goal is to help organizations unlock the full potential of their data. If you're interested in learning more about data mesh and our solutions, please visit our website and book a demo. You can also read more about our collection features on the documentation portal. We look forward to helping you manage your data and transform the way your users discover it!
These enhancements will enrich your data discovery experience, helping you understand your BigQuery data better. For instance, now you can use the Explorer Lineage interface to view lineage relationships between tables and views to track data flows. New metadata highlights include Dataset Labels, Last Modified, Date Modified, Created Date, Table Partitions, and View SQL.
You can see the full list of harvested metadata in the documentation. As always, please reach out it you have questions!
]]>The Amazon S3 Collector catalogs buckets and objects, allowing you to quickly search and discover your data. This new collector harvests metadata about buckets and objects, including the Region, Version State, Size, Last Modified Data, ACL Owner, Grantee and Grant Permission, amongst others (see the full list in the documentation).
Inside the data.world platform, users will be able to view the relationships between S3 buckets and Objects, enhancing data discoverability. Using our configurable UI, you can display which pieces of metadata are most important to you, such as ACL Permission or S3 Metadata Keys and Values.
Learn how to use the new Amazon S3 Collector in our documentation, or please reach out if you have questions!
]]>At data.world, our goal is to help organizations unlock the full potential of their data. We're constantly improving search in order to better serve our customers looking to take data management and discovery to the next level.
If you're interested in learning more about our data discovery solutions, please visit our website and book a demo. You can also read more about our search features on the docs portal. We look forward to helping you manage your data and transform the way your users discover it!
]]>These summary statistics will help you understand and trust your data by providing a quick look at the data. For instance, viewing stats like the minimum and maximum values shows the shape of the data, allowing you to know quickly if your data is as expected.
How can you create profiling metadata? This feature is currently available via the Snowflake, SQL Server, PostgreSQL, and Redshift collectors with more collectors on the near horizon. There are three optional commands that can be used during the collector run to generate the profiling metadata:
--enable-column-statistics
description: enables harvesting of column statistics
--sample-string-values
description: enables harvesting of histograms for columns containing string data
--target-sample-size
description: controls the number of rows sampled for computation of column statistics and string-value histograms
You can read more about these commands on the following collector documentation pages:
]]>feature
COMING SOON
To find out more about these new navigation features, please visit our documentation portal.
]]>The Snowflake collector can now harvest Snowflake object tags, Snowflake tag-based masking policies, and Snowflake row access policies. This new feature will enhance your data governance experience by allowing you to see if a tag or policy is applied to a table or column coming from Snowflake.
For instance, here is a screenshot from the data.world catalog showing how a Snowflake Tag-Based Masking Policy has been applied to sensitive data columns: routing numbers, bank name, and bank account number. In this view, you can also see the associated Tag (Classification:confidential), and technical details about the Policy, like the Policy Body which explains how the Policy works.
How can you use this new feature? There are 2 optional commands for harvesting this information within the Snowflake collector run. Read more about it in the Snowflake Collector documentation.
Stay tuned for more exciting governance features in the coming months!
]]>
Collection Access Control provides role-based control to your metadata resources by collection, helping you target who can see and edit the resources in your catalog.
Read more about how to manage collection access on our documentation portal.
After designating which resources should have this feature enabled in your metadata profile, users will be able to access these resources in the "Other resources" section in the "New resource" dropdown (in the example below, the custom resources are "Bank account" and Credit card").
For more information, refer to the documentation.
]]>The Monte Carlo collector harvests both Incidents and Monitors, which inform users about the issue, when it happened, and where it happened.
For example, users can view relevant information about Incidents and Monitors, like Status, Count, the Date Created, as well as Owner and Severity. You can also open the specific Incident in Monte Carlo directly from the data.world platform.
You can read more about the newest Monte Carlo collector in our documentation.
Existing customers, please reach out to your data.world representative to learn more about becoming a beta user.
]]>Read our Implementation Guide to get started.
]]>Catalog administrators can now bulk upload glossary terms for faster onboarding and enrichment, and make bulk edits to ensure your catalog is always accurate.
This feature enables the following workflows:
For more information, refer to the full documentation here.
]]>Both Tableau Online and Tableau Server are supported, as well as the following connection types: Direct connection (inbound), SSH tunnel (inbound, preferred).
For more information, refer to the documentation here.
]]>Understand your data landscape in a powerful way with an enhanced graph visualization that enriches your understanding and discovery. Query this data for more insights. Find answers in a matter of clicks, not hours or days.
Read more about the features of Explorer in our documentation portal and contact your Sales or Customer Success specialist to find out how to get Explorer lineage added to your catalog.
A complete list of supported integrations are:
For developers: Any additional apps not mentioned above, but which authenticate via Oauth can be directed to point at auth.data.world. Please refer to our developer documentation to guide you through the configuration changes required.
Developer documentation: link
]]>Metadata collected includes data source, destination, column, table, schema, and our newly announced Lineage functionality as well. You can read more about how to use the Fivetran collector in the documentation.
The latest change in a series of updates to access control now provides a more intuitive default experience for the All members group when building a new Organization. The default experience now restricts users from viewing catalog resources or datasets and projects - allowing only view access of datasets or projects shared with the organization or set as Discoverable. This change allows admins to set more granular access through user groups with different tiers of access, leaving the default All members group as the most restricted.
To give members more access, admin users can change the access configuration for the All members group or add members to groups with more access.
You can read more about our "out-of-the-box" user groups on the Docs Portal.
]]>Members is the special out-of-the-box group that automatically includes all members in an organization and determines the minimum level of access for all members in the organization. Please read more about this setting in our Docs Portal.
]]>Visit our Docs Portal for more details about bookmarks.
]]>We’ve enriched our suite of metrics with the ability to monitor and govern all change requests submitted and approved/denied to your catalog resources via the UI.
With suggest changes metrics, you can monitor the following:
PRO TIP: Query these tables in our built in query workbench for powerful insights, such as filtering by a specific type of resource, the responder or requestor, or by a particular date range. This short video highlights some key use cases:
This enhancement to our metrics suite allows governance teams to quickly find a list of all change submissions which haven’t been acted upon, or monitor the changes to your catalog resources made via the UI.
Tables added/updated:
View complete documentation here: Docs Portal
]]>
With groups, you can:
This feature has changed how new users are added to organizations on the platform. This short video highlights the changes.
With the site improvements, some of your bookmarks or saved links may no longer work. We have diligently mapped deprecated URLs to the new pags to keep the impact on our users as low as possible.
If you encounter a link that no longer works, the easiest way to find what you need is to go to the docs portal landing page and search for the information. Please contact support with any questions of issues.
]]>The Discover experience transforms the empty search page into an actionable entry point for all of your resources—whether organization-owned or in the open community.
Learn more about the updated navigation in the preview announcement or read the updated documentation.
]]>This functionality is only around editing, with support for suggestions coming soon.
The table now returns up to 100 items per page, and is scrollable to enable users to quickly browse the resources. Pagination is supported up to 10 pages, for a max of 1000 resources. The filter box will remain present if there are more than 10 items.
For resources with more than 1000 related items, we recommend taking advantage of the filter to help narrow down the result set.
The data dictionary has been updated to reflect the latest updates as well.
Updated Tables:
Tops - Most Searched Terms: Added new column "search_type" (single-tenant and multi-tenant).
Events - Queries: limited to past 30 days (multi-tenant).
]]>The Discover experience transforms the empty search page into an actionable entry point for all of your resources—whether organization-owned or in the open community.
Switch to the All tab to reference up to 25 of your recently viewed resources and jump back in where you left off.
Look for the Preview banner to try out the new navigation and Discover experience today. Review the updated documentation or share your feedback.
]]>Eureka Automations make it faster and easier to deploy and manage your data catalog.
Eureka Action Center is a dynamic dashboard homepage that helps answer the question, “What do I need to take action on now?”
Eureka Answers surfaces the most relevant concepts from the knowledge graph to the top of search.
Eureka Explorer is a visual map of your data ecosystem powered by the knowledge graph.
As always, please reach out if you have any questions or feedback!
]]>The data dictionary has been updated to reflect the latest updates as well.
Events - Catalog Resources Pages Activity By Day
A fact table containing catalog pages activities such as views, edits and suggestions submitted by agents aggregated by UTC-based calendar day
Columns:
Note: This metric is meant to replace the Events - Metadata Asset Activity By Day table, which provided metrics on metadata pages in data.world and was sunset in September 2021. We are going to let Events - Metadata Asset Activity By Day stay in the metrics dataset instead of completely removing it in case you are still using it, but you will not see any events in it beginning September 2021 (or before depending on their activity on those pages).
What can be done with this new metric?
Below are some examples of analysis you can do with this metric:
SELECT *
FROM catalog-resources-pages-activity-by-day
WHERE resourcetype = ‘Business term’
ORDER BY date DESC;
SELECT date, SUM(views), SUM(edits), SUM(suggestions_submitted)
FROM catalog-resources-pages-activity-by-day
WHERE RESOURCENAME = '<insert the resourceid column value here>’
GROUP BY date
ORDER BY date desc;
SELECT date, org, SUM(views), SUM(edits), SUM(suggestions_submitted)
FROM catalog-resources-pages-activity-by-day
GROUP BY date, org
ORDER BY date desc;
As always, let us know if you have any questions or feedback!
To get started with this feature, please reach out to customer success!
The new home page will feature personalized views to help you quickly access resources and view alerts. It will also give you a reliable home base to explore and come back to.
To preview the new home page experience, look for the coming soon banner when you log into data.world.
]]>Updated metrics tables/reports have arrived on January 28, 2022! Some reports may take 24-48 hours to reflect the new data after deploy due to sync timing.
Data dictionary has been updated to reflect the latest updates as well.
Also various performance improvements and optimizations have been implemented.
]]>Learn about the Core Navigation changes that redirect all organization-specific links to this page, or watch the walk-through videos to learn about the updated functionality you'll find here.
Create and manage organization-level resources and connections in a consolidated experience—tailored to your level of access. Discover data faster with custom filters and advanced search syntax for all catalog resources and glossary terms.
Share information about your organization with the data.world community, curate datasets and projects, and manage memberships—all from the Organization Profile Page.
Create and manage database connections for your organization as an admin—whether syncing data to your Community datasets or collecting catalogs as an Enterprise organization.
Organization members and admins will be able to search, create, and manage resources, collections, members, connections, and more in one place.
Landing pages that will be replaced with the new Organization Profile Page will feature a banner with a link to preview the new experience.
The organization landing page will redirect to the new Organization Profile Page.
Now | Coming Soon |
---|---|
Organization-specific library and list views will redirect to the Resources tab on the new Organization Profile page, with more advanced filtering options.
Now | Coming Soon |
---|---|
For enterprise organizations, the Glossary landing page will redirect to a new Glossary tab on the Organization page, also with improved filtering options.
Now | Coming Soon |
---|---|
Updated metrics tables/reports have arrived on December 17, 2021! Some reports may take 24-48 hours to reflect the new data after deploy due to sync timing.
Data dictionary has been updated to reflect the latest updates as well.
Members of an organization can look up collections based on title, description, creation date, and more. Admins can also create new collections directly from the Collections tab.
]]>Community organizations support editing multiple datasets or projects that match the filters in the Resources tab.
Enterprise organizations also support editing multiple analyses, business terms, or tables.
]]>Members of an organization can filter and browse collections by name on the Overview tab. Admins can also create new collections directly from the organization profile.
]]>Data artifact lineage improvements continue as we introduce the ability to reset the lineage component viewport. For customers with large, complex data artifact lineage, we've heard that it can be difficult to reorient once you start exploring the visualization. With this new "Reset view" button, the view and zoom are immediately reset to the starting orientation.
Expand and collapse controls are now available for the search filters sidebar to help you quickly locate available filtering options.
]]>Browse the Overview tab for quick links to different resource categories in your catalog. The quick link tiles will take you to a filtered presentation of the new Resources tab. This view operates much like the main search page with support for facets and advanced search syntax, all scoped to your organization's resources.
Searching for open data? Community organizations will now also have these search and filter options available on the new Resources tab.
Data dictionary has been updated to reflect the latest updates as well.
Relationship arrows can now be repositioned with a drag and drop action.
Download a snapshot image of a concept and its immediate context.
The link to concept feature now includes rich image previews.
Link multiple Gra.fo documents together into one workspace. Separate complex models into subgraphs or extend reference documents that are used in multiple projects.
Most analysts trying to find answers to business questions aren’t searching for tables and columns directly. What they are actually looking for is contextual information that accelerates time to business impact for data. data.world Concept Cards will change the way data consumers access data by providing a unique search experience no other catalog provider does or can do without the backing of a knowledge graph.
Concept Cards are a feature on data.world’s near-term roadmap to help users discover related people, resources, and other supporting information we can obtain from the knowledge graph about a given search topic. If there are suggested actions that can be taken for the topic itself or for related resources, access to those actions is surfaced directly in the search results.
These cards become a jumping off point to browse and discover new things on the platform that share something in common with the search topic of interest. We see these Concept Cards as the first of many intelligent recommendations we can make by harnessing the power of the knowledge graph.
]]>Querying data in its current state is the most common data catalog use case, but there are times when it is necessary to compare previous versions of datasets, metadata, and lineage. data.world SQL and SPARQL Time Travel allows customers to view changes across metadata and data and even query historical data sources.
The new feature provides granular insight into audit trails and analysis of data that is snapshotted across time. You can search both ingested data sources and Snowflake virtual tables for previous states of data. Being able to analyze previous versions of a dataset, even simultaneously with the current version of a dataset, enables flexible analysis across various time scales – review data month-over-month, year-over-year, etc.
In data.world, your metadata is also data and therefore fully queryable and reportable. You can compare previous versions of your metadata with current versions in order to understand how your systems and schemas are changing. See new columns, new column names, sensitive data that recently appeared in a field that wasn't there previously, and much more.
Supported operations include previous version, number of versions back (tip-N), specific timestamp, and offset.
Example: SQL Time Travel Query
Example: SPARQL Time Travel Query
A key aspect of data compliance is knowing where sensitive data lives and applying classifications that relate to policies that inform business processes for proper tracking and management. Identifying sensitive data, applying these policies, and reporting on this information can be an extremely time consuming and error-prone task if attempted manually.
data.world’s Sensitive Data Discovery automates discovery and classification, making it easier for enterprise customers to identify sensitive data and take action on it within the catalog.
Scan – Use advanced machine learning to identify sensitive data types like email addresses, names, ID numbers, locations, protected health information, and 40+ additional data types identifiable out of the box.
Classify – Apply policy classifications, tags, and statuses such as Restricted, Personal Information, US Only, etc. These classifications help maintain the integrity and confidentiality of your data. They are driven by your scan results and other metadata, as dictated by your unique business logic and terminology.
Take Action – Report and audit sensitive data types and policy classifications across your data landscape, understand how it changes over time, and drive better compliance and governance in your organization.
Integrate – Leverage Sensitive Data Discovery metadata as part of your broader metadata orchestration strategy with APIs and bulk export. Our open and extensible platform makes it easy to plug in your broader ecosystem of additional Sensitive Data Discovery tools and platforms for even greater governance capabilities.
Resource page example
Search results example
If you are an existing data.world customer and would like to be included in the private beta, reach out to your Client Success Director for more information.
]]>New Tables (multi tenant & single tenant)
Updated Tables (multi tenant & single tenant)
Base platform data updates (single tenant only):
PATCH
is a method for making partial updates to individual records, such as adding tags, changing a description, or modifying a title.PATCH
endpoints modify list values. We outline these changes below.tag A
, tag B
] PATCH
request is sent to /datasets/democorp/my-example-dataset
with body: { "tags": [ "tag C", "tag D" ] }
tag A
, tag B
, tag C
,tag D
]PATCH
request is sent to /datasets/democorp/my-example-dataset
with payload: { "tags": [] }
tag A
, tag B
, tag C
,tag D
]tag A
, tag B
] PATCH
request is sent to /datasets/democorp/my-example-dataset
with body: { "tags": [ "tag C", "tag D" ] }
tag C
,tag D
]. tag A
and tag B
have been removed.PATCH
request to /datasets/democorp/my-example-dataset
with body: { "tags": [] }
Today, PATCH
can be used to add, modify, or remove fields for all non-list values. With the current merge logic, items can only be appended to list values using PATCH
. As a consequence, if you want to remove or reorder the items in a list, you must use the PUT
method, which does not support partial updates and requires a full overwrite of the existing record. The new logic to overwrite list values will allow users to make partial updates to records that remove or modify the order of items in the list without needing to modify the entire record.
This new logic primarily impacts tags, file labels, collections, and multi-select custom metadata fields.
]]>Explore data.world's rich advanced search syntax with the Search Builder tool on our main search page. This friendly form helps you construct more complex searches with multiple filters, logical operators, categories, and custom metadata fields. The Search Builder can be accessed by selecting the "Advanced" option above the filters list on the main search page.
This release also includes changes to our main search page. You'll notice a new layout on the All Results tab that shows the top 3 search hits by type for your term. This tab now shows more results per page and gives users a high level overview of the types of resources they can find on the platform. Hover over the circular "i" icon for more details about the result. More targeted results can be viewed on the Resources tab. You'll also notice changes to the category tabs at the top of the page. Resources, Organizations & People, Comments, and Columns each have their own tailored search experience.
The new search experience is available today for select users and will be available for all users early next week.
]]>data.world provides a federated engine to query data from multiple data systems simultaneously, at the source. By using Postgres Proxy, it’s now easier than ever to extend these capabilities to your favorite analysis tools for quickly accessing and creating value from data.
To connect to data.world using the proxy, simply create a new PostgreSQL connection, configured as follows:
host: postgres.data.world
port: 5432
user: {your data.world user id}
pass: {read/write token}
db: agentid/datasetid
You can find your read/write token in the user settings. If you have any issues or questions, don't hesitate to reach out.
Note: for single tenant customers, set host to postgres.{site}.data.world
.
ddw-metrics-*
) dataset to address some minor bugs and performance improvements.Potential observable changes:
If you have any questions or concerns, please let us know at support@data.world or via your customer success representative.
]]>Recently announced in a coming soon post, the new resource type search filter is now available on the search page for all users of the platform. This filter allows users to drill down to any supported resource type, including custom configured catalog types.
]]>These are some of the improvements and new navigation experiences you'll see in our new portal:
- Home page menus
Use the top menus to navigate the documentation for specific product versions: Community docs for data.world community members and Enterprise docs, specifically written for our enterprise customers who have needs and use cases outside of the scope of our community users.
- Search improvements
Search is one of the most important ways to find the information you need. So we’ve introduced a new search experience. When you enter a query, you’ll immediately see some suggestions.
- Easier access to everything you need
Easy access to integrations gallery, API docs, and Grafo documentation.
Check it out at https://docs.data.world/.
]]>This change will also update the presentation order of the search filters as:
1. Resource Type
2. Owner
3. Status
4. Tag
5. Collection
6... Custom configured facets
These changes have focused on performance optimizations, increased consistency, some additional columns, and some new tables/reports. Some columns have been renamed to order to achieve consistent use and definitions.
If you have any questions or concerns, please let us know at support@data.world or via your customer success representative.
Multi-tenant change log
Single-tenant change log (private sites & private installs)
docker pull datadotworld/dwcc:x.xx
where x.xx is your desired version, and you're in business. It's that easy.Other enhancements to the metadata collector:
--config-file
option for metadata collector (Beta): We've heard your feedback on wanting a simplified way to manage the configurations for your metadata collectors. The config file will become the default way in the near future to set your parameters going forward. Lots more info on this coming soon! New metadata includes expanded information from datasources, databases, fields, metrics, and many more inter-object relationships.
Improved help text for tags, including on pressing “Enter” to add tags
Improved empty state messaging for adding contributors to a dataset
Consistent use of timestamps in alerts and notifications
Navigation tabs on various pages are now keyboard-navigable (left and right arrow keys) for ease of browsing and improved accessibility
“Share” button directly opens “Grant access” modal
Consistent use of display name in emails
Text truncation fixed for filter bars and the project workbench
Various layout, text, and navigational misalignments or inconsistencies
]]>Head over to our API documentation to learn more about configuring virtual connections and adding virtual tables to your datasets with the public API.
You can now share your Gra.fo model documents without requiring your audience to have a Gra.fo account or an individual invitation. Use the "Get link" option of the share menu to grant read-only access to your document to anyone you've shared the link with.
We are pleased to announce that we've rolled out the first of several improvements to support search matches on custom metadata fields for your catalog resources.
This improvement expands the fields we match against for free text searches to include any custom metadata fields you have configured in your catalog as text or selection fields. This feature empowers your end users to search for resources by the terminology and categorizations that mean the most to your business.
In the example above, verified by and data steward are custom metadata fields defined in our catalog for tables. A search for sarah smart now yields matches where she is listed as the data steward or the person who has verified the data, in addition to any existing matching fields like owner.
Tip: You can perform more precise searches against custom metadata fields with our advanced search syntax. In the example above, a search for
metadata:"data steward:sarah smart"
will return filtered results where Sarah Smart is listed as the Data Steward.
Look for upcoming releases to further support boolean and IRI-based metadata searches.
]]>