Coming Soon: Organization Profile Updates

The Organization Profile Page is getting a facelift this week to offer more intuitive navigation, stronger support for custom catalog types, and richer discovery features. 

Browse the Overview tab for quick links to different resource categories in your catalog. The quick link tiles will take you to a filtered presentation of the new Resources tab. This view operates much like the main search page with support for facets and advanced search syntax, all scoped to your organization's resources. 

Searching for open data? Community organizations will now also have these search and filter options available on the new Resources tab.


Interactive Lineage hover elements

Another neat feature for our customers leveraging data artifact lineage! In addition to being interactive, the resource items now display a summary card when hovering. You can preview metadata at a glance and click through to the collection, individual tags, or to the resource itself.


Metrics update: October 18, 2021

Updated metrics tables/reports have arrived on October 18, 2021! Some reports may take 24-48 hours to reflect the new data after deploy due to sync timing.

Data dictionary has been updated to reflect the latest updates as well.

Updated Tables - For multi-tenant

  1. Events - Dataset or Project Views By Org - Name changed (from “Events - Views by Org”) and column name “dataset_views” changed to “views”
  2. Events - Searches - Last 90 Days - Fixed a bug that sometimes caused duplicate rows
  3. Membership - Daily Counts - By Org - Name changed (from “Membership - Daily - By Org")
  4. Resources - Org Owned Database connections - Name changed (from “Resources - Database connections”) and added column “owner”
  5. Tops - Bookmarks - Extended range to all users (it previously was limited to the top 10 users) and added column “displayname”
  6. Tops - Dataset Creation - Extended range to all users (it previously was limited to the top 10 users) and added column “displayname”
  7. Tops - Most Bookmarked Resources - Extended date range to all resources (it previously was limited to the top 10 resources)
  8. Tops - Most Comments - All Time - Extended date range to all resources (it previously was limited to the top 10 resources)
  9. Tops - Most Searched Terms - Fixed a bug that sometimes caused duplicate rows
  10. Tops - Most Viewed Resources - Added “catalog” type category to the resource_type variable
  11. Tops - Pageviews By Resource and Agentid - Added “catalog” type category to the resource_type variable

Updated Tables - For single-tenant

  1. Events - Dataset or Project Views By Org - Name changed (from “Events - Views by Org”) and column name “dataset_views” changed to “views”
  2. Resources - Org Owned Database connections - Added column “owner”
  3. Tops - Bookmarks - Extended range to all users (it previously was limited to the top 10 users) and added column “displayname”
  4. Tops - Dataset Creation - Extended range to all users (it previously was limited to the top 10 users) and added column “displayname”
  5. Tops - Most Viewed Resources - Added “catalog” type category to the resource_type variable
  6. Tops - Pageviews By Resource and Agentid - Added “catalog” type category to the resource_type variable



Coming Soon: Concept Cards

Business context

Most analysts trying to find answers to business questions aren’t searching for tables and columns directly. What they are actually looking for is contextual information that accelerates time to business impact for data. data.world Concept Cards will change the way data consumers access data by providing a unique search experience no other catalog provider does or can do without the backing of a knowledge graph.

Capabilities

Concept Cards are a feature on data.world’s near-term roadmap to help users discover related people, resources, and other supporting information we can obtain from the knowledge graph about a given search topic. If there are suggested actions that can be taken for the topic itself or for related resources, access to those actions is surfaced directly in the search results.

These cards become a jumping off point to browse and discover new things on the platform that share something in common with the search topic of interest. We see these Concept Cards as the first of many intelligent recommendations we can make by harnessing the power of the knowledge graph.

SQL and SPARQL Time Travel

Business context

Querying data in its current state is the most common data catalog use case, but there are times when it is necessary to compare previous versions of datasets, metadata, and lineage. data.world SQL and SPARQL Time Travel allows customers to view changes across metadata and data and even query historical data sources. 

Capabilities

The new feature provides granular insight into audit trails and analysis of data that is snapshotted across time. You can search both ingested data sources and Snowflake virtual tables for previous states of data. Being able to analyze previous versions of a dataset, even simultaneously with the current version of a dataset, enables flexible analysis across various time scales – review data month-over-month, year-over-year, etc.

In data.world, your metadata is also data and therefore fully queryable and reportable. You can compare previous versions of your metadata with current versions in order to understand how your systems and schemas are changing. See new columns, new column names, sensitive data that recently appeared in a field that wasn't there previously, and much more.

Supported operations include previous version, number of versions back (tip-N), specific timestamp, and offset.

Example: SQL Time Travel Query

Example: SPARQL Time Travel Query


Beta: Sensitive Data Discovery

Business context

A key aspect of data compliance is knowing where sensitive data lives and applying classifications that relate to policies that inform business processes for proper tracking and management. Identifying sensitive data, applying these policies, and reporting on this information can be an extremely time consuming and error-prone task if attempted manually.

data.world’s Sensitive Data Discovery automates discovery and classification, making it easier for enterprise customers to identify sensitive data and take action on it within the catalog.

Capabilities

Scan – Use advanced machine learning to identify sensitive data types like email addresses, names, ID numbers, locations, protected health information, and 40+ additional data types identifiable out of the box.

Classify – Apply policy classifications, tags, and statuses such as Restricted, Personal Information, US Only, etc. These classifications help maintain the integrity and confidentiality of your data. They are driven by your scan results and other metadata, as dictated by your unique business logic and terminology.

Take Action – Report and audit sensitive data types and policy classifications across your data landscape, understand how it changes over time, and drive better compliance and governance in your organization.

Integrate – Leverage Sensitive Data Discovery metadata as part of your broader metadata orchestration strategy with APIs and bulk export. Our open and extensible platform makes it easy to plug in your broader ecosystem of additional Sensitive Data Discovery tools and platforms for even greater governance capabilities.

Screenshots

Resource page example

Search results example

If you are an existing data.world customer and would like to be included in the private beta, reach out to your Client Success Director for more information.

Interactive lineage items

For customers leveraging data artifact lineage, the resource items are now interactive. You can click through the icons to view the respective resource pages.

Metrics update: September 16, 2021

Updated metrics tables/reports have arrived on September 16, 2021! Some reports may take 24-48 hours to reflect the new data after deploy due to sync timing.

New Tables (multi tenant & single tenant)

  • Resources - Dataset Files - A detailed listing of all (currently existing) files residing in datasets.
  • Tops - Engagement - A list of users ranked by key engagement metrics.
  • Resources - Live Metadata Assets Created - By Day - A long form series of counts of metadata assets created by date.
  • Events - Dataset Activity - By Day - A fact table containing dataset activity measurements, aggregated by UTC-based calendar day.
  • Events - Metadata Assets Activity - By Day - A fact table containing metadata assets activity measurements aggregated by UTC-based calendar day.

Updated Tables (multi tenant & single tenant)

  • Membership - Current - By Org - Added new columns for email address, user display name, org-level authorization settings, org-level visibility settings and date of most recent update to authorization settings.
  • Events - Searches - Last 90 Days - Fixed a bug that caused the counts of search results to be capped at 10.
  • Events - Downloads (previously Events - Downloads - Last 90 Days) - Extended the timeframe to all-time; added new columns for file labels and user displayname. 

Base platform data updates (single tenant only):

  • DOWNLOADS - new columns: (type, filename, filelabels) providing information about file downloads.
  • DAILY_DWEC_ASSET_FACTS - new fact table providing measurements of metadata asset activity by date.
  • FILES_DATASET_DIM - new dimension table providing information about files residing in Datasets.

🚨 Default Behavior Change: PATCH API endpoints 🚨

The data.world public API supports several options for programmatically making updates to resources on the platform. PATCH is a method for making partial updates to individual records, such as adding tags, changing a description, or modifying a title.

In the next two weeks, we will be making a change to the way PATCH endpoints modify list values. We outline these changes below.


Existing Merge Behavior

Lists are merged with existing values on PATCH requests

  1. A dataset has tags: [tag A , tag B
  2. A PATCH request is sent to /datasets/democorp/my-example-dataset with body: { "tags": [ "tag C", "tag D" ]  }
  3. The dataset is updated to reflect tags: [ tag A, tag B, tag C ,tag D]
  4. A PATCH request is sent to /datasets/democorp/my-example-dataset with payload: { "tags": []  }
  5. No change is applied and the tags remain: [ tag A, tag B, tag C ,tag D]


New Replace Behavior

Lists replace existing values on PATCH requests

  1. A dataset has tags: [tag A , tag B
  2. A PATCH request is sent to /datasets/democorp/my-example-dataset with body: { "tags": [ "tag C", "tag D" ]  }
  3. The dataset is now updated to have tags: [ tag C ,tag D].  tag A and tag B have been removed.
  4. I send a PATCH request to /datasets/democorp/my-example-dataset with body: { "tags": []  }
  5. The dataset has been updated to remove all tags.


Why we are making this change

Today, PATCH can be used to add, modify, or remove fields for all non-list values. With the current merge logic, items can only be appended to list values using PATCH. As a consequence, if you want to remove or reorder the items in a list, you must use the PUT method, which does not support partial updates and requires a full overwrite of the existing record. The new logic to overwrite list values will allow users to make partial updates to records that remove or modify the order of items in the list without needing to modify the entire record.

This new logic primarily impacts tags, file labels, collections, and multi-select custom metadata fields.

Show Previous EntriesShow Previous Entries