Eureka: Automations and answers powered by the knowledge graph

data.world is happy to announce the release of Eureka, a suite of knowledge-graph powered data catalog capabilities designed to simplify the development, discovery, understanding, and use of trusted data products.

Eureka Automations new 

Eureka Automations make it faster and easier to deploy and manage your data catalog. 

Eureka Action Center new 

Eureka Action Center is a dynamic dashboard homepage that helps answer the question, “What do I need to take action on now?” 

Eureka Answers new 

Eureka Answers surfaces the most relevant concepts from the knowledge graph to the top of search.

Eureka Explorer – Coming fall 2022 coming soon 

Eureka Explorer is a visual map of your data ecosystem powered by the knowledge graph.

As always, please reach out if you have any questions or feedback!

Metrics update: December 17, 2021

Updated metrics tables/reports have arrived on December 17, 2021! Some reports may take 24-48 hours to reflect the new data after deploy due to sync timing.

Data dictionary has been updated to reflect the latest updates as well.

Updated Tables - For both multi-tenant and single-tenant

  1. Events - Metadata Assets Activity - By Day: Column name changed from “resourceid” to “resource” - this change was applied in order to bring this table into conformity with the dimension naming convention used elsewhere in the metrics dataset.
  2. Membership - All Time List: Added “current_member” column (boolean; TRUE: account is currently provisioned; FALSE: account is currently de-provisioned). Added “last_date_active” column (the date of the user’s most recent activity in data.world).
  3. Tops - Requests: Name of table/report changed to “Tops - Most Requested Resources.” Added “resourcetype” column (dimension; indicates whether the requested resource was a dataset, group, etc.).

Metrics update: October 18, 2021

Updated metrics tables/reports have arrived on October 18, 2021! Some reports may take 24-48 hours to reflect the new data after deploy due to sync timing.

Data dictionary has been updated to reflect the latest updates as well.

Updated Tables - For multi-tenant

  1. Events - Dataset or Project Views By Org - Name changed (from “Events - Views by Org”) and column name “dataset_views” changed to “views”
  2. Events - Searches - Last 90 Days - Fixed a bug that sometimes caused duplicate rows
  3. Membership - Daily Counts - By Org - Name changed (from “Membership - Daily - By Org")
  4. Resources - Org Owned Database connections - Name changed (from “Resources - Database connections”) and added column “owner”
  5. Tops - Bookmarks - Extended range to all users (it previously was limited to the top 10 users) and added column “displayname”
  6. Tops - Dataset Creation - Extended range to all users (it previously was limited to the top 10 users) and added column “displayname”
  7. Tops - Most Bookmarked Resources - Extended date range to all resources (it previously was limited to the top 10 resources)
  8. Tops - Most Comments - All Time - Extended date range to all resources (it previously was limited to the top 10 resources)
  9. Tops - Most Searched Terms - Fixed a bug that sometimes caused duplicate rows
  10. Tops - Most Viewed Resources - Added “catalog” type category to the resource_type variable
  11. Tops - Pageviews By Resource and Agentid - Added “catalog” type category to the resource_type variable

Updated Tables - For single-tenant

  1. Events - Dataset or Project Views By Org - Name changed (from “Events - Views by Org”) and column name “dataset_views” changed to “views”
  2. Resources - Org Owned Database connections - Added column “owner”
  3. Tops - Bookmarks - Extended range to all users (it previously was limited to the top 10 users) and added column “displayname”
  4. Tops - Dataset Creation - Extended range to all users (it previously was limited to the top 10 users) and added column “displayname”
  5. Tops - Most Viewed Resources - Added “catalog” type category to the resource_type variable
  6. Tops - Pageviews By Resource and Agentid - Added “catalog” type category to the resource_type variable



🚨 Default Behavior Change: PATCH API endpoints 🚨

The data.world public API supports several options for programmatically making updates to resources on the platform. PATCH is a method for making partial updates to individual records, such as adding tags, changing a description, or modifying a title.

In the next two weeks, we will be making a change to the way PATCH endpoints modify list values. We outline these changes below.


Existing Merge Behavior

Lists are merged with existing values on PATCH requests

  1. A dataset has tags: [tag A , tag B
  2. A PATCH request is sent to /datasets/democorp/my-example-dataset with body: { "tags": [ "tag C", "tag D" ]  }
  3. The dataset is updated to reflect tags: [ tag A, tag B, tag C ,tag D]
  4. A PATCH request is sent to /datasets/democorp/my-example-dataset with payload: { "tags": []  }
  5. No change is applied and the tags remain: [ tag A, tag B, tag C ,tag D]


New Replace Behavior

Lists replace existing values on PATCH requests

  1. A dataset has tags: [tag A , tag B
  2. A PATCH request is sent to /datasets/democorp/my-example-dataset with body: { "tags": [ "tag C", "tag D" ]  }
  3. The dataset is now updated to have tags: [ tag C ,tag D].  tag A and tag B have been removed.
  4. I send a PATCH request to /datasets/democorp/my-example-dataset with body: { "tags": []  }
  5. The dataset has been updated to remove all tags.


Why we are making this change

Today, PATCH can be used to add, modify, or remove fields for all non-list values. With the current merge logic, items can only be appended to list values using PATCH. As a consequence, if you want to remove or reorder the items in a list, you must use the PUT method, which does not support partial updates and requires a full overwrite of the existing record. The new logic to overwrite list values will allow users to make partial updates to records that remove or modify the order of items in the list without needing to modify the entire record.

This new logic primarily impacts tags, file labels, collections, and multi-select custom metadata fields.

Metrics update: July 28, 2021

For enterprise customers, updated metrics have been released today to your Usage and Governance Reporting (ddw-metrics-*) dataset to address some minor bugs and performance improvements.

Potential observable changes:

  • For both single-tenant and multi-tenant customers, some reports were not reflecting de-provisioned user accounts. With this fix, multi-tenant customers may find a slight increase in counts in Visits - Return Visitors - Daily, Visits - Return Visitor Days, Visits - Unique Visitor Days and Visits - Unique Visitors - Monthly now properly reflecting de-provisioned user accounts in addition to active user accounts. Single-tenant customers may find a slight increase in Visits - Unique Visitors Daily and Visits - Unique Visitor Days.
  • For multi-tenant customers, under certain circumstances some reports could duplicate-count users that were members of multiple sub-organizations. With this fix, you may find a slight decrease in counts in Membership - By Date, Visits - Adoption Daily and Visits - Avg Visits Weekly.

If you have any questions or concerns, please let us know at support@data.world or via your customer success representative.

DockerHub and metadata collector enhancement roundup

Our metadata collector (dwcc, aka the data.world Catalog Collector) is now available on DockerHub! Simply run docker pull datadotworld/dwcc:x.xx where x.xx is your desired version, and you're in business. It's that easy.

Other enhancements to the metadata collector:

  • Updated Domo collector to improve relationship modeling
  • Various Tableau & Manta collector fixes & enhancements
  • Denodo metadata collector support shifted to Denodo 8
  • --config-file option for metadata collector (Beta): We've heard your feedback on wanting a simplified way to manage the configurations for your metadata collectors. The config file will become the default way in the near future to set your parameters going forward. Lots more info on this coming soon! 

Bug roundup 🐞

In the last few weeks, several minor bugs and enhancements have been made. Here are some notable ones:

Improved help text for tags, including on pressing “Enter” to add tags

Improved empty state messaging for adding contributors to a dataset

Consistent use of timestamps in alerts and notifications

Navigation tabs on various pages are now keyboard-navigable (left and right arrow keys) for ease of browsing and improved accessibility

“Share” button directly opens “Grant access” modal

Consistent use of display name in emails

Text truncation fixed for filter bars and the project workbench

Various layout, text, and navigational misalignments or inconsistencies

Coming Soon: Addressing timezone inconsistency

🚨 Default behavior change coming next week 🚨

We have recently discovered that when executing queries, there are some cases where our DATETIME columns contain timezone information, and other cases where they do not. This is primarily an issue that arises with columns containing date/time information in uploaded files (we do not see this with live tables). We have decided to address this inconsistency. Starting next week, query result columns of type DATETIME will no longer contain timezone information, while columns of type DATETIMESTAMP will always contain timezone information.

The impact of this change shouldn’t be significant, and most users will see no change. However, if you have queries across ingested data which aggregate on DATETIME columns, or do DATE_ADD() style calculations, you may notice differences in your results depending on your current timezone.

If you are impacted by this change, here are some ways to clarify your intent w.r.t. timezones:

  1. CAST the resulting column to a DATETIMESTAMP to force timezones, or DATETIME to strip timezones (documentation)
  2. Use AT_TIME_ZONE() to explicitly state your timezone (documentation)
  3. Ensure that the table column type is set to be of type DATETIMESTAMP or DATETIME (documentation)

Note: If timezone information is desired, but not defined, UTC is assumed. 

Please contact support@data.world with any questions or concerns. As always, we’re happy to help.

New: Groundbreaking "deep brain" integration

data.world is very excited to announce our new deep brain integration.

Now data consumers simply need to think about what data they want, and data.world will return governed, curated data. It also supports cataloging of business terminology straight from subject matter experts.

When we originally envisioned the feature, our design inspiration was to provide an "easy button." However Jon Loyens, co-founder and CPO, famously then said "what if there was no button at all?"

A future release will support agile data governance workflows, such as data access approvals. Integration is quick and relatively painless, though upgrades require a bit of effort and minor outpatient surgery.

Featured Search Results for Data Partners

Relevant data partners now appear at the top of some search results as featured results, making it easier to find high-quality, reliable datasets in search. Featured results are disabled for private installs.

If you’re interested in learning more about how we help teams find the right data vendors for their projects, drop us a line at concierge@data.world.


Show Previous EntriesShow Previous Entries