Profiling: a new kind of metadata - data.world Product data.world Product What's New?

With the new year comes new features! We are pleased to launch our newest metadata capability: data profiling. This new feature creates metadata describing summary statistics for columns when a collector is run.

These summary statistics will help you understand and trust your data by providing a quick look at the data. For instance, viewing stats like the minimum and maximum values shows the shape of the data, allowing you to know quickly if your data is as expected.

How can you create profiling metadata? This feature is currently available via the Snowflake, SQL Server, PostgreSQL, and Redshift collectors with more collectors on the near horizon. There are three optional commands that can be used during the collector run to generate the profiling metadata:

--enable-column-statistics description: enables harvesting of column statistics

--sample-string-values description: enables harvesting of histograms for columns containing string data

--target-sample-size description: controls the number of rows sampled for computation of column statistics and string-value histograms

You can read more about these commands on the following collector documentation pages: