Training Provider Data Packages
At its core, Training Provider Outcomes Data is driven by the Training Provider Data Package specification, a simple format for packaging training provider data based on the Open Knowledge Foundation’s Data Package specification for sharing between tools and people. Associated specifications include Tabular Data Package, a format for packaging tabular data, JSON Table Schema, a specification for defining a schema for tabular data, and CSV Dialect Description Format (CSV-DDF), a specification for defining a dialect for CSV data.
How do these specifications relate?
A Data Package can “contain” any type of file. A Tabular Data Package is a special type of Data Package that “contains” one or more CSV files. In a Tabular Data Package, each CSV must have schema defined using JSON Table Schema and, optionally, a dialect defined using CSV-DDF. An application or library that consumes Tabular Data Packages therefore must be able to understand not only the full Data Package specification, but also JSON Table Schema and CSV-DDF.
For more information on each specification, see below:
Tabular Data Package
JSON Table Schema
CSV Dialect Description Format
Tools for working with Data Packages
Quick Start Tools for Training Provider Data Packages
Online Data Package viewer app – provides a nice human-friendly view of a Data Package in seconds.
Data Package your data by creating a
datapackage.json – the online datapackage.json maker creates the
datapackage.json file needed to turn data into a Data Package.
Online validator that checks your
datapackage.json and Training Data Package are good to go.
- Creating and using Training Data Packages in Python coming soon
- Creating and using Skills Data Packages in Python coming soon
- Creating and using Training Data Packages in R coming soon
- data package manager (dpm) - overall library and command line
- datapackage-init - create Data Packages by creating
- datapackage-read - load and access Data Packages (
- datapackage-validate - validate Data Packages (
- datapackage-render - render Data Packages and their views to embeddable HTML, images (png) and more
A comprehensive Python library is available:
Two libraries are available:
- https://github.com/textkit/datapak - work with tabular data packages (lets you download, load or query datasets using SQL via ActiveRecord - thus, works with any SQL database; defaults to an in-memory SQLite database).
- https://github.com/theodi/datapackage.rb – parse and validate both data packages and tabular data packages. (May be obsolete as no updated since Feb 2014)
A validator and storage library for working with JSON Table Schema is available here:
https://github.com/the42/datapackage - provides struct specifications for Data Package as well as a command line tool to create Data Packages.
- R Data Package Library - by rOpenSci
- R Data Package Manager - by Christopher Gandrud
- R Open Data Protocols Library - by QRBC
A function to read data from a Tabular Data Package is available for download from MATLAB Central’s File Exchange.
To contribute to the library, see the project’s GitHub repository.
Data Package Manager (dpm) – https://github.com/okfn/dpm. Comprehensive command line tool.
Use Data Packages with …
These “Using with” examples usually require Tabular Data Packages where the data in the Data Package is stored in CSV.
- https://github.com/frictionlessdata/jsontableschema-sql-py - generic JSON Table Schema to SQL library in Python
- https://github.com/frictionlessdata/datapackage-py - general Python library can be used to automate import of Tabular Data Packages into SQL
- You can also use the Ruby datapak library (see Ruby library section)
In addition to the generic option there is a simple python script (no dependencies) to load a Tabular Data Packages into SQLite.
In addition to the generic options There is a python script (with no dependencies) to load a Tabular Data Package into Postgresql
There is a BIML project that uses datapackage.json to generate SSIS packages that can load the contents of a Tabular Data Package into a SQL Server database. Find out more about SQL Server Integration Services (SSIS).
In progress: fully automated Data Package support (see this issue for updates).
In the meantime you can just open the CSV file by hand!
In progress: Fully automated Data Package support (see this issue for updates).
In the meantime you can just import the CSV files in the Data Package directly.
- https://github.com/frictionlessdata/jsontableschema-bigquery-py - generic JSON Table Schema to BigQuery library in Python
- https://github.com/frictionlessdata/datapackage-py - general Python library can be used to automate import of Tabular Data Packages into BigQuery