Open Skills Project

The Open Skills Project is a public-private partnership lead by the University of Chicago focused on providing a dynamic, up-to-date, locally-relevant, and normalized taxonomy of skills and jobs that builds on and expands on the Department of Labor’s O*NET data resources. It’s aim is to improve our understanding of the labor market and reduce frictions in the workforce data ecosystem by enabling a more granular common language of skills among industry, academia, government, and nonprofit organizations.

Open Skills API

The Open Skills API is an interface for developers to build applications using data produced by the Open Skills Project.

Open Skills Research Hub

The Open Skills Research Hub is a collection of public datasets produced by the Open Skills Project for the purpose of collaborative research.

Explore the Research Hub

Representativeness Analysis

Understanding how our data relates to the labor market is important to us at Data@Work.

Explore Representativeness Analysis of the Research Hub

Funded Research Opportunity

Data@Work has selected recipients of the Research Hub Funded Research Opportunity. Stay tuned for more information!

Technical Details

For researchers interested in the work we’ve been doing, here is an overview. We also post updates to our blog about our work.

What Has Been Done

Currently we have a machine learning pipeline, in Airflow and as Python scripts, that provides classes to preprocess data with NLP transforms, ingest known skills and job titles, label skills, job titles within corpora, generate special representations of skills and job titles and, finally, save off version controlled instances of data and outputs flowing through the pipeline stages.

What We Are Working on Now

We are working on skill tagging, using NLP and other strategies, to identify known and unknown skills present in our partner data and other open sources. We use neural embeddings supplemented with more traditional NLP techniques for additional flexibility, which is especially important as new skills emerge over time.

We are improving job title, skill normalization research by making it easier to automatically extract skills, job titles as their canonical names from unlabeled text. We’re particularly focused on generating suitable neural embeddings for these canonical normalization tasks and building skill and job title classifiers. Skill tagging, the initial identification of skills from unannotated text, also falls into this area of work. For skill, job title labeling, we are exploring active learning techniques to help the community generate quality labeled data at scale that is available for all.

For Developers

Open Skills Documentation Site

Research Papers Used