Open Skills Project
The Open Skills Project is a public-private partnership lead by the University of Chicago focused on providing a dynamic, up-to-date, locally-relevant, and normalized taxonomy of skills and jobs that builds on and expands on the Department of Labor’s O*NET data resources. It’s aim is to improve our understanding of the labor market and reduce frictions in the workforce data ecosystem by enabling a more granular common language of skills among industry, academia, government, and nonprofit organizations.
Open Skills API
The Open Skills API is an interface for developers to build applications using data produced by the Open Skills Project.
Open Skills Research Hub
The Open Skills Research Hub is a collection of public datasets produced by the Open Skills Project for the purpose of collaborative research.
Understanding how our data relates to the labor market is important to us at Data@Work.
Funded Research Opportunity
Data@Work has selected recipients of the Research Hub Funded Research Opportunity. Stay tuned for more information!
For researchers interested in the work we’ve been doing, here is an overview. We also post updates to our blog about our work.
What Has Been Done
Currently we have a machine learning pipeline, in Airflow and as Python scripts, that provides classes to preprocess data with NLP transforms, ingest known skills and job titles, label skills, job titles within corpora, generate special representations of skills and job titles and, finally, save off version controlled instances of data and outputs flowing through the pipeline stages.
What We Are Working on Now
We are working on skill tagging, using NLP and other strategies, to identify known and unknown skills present in our partner data and other open sources. We use neural embeddings supplemented with more traditional NLP techniques for additional flexibility, which is especially important as new skills emerge over time.
We are improving job title, skill normalization research by making it easier to automatically extract skills, job titles as their canonical names from unlabeled text. We’re particularly focused on generating suitable neural embeddings for these canonical normalization tasks and building skill and job title classifiers. Skill tagging, the initial identification of skills from unannotated text, also falls into this area of work. For skill, job title labeling, we are exploring active learning techniques to help the community generate quality labeled data at scale that is available for all.
Research Papers Used
- Concept-Based Information Retrieval using Explicit Semantic Analysis
- Bringing Order to the Job Market: Efficient Job Offer Categorization in E-Recruitment
- Semantic Similarity Strategies for Job Title Classification
- An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation
- Distributed Representations of Words and Phrases and their Compositionality