Nice to meet you,
I'm Calvin.

🗽Living and working in NYC

💻 Software engineer at CLEAR building out backend systems

Searching for the best coffee in the city (or skiing if I'm lucky)

Time Anomalies | Workday

Tools & frameworks:

Jupyter, Xpresso (Java-based OOP language)

Overview

Millions of users enter hourly time during their shift every week using time management software. While most of the data entered reflects the actual worked time, users can make mistakes. Once time is submitted, managers and timekeepers need to spend considerable time verifying user input before the time is processed for payroll.1

As part of the Time and Scheduling Hub for managers, Workday Time Anomalies automatically reveals possible time-entry errors and alerts managers about unusual time entries. The goal is to save time and improve payroll and labor cost accuracy.2

At Workday, I built out components of the cross-team Time Anomalies project that allows managers to detect these unusual time entries using machine learning. I also worked to enhance the delivery of this information to users with the Time Approval Summary filters, which allows managers and supervisors to easily discern and sort through descrepancies with their worker's time.

I deployed the feature through an early adopter phase of 10 customer opt-ins that ultimately scaled to general availability for all Time Tracking customers in Workday 2021.

Information & details
1. Workday Time Management
2. Workday Payroll & Time Blog Post

Job Matcher | Project

Tools & frameworks:

Python, NLP, SpaCy, Scrapy, AWS Ec2

Overview

Create a natural language processing model that matched user resumes with jobs and delivered a daily ranked list of prospective jobs postings.

Lessons Learned / Challenges
  • Balancing performance and functionality: The job scraping and matching for users involved compute-intensive operations due to the ML model. Using serverless asychronous data processing could not only alleviate server load but could also improve user experience.
  • Optimizing the database: A NoSQL database would allow us to quickly update new jobs with flexibility to change our schema as we prototyped.
  • Tuning the algorithm: We could stack programatic rules with the machine learning algorithm to deliver more accurate predictionos.
Details

This project was inspired by a friend who came to me with a problem. Recently laid off from a large Seattle-area tech company, he was looking for a new job. While there were many services to help find jobs, finding a good match with unique skills and experiences required many hours manually parsing job descriptions for a good fit.

What if there was a service that could process and match your resume with related jobs and deliver a ranked list of results daily to aid your job search?

With that idea in mind, we worked together to create Job Matcher, a service that scrapes job posting websites and uses natural language processing to compute similarities between the postings and user resumes. It would prioritize skills (for example, Ruby or AWS Sagemaker) and return to the user a set of matched jobs ranked by the machine learning model's scoring criteria daily. The job position scraping and matching would be automated via a daily asyncronous API call to an EC2 instance or using a Lambda function so by the time the user logged in, the results would already be persisted and presented to them on the client side.

Architecture

The application consists of several main components: The flask server handles the client interaction including user identification and metadata persistence (resume, email, etc.). To communicate with the MongoDB cluster, we exposed pathways via FastAPI to allow for new user creation, updating of user details, and getting the latest matched resumes. Depending on which API pathway is called, we utilize the SpaCy NLP library to look for skills in the jobs and calculate semantic similarity with the user's information.

One particular difficult aspect of the project was determining when the job scraping and matching would run. Since the application needed to not only scrape jobs and match them with the user's data, the user might have to wait a long time to see their results if we performed these actions syncronously.

For future improvements, we could utilize a server-less service like Amazon Lambda that could be triggered daily to run the scraper and matcher. Additionally, Lambda Layers allows multiple dependencies like SpaCy and PyScraper to be incorporated easily into the deployed Lambda function. Consequently, dependencies can be managed with rewriting the Docker script and if we wanted to improve the algorithm in the future with new functionality, new packages could be added.