Introduction
I am Nhat Pham, an aspiring computer scientist, a machine learning enthusiast, and a functional programmer. Currently, I am a Computer Science and Data Science student at the University of Maryland, College Park. Previously, I was a Data Science and Statistics student at the University of Washington, Seattle, and a Software Engineering Intern at Verta.
My main interest is building scalable distributed systems, as well as data-driven machine learning applications. You can find my CV here, my UW transcript here, and my UMD transcript here.
Below is a summary of some of my most recent works and projects.
Work Experience
Software Engineering Intern at Verta (June-November 2020)
At Verta, I was fortunate enough to have worked on many parts of the company’s tech stack:
- I implemented the dataset versioning interface for Verta’s Scala client, inspired by GitHub’s design, with immutable datatypes (Repository, Commit) and operations (update, revert, merge).
- I worked on the new machine learning model registry and deployment interface for Verta’s Python client, wrote a command-line tool that helps user to automate their deployment workflow, and maintained the deployment backend, written in Golang and MySQL. I also wrote end-to-end deployment tests to prevent regression.
- I set up the Jenkins test pipeline for the Scala client, and configured a Kubernetes pods for the pipeline to run on.
Awards and Competitions
AIVIVN Sentiment Analysis Competition
My team won a Vietnamese sentiment analysis competition, using an ensemble of 1D Convolutional Neural Network, Hierarchical Recurrent Neural Network, and Self-Attention Recurrent Neural Network.
Our solution (in Vietnamese) can be found here.
GitHub Repository: AIVIVN_1
Emotion Recognition Competition
My team placed 8th in the public round of an emotion recognition based on audio, using a convolutional neural network on top of MFCC representation.
GitHub Repository: erc
Personal Projects:
Machine Learning:
Neural Network Toolbox
Whenever I did a deep learning project, I always had to re-implement everything from scratch, from the neural network components to the training procedure. At first, this was helpful because it required me to really understand deep learning concepts and procedures. However, these chores quickly became annoyingly repetitive. So I developed this small framework, which allows me to focus on the high-level design and prototype of models and ideas, without having to re-invent the building blocks every time.
GitHub Repository: nn-toolbox
Arbitrary Style Transfer
I implement a neural network that can transfer the style of an arbitrary drawing to another arbitrary photo.
GitHub Repository: torch-styletransfer
Denoising “Dirty” Documents
This project is based on a Kaggle competition. We apply a Convolutional Neural Network to “clean up” images of documents affected by noises (e.g coffee stains).
GitHub Repository: DenoisingDirtyDocuments
Detecting Insults in Social Commentary
This project is based on a Kaggle competition. I implemented a Recurrent Neural Network using Keras to detect whether a comment is an insult.
GitHub Repository: DetectingInsults
Web Programming and Visualization:
PumpItUp Visualization
Based on the data of a DrivenData competition, I have created a simple interactive map visualization of wells (functional and otherwise) in Tanzania, using the shiny and leaflet packages in R. You can see it here.
Other Work:
fp-course
GitHub Repository: fp-course
My solution to the exercises of fp-course, an introductory course on functional programming concepts in Haskell.
lets-lens
GitHub Repository: lets-lens
My solution to the exercises of lets-lens, a course on optics (functional data accessors and modifiers) in Haskell.
zippers-course
GitHub Repository: zippers-course
My solution to the exercises of zippers-course, a course on zippers (functional data structure traversal and modification) in Haskell.
Data Structures and Algorithms Practice:
GitHub Repository: AlgorithmPractice
This is where I store my data structures and algorithms practice, most of which are solutions to LeetCode problems. Main languages are Java, Python, and C++.
Database Practice:
GitHub Repository: DatabasePractice
This is where I store my database SQL queries practice.