I am a graduate student at UC Berkeley, Master's in Operations Research (Data Science and Machine Learning). I have more than 4 years of work experience in Analytics and Data Science.
In my free time, I enjoy hanging out with friends, singing & playing piano/guitar, reading books (and Quora), playing tennis (since 2011) and travelling.
Feel free to contact me about opportunities in Data Science & Analytics at anh.nnguyen@berkeley.edu.
Python, SQL (MySQL, PostgreSQL), R, STATA, SPSS
Linear and Logistic Regression, Classification and Regression Trees, Random Forest, XGBoost, K-means Clustering, Time Series Models (ARIMA), NLP, Boostrapping, K-fold Cross Validation
Tableau, Python (Matplotlib, Seaborn, Altair, Plotly, Bokeh)
Hypothesis Testing, A/B Testing, Design of Experiments, MLE, Probability Distributions, Bayes Theorem
A comprehensive understanding of business goals and industry, to discern the problems to help the business sustain and grow as well as exploring new business opportunities
A Story teller -- present insights and interesting patterns in a clear and concise manner to business executives
Project Management, Problem-solving Skills, Detail-oriented, Creativity, Team Player, Curiosity, Critical Thinking, Open-minded
Forecasting metal price listed in the commodity market based on the historical price using different time-series prediction models.
Tools: Python, Time Series Models (ARIMA), Linear Regression Models (Ridge, Lasso), XGBoost, GRU
Trained and compared the performance of the machine learning models with two different missing-data imputation: mean imputation and guess matrix.
Tools: Python, Scikit-Learn, Logistic Regression, Random Forest Classifier, AdaBoost, Perceptron.
Perform Sentiment Analysis on IMDB Movie Reviews using Unigram and Bigram setting, compared model performances with and without stemming and lemmatizing methods.
Tools: Python, Sci-kit Learn, Random Forest Classifier, Stemming, Lemmatizing.
Predict building heating load with machine learning techniques and classification models including Linear Regression, Logistic Classficiation Regression, Feature Scaling (Unity Based Normalization).
Tools: Python, Scikit-Learn, Feature Scaling, Linear Regression, Logistic Classification Model.
Using the hand-written digit database MNIST, create a machine learning model to recognize hand-written digits. By using Tensorflow, the model was trained to recognize digits by having it "look" at thousands of examples and check the model's accuracy with the test data.
Tools: Python, Sci-kit Learn, Tensorflow, Vanilla Dense Neural Network (Vanilla DNN)
Using Webscraping and Text Manipulation to perform analysis on Presidential Debates for the years from 1960 to 2012
Tools: Python, Webscrapping with BeautifulSoup
Optimizing different product types to maximize the company’s net profits using linear programming (LP) model, performing sensitivity analysis on the constraints and the variables along with business plans and recommendations.
Tools: AMPL (A Mathematical Programming Language), Linear Programming Model, Mixed Integer Linear Programming Model, LaTeX