
Understanding Deep Residual NetworksDeep residual networks (ResNet) took the deep learning world by storm when Microsoft Research released Deep Residual Learning for Image Recognition. These networks led to 1stplace winning entries ...

GCP Products OverviewIntroduction Cloud Compute Engine gcloud GCP Storage Solutions IAM Cloud BigQuery Could Datalab Cloud Datastudio Cloud Pub/Sub Cloud Dataproc Cloud Dataflow Stackdriver Machine Learni...

Understanding Ensemble machine learning methodsEnsemble methods in machine learning is a class of method which combines multiple machine learning models into one predictive model in order to have a boost on the prediction accuracy over a single...

Understanding Word2Vec and Doc2VecWord embeddings are a type of word representation which stores the contextual information in a lowdimensional vector. This approach gained extreme popularity with the introduction of Word2Vec in 2...

Understanding the mathematics behind Linear Discriminant Analysis (LDA)We are already familiar with Logistic Regression classification algorithm. It works fine for twoclass classification problems. However, if there are more than two classes, Logistic Regression will...

Understanding the mathematics behind linear regressionToday we are going to talk about linear regression, one of the most well known and well understood algorithms in machine learning. We are going to focus on the simple linear regression, which conta...

Understanding the mathematics behind Naive BayesNaive Bayes, or called Naive Bayes classifier, is a classifier based on Bayes Theorem with the naive assumption that features are independent of each other. Without further ado, let’s get straight ...

Landsat 8 data access from Google Cloud StorageLandsat 8 bandsLandsat 8 is one of NASA’s EOS (Earth Observing System) satellites. It was launched in February 2013 and was aimed to replace its predecessor Landsat 7. The satellite collects images...

Understanding the mathematics behind Support Vector MachinesSupport Vector Machine (SVM) is one of the most powerful outofthebox supervised machine learning algorithms. Unlike many other machine learning algorithms such as neural networks, you don’t have...

A data visualization of the impact of tariff spat on midterm electionThe tarrif spat between US and China is on the brink of escalating to a fullblown trade war, which would be a disaster for the global economy. On Tuesday, April 3rd, the Trump administration propo...

Geotagged tweets collection using Twitter Streaming API and databaseOne research I’m working on is to use Twitter data to predict crime patterns. So, the first thing I need to do is to collect Twitter data. Specifically, since I’m interested in discovering the spat...

Machine Learning Classification Model Evaluation MetricsAfter training the machine learning classification model, we should always evaluate the model to determine if it does a good job of predicting the target value on new unseen data. Among the various...

Running Jupyter Notebook with Apache Spark on Google Cloud Compute EngineApache Spark is a powerful opensource clustercomputing framework. Compared to Apache Hadoop, especially Hadoop MapReduce, Spark has advantages such as speed, generality, ease of use, and interact...

How to install and set up MySQL on MacMySQL is probably the most popular open source SQL relational database. Unfortunately, MacOS doesn’t ship with MySQL. I still remember when I took my first database class years ago, the professor h...

Using Python subprocess for parallel processingUnlike Javascript, which is naturally asynchronous, Python interpreter executes codes in a sequential order. The subsequent jobs have to wait until the completeness of the previous ones. This behav...

My first blogThis is my first blog, EVER! I’ve always been thinking of writing something about the work I do, sharing the knowledge I know, and of course, learning new stuff in turn. Now finally I made my decis...