What I’m working on this week

9/10/2021 This will be a quick and informal update. These are some of the things that I’m working on right now. Started the first online course on Google Cloud’s Data Scientist / Machine Learning Engineer learning path. This course is a self-paced online course offer by Coursera called Google Cloud Big Data and Machine LearningContinue reading “What I’m working on this week”

Project: Using Deep Learning for Automatic Building Footprint Extraction from Satellite Imagery (Part 1)

Introduction This is a computer vision and deep learning project that I did for my Machine Vision course as part of my Masters degree in Data Science. I was particularly excited by both the advances in deep learning and the increased accessibility of large-scale, high resolution satellite image datasets. The practical applications of successful buildingContinue reading “Project: Using Deep Learning for Automatic Building Footprint Extraction from Satellite Imagery (Part 1)”

Project: AI algorithm using Alpha-Beta Pruning playing Tic-Tac-Toe

This is one of those personal projects that I worked on late at night. The code might not be my best work, but the application runs well, it’s fun to play against, and I learned a new algorithm from it. TL;DR I built an AI that can play a perfect strategy of Tic-Tac-Toe without machineContinue reading “Project: AI algorithm using Alpha-Beta Pruning playing Tic-Tac-Toe”

Project: Measuring Public Sentiment Towards Nuclear Energy Using Twitter Data

New deep learning tools such as BERT, the open-source transformer model, were all the rage in the NLP world when we began this class project. The point of NLP (natural language processing) is to automatically “read”, or summarize, lots of unstructured text, and transformer models are a new architecture of deep learning models that canContinue reading “Project: Measuring Public Sentiment Towards Nuclear Energy Using Twitter Data”

Pipeline Orchestration with Apache Airflow (Part 5):

I recently worked through Udacity’s Data Engineering nanodegree program which consisted of four lessons:  Data Modeling (PostgreSQL and Cassandra), Data Warehousing (Redshift), Data Lakes (Spark), and Pipeline Orchestration (Airflow). In this post, I’ll share some of my notes from the fourth and final lesson: Apache Airflow, which is an open-source pipeline orchestration software package andContinue reading “Pipeline Orchestration with Apache Airflow (Part 5):”

Data Lakes with Spark (Part 4):

I recently worked through Udacity’s Data Engineering nanodegree program which consisted of four lessons:  Data Modeling (PostgreSQL and Cassandra), Data Warehousing (Redshift), Data Lakes (Spark), and Pipeline Orchestration (Airflow). In this post, I’ll share some of my notes from the third lesson: Data Lakes. This lesson plan focused on using HDFS with Spark which areContinue reading “Data Lakes with Spark (Part 4):”

Data Warehousing (Part 3):

I recently worked through Udacity’s Data Engineering nanodegree program which consisted of four lessons:  Data Modeling (PostgreSQL and Cassandra) , Data Warehousing (Redshift), Data Lakes (Spark), and Pipeline Orchestration (Airflow). In this post, I’ll share some of my notes from the second lesson: Data Warehousing. This lesson plan focused on deploying data warehouses on AWSContinue reading “Data Warehousing (Part 3):”

Data Modeling (Part 2):

I recently worked through Udacity’s Data Engineering nanodegree program which consisted of four lessons:  Data Modeling (PostgreSQL and Cassandra) , Data Warehousing (Redshift), Data Lakes (Spark), and Pipeline Orchestration (Airflow). In this post, I’ll share some of my notes from the first lesson: Data Modeling. The lesson plan focused on two database managers in particular:Continue reading “Data Modeling (Part 2):”

Introduction to Data Pipelines (Part 1):

I recently worked through Udacity’s Data Engineering Nanodegree program which consisted of four lesson plans (~5 hrs of material each):  Data Modeling (PostgreSQL and Cassandra), Data Warehousing (Redshift), Data Lakes (Spark), and Pipeline Orchestration (Airflow). This program was focused on deploying Big Data pipelines on AWS (using Redshift, S3, EMR, Glue, Athena, Lambda, etc.), butContinue reading “Introduction to Data Pipelines (Part 1):”