Blog posts

Articles on stuff that interest me

How can we make linear regression better? Regularization.

If you haven't read my post on linear regression I invite you to do so here, but basically it is a method for modelling the relationship between variables $X_i$ and a target feature $y$ in a linear model. This modelling is done through learning weights $\theta_i$ for each $X_i$ supposing that our model looks something like this:

y = \sum_{i=1}^n\theta_i\cdot X_i

How does pruning work in CART ?

Ok so as we saw in previous parts, the CART algorithm allows us to build decision trees. Up till now we have built these trees until all leaves are pure, meaning they have only one class of examples (for classification trees), however this can lead to overfitting the training data which decreases the generalizability of our model, and therefore it's usefulness. This is where cost-complexity pruning comes into play.

emoj.yt (emoji URL-shortener)

August 19, 2020

This is a little write up of a very small project I did, inspired by Coding Garden with CJ on youtube & twitch (specifically this video), and Net Ninja express tutorials:
A URL-shortener that uses a sequence of emojis to encode each URL.
The code is available on github, and you can try it out at emoj.yt.

Implementing linear regression, math and Python!

January 9, 2020

Today I want to explain linear regression. It is one of the simplest statistical learning models and can be implemented in only a couple lines of Python code, in an efficient manner. Being so simple however does not mean it is not useful, in fact it can be very practical to explore relationships between features in a dataset and make predictions on a target value. Therefore I think it's important to understand how the method works and how the different parameters have an effect on the outcome.

KD-trees: Classification of n points in d-sized Euclidean space

April 11, 2019

This is a little writeup of a project I did in collaboration with a classmate while studying a algorithmic complexity class. We implemented a faster, but still exact, $k$ nearest neighbors classifier based on k-d trees. I learned a lot and hope this can be interesting to some of you.

Adanet - adaptive network structures

March 18, 2019

Introduction

The goal of this project is to reproduce the methods and experiments of the following paper:
C. Cortes, X. Gonzalvo, V. Kuznetsov, M. Mohri, S. Yang AdaNet: Adaptive Structural Learning of Artificial Neural Networks. We will try to reproduce their method that consists in building neural networks whose structure is learned and optimized at the same time as it’s weights.This method will be applied to a binary classification task on images from the CIFAR-10 dataset.

Let's implement the CART Algorithm

March 2, 2019

This is Part 3 of my decision trees series. This time around we are going to code a decision tree in Python. So I'm going to try to make this code as understandable as possible, but if you are not familiar with Object Oriented Programming (OOP) or recursion you might have a tougher time.

The CART Algorithm

February 27, 2019

This is Part 2. of my decision tree series. Here we will see how we can build a decision tree algorithmically using Leo Breiman's (One of the big, big names in decision trees) CART algorithm.

What are decision trees?

February 26, 2019

The first subject I want to tackle on this page is decision trees. What are they? How do they work? How can I make one?
I am planning to make a small series, ranging from explaining the concept, to implementing a decision tree inference algorithm and hopefully all the way up to implementing Random Forests.
All right let's get started.