The Data Den

Training Text to Image Diffusion Models in Keras

Posted on October 6, 2022

Text to Image Model in Keras This is a blog post accompanying the github repo here: github.com/apapiu/guided-diffusion-keras. Over time I will most likely add more posts related to the repo around in/outpainting, latent diffusion, stable diffusion etc. so make sure to come back :) Below I will talk about... [Read More]

Checking if a Die is Fair - A Simple Bayesian Approach

Posted on August 6, 2022

If I gave you a die and and asked you if it’s a fair die what would you do? Well most likely you would start rolling a few times and look at the data. Yet even with data it’s hard to know for sure! Even a fair die can lead... [Read More]

Tf-Idf Ridge Model Selection using Pipelines in Sklearn

Posted on August 4, 2016

Creating a pipeline to tune tf-idf + ridge regularization parameters and select the best model for text based predictions. I am going to dabble a bit into text mining in this post. The idea is very simple: we have a collection of documents (these could be emails, books or... [Read More]

Gender Neutral Baby Names

Posted on March 12, 2016

Gender neutral names are names that tend to be given to both girls and boys. I will be looking at this interesting subset of names and try to answer some questions: What have historically been the most gender neutral baby names? How do these names behave over time? ... [Read More]

A Shiny App honoring Pi

Posted on March 3, 2016

With March 14 looming on the horizon I decided to make a little interactive visualizaton on how to approximate Pi by randomly generating points in a square. This is a certainly not a novel idea but I think it touches on two really important topics: the nature of volume/area... [Read More]

Patterns in the Republican Primaries

Posted on February 25, 2016

I am going to focus on the Republican Primaries in early states, plot some maps and graphs and see if I can figure out any patterns in the ways people voted. Hopefully I will update this as more results start pouring in. I also try to build some prediction models... [Read More]

Interactive Maps of NYC: Biking, Ancestry and Dating

Posted on February 15, 2016

I’ve been playing with the ACS census data in New York City and made some maps that will hopefully reveal some interesting facets of the city . Instead of looking at the usual suspects like median income or density I’m going to try to show New York from slightly different... [Read More]

Polynomial Overfittting

Posted on January 17, 2016

The bias-variance tradeoff is one of the main buzzwords people hear when starting out with machine learning. Basically a lot of times we are faced with the choice between a flexible model that is prone to overfitting (high variance) and a simpler model who might not capture the entire signal... [Read More]

MNIST Digit Recognition: Exploratory Data Analysis and Prediction

Posted on January 2, 2016

We will be looking at the MNIST data set on Kaggle. The goal in this competition is to take an image of a handwritten single digit, and determine what that digit is. We’ll start with some exploratory data analysis and then trying to build some predictive models to predict the... [Read More]

Strategies for the Board Game Risk

Posted on November 7, 2015

The game of Risk is a turn-based strategy game where players battle each other to take over the world. Your aim is to control as many of the forty-two territories with the armies at your disposal. The way you gain ground is by attacking enemy territories via rolling dice. Here... [Read More]