The Data Den


Projects on Math + Visualization + Machine Learning
by Alexandru Papiu

Training Text to Image Diffusion Models in Keras

Text to Image Model in Keras This is a blog post accompanying the github repo here: github.com/apapiu/guided-diffusion-keras. Over time I will most likely add more posts related to the repo around in/outpainting, latent diffusion, stable diffusion etc. so make sure to come back :) Below I will talk about... [Read More]

Exploring Google Data

Visualize Google Location History: Here is an easy way to analyze your google history data. In order to get this data go to: https://takeout.google.com/settings/takeou and only tick “location history” for download. It might take a few hours or days before Google sends you an email but once that is... [Read More]

Tf-Idf Ridge Model Selection using Pipelines in Sklearn

Creating a pipeline to tune tf-idf + ridge regularization parameters and select the best model for text based predictions. I am going to dabble a bit into text mining in this post. The idea is very simple: we have a collection of documents (these could be emails, books or... [Read More]

Gender Neutral Baby Names

Gender neutral names are names that tend to be given to both girls and boys. I will be looking at this interesting subset of names and try to answer some questions: What have historically been the most gender neutral baby names? How do these names behave over time? ... [Read More]

A Shiny App honoring Pi

With March 14 looming on the horizon I decided to make a little interactive visualizaton on how to approximate Pi by randomly generating points in a square. This is a certainly not a novel idea but I think it touches on two really important topics: the nature of volume/area... [Read More]

Patterns in the Republican Primaries

I am going to focus on the Republican Primaries in early states, plot some maps and graphs and see if I can figure out any patterns in the ways people voted. Hopefully I will update this as more results start pouring in. I also try to build some prediction models... [Read More]

Interactive Maps of NYC: Biking, Ancestry and Dating

I’ve been playing with the ACS census data in New York City and made some maps that will hopefully reveal some interesting facets of the city . Instead of looking at the usual suspects like median income or density I’m going to try to show New York from slightly different... [Read More]

Polynomial Overfittting

The bias-variance tradeoff is one of the main buzzwords people hear when starting out with machine learning. Basically a lot of times we are faced with the choice between a flexible model that is prone to overfitting (high variance) and a simpler model who might not capture the entire signal... [Read More]