We use the word information a lot in daily life. After reading a good book, we say that it was informative, and after we watch boring commercials we complain that we wasted our time — we got no useful information. Because high information tends to be useful, people have tried to come up with a more rigorous definition of exactly what information means.

The most popular definition of information is that it measures **how surprising something is**. The more surprising an event is, the more information it contains. For example, consider two possible events: tomorrow there will be a hurricane…

In any machine learning problem, having good data is just as important as having a good model. Or, as the famous saying about bad data goes — garbage in, garbage out. In this article we explore some common, yet not well recognized sources of bad data.

Topcoding and bottomcoding occur when a dataset replaces very high or very low numbers with the same value. This is sometimes used to protect the identity of people in the dataset. For example, consider a publicly available dataset with one part of the data being income. There aren’t many people who have incomes above…

As deep learning techniques continue to advance, image recognition systems are becoming more and more powerful. With this power comes great reward — helping diagnose disease from x-rays and self-driving cars are just two examples. But there is also potential for harm, particularly concerning facial recognition. In the future, it’s possible that surveillance cameras with state-of-the-art facial recognition technology could pop up on every street corner, effectively eliminating any privacy we still have. Fortunately, some researchers are already coming up with ways to counteract deep learning based facial recognition. …

In this article, we’ll take a look at two binary search problems and understand the thought process behind their solutions. Rather than simply stating the solutions, I think seeing how you arrive at the solutions leads to better understanding.

First, we should review the basic binary search algorithm. The way it works is this. We have a sorted ascending list, and we are searching for a number in the list. We look at the median of the list and check if the number we’re looking for is bigger or smaller. If it is bigger, we can eliminate the first half…

In classification problems, we often assume that every misclassification is equally bad. Sometimes, however, this isn’t true. Consider the example of trying to classify whether or not there is a terrorist threat. There are two types of misclassifications: either we predict there is a threat but there is actually no threat (false positive), or we predict there is no threat but there actually is a threat (false negative). Clearly the false negative is much more dangerous than the false positive — we might end up wasting time and money in the false positive case, but people might die in the…

Online advertising is ubiquitous these days. You can’t go many places online without being offered to buy something — perhaps a shirt, or maybe a pair of headphones. What’s more, these ads have uncanny levels of accuracy. I’m often surprised at how many things I actually would buy from the ads that are shown to me. I’ve been equal parts creeped out and impressed by these ads and decided to figure out how they worked. It turns out that **recommender systems**, as these ads are called, are actually quite intuitive. …

In machine learning algorithms, there are two kinds of parameters — model parameters and hyperparameters. Model parameters are learned through the training process, for example the weights of a neural network. Hyperparameters are used to control to training process; consequently, they must be set before training begins. Some examples of hyperparameters in deep learning are learning rate and batch size. One problem overlooked by many machine learning practitioners is exactly how to set these hyperparameters. Failure to do good hyperparameter tuning might nullify all your hard work building the model. Fortunately, there is a general, heuristic, approach for picking hyperparameters…

I’ve been playing chess since first grade. At first I was terrible, but I kept at it and managed to get to tournament level. After all this time at the board, I’ve realized that the skills I learned aren’t just useful for chess alone. They can be applied to life as well. Here are some of the most useful ones.

In chess, tactics are short (2–3 move) sequences of moves that net you an immediate advantage. For example, your opponent could have a knight defended by their queen. You might see the tactic of distracting their queen somewhere else with…

There’s a lot of hype around reinforcement learning (RL) these days, and rightfully so. Ever since DeepMind published its paper “Playing Atari with Deep Reinforcement Learning”, many promising results have come out, with perhaps the most famous one being AlphaGo. Before we can understand how these models work, however, we need to understand some basic principles of reinforcement learning. I think the best introduction to these concepts is through a much simpler problem — the so-called “k-armed bandit problem”.

First, let’s define what the k-armed bandit problem is. The phrase “k-armed bandit” conjures up images of Boba Fett from Star…

One roadblock in using neural networks are the power, memory, and time needed to run the network. This is problematic for mobile and internet-of-things devices that don’t have powerful CPUs or large memories. Binarized neural networks are a solution to this problem. By using binary values instead of floating point values, the network can be computed faster, and with less memory and power.

Conceptually, binarized neural networks (BNN) are similar to regular feedforward neural networks (NN). One difference is that the weights and activations in a BNN are constrained to be only two values: 1 and -1, hence the name…

Amazon Engineer. I was into data before it was big.