Interest in differential privacy is growing rapidly. As evidence of this, here’s the result of a Google Ngram search [1] on “differential privacy.” When I first mentioned differential privacy to consulting leads a few years ago, not many had heard of it. Now most are familiar with the term, though …

I had a recent discussion with someone concerning a 50 sigma event, and that conversation prompted this post. When people count “sigmas” they usually have normal distributions in mind. And six-sigma events so rare for normal random variables that it’s more likely the phenomena under consideration doesn’t have a normal …

Much of what you’ll read about power laws in popular literature is not mathematically accurate, but still useful. A lot of probability distributions besides power laws look approximately linear on a log-log plot, particularly over part of their range. The usual conclusion from this observation is that much of the …

Companies collect and aggregate location data from millions of people’s phones. Then that…Tags: location, privacy, The Markup

Derek Thompson for The Atlantic highlights recent research comparing mortality in America against…Tags: Atlantic, Derek Thompson, mortality

Sept. 23, 2021, 7:02 a.m.

How Humans Judge Machines is an academic publication covering the results of experiments…Tags: book, interaction, machines

For their 5 Levels series, Wired brought in Hilary Mason to explain machine…Tags: Hilary Mason, machine learning, Wired

Missing data throws a monkey wrench into otherwise elegant plans. Yesterday’s post on genetic sequence data illustrates this point. DNA sequences consist of four bases, but we need to make provision for storing a fifth value for unknowns. If you know there’s a base in a particular position, but you …

I needed to know the frequencies of letters at the beginning of words for a project. The overall frequency of letters, wherever they appear in a word, is well known. Initial frequencies are not so common, so I did a little experiment. I downloaded the Canterbury Corpus and looked at …

The Wall Street Journal tested out the TikTok algorithm with bots to see…Tags: algorithm, TikTok, Wall Street Journal

Background The COVID-19 pandemic has had huge impacts on the economy of the U.S., and the restaurant industry has been among the hardest hit. To adapt to the pandemic, restaurants turned to technology. 2020 brought about contactless ordering on tablets, QR code menus, and an explosion in the usage of …

Joshua Barbeau fed an AI chatbot with old texts from his fiancee who…Tags: AI, chatbot, death, fiancee

July 29, 2021, 10:03 a.m.

Sebastian Raschka made 170 videos on deep learning, and you can watch all…Tags: deep larning, Python, Sebastian Raschka

July 21, 2021, 11:44 a.m.

Introduction to Modern Statistics by Mine Cetinkaya-Rundel and Johanna Hardin is a free-to-download…Tags: book, introduction

Suppose in a company of N employees, m are chosen randomly for drug screening. In two independent screenings, what is the probability that someone will be picked both times? It may be unlikely that any given individual will be picked twice, while being very likely that someone will be picked …

ProPublica anonymously obtained billionaires’ tax returns. Combining the data with Forbes’ billionaire wealth…Tags: billionaires, money, ProPublica, taxes

In the sixty years since Arthur Samuel first published his seminal machine learning work, artificial intelligence has advanced from being not as smart as a flatworm to having less common sense than a house cat. Read more

Here’s a way to find a 95% confidence interval for any parameter θ. With probability 0.95, return the real line. With probability 0.05, return the empty set. Clearly 95% of the time this procedure will return an interval that contains θ. This example shows the difference between a confidence interval …