Linear Digressions

  • Autor: Vários
  • Narrador: Vários
  • Editor: Podcast
  • Duración: 98:40:43
  • Mas informaciones

Informações:

Sinopsis

Linear Digressions is a podcast about machine learning and data science. Machine learning is being used to solve a ton of interesting problems, and to accomplish goals that were out of reach even a few short years ago.

Episodios

  • Deep Blue

    07/11/2016 Duración: 20min

    In 1997, Deep Blue was the IBM algorithm/computer that did what no one, at the time, though possible: it beat the world's best chess player. It turns out, though, that one of the most important moves in the matchup, where Deep Blue psyched out its opponent with a weird move, might not have been so inspired after all. It might have been nothing more than a bug in the program, and it changed computer science history. Relevant links: https://www.wired.com/2012/09/deep-blue-computer-bug/

  • Organizing Google's Datasets

    31/10/2016 Duración: 15min

    If you're a data scientist, there's a good chance you're used to working with a lot of data. But there's a lot of data, and then there's Google-scale amounts of data. Keeping all that data organized is a Google-sized task, and as it happens, they've built a system for that organizational challenge. This episode is all about that system, called Goods, and in particular we'll dig into some of the details of what makes this so tough. Relevant links: http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45390.pdf

  • Fighting Cancer with Data Science: Followup

    24/10/2016 Duración: 25min

    A few months ago, Katie started on a project for the Vice President's Cancer Moonshot surrounding how data can be used to better fight cancer. The project is all wrapped up now, so we wanted to tell you about how that work went and what changes to cancer data policy were suggested to the Vice President. See lineardigressions.com for links to the reports discussed on this episode.

  • The 19-year-old determining the US election

    17/10/2016 Duración: 12min

    Sick of the presidential election yet? We are too, but there's still almost a month to go, so let's just embrace it together. This week, we'll talk about one of the presidential polls, which has been kind of an outlier for quite a while. This week, the NY Times took a closer look at this poll, and was able to figure out the reason it's such an outlier. It all goes back to a 19-year-old African American man, living in Illinois, who really likes Donald Trump... Relevant Links: http://www.nytimes.com/2016/10/13/upshot/how-one-19-year-old-illinois-man-is-distorting-national-polling-averages.html followup article from LA Times, released after recording: http://www.latimes.com/politics/la-na-pol-daybreak-poll-questions-20161013-snap-story.html

  • How to Steal a Model

    09/10/2016 Duración: 13min

    What does it mean to steal a model? It means someone (the thief, presumably) can re-create the predictions of the model without having access to the algorithm itself, or the training data. Sound far-fetched? It isn't. If that person can ask for predictions from the model, and he (or she) asks just the right questions, the model can be reverse-engineered right out from under you. Relevant links: https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_tramer.pdf

  • Regularization

    03/10/2016 Duración: 17min

    Lots of data is usually seen as a good thing. And it is a good thing--except when it's not. In a lot of fields, a problem arises when you have many, many features, especially if there's a somewhat smaller number of cases to learn from; supervised machine learning algorithms break, or learn spurious or un-interpretable patterns. What to do? Regularization can be one of your best friends here--it's a method that penalizes overly complex models, which keeps the dimensionality of your model under control.

  • The Cold Start Problem

    26/09/2016 Duración: 15min

    You might sometimes find that it's hard to get started doing something, but once you're going, it gets easier. Turns out machine learning algorithms, and especially recommendation engines, feel the same way. The more they "know" about a user, like what movies they watch and how they rate them, the better they do at suggesting new movies, which is great until you realize that you have to start somewhere. The "cold start" problem will be our focus in this episode, both the heuristic solutions that help deal with it and a bit of realism about the importance of skepticism when someone claims a great solution to cold starts. Relevant links: http://repository.upenn.edu/cgi/viewcontent.cgi?article=1141&context=cis_papers

  • Open Source Software for Data Science

    19/09/2016 Duración: 20min

    If you work in tech, software or data science, there's an excellent chance you use tools that are built upon open source software. This is software that's built and distributed not for a profit, but because everyone benefits when we work together and share tools. Tim Head of scikit-optimize chats with us further about what it's like to maintain an open source library, how to get involved in open source, and why people like him need people like you to make it all work.

  • Scikit + Optimization = Scikit-Optimize

    12/09/2016 Duración: 15min

    We're excited to welcome a guest, Tim Head, who is one of the maintainers of the scikit-optimize package. With all the talk about optimization lately, it felt appropriate to get in a few words with someone who's out there making it happen for python. Relevant links: https://scikit-optimize.github.io/ http://www.wildtreetech.com/

  • Two Cultures: Machine Learning and Statistics

    05/09/2016 Duración: 17min

    It's a funny thing to realize, but data science modeling is usually about either explainability, interpretation and understanding, or it's about predictive accuracy. But usually not both--optimizing for one tends to compromise the other. Leo Breiman was one of the titans of both kinds of modeling, a statistician who helped bring machine learning into statistics and vice versa. In this episode, we unpack one of his seminal papers from 2001, when machine learning was just beginning to take root, and talk about how he made clear what machine learning could do for statistics and why it's so important. Relevant links: http://www.math.snu.ac.kr/~hichoi/machinelearning/(Breiman)%20Statistical%20Modeling--The%20Two%20Cultures.pdf

  • Optimization Solutions

    29/08/2016 Duración: 20min

    You've got an optimization problem to solve, and a less-than-forever amount of time in which to solve it. What do? Use a heuristic optimization algorithm, like a hill climber or simulated annealing--we cover both in this episode! Relevant link: http://www.lizsander.com/programming/2015/08/04/Heuristic-Search-Algorithms.html

  • Optimization Problems

    22/08/2016 Duración: 17min

    If modeling is about predicting the unknown, optimization tries to answer the question of what to do, what decision to make, to get the best results out of a given situation. Sometimes that's straightforward, but sometimes... not so much. What makes an optimization problem easy or hard, and what are some of the methods for finding optimal solutions to problems? Glad you asked! May we recommend our latest podcast episode to you?

  • Multi-level modeling for understanding DEADLY RADIOACTIVE GAS

    15/08/2016 Duración: 23min

    Ok, this episode is only sort of about DEADLY RADIOACTIVE GAS. It's mostly about multilevel modeling, which is a way of building models with data that has distinct, related subgroups within it. What are multilevel models used for? Elections (we can't get enough of 'em these days), understanding the effect that a good teacher can have on their students, and DEADLY RADIOACTIVE GAS. Relevant links: http://www.stat.columbia.edu/~gelman/research/published/multi2.pdf

  • How Polls Got Brexit "Wrong"

    08/08/2016 Duración: 15min

    Continuing the discussion of how polls do (and sometimes don't) tell us what to expect in upcoming elections--let's take a concrete example from the recent past, shall we? The Brexit referendum was, by and large, expected to shake out for "remain", but when the votes were counted, "leave" came out ahead. Everyone was shocked (SHOCKED!) but maybe the polls weren't as wrong as the pundits like to claim. Relevant links: http://www.slate.com/articles/news_and_politics/moneybox/2016/07/why_political_betting_markets_are_failing.html http://andrewgelman.com/2016/06/24/brexit-polling-what-went-wrong/

  • Election Forecasting

    01/08/2016 Duración: 28min

    Not sure if you heard, but there's an election going on right now. Polls, surveys, and projections about, as far as the eye can see. How to make sense of it all? How are the projections made? Which are some good ones to follow? We'll be your trusty guides through a crash course in election forecasting. Relevant links: http://www.wired.com/2016/06/civis-election-polling-clinton-sanders-trump/ http://election.princeton.edu/ http://projects.fivethirtyeight.com/2016-election-forecast/ http://www.nytimes.com/interactive/2016/upshot/presidential-polls-forecast.html?rref=collection%2Fsectioncollection%2Fupshot&action=click&contentCollection=upshot®ion=rank&module=package&version=highlights&contentPlacement=5&pgtype=sectionfront

  • Machine Learning for Genomics

    25/07/2016 Duración: 20min

    Genomics data is some of the biggest #bigdata, and doing machine learning on it is unlocking new ways of thinking about evolution, genomic diseases like cancer, and what really makes each of us different for everyone else. This episode touches on some of the things that make machine learning on genomics data so challenging, and the algorithms designed to do it anyway.

  • Climate Modeling

    18/07/2016 Duración: 19min

    Hot enough for you? Climate models suggest that it's only going to get warmer in the coming years. This episode unpacks those models, so you understand how they work. A lot of the episodes we do are about fun studies we hear about, like "if you're interested, this is kinda cool"--this episode is much more important than that. Understanding these models, and taking action on them where appropriate, will have huge implications in the years to come. Relevant links: https://climatesight.org/

  • Reinforcement Learning Gone Wrong

    11/07/2016 Duración: 28min

    Last week’s episode on artificial intelligence gets a huge payoff this week—we’ll explore a wonderful couple of papers about all the ways that artificial intelligence can go wrong. Malevolent actors? You bet. Collateral damage? Of course. Reward hacking? Naturally! It’s fun to think about, and the discussion starting now will have reverberations for decades to come. https://www.technologyreview.com/s/601519/how-to-create-a-malevolent-artificial-intelligence/ http://arxiv.org/abs/1605.02817 https://arxiv.org/abs/1606.06565

  • Reinforcement Learning for Artificial Intelligence

    03/07/2016 Duración: 18min

    There’s a ton of excitement about reinforcement learning, a form of semi-supervised machine learning that underpins a lot of today’s cutting-edge artificial intelligence algorithms. Here’s a crash course in the algorithmic machinery behind AlphaGo, and self-driving cars, and major logistical optimization projects—and the robots that, tomorrow, will clean our houses and (hopefully) not take over the world…

  • Differential Privacy: how to study people without being weird and gross

    27/06/2016 Duración: 18min

    Apple wants to study iPhone users' activities and use it to improve performance. Google collects data on what people are doing online to try to improve their Chrome browser. Do you like the idea of this data being collected? Maybe not, if it's being collected on you--but you probably also realize that there is some benefit to be had from the improved iPhones and web browsers. Differential privacy is a set of policies that walks the line between individual privacy and better data, including even some old-school tricks that scientists use to get people to answer embarrassing questions honestly. Relevant links: http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42852.pdf

página 11 de 15