For a long time, we were training machine learning models by optimizing surrogate functions or log-likelihood rather than evaluation functions. For example, while the evaluation is the 0-1 loss, the

**Link:**https://arxiv.org/pdf/1602.04938v1.pdf

**Authors:**Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin I have been busy with finals for the past few days so this post is to make up for

Link: https://www.cs.cmu.edu/~yww/papers/naacl2016.pdf Authors: William Yang Wang, Yashar Mehdad, Dragomir R. Radev, Amanda Stent

**Summary**The paper tackles the problem of timeline summarization - extracting milestones of a news storyLink: http://www.ccs.neu.edu/home/luwang/papers/NAACL2016.pdf Authors: Lu Wang, Wang Ling

**Summary**Generate opinion summary using an encoder-decoder neural model. The problem can be formulated as given an input, which contains a setI came across a very interesting paper by Fraenkel and Schul on how negative phrases are constructed. It was a delighting moment when I realized that "these things are

Suppose we are evaluating a class of predictors on a data distribution $D$. This distribution generates data pairs $(x, y)$ where $y = f(x) + \epsilon$, $\epsilon$ is a zero-mean

This post relates to "Gibbs sampling for the uninitiated" (Resnik and Hardisty, 2010), a very helpful material for those who just began to learn about Bayesian inference