Document filtering is increasingly deployed in Web environments to reduce information overload of users. We formulate online information filtering as a reinforcement learning problem, i.e., TD(0). The goal is to learn user profiles that best represent information needs and thus maximize the expected value of user relevance feedback. A method is then presented that acquires reinforcement signals automatically by estimating user's implicit feedback from direct observations of browsing behaviors. This "learning by observation" approach is contrasted with conventional relevance feedback methods which require explicit user feedbacks.
展开▼