"Distributed Statistics for Peer-to-Peer Web Search" Gerhard Weikum Max-Planck Institute for Informatics, Saarbruecken, Germany Abstract: The peer-to-peer (P2P) computing paradigm is an intriguing alternative to Google-style search engines for querying and ranking Web content. In a network with many thousands or millions of peers the storage and access load requirements per peer are much lighter than for a centralized Google-like server farm; thus more powerful techniques from information retrieval, statistical learning, computational linguistics, and ontological reasoning can be employed on each peer's local search engine for boosting the quality of search results. In addition, peers can dynamically collaborate on advanced and particularly difficult queries. Moroever, a peer-to-peer setting is ideally suited to capture local user behavior, like query logs and click streams, and disseminate and aggregate this information in the network, at the discretion of the corresponding user, in order to incorporate richer cognitive models. On the other hand, P2P Web search also poses major challenges, one of them being the computation, dissemination, and efficient management of statistical measures that are crucial for good search strategies and ranking algorithms. Statistics (e.g., local and global document frequencies, overlap among peers' contents, PageRank-style authority) need to be acquired and maintained in a decentralized manner for scalability, they need to be compact for efficient communication, and they need to provide sufficiently accurate estimators of various measures of interest. This talk will give an overview on our recent and ongoing research on distributed statistics management for P2P Web search. The developed methods have been implemented in the Minerva prototype system, an experimental testbed for P2P research. ---