Distributional measures of lexical similar- ity and kernel methods for classification are well-known tools in Natural Language Processing. We bring these two meth- ods together by introducing distributional kernels that compare co-occurrence prob- ability distributions. We demonstrate the effectiveness of these kernels by present- ing state-of-the-art results on datasets for three semantic classification: compound noun interpretation, identification of se- mantic relations between nominals and se- mantic classification of verbs. Finally, we consider explanations for the impressive performance of distributional kernels and sketch some promising generalisations.