This paper proposes an evaluation scheme to measure the performance of a system that detects hierarchical event structure for event coreference resolution. We show that each system output is represented as a forest of unordered trees, and introduce the notion of conceptual event hierarchy to simplify the evaluation process. We enu- merate the desiderata for a similarity met- ric to measure the system performance. We examine three metrics along with the desiderata, and show that metrics extended from MUC and BLANC are more ade- quate than a metric based on Simple Tree Matching.