Jonathan, Siddharth and Paepcke, Andreas (2007) SpotSigs: Near Duplicate Detection in Web Page Collections. Masters thesis, Stanford University.