Monday, May 14, 2007

CANTINA: A Content-Based Approach to Detecting Phishing Web Sites

Our paper entitled CANTINA: A Content-Based Approach to Detecting Phishing Web Sites was presented at WWW2007.


Phishing is a significant problem involving fraudulent email and web sites that trick unsuspecting users into revealing private information. In this paper, we present the design, implementation, and evaluation of CANTINA, a novel, content-based approach to detecting phishing web sites, based on the TF-IDF information retrieval algorithm. We also discuss the design and evaluation of several heuristics we developed to reduce false positives. Our experiments show that CANTINA is good at detecting phishing sites, correctly labeling approximately 95% of phishing sites.


Paper: PDF
Presentation: PPT

2 comments:

jas0nh0ng said...

We know of one failure mode thus far for CANTINA, which is that it doesn't work for content on a site spoofing itself. For example, a fake MySpace page faking a MySpace login.

A Stanfurd student also suggested that CANTINA might not work as well for long-tail sites either.

Anonymous said...

It sounds like CANTINA might have some problems with intranets too, since intranet content is indexed by Google.