CANTINA: A Content-Based Approach to Detecting Phishing Web Sites

Our paper entitled CANTINA: A Content-Based Approach to Detecting Phishing Web Sites was presented at WWW2007.


Phishing is a significant problem involving fraudulent email and web sites that trick unsuspecting users into revealing private information. In this paper, we present the design, implementation, and evaluation of CANTINA, a novel, content-based approach to detecting phishing web sites, based on the TF-IDF information retrieval algorithm. We also discuss the design and evaluation of several heuristics we developed to reduce false positives. Our experiments show that CANTINA is good at detecting phishing sites, correctly labeling approximately 95% of phishing sites.


Paper: PDF
Presentation: PPT

Comments

jas0nh0ng said…
We know of one failure mode thus far for CANTINA, which is that it doesn't work for content on a site spoofing itself. For example, a fake MySpace page faking a MySpace login.

A Stanfurd student also suggested that CANTINA might not work as well for long-tail sites either.
Anonymous said…
It sounds like CANTINA might have some problems with intranets too, since intranet content is indexed by Google.

Popular posts from this blog

How to Fix a Jammed Toyota Camry Trunk

Web 2.0 and Research

NYTimes: It Takes a Cyber Village to Catch an Auto Thief