Almost every website on the internet, including phishing sites, has at least one favicon associated with it. In this blog, we will look at how we can use favicons to discover phishing and brand impersonation websites.
A favicon (short for favorite icon) is the icon associated with a website/web page. Most desktop browsers show the favicon next to the page title in the tab bar and the shortcuts, bookmarks, and history. Favicons offer users a quick and visual way to find their favorite websites quickly by looking at the brand favicon while going through browser history, bookmarks, or if they have multiple tabs open.
For most websites, we can find favicons at example.com/favicon.ico path. On other webpages, we can find favicon URLs in the icon, and shortcut-icon attribute in the head of DOM HTML. The filetype associated with favicons are generally .ico, .png, .svg etc.
The Paypal website’s favicon can be found at https://www.paypal.com/favicon.ico
Favicons in Phishing Kits
Instead of creating an entire scam/phishing webpage from scratch, some phishing kit developers clone the brand’s official website and use it as a base template. In the cloned page, input fields and submit form functionality are modified to send data to an endpoint that the malicious actors control.
Sometimes this laziness on the attacker’s part leads to a set of phishing sites & phishing kits having the same exact favicon as the original brand’s website. If we can compare the favicon of all websites, then we should be able to identify the websites that are using the exact same favicon.
Favicons found in popular phishing kits can also be monitored. The same phishing kits get traded and sold on the underground markets multiple times.
Instead of comparing the favicon image with other favicon images, a better and more efficient approach is to calculate the hash of the favicon and compare it with the favicon hashes of other websites.
Internet-connected devices Search Engines
Internet-connected devices search engines services like Shodan, ZoomEye, and Censys scan the internet on a regular basis and save data about each IP address, open ports, and services associated with them.
These platforms also take a snapshot of responses, headers, page DOM, favicons, and much more for every result that uses HTTP.
Shodan uses the mmh3 hash format for favicons. Similarly, ZoomEye uses both the mmh3 and MD5 hash for favicons.
Favicons hash can be queried on these search engines to find servers that are hosting websites with the same favicon.
Shodan
MMH3 hash for any brand’s favicon can be used to run queries against the database of Shodan’s collected data.
The following python3 code snippet can be used to generate the mmh3 hash of a favicon.
import requests,mmh3,base64
res = requests.get('https://www.paypal.com/favicon.ico')
favicon = base64.encodebytes(res.content)
mmh3_hash = mmh3.hash(favicon)
print(mmh3_hash)
The Shodan query below can be used to find servers that are hosting websites are using PayPal’s favicon.
http.favicon.hash:309020573
Searching PayPal’s favicon hash gives out 194 servers in results that are hosting websites with PayPal favicon as their webpage favicon. Not all of these servers are going to be phishing or scam pages. Some of these are going to be legitimate PayPal-owned sites.
We can narrow down results by filtering out servers that have a paypal.com SSL certificate and servers which are part of Paypal’s network.
http.favicon.hash:309020573 -ssl:*paypal.com -org:"PayPal Inc."
Now we only get 76 results and some of these are PayPal phish pages.
Historical Trend for a Favicon Usage
Shodan offers a historical visualization of the number of results for any given query. It can be used to look for trends for a favicon over a time range. This might be helpful in uncovering past phishing campaigns.
ZoomEye
Similar to Shodan, ZoomEye is another internet-connected device search engine. For the favicon search, ZoomEye tends to produce more interesting results as compared to Shodan.
Searching the ZoomEye for the same PayPal favicon hash results over 1311 servers. Results can further be analyzed & filtered out to check for phish & scams.
Query : iconhash:"309020573"
We can use favicons from ongoing phish & scam pages to find out other webservers which are hosting the same phishing kits possibly being used by the same actors.
Limitations
- Results discovered using favicon search include the results for the legitimate websites of brands and possible phishing & scam websites. Separating legitimate websites from phishing is where Bolster’s automated phish detection AI is good at. [Check Out Our Free Community Scanner CheckPhish]
- Diversity of favicons data set. Favicons being used on any brand’s website today may be different than they were a few months ago, depending upon if they had any design changes. And not all phishing pages use the official or up-to-date brand favicons. Building a historical collection of a brand’s favicons using the Internet Archive’s Wayback machine and collecting the favicons being used in popular phishing kits helps increase the discovery of such websites.
- Phish pages using shared hosting on free hosting platforms similar to 000webhost, and firebase hosting can’t be detected since those sites can’t be accessed directly by IP address, Hence those aren’t indexed on such search engines.
About Us
This blog is published by Bolster Research Labs. We are also creators of https://checkphish.ai – a free URL scanner to detect phishing and scams sites in real time.
If you are interested in advanced research and uncovering of new scams or working with cutting edge AI, come work with us at the Bolster Research Labs. Check out open positions here.