Google the Security Incident Detector

Wednesday, July 6, 2011 Posted by Corey Harrell
Search engines are not only great tools for locating information across the Internet but they can alert organizations of potential security incidents. Others have already published methods on how to use search engines to locate information including web pages infected with SPAM links and common vulnerabilities. In addition to this information, search engines can help determine if a company's data has been stolen. Google queries and alerts can be leverage to assist organizations with noticing security issues such as data leakage, website vulnerabilities, and stolen information. This post will discuss an approach of using Google to search and monitor portions of the Internet for specific security issues.

Search Company’s Website for Security Issues

The term Google hacking refers to when search engine - such as Google - is used to locate weaknesses on the Internet. This is accomplished by building queries a specific way to locate sites containing software vulnerabilities, misconfigurations, or sensitive information. The same technique can be used by organizations to identify security issues on their own websites. What the specific issues are will be dependent on the organization but two possibilities are sensitive information and infected web pages.

     Sensitive Information

The business dictionary defines sensitive information as any information if compromised “could cause serious harm to the organization owning it”. Numerous types of data fit into this definition but three examples are: personally identifiable information (PII), credit card information, and network information.

PII can uniquely identify or locate a single person, and PII includes social security numbers, date of births, and addresses. A data breach from a few months ago illustrates the risk of PII being compromised. The personal information (names and social security numbers) of 300,000 people who applied for California workers' compensation benefits were mistakenly exposed online. As reported, the compromised PII was discovered last month after a data security company located the data through automated Google searching. The combination of breaches being reported in the media and the various data breach notification laws, it stands to reason that organizations should monitor their Internet facing sites for exposed PII. The Google queries below may locate information for social security numbers, birthdays, or contact information for specific websites.

ssn | “social security number” site:domain-name-here
dob | “date of birth” site:domain-name-here
“phone * * *” | “address *” | “e-mail” site:domain-name-here

The above queries contain a few symbols needing explnations. The pipe symbol ( | ) means “or” and the query will return hits if either term is present. The quotes ( “” ) mean the string of words has to match exactly while the asterisk symbol ( * ) is a wildcard and can represent any unknown terms. Site: makes Google only search the websites containing the specified domain (the query would contain the organization’s domain instead of “domain-name-here”). For additional information on syntax for Google queries check out Basic Search Help and More Search Help.

The company Blippy exposed data containing credit card numbers to the Internet. A few months later a company discovered the credit card numbers of four Blippy's users were in Google's index. In addition to PII, organizations could monitor their Internet facing websites for data related to credit card information. The Google queries below may locate information related to credit cards and amongst the information could be card numbers.

expiration | expdate | expire site:domain-name-here
CVV2 site:domain-name-here

Sosata.com (a Groupon subsidiary) accidently published a database containing email addresses and plain-text passwords of 300,000 users which was then indexed by Google. The accident was discovered after a security consultant located the exposed information on Google. Network information such as passwords, usernames, login pages, and errors can assist outside parties in attacking an organization. Companies can monitor their websites for leaked network information that may pose a risk to their network security. The Google queries below may locate: login pages, usernames, passwords, and errors.

login | logon site:domain-name-here
username | userid | employee.ID | “your username is” site:domain-name-here
password | passcode | “your password is” site:domain-name-here
intitle:error site:domain-name-here

     Infected Web Pages

The University of Calgary’s website was compromised and the attackers used the website to help sell pharmacy products. The Sucuri Research blog performed a Google search against the university’s website and was able to identify more than two thousand infected web pages. The compromise illustrates the point made by Unmask Parasites which was “to make their doorway pages rank better in search engines, spammers search for compromised web sites and use various security holes to insert hundreds of hidden spam links into trusted web pages”. Companies should add infected web pages to the list of what to monitor on their websites.

Google queries can identify infected web pages. The Unmask Parasites blog has a list of queries which can be used as a starting point for searching for SPAM links. In addition to the Unmask Parasites list, additional terms can be identified by using the blog’s Find Infected Pages with Google to locate infected web pages on the Internet. The portion of the infected web page displayed by Google can reveal other terms to use in a SPAM link query. The picture below shows an infected web page with the search terms used highlighted in bold.

Search Specific Websites for Stolen Information

The previous Google queries can help organizations identify sensitive information and infected web pages on their own websites. However, the queries won’t alert an organization to a compromise resulting in company information being stolen. A Naked Security article reported how the Atlanta Infragard chapter was compromised and the attackers “published 180 usernames, hashed passwords, plain text passwords, real names and email addresses”. How can a company feel confident that none of their employees’ information was compromised? Applying the same question to the publicize data breaches over the past year makes it even more difficult for a company to know if they are at risk. Google searches can help by querying the websites where stolen information is published.

One website with stolen information is Pastebin.com. Lenny Zeltser had a great article - The Use of Pastebin for Sharing Stolen Data – explaining what pastebin is and why hackers are using the site to share stolen information such as network configuration details and authentication records. Briefly reviewing Pastebin’s Trending Pages web page shows there is a range of information available from compromised credentials to identified vulnerabilities in websites. Organizations can search Pastebin.com to determine if their network is at risk because of stolen information. The Google query to accomplish is

site:pastebin.com +domain-name-here

The plus symbol ( + ) attached to the domain name makes Google match the domain exactly as it is typed. Pastebin is one example of a website to search but other sites, such as forums, should be queried as well. A few other potential websites to search are mentioned in Lenny’s post Using Pastebin Sites for Pen Testing Reconnaissance.

Automate Searching with Google Alerts

The previous Google queries will identify sensitive information, infected web pages, and stolen information currently in Google’s index or cache. To continuously monitor the Internet for this type of information an organization would need to periodically perform the queries to see if new information was added to Google’s index. Google alerts send email updates of the latest Google results based on the specified query and the alerts can hep organizations with the continuous monitoring. All of the previous queries can be configured as alerts and it's a fairly simple process to setup it up as can be seen in the screenshot below.

There are five required fields in setting up an alert.

* Search term: is where the query is placed
* Type: specify everything, news, blogs, realtime, video, or discussions websites
* How often: indicates the frequency of the email updates and can be set to as it happens, once a day, or once a week
* Volume: will show only the best results or all results
* Your email: the email address where the latest relevant Google results are sent

Summary

Google queries show the information currently in Google’s index and cache while Google alerts send email notifications when Google is returning new information. The combination of queries and alerts can be leverage by organizations to identify security issues such as data leakage, website vulnerabilities, and stolen information. The majority of the data breaches referenced had two things in common. The first commonality was sensitive company information was exposed to the Internet. The second commonality was the companies were notified about the data leakage after a third party located the information through Google searches. The approach of using Google to search and monitor portions of the Internet won’t prevent security issues from occurring in the first place. However, the approach may reduce the amount of time that lapses before an organization knows about the security issue.

My hope is at least a few people / organizations find this post helpful. It wasn’t my plan to write about the leakage of sensitive information (actually I was working on my next post Examination of a Phishing Email) but I wanted to inform others about the risk of leaked information.


References

Some of the queries I mentioned were obtained from the book Google Hacking for Penetration Testers and the Google Hacking Database.
Labels: ,
  1. Nice job. Reminds me of the time I did a search about a company I had applied at and I googled them. I was reading all kinds of info about them, finding it very interesting, and suddently realized it was confidential and proprietary. I looked at the pages and found I was on their Intranet. OOPS. I got out of there fast.

    Somehow, I found a link that bypassed authentication, and once I was in, I was in. I'll bet I jumped around and read for 15 minutes before I realized where I was. Crazy,

    Hey, I've never seen any numbers or capital letters in your comment captcha. Is that on purpose?

Post a Comment