When we think everything we want to know is at our fingertips, you might be surprised to find that of the 3.2 trillion search queries handled by Google in 2016; it only represented a fraction of what was actually available online. There is a great deal of online information that isn’t accessible to us by search engines. It takes special tools to find these hidden pages in an area of the internet called the deep web. The information there may account for up to 5,000 times what we can see with the typical Google search.
Interestingly, even if you were on a particular website, some pages might still be invisible to you. This is because you won’t be able to find them on the menu or via the navigation.
Aside from security reasons, you might wonder why there is information on the internet that isn’t readily available. If it isn’t for searching, why do these pages exist? Why would a website have hidden pages?
Types of Hidden Content You Would Want to Find
Some hidden content is known as dynamic content. It becomes apparent to you only when you issue a specific request on a database-driven website. Search engines like Google aren’t designed to track or store the information from these databases. To find these pages, you will have to be on the website and search for the specific information you want. Or you could use Bright Planet, a database-oriented search service.
Other categories of pages don’t have links to connect them to searchable sources, so they remain hidden. Websites that are under development with lots of temporary resources can fall into this category as well.
Another category of hidden pages is when log-in credentials are needed to view them. For example, the web designer allocates pages and maybe even whole sections of the site to be hidden from search engines. Usually, you will have to create an account to gain access to those pages.
Ways to Find Hidden Pages on a Website
Many hidden pages contain valuable information that is beneficial for you to see. Here are a few ways for you to find hidden pages on a website.
Use Robots.txt Files
To understand how robots.txt files can help you find hidden pages, you need to know how search engines find pages in the first place.
Search engines crawl through pages on a site and index them to show when a search query is made. When a website designer or owner decides to hide pages from this indexing, they add the addresses of those pages to a text file named robots.txt, which is stored at the root of the site.
To find the hidden pages on a site,
- Type [domain name]/robots.txt into the location line of your browser.
- Replace the [domain name] with the site address.
- Press enter.
If you see entries with the preface – ‘disallow,’ or ‘no follow are parts of the site that are inaccessible via search engine.
Hack the Website
You’ve probably seen some ‘tech genius’ hack a website easily and quickly on television many times. Fortunately, you don’t have to be a young tech genius to hack the website.
Type the web address for specific pages and folders into your browser. If you don’t know the address, take note of predictive patterns based on other pages on the site. You can use this method to find entire folders.
Example.com/content/page1.html. Seeing that ‘content’ is the folder name, you may be able to view the whole folder by typing Example.com/content. This, only if access to the folder hasn’t been disabled.
Find Them Manually
If you’re the website owner, you can check a page for accessibility by manually copy and pasting another of your URL into your browser, edited accordingly. If the page you’re looking for doesn’t show up, it is hidden.
If you don’t know which pages may be hidden, you can organize your site into directories. Then you can add your domain-name/folder-name to your site’s browser and get to pages and sub-directories. Once you find them, you can add the pages to your site map and allow a crawl request.
Finding Hidden Pages on Your Own Website
As the site owner, you need to be able to locate all of your site’s pages. Here are a few ways for you to do it.
Using a Log
You can refer to a log to see the pages on your site. A log is kept of all visitors to your site, the pages they visit, and how long they remain on those pages. With this log, which you can receive from your host provider or by logging into your cPanel in ‘raw log files,’ you can track your site activity. The pages that are never visited or have the highest drop-off rates may be hidden or dead-end pages.
If your site is powered by a content management system (CMS), but your sitemap doesn’t have all the links, your CMS can generate all the links for you. This may be accomplished by using a plugin.
Using a Sitemap file
Whether you have a sitemap or use a sitemap generator to create one, you can use it to find your site pages. To use a generator, just enter your domain name and the sitemap will be created for you.
Using Google Analytics
Follow these steps:
- Log in to the Analytics page.
- Click ‘behavior’ and then ‘site content.’
- Next, go to ‘all pages.’
- Choose ‘show rows’ on the right as you scroll to the bottom.
- Depending on how many pages you think your site has, choose 500 or 1000.
- On the top right, choose ‘export.’
- Choose ‘export as .xisx’ (excel)
- Choose ‘dataset 1’ after excel is exported.
- Sort by ‘unique page views.’
- Delete all other rows and columns apart from one with your URLs.
- Use this on the second column: =CONCATENATE(“http://domain.com,A1).
- Replace the domain with your domain. Then drag the formula into other cells as well. This gives you all of your URLs. Follow the next step to turn them into hyperlinks, so they’re easier to access.
- Use this formula in the third row: =HYPERLINK (B1) and drag it to other cells.
Finding Hidden Pages is Essential to Eliminating Orphan Pages and Dead Ends
Orphan pages aren’t linked to anything else on your site, and dead ends are pages that don’t lead anywhere. Finding all the hidden pages on your website is vital to eliminating orphan pages and dead ends. Above, we’ve discussed how to find all of those hidden pages. These pages will cause you to lose out on vital business.