Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
MG21 Virtual Library MGHS Library and Information Media Center MGHS Answers MGHS Library Catalog MGHS Library Guides MGHS Library Resources MGHS Library for Teachers

The Invisible Web: An Overview

What is the invisible web?

 Web search has become a part of life. Users, living daily with this activity, often expect that they can locate anything on the web using common search tools, such as Google or Bing.  

But consider: the web is massive.  Anything ever uploaded, anytime resides there.  The size is hard for us to comprehend.  On it's surface, we find the indexed web.  This includes sites that, through demand, have been indexed by search engines.    

Deeper, if we have excellent research skills, we will encounter the other 90%--the invisible, the cloaked, or the deep web. BrightPlanet, a company that specializes in searching the deep web, estimates that with nearly 550 billion individual documents on the web currently, with only one billion of those reside on the surface where average search engines can find them.  

Search engines like Google and Bing use "spiders" or "robots" to search the open web.  These programs are limited in what they can see, and therefore, index.  Results returned to us are generally dictated by tags and other "meta data" submitted by the creator that allows search engines to find, rank, and push information to the top of the search results.  

But, how do we access items archived in databases or other less accessible places--the content often referred to as the invisible web or deep web.   By some estimates[see info graphic at the right], Google and similar search engines have indexed just a little over 8% of the web. They can easily locate another 25% of what's published because it is presented in a static format (i.e. the content of the page doesn't change).   

...something's missing.

  • content is unlinked.  Files get loaded for quick sharing or storage and access links are emailed rather than added to a web index page.  Web crawlers (robots, spiders) can't find them because crawlers crawl from page link to page link. 
  • content is not optimized.  Inexperienced or small volume designers load pages withouth adding the tags and meta-data to their sites that will make them more visible to search engines.  
  •  

...something's hidden.

  • site settings block the search engine protocols from searching the linked content.  The robot get's stopped at the front door.
  • content is password protected.  Search engines will not find information hidden behind secure portals such as Moodle, Blackboard, etc.  
  • ontent archived in paid subscriptions (databases search engines will reach the title of content but the site will refuse you access.
  • information stored on intranets is protected from retrieval by those outside the network on which it was created.  

...something's unrecognized.

  • information presented through JavaScript or Flash activates through script, rather than a link.  The search engine doesn't recognize it as new content.
  • files are encoded in non-html formats are not recognized as content containers by search engines.  Image, audio, animation and even some PDF files, especially if the format is new, are often skipped.  

...something's buried.

  • information in content rich websites can get overlooked by search engines.  Standard search engines only partially index large websites, such as the Library of Congress.  The rest of the pages are intentionally skipped (to ensure search engine speed) and you're left to find these through site navigation.  This content is sometimes referred to as the opaque web.

...something's unpredictable.

  • information generated as a result of real-time interaction between user and website is personalized (think Amazon) and often gated (through privacy features).  It is often impossible to be replicate.   
  • information posed on social media is generated and fed so rapidly that it easily get's buried on the web within days.  

...something's unpredictable.

  • some content is simply lost.  Residual content, once linked, get's left behind on servers when a website is taken down.  Sometimes the files become corrupted or obsolete.  There may be fingerprints that let us know it's there, but the information become inaccessible by any current means.   This is often referred to as the dark web.

Sample Tools for Searching the Invisible Web

Google and other search engines CAN locate some information tucked deep into the web. Try searching for your topic along with the keyword "database." Use the tools below to search deeper and reach the valuable resources not visible to Google and other standard search engines.

  • Directory of Open Access Journals
    A full-text journal searchable database.
  • FindArticles
    Indexes articles from a variety of different publications.
  • Find Law
    Indexes information on legal issues organized by category.
  • Google Scholar
    Google's engine that specifically searches academic content on the web.
  • HighWire
    Provides access to the Stanford University databases of free, full-text, scholarly content.
  • MagPortal
    Search for free online magazine articles by topic.
  • OAIster (WorldCat)
    This online search engine powered by WorldCat searches libraries, worldwide. Many of the items are archival copies of books and primary source documents, downloadable or viewable online.

Learn more with The Ultimate Guide to the Invisible Web from the Open Education Database.
 

What about google scholar?

Google has it's own tool for helping you to mine the invisible web.  Google Scholar is a great tool and always worth your time.  Still, the searches are limited. Not all returns will be available in full text, and not everything on the invisible web will surface.  Use Google Scholar in tandem with other deep search tools to harvest more!



Databases

 


Badgerlink, a project of the Wisconsin Department of Public Instruction (DPI), seeks to promote information literacy and access throughout the for all Wisconsin residents in cooperation with the state's public, school, academic, and special libraries. You can search the databases individually, or you can search them all at one time by going to theBadgerlink SuperSearch.

Provided as a public service by BrightPlanet, this is a listing of databases (70,000+), grouped categorically. " By tracing through CompletePlanet's subject structure or searching Deep Web sites, you can go to various topic areas, such as energy or agriculture or food or medicine, and find rich content sites not accessible using conventional search engines."

Learn More

The staff writers at the Open Education Database have written an Ultimate Guide to the Invisible Web.  In it, you will learn:

  • Background of the Invisible Web
  • 9 Reasons a Web Page is Invisible
  • 10 Ways to Make Invisible Content Visible
  • How to Access and Search for Invisible Content
  • 15 Invisible Web Search Tools

Library Information and Media Center - Monona Grove High School - Monona, Wisconsin

Answers| Catalog | Guides | Resources | Teachers