A Semi-Automated Semantic Web?

I read a paper today that discussed a step covered in class today on document engineering.  We learned how the Berkeley Calendar Network team manually harvested and consolidated tables of terms in a huge excel spreadsheet.  The IEEE paper I read argues for semi-automating this process of deriving homonyms in IS-A relationships, for instance, and integrating the terms with the http://wordnet.princeton.edu ontology.

The buzzword-laden paper continues to argue for creation of a working Semantic Web by harnessing the large quantity of structured, “Deep Web” data. The, “Deep Web” (unindexed by conventional search engines) contains ~over 4 orders of magnitude of data than the, “Surface Web” and some data is structured in databases.  The Semantic web, they claim, has been hampered by difficulty in manually creating large OWL and RDF ontologies, and harvesting the richer potential of the Deep Web points to a possible solution:
Semantic Web + Deep Web-Ontology-aware browser.

Ironically, the paper, itself, is buried in the deep web:
http://ieeexplore.ieee.org/iel5/2/4623205/04623231.pdf?tp=&arnumber=4623231&isnumber=4623205

Comments off

New Research Engine Searches “Deep Web”

How much of the World Wide Web is actually indexed… 27.65 billion pages? Maybe about 0.2% of the total content? The “Deep Web” (web documents not immediately accessible by direct hyperlink from public pages) may contain something like 91,000 terabytes of data… as compared to an estimated 167 terabytes of Surface Web data.

A new service, called Infovell, hopes to help users find more of this “Deep Web” data… yet unlike Google and other Surface Web engines, it won’t be ad-supported. Instead, the service will be subscription based. Read more at ReadWriteWeb blog. Here is an excerpt from the article:

InfoVellThe engine scours through open-access repositories of information like PubMed Central and the U.S. Patent and Trademark Office Claims, but it also allows access to scholarly journals such as those from Oxford University Press, SAGE, Taylor & Francis, Annual Reviews, Mary Ann Liebert Publications, and more. The culmination of these billions of pages currently unindexed by other engines, gives you access to content in the areas of Life Sciences, Medicines, Patents, Industry News, and other reference content from expert sources.

Comments (1)