Pittsburgh Supercomputing Center
  Print EMAIL help
Finding Without Searching with David Woolls
Wednesday, September 04, 2013, 02:00pm - 03:00pm

We invite you to join us for this seminar given by David Woolls, CEO, CFL Software Limited. 

Please note that the room has been changed. The seminar will now be in room 103, 300 S Craig St.

 Watch the broadcast here.

Abstract: When searching in large collections of long unstructured text documents knowing what you need to ask a regular search engine is not always or even generally possible. This project seeks to explore the potential of using entire documents as a starting point to find the most relevant companion documents in a collection. We will describe the use of linguistic principles and the Semantic Web RDF format as a base to provide a common framework for heterogeneous text types: contracts, patents, e-mails, academic articles etc. Such a methodology generates a very large number of queries per document and creates a graph analytic problem when looking for the most relevant documents because of the number of potential links. The project seeks to discover the time and resource parameters required to deliver accurate and timely results using YarcData's Urika, known as Sherlock at PSC, which has been specifically designed to tackle such complex graph analytics using shared memory and multi-threaded technology. The aim to uncover commercial, government, and academic applications using large data collections such as Wikipedia and US patents as a starting point.


Location : Room 103, Pittsburgh Supercomputing Center, 300 S Craig St
Contact : (412) 268-4960