Vendor Column: Unlocking Organizational Knowledge with Scientific Search

Skip to Navigation

Ezine

  • Published: Oct 4, 2012
  • Author: Chris Stumpf
  • Channels: Laboratory Informatics / Chemometrics & Informatics
thumbnail image: Vendor Column: Unlocking Organizational Knowledge with Scientific Search

Over the past several columns, I have discussed how scientists and analysts can make the laboratory more efficient by standardizing the laboratory informatics solutions they employ. I also highlighted the benefits of system metrics in the form of business intelligence capabilities. Implementing these types of strategies can help organizations increase laboratory operational efficiency. However, there are routine information management activities that can impact productivity significantly. One that we have not addressed in previous columns is searching for information. In this month’s column, I am joined by my colleague, Paul van Eikeren, an expert in the field of scientific search. Let’s take a look at a hypothetical informatics environment that you may encounter. We’ll assume you have standardized your laboratory informatics structure such that CDS, LIMS, ELN, and SDMS are each provided by a single vendor. In addition, your informatics infrastructure would most likely include document management (e.g., Microsoft Sharepoint, EMC2 Documentum); email; and Enterprise Resource Planning (ERP). This could necessitate searching for information in 7 or 8 information/data repositories. Conducting individual searches within each data repository is one way to accomplish this task – but is there a more effective and efficient approach?

First, let’s assume the simplest case is to perform; a text based search, i.e., the query doesn’t consist of chromatograms/spectra, chemical structures, or images. For text searches, there is a technique called federated search. So what is federated search? It is basically the ability to search over multiple information repositories simultaneously. Libraries use this capability to search across collections held by other libraries while government agencies use it to simultaneously search across a variety of information repositories including those populated by the FDA, EPA, and NIH. Two examples of government focused federated search utilities include worldwidescience.org and www.science.gov. Click on the advanced search option of either solution and you will see that both cover a wide variety of sites, many of which also use federated search. Federated search is comprised of three main components:

  1. Single user interface to search across numerous information repositories.
  2. The search utility has designated interfacing connectors (direct interfaces into the information repositories) that communicate search requests directly with the repository sites.
  3. The query results from each repository are returned to the search user interface and these results are de-duplicated and presented as a single result list.

This approach works quite well for keyword text searches when you’re looking for scientific literature, books, patents, regulations, etc., and I use it myself when performing literature searches based on text searches. However, scientists and analysts also like to perform searches based on science objects like chemical structures, chromatograms/spectra, images/pictures, etc., which are the language of science. For this type of search, scientists and analysts must conduct separate searches within the individual repositories (e.g., SDMS, CDS, LIMS, ELN). Performing separate searches is inefficient from a time perspective; it often requires specialized knowledge in order to operate and search in each repository; and the user has to know about or have access to the individual repositories. It should also be pointed out that many repositories don’t provide searching based on science objects; only text or keywords. But, given some of these limitations, is it possible to perform federated searches across these data repositories as well?

Newly developed software that makes such a search possible, called Scientific Search, is becoming available in the laboratory informatics marketplace, but there are three challenges in delivering such a solution:

  1. Developing interfacing connectors into the laboratory informatics repositories
  2. Putting the science objects (chromatograms/spectra, chemical structures, etc.) into perspective by providing meta-data (descriptive data about the data)
  3. Adding structure to unstructured Information, e.g., putting structure to free form text documents so that computer algorithms can search the text efficiently.

These challenges are actually not new; they exist any time someone wants to connect two laboratory informatics systems together, e.g., connecting a CDS to a LIMS. Most laboratory informatics vendors provide an interfacing utility for bi-directional communication typically referred to as a software development kit (SDK) -- and many now utilize web services so that communication can be conducted via internet communication protocols (but not real-time searches). There is still room for improvement, but these interfaces are making it possible to search across multiple laboratory informatics repositories. However, challenge #2 likely represents the biggest obstacle as many laboratory informatics solutions may not associate the appropriate meta-data descriptors with science objects. Hence, the scientific search solutions will have to assign descriptive meta-data with science objects by using algorithms that extract the information from the individual repositories, organize it into a centralized index and attempt to recognize the science objects by comparing with internal libraries, and then normalizing the results to reduce redundant nomenclature. For example, extracting a chemical structure from a LIMS record and assigning a chemical name to that science object. Because this meta-data index must be developed, updated, and maintained centrally, real time searches as done with text based federated search is not possible and this is probably the key differentiator of federated search (distributed search) versus Scientific Search (central index search). The third and final challenge is simply indexing the content of text documents so that they can be easily searched, i.e., putting the content in a computerized data structure that facilitates searching similar to the searchable tables found in database tables.

So how might you use Scientific Search in your organization? Let’s assume that you are participating in a project to develop a new clinical biomarker test (having your cholesterol checked is an example of a biomarker clinical test). You enter the project years after its initiation and would like to get up to speed on what has already been done (primarily to avoid any duplicate efforts)—and search the 7-8 laboratory informatics repositories discussed earlier. If you were simply using text based federated search, you would use the biomarker name in your search and would only retrieve information where the biomarker name (or synonym) appears—you would not receive any chromatograms, spectra, etc. that lacked an associated name. However, since you are using Scientific Search you can search by text, chemical structures, spectra/chromatograms, etc. This would allow you to find data based on keywords and science objects from the initial metabolomics exploration (NMR, LC/MS, GC/MS data); information on the development of the quantitative assays (LC/MS, LC/UV); research regarding the biological pathways; potential patent applications, and ending with the most recent work. The benefit of having access to all of the laboratory based information is that it would tend to minimize duplicated efforts; improve collaboration because you would know the appropriate subject matter experts to contact; and would give you a better idea of the general scope of the project (e.g., is this a new project within the organization or a well-established one).

Without Scientific Search, a scientist or analyst would find it difficult and time-consuming to access the all the available laboratory based information because they either didn’t know where to look; not have access to the appropriate laboratory informatics systems; or not know who to ask (e.g., you didn’t know that a colleague in the organization is an expert on working with a particular sample matrix so you spend months developing your own sample preparation methodology). You can see from this example that Scientific Search offers a tremendous opportunity to unlock organizational knowledge that was previously hidden from scientists and analysts and is a development to watch over the coming years.

Article by Chris Stumpf and Paul van Eikeren, Waters Corporation


Paul van Eikeren is a data scientist focused on applications of informatics, computer algorithms and machine learning to scientific search at Waters Corporation. He has a Ph.D. in physical organic chemistry and a 40 year career spanning academic and commercial R&D related to pharmaceutical development. He is also the co-founder of IntelliChem, a leading provider of Electronic Laboratory Notebooks to the pharmaceutical industry, and the co-founder of Blue Reference, a company focused on software solutions for scientific search.


The views represented in this article are solely those of the author and do not necessarily represent those of John Wiley and Sons, Ltd.

Social Links

Share This Links

Bookmark and Share

Microsites

Suppliers Selection
Societies Selection

Banner Ad

Click here to see
all job opportunities

Most Viewed

Copyright Information

Interested in separation science? Visit our sister site separationsNOW.com

Copyright © 2013 John Wiley & Sons, Inc. All Rights Reserved