ACL Anthology Searchbench Help

  1. General Overview
  2. User Interface
    1. Filters View
    2. Results View
    3. Document View
  3. Semantic Search
    1. Statements
    2. Searching
  4. Frequently Asked Questions (FAQ)

General Overview

The ACL Anthology Searchbench allows you to semantically search in and to browse the papers of the ACL Anthology, the main collection of scientific papers on the study of computational linguistics. Search is possible both in the bibliographic metadata of these papers and in the full textual content.

Searching is done by setting filters of different kinds, such as filtering by author names or by publication year. Only those documents will show up in the results list that match all set search filters.

An interesting feature of the searchbench compared to traditional search applications, such as Google Web Search, is the possibility to search semantically in the content of the scientific papers, i.e., you can search for actual meaning instead of just for plain words. See the section on semantic search below for more details on this innovative way of searching.

User Interface

Here is a schematic overview of the searchbench’s user interface of which we will describe the available views and controls in the following:

User interface overview

  1. Filters View
  2. Results View
  3. Document View
  4. metadata of the scientific paper in the Document View
  5. subviews on the scientific paper in the Document View

Filters View

In the Filters View on the left of the searchbench user interface you set filters that constrain the list of found documents in the Results View.

The following kinds of filters are supported:

Statements
filter by semantic statements, i.e., by the actual content of sentences, see the section on semantic search below
Plain Text
filter by words with a full text search on the plain text of the papers (comparable to a traditional search engine)
Extracted Topics
filter by automatically extracted topics of the papers
Publication
filter by publication title/event
Authors
filter by author names
Year
filter by publication years; you can also enter year ranges (e.g., 2006-2009) or collections of years and year ranges here, e.g., 1999,2002-2006,2008,2010
Title
filter by paper title or parts of titles
Affiliations
filter by affiliation organizations of the authors of a paper
Affiliation Sites
filter by author affiliation cities and countries

Changes in the filters collection automatically update the Results View. Filters can be set in three ways:

Click on a filter item in the Filters View to remove it. In addition, you have the possibility to remove all currently set filters with a single click on the link at the top of the Filters View. All filter changes are also recorded in the history of your web browser so that you can easily go back to a previous filter setting (or even bookmark an interesting filter setting or share it with colleagues).

Found papers always match all currently set filters. For each filter type multiple different filter items can be set; one could search for papers written jointly by people from different research institutes on a certain topic, for example, by setting multiple affiliation site filters and a topic filter.

Set filter items can not only be removed but they can also be edited. To edit a filter item you simply click the little pencil button at the end of the item.

Matches of the statements filter and the plain text filter are highlighted in document snippets for each paper in the Results View and in the currently selected paper of the Document View.

Results View

The Results View in the upper right part of the searchbench user interface shows a list of found papers that match the filters currently set in the Filters View. Found papers are sorted by relevance, i.e., how well they match your filters. For each paper in the results view, the title, the publication year and the list of authors is shown. If you filter by statements or plain text, you get snippets for the matching papers in which the matching statements/words are highlighted.

At the top of the Results View you find the number of papers available for the currently set filters. By default, the view only shows a smaller number of papers at once; at the end of the currently shown results list you can request to show further results (if any).

By clicking the title of a paper in the results list, you can show the paper with further information in the Document View. Clicking a snippet of a paper in the results list not only opens the paper in the Document View but also highlights the clicked sentence in this view for a few seconds.

As already noted in the Filters View section, you can click metadata marked with a “+” symbol in the Results View in order to use it as a new search filter.

Document View

The Document View in the lower right part of the searchbench user interface provides a detailed view on the scientific paper which has previously been selected in the Results View. At its top, the Document View has a header displaying metadata of the current paper including the automatically extracted topics on the right. Below this header, the Document View provides three subviews of the selected paper:

  1. The Document Content View is a list of the sentences of the paper and provides different kinds of interaction with these sentences. Currently, headings, captions and footnotes are also treated as sentences; these different kinds of sentences are displayed with a different styling for a better overview.
  2. The PDF View shows the original PDF version of the paper.
  3. The Citations View provides views of the paper in its scientific context.

As already noted in the Filters View section, you can click metadata marked with a “+” symbol in the header of the Document View in order to use it as a new search filter.

The Document Content View provides different kinds of interaction with its sentences. Among these is the possibility to highlight a sentence in the original PDF document of the paper in the PDF View. Note, however, that this particular function is only available in web browsers that have the Adobe Reader plugin installed; to the best of our knowledge, there is unfortunately no other PDF viewer available which supports the protocol that is required to do such highlighting.

Semantic Search

The main feature which distinguishes the ACL Anthology Searchbench from other search applications for scientific papers is the semantic search in paper content, i.e., the search for meaning. This enables the search for (semantic) statements in the paper content as opposed to searching for words in the plain text, i.e., you find sentences containing the exact statements you are interested in, not just documents containing search words in arbitrary order and distance to one another.

Statements

Under a (semantic) statement we understand some sort of elementary fact. Thus, very simple sentences often bear a single statement only, while more complex sentences (especially when having multiple clauses) contain multiple statements. Take the sentence “Peter is running fast and carrying milk” as an example. This sentence contains two statements: (1) Peter is running fast; (2) Peter is carrying milk.

Each semantic statement consists of one or more semantic parts: semantic subject, semantic predicate, semantic objects, semantic adjuncts. The semantic predicate is mandatory and resembles the main verb behind the ‘action’ described by the statement (“run” and “carry” in the example statements above). The semantic subject can be seen as the ‘actor’ (“Peter” in both example statements above). As the distinction between semantic objects and adjuncts is not trivial for non-linguists we simply assume that – besides subject and predicate – a statement can contain further information in some ‘rest’ parts (both “fast” and “milk” in our example statements belong to these rest parts). The different semantic parts of a statement are highlighted in three different colors, depending on whether a part is the semantic subject, the semantic predicate or anything else.

Searching

The statements filter in the Filters View is responsible for the semantic search. There are two ways in which a new statement filter can be set: (1) entering a statement manually in the Filters View; (2) clicking a sentence in the Document Content View and choosing the statements of this sentence that shall be set as new statement filters (i.e., it is possible to formulate and refine queries ‘by example’).

Possibility (1) for adding statement filters probably needs a closer look. In the manual input of a statement filter item you essentially specify all statement parts that shall be contained in the statements you are looking for. While the semantic predicate is mandatory in actual statements, you can underspecify the filter item in the statements filter; for example, you may search for statements with a certain semantic subject but with arbitrary semantic predicates and other semantic statement parts. Statement filter items are strings of statement parts. To underspecify a statement part you simply omit it in the filter item. Every statement part starts with a specific prefix and extends either till the start of the next part or till the end of the filter item. The prefixes s:, p:, r: start subject, predicate and rest parts respectively. Example: p:improve r:parsing accuracy looks for statements with an arbitrary semantic subject, “improve” as the semantic predicate and “parsing accuracy” as some object/adjunct. For your convenience there is a special case where – when no prefixes are specified at all – the first token is taken as the predicate and all following tokens as the rest part; for example in improve parsing accuracy the token “parsing” will be taken as the predicate and “parsing accuracy” will be taken as the rest part, the subject part will be underspecified.

For every statement filter you create, you may also specify how strict the filtering for matching statements shall be. The default will usually be just what you want, so changing this setting should not be necessary. Here’s what the different options do:

strict
This option will make the statement filter only find strictly affirmative statements with a predicate matching only the entered one.
default
This option will make the statement filter find generally affirmative or neutral statements with a predicate matching either the entered one or a synonym of it.
lax
This option is just like the default option but additionally will make the statement filter find statements with negated or neutral predicates matching antonyms of the entered predicate.
maximal
This option will make the statement filter find statements with the entered predicate or a synonym/antonym thereof, irrespective of whether the predicate is negated or not.

The stemming of entered statements is not affected by any of these options.

Frequently Asked Questions (FAQ)

I have installed the Adobe Reader plugin for my web browser but still the highlighting feature does not work.
In some (newer?) versions of Adobe Reader the highlighting feature is not fully enabled by default. You can enable it by going to “Edit” → “Preferences…” → “Search”. There you have to make sure that the “Enable search highlights from external highlight server” check box is checked.
I have installed the Adobe Reader plugin for my web browser. I also have fully enabled the highlighting feature as described in the previous question. The highlighting feature still doesn’t work, though.
For reasons we don’t know, Safari and Opera do not show the highlighting in the PDF View. Opera at least shows the highlighting, when you open the PDF file in its own tab (using the searchbench button “Download this PDF document or open it in a new window” on the left of the embedded PDF file), Safari does not show the highlighting at all. If you know how we could solve this problem, then please let us know.
I have installed Adobe Reader 10 but the highlighting feature does not work.
Unfortunately, Adobe has removed the highlighting feature in Adobe Reader 10. You can either install and work with an older Adobe Reader version. Or you may try the free dtSearch PDF Search Highlighter plugin for Adobe Reader 10 (see also the question below). Alternatively, you can try to push Adobe to bring the feature back; this page lists some websites that you can use to get in touch with Adobe.
I have installed the dtSearch PDF Search Highlighter plugin but the highlighting feature does not work.
After installing the plugin, go to “Start → All Programs → dtSearch → dtSearch PDF Search Highlighter → dtSearch PDF Search Highlighter Options” and set the following options; please note, that this configuration tool is not accessible from within Adobe Reader’s preferences. Choose “Allow web sites to highlight hits on PDF files from the same domain (recommended)” and add the Searchbench’s host name aclasb.dfki.de in the text field below (“Also allow these trusted web sites to highlight hits on PDF files on any web site”). Also, you have to make sure that “Verify that highlighting requests are only sent to compatible servers (recommended)” is not set.
Why do I have to use the proprietary Adobe Reader for PDF highlighting? There are plenty of free alternatives.
To the best of our knowledge, only Adobe Reader currently supports PDF highlighting via URL. If you know of any other PDF viewer that also supports such highlighting, then please contact us. For the highlighting we currently use the PDF Highlight File Format protocol.
With Internet Explorer 8 I can’t view any PDF documents in the PDF View.
For some reason, in Internet Explorer 8 opening a PDF file from another server than the web site’s origin server is considered to be a security risk. The default IE security settings prevent you from seeing such PDF documents. You can change the IE security settings, though. Therefore go to “Tools” → “Internet Options” → “Security” → “Internet” → “Custom Level…”. Under “Miscellaneous” set “Access data sources across domains” either to “Enable” or to “Prompt”; in the latter case you always have to accept a security warning the first time you open a new browser window/tab for the searchbench (see next question).
With Internet Explorer 8 I get the warning: “This page is accessing information that is not under its control. This poses a security risk. Do you want to continue?”
For some reason, in Internet Explorer 8 opening a PDF file from another server than the web site’s origin server is considered to be a security risk. You can change the IE security settings, though, to get rid of this message. Therefore go to “Tools” → “Internet Options” → “Security” → “Internet” → “Custom Level…”. Under “Miscellaneous” set “Access data sources across domains” to “Enable”.