Approaches to Understanding the Searching ... - ACS Publications

Due to the high costs of purchasing, supporting and training users of desktop chemical information systems, it is important to understand users' behav...
5 downloads 12 Views 540KB Size
1016

J. Chem. Inf. Comput. Sci. 2002, 42, 1016-1027

Approaches to Understanding the Searching Behavior of CrossFire Users Frank Cooke and Nikolai Kopelev GlaxoSmithKline, 709 Swedeland Road, P.O. Box 1539, King of Prussia, Pennsylvania 19406-0939

Helen Schofield* Chemistry Department, UMIST, P.O. Box 88, Sackville Street, Manchester M60 1QD, UK

Geraldine Boyce GlaxoSmithKline, New Frontiers Science Park, Harlow, Essex, CM19 5AW, UK

Sean Dunne MIMAS, Manchester Computing, University of Manchester, Oxford Road, Manchester M13 9PL, UK Received February 28, 2002

Due to the high costs of purchasing, supporting and training users of desktop chemical information systems, it is important to understand users’ behavior in order that deficiencies in their search efficiency and effectiveness can be identified and addressed. CrossFire generates comprehensive log files that can be examined to determine the nature of search activity. At GSK (GlaxoSmithKline) a Log File Parser, CrossParse, has been developed in Visual Basic that enables analysis by individual user name, groups of users or the whole user population. Log files can be analyzed for occasions when specific structural features are built, specific types of search are done and how the results are manipulated. CrossParse produces output that can be saved and analyzed within Microsoft Excel. It also allows determination of numbers of active concurrent users on the CrossFire system. CrossParse has been used at GSK (ex-SmithKline Beecham sites) to examine the search behavior of medicinal and synthetic chemists. Additionally, it has been used by MIMAS (Manchester Information and Associated Services) to compare the search behavior of trained and untrained users in the higher education community and to identify any areas where improvements to training can be made. Use of CrossParse in both organizations has allowed identification of areas where users may have difficulties using CrossFire. This will provide valuable feedback to MDL Inc., the authors of the CrossFire application, and guide them in enhancing CrossFire. INTRODUCTION

Desktop chemical information systems targeted toward end-users are generally quite expensive. This expense is derived from several sources including the original purchase of the application, the rollout to the end-users’ desktops, training, day to day support, and the often forgotten soft costs associated with the purchase of internal servers, technical support staff for the servers and scheduled updates of the application database. It is important that the benefit/cost ratio be maximized for such investments. One component, necessary to achieve this, is a good understanding of how effectively the user population utilizes the application. There have been a number of studies performed with the expectation of gaining a better understanding of end-users’ searching behavior. Many of these have been associated with Online Public Access Catalog systems (OPACs). Often, these studies have been performed via two distinct mechanisms: (1) subjectivessurveys that are completed by the end-users that describe their perception of their searching behavior and (2) objectivestransaction log analysis. * Corresponding author [email protected].

phone:

+44-161-200-4468;

e-mail:

Most modern day information access applications produce transaction logs. These are collections of ‘activities’ performed by the end-user and often include data such as the following: date, time, search string, navigation operation, answer display and an end-user identifier. Frequently, these transaction logs are analyzed manually by first printing them and then by visual inspection. This is the more subjective but insightful approach compared to relying on programmatic analysis only. However, this can be very time-consuming and does not lend itself well to the analysis of large transaction logs or repeated analysis on a periodic basis. By combining the analysis of the transaction logs with end-user surveys often a more complete picture of searching behavior can be achieved. Some transaction-log studies have been performed on OPAC systems to gather information on which type of information is being accessed so that more targeted collection development might be achieved.1 Other studies have investigated the effect of additional searching aids, e.g. flip charts next to the terminal or additional online help.2 This help might be in the form of a separate Web page or built-in application help in the typical Windows/Help menu format. Similar studies have looked at user behavior as it relates to application design in order to optimize ease of use, remove

10.1021/ci0202638 CCC: $22.00 © 2002 American Chemical Society Published on Web 08/16/2002

SEARCHING BEHAVIOR

OF

CROSSFIRE USERS

points of confusion and help guide the user to desired result.3-6 Numerous papers have been written about the search behavior of sets of users and investigations around the types of searches being performed. Most of these are aimed at optimizing or customizing the user interface and the methodology behind a particular search system to enhance the user interaction and allow better retrieval and precision.7-14 Recent work has also focused on users’ behavior when searching the Web.15 Often, in the area of chemistry, the most important search criteria are buried within the chemist’s language of communication, i.e., the chemical structure. The CrossFire application offers very rich transaction logs that incorporate complete specifications of the search in the language of the chemist. These logs can be extremely large and do not lend themselves to visual inspection. For this reason a parser application, we are calling CrossParse, was written in Visual Basic to attempt to understand how effectively end-users are using CrossFire. Programmatic analysis of transaction logs is not new. One example is the analysis of a library transaction log via a Fortran program to analyze for search types and correlating this with particular times within a semester.16 Some useful background on the challenges and opportunities for transaction logging and its analysis can be found in the work by Jones, Gatford and Walker.17 However, to our knowledge this type of analysis has never been undertaken for a chemistry application using chemistry search features. MIMAS (Manchester Information and Associated Services) provides many training classes throughout the year to the UK, Scandinavian and Irish higher-education communities, and CrossParse offers a means to view a ‘before and after’ picture of how users are employing the application and specifically those features or functionalities that have been addressed during a prior training event. Periodic use of CrossParse allows the determination of whether the ‘training effect’ has a long lasting effect versus an immediate but unsustainable effect. Comparisons have also been made by other authors between trained and untrained user search behavior,18-21 but these have usually been undertaken by use of questionnaires or direct observations of a relatively small number of searchers and therefore tend to be rather subjective. CrossParse allows a more quantitative approach to this type of analysis. One of the political and social issues not often addressed in works of this type is the ‘big brother’ syndrome, i.e., a higher authority is watching over you ‘for your own good’. It should be made clear to users before embarking on such an endeavor that the desired outcome of such a work is to optimize training effectiveness and their productivity and not to point at ‘troublesome’ individuals for specific remedial action! It should be noted that at no stage during this research were individuals identified. MIMAS CROSSFIRE SERVICE

Background. CrossFire (Beilstein and Gmelin)22 are provided to the UK higher education community via MIMAS23,24 at the University of Manchester, a JISC25 (Joint Information Systems Committee) supported national data

J. Chem. Inf. Comput. Sci., Vol. 42, No. 5, 2002 1017 Table 1. MIMAS Questionnaire Distributed at Face-to-face Training Coursesa

a Total number of completed questionnaires is 54, including 11 who have not received the follow-up survey. Numbers in parentheses are for the 9 respondents to the follow-up questionnaire.

center, which provides the UK higher education, further education and research communities with networked access to key data and information resources to support teaching, learning and research across a wide range of disciplines. Agreements have been made with consortia of universities in Sweden, Denmark and Finland, bringing the total number of sites which access the CrossFire service via MIMAS to 85. At present there are about 4500 active users of the system. The user community is diverse, and, in addition to chemists, it also includes a number of chemical engineers, biochemists, materials scientists, physicists, even electrical engineers, business studies students and librarians (who usually do not have a background in chemistry). Additionally, users are at different stages of their studies, ranging from first-year undergraduates through postgraduates and other researchers to academic staff. CrossFire Training. The specialized nature of CrossFire ensures that some form of training is required in order to use the system to a professional level. Unlike purely textbased systems, there are a number of different aspects of CrossFire whose interfaces are not immediately intuitive to the new user. There is a continual demand for MIMAS CrossFire training courses as the MIMAS user community is transient: each year a new cohort of undergraduate and postgraduate students arrives at universities. The main attendees of MIMAS CrossFire training courses are postgraduate students. Around 1500 CrossFire users have been trained by MIMAS during the five years of the service.

1018 J. Chem. Inf. Comput. Sci., Vol. 42, No. 5, 2002

Assessment of Training. Traditionally, MIMAS training courses have been evaluated by use of questionnaires completed by attendees after course attendance. This assessment evaluates the course itself, its content, delivery and the course materials, and is necessarily qualitative in nature. The questionnaire does not measure any increase in efficiency in searching resulting from the training or identify whether course attendees subsequently make use of any search features covered in the course which were not used prior to attending the course. More thorough investigation by questionnaire is possible, and this has recently been introduced by MIMAS for its CrossFire training. This takes the form of an in-depth questionnaire distributed at the course. Attendees are invited to provide additional background information and answer a selection of questions, which are shown in Table 1 along with a summary of responses provided at training courses delivered at three universities in 2001. Approximately one month after the course a further questionnaire is distributed by e-mail to attendees who completed the initial questionnaire and who have given their agreement to taking part in the follow-up study. The followup questions asked at this stage and answers provided by trained users at the three sample universities are given in Table 2. The log-file parser developed at ex-SB (SmithKline Beecham) provides the opportunity for a more objective, semiquantitative assessment of the effectiveness of the MIMAS CrossFire training courses. This can be achieved by running the program for specific groups of users or even individual users before and after they have attended training courses. Additionally it facilitates comparison of actual user behavior with users’ perceptions of the benefit provided by training.

COOKE

ET AL.

Table 2. Follow-up Questionnaire Distributed by E-mail to Course Attendees Who Completed the Initial Training Questionnaire (See Table 1)a

a Nine responses were received (out of 45 sent). Numbers of responses are given in square brackets.

CROSSFIRE AT EX-SB

CrossFire was released to over 400 end-users at SB in October 1997. On-site hands-on training was arranged for those users able to attend the scheduled training classes. For those unable to attend separate one-on-one training or group training without hands-on experience was offered. Following the launch of CrossFire, any additional users requesting access to the system were only offered one-on-one training as the number requesting training on a periodic basis was insufficient to warrant organized classroom training on a regular basis. A general questionnaire was prepared and used following the mass training for the initial launch. This provided feedback on the Beilstein professional training staff and the material covered in much the same way as the traditional evaluation done by MIMAS. There has been no systematic effort employed to capture the effectiveness of the subsequent one-on-one training. SB’s primary audiences for the CrossFire application are Medicinal Chemistry and Chemical Development. Chemical Development can be broadly split into two sectionss Synthetic Chemistry and Analytical Chemistry. Usage by Medicinal Chemistry and Synthetic Chemistry represents greater than 95% of the usage of the system. The other ∼5% is primarily made up by the Information Management department with little use being made by the Analytical

Chemistry department. Usage by the Information Management department is on behalf of those chemists that either do not feel comfortable using the application or need a complex or comprehensive search done. Within the new GSK organization a number of new and more effective training approaches are being developed. It is anticipated that the work described here will provide useful metrics for the effectiveness of the new training programs, currently under development, for our scientific desktop tools. CROSSPARSE DESIGN AND FUNCTIONALITY

Most of the actions taken by CrossFire end-users can be captured to a transaction log. If the transaction log functionality has been licensed, then by default, it is named ‘xfacct.log’ and is stored in the XFIRE\SYS directory on the CrossFire server. When the option for logging is turned on, there are several options that determine the level of logging to be performed. In the XFIRE.INI file, located in the XFIRE\BIN directory, there should be an entry for ACCOUNTING that is set to ‘ON’:

SEARCHING BEHAVIOR

OF

CROSSFIRE USERS

J. Chem. Inf. Comput. Sci., Vol. 42, No. 5, 2002 1019 Table 4. Examples of Representations of Structural Features Which Can Be Automatically Recognized by CrossParse

Figure 1. A snapshot of the ROSDAL code for a CrossFire reaction search, with some of the structure and reaction details indicated. Table 3. CrossFire Log File Format, Indicating the Structure of the Log Files and Type of Data Recorded column

contents

1 2

time/date using format: YYYYMMDD:HHMMSS user name + group name + TCP/IP address (separated by commas) process number CPU time of the command elapsed time of the command CrossFire database being used database context of the command (S ) substance, R ) reactions, C ) citation, N ) no context found) reference character for the command type (S ) structure search for substance or reaction, F ) fact search, ) not a searching command) command or search statement

3 4 5 6 7 8 9

The details of the accounting are found in the XFIRE.CFG file which is located in the SYS directory. An example is shown here:

The accounting function can collect specific commands or all commands that the CrossFire user is sending to the CrossFire server. If the command list is empty, then all commands are logged. Using the “_” after a specific command signifies that the command should be logged but not the complete command string associated with it. Logging all commands with all details is the appropriate option for CrossParse.

commander window query options

relevant string representation

implicit free sites implicit ring closures no isotopes multifragment charges radicals tautomers separate fragments absolute stereochemistry

/F+ /R+ /N1 /I2 (#+