2954
Ind. Eng. Chem. Res. 2000, 39, 2954-2969
Process Design Decision Support System for Developing Process Chemistry David C. Miller† and James F. Davis* Department of Chemical Engineering, 140 West 19th Avenue, The Ohio State University, Columbus, Ohio 43210
We present a methodology and computer-based process design decision support system (PDDSS) that critiques laboratory-scale process chemistry based on an engineering analysis of a process topology developed from what is currently known about the process chemistry. The process topology is a functionally generated “flow sheet” for a feasible process comprised of a generic equipment configuration. Equipment in the topology consists of models and knowledge that allow for early analysis of cost, environmental, regulatory, and safety issues that may become important as the project moves from experimental process research to engineering design. By coupling experimental chemistry development with interactive, engineering-based evaluation, PDDSS can accelerate the overall development process and enables the underlying chemistry to better meet financial and processing goals. PDDSS is unique in its approach by integrating downstream engineering considerations with laboratory-scale chemical synthesis. This paper discusses the knowledge framework and operation of PDDSS using a case study. Introduction Decreasing time to market is an important issue in virtually all industrial sectors. The first product to enter a new market often gains a competitive advantage. For example, in the pharmaceutical industry, physicians often do not want to prescribe a newer drug if the current one has a proven history of safety and efficacy. Thus, the dominant market share will belong to the first entry to a particular market. Additionally, since patents only provide protection for 17 years, it is especially important to ensure that product development and subsequent clinical trials occur as quickly as possible. Finally, and most obviously, until a product can be sold, it cannot generate revenue for the company. By more quickly getting a product into the consumer’s hands, a company can more quickly and more effectively begin to realize a return on its R&D investment. With such a huge incentive to decrease time to market, companies have embraced the ideas of life cycle design and concurrent engineering. For example, Rutter1 describes the application and success of such efforts at G.D. Searle. The fundamental theme is to improve development and manufacturing processes by integrating the development process across traditional functional distinctions.2,3 One way this is manifest is by concurrently performing various activities that would typically be segmented into different departments. For example, while continuing to develop the process, a new chemical may be in the midst of field trials and obtaining regulatory approval. Another way is to more fully consider the downstream ramifications of early design decisions. Such an approach helps to maximize the effective patent life by making development of the manufacturing process faster, thus reducing the likeli* To whom correspondence should be addressed. Phone: 614-292-0090. Fax: 614-292-3769. E-mail:
[email protected]. † Current address: Department of Chemical Engineering, Michigan Technological University, Houghton, MI 49931. Phone: 906-487-1956. Fax: 906-487-3213. E-mail: millerd@ mtu.edu.
hood that market entry will be delayed because of process design considerations. One of the greatest challenges in applying these concepts is the lack of tools and techniques for integrating the decision-making process. This is especially true in the agrochemical, pharmaceutical, and specialty chemical industries because of the inherently experimental nature of process chemistry development. Unlike traditional, continuous chemical processes, which focus to a large degree on petroleum-based products, batch production of specialty chemicals makes use of a variety of structurally complex molecules. Whereas the chemical properties and reactivity of relatively simple petrochemicals have been well studied and can be reasonably well predicted, those of complex specialty chemicals and their intermediates cannot. Typically, only limited physical property data is available for these molecules because they are unique to a specific process, never having been synthesized before. In a similar manner, the reaction chemistry and separation methodologies have significant uncertainty associated with them. Considerable experimental work is required to find suitable reaction conditions, separation techniques, and so forth. Thus, designing processes for these chemicals relies on the experimental development of reaction pathways and separation methods because not enough is known about the systems to use traditional sophisticated simulation techniques. Therefore, chemistry information that has experimental validation has considerable value. Because of the complexity of the chemicals and their chemistry, significant knowledge gaps must be bridged to successfully integrate engineering design with early process chemistry development. We have previously reported an overview of our approach to the problem.4 Here, we describe in detail our methodology and computer-based process design decision support system (PDDSS) for integrating process chemistry development with chemical engineering design. PDDSS is unique in providing engineering analysis based directly on experimentally developed process chemistry. Using actual
10.1021/ie990713u CCC: $19.00 © 2000 American Chemical Society Published on Web 06/20/2000
Ind. Eng. Chem. Res., Vol. 39, No. 8, 2000 2955
Figure 1. Schematic of the development process from discovery to process design.
experimental laboratory procedures, PDDSS develops an initial process topology to provide a basis for various types of engineering critique. In this way, the process chemistry can be developed with the ultimate process plant in mind. Issues that could cause delays during detailed design can be addressed before being lockedin by chemistry considerations. Development of Complex Synthetic Molecules Focusing on products that result from multistep syntheses is especially important because the chemical industry increasingly relies on these types of products. As such, there is significant benefit to designing and commercializing these processes in the most efficient way possible. To understand how we can integrate downstream engineering considerations into the early chemistry development process, we examine the typical development cycle for these structurally complex, site selective molecules. Ordinarily, chemical products in the pharmaceutical, agricultural, and specialty chemical industries begin in a discovery laboratory and proceed through process chemistry development, which includes route selection and scale-up, before reaching engineering design. Figure 1 shows a schematic overview of these early stages of the development process. PDDSS provides support to chemists engaged in process chemistry development. The goal is to speed development by bringing process chemistry closer together with engineering design. During the discovery process, many molecules are being synthesized and screened for potential activity. The focus at this stage is to rapidly synthesize and screen as many compounds as possible. Thus, the methods and reagents are chosen to facilitate rapid laboratory synthesis and purification. Little consideration is given to the cost of the reagents, since only milligram quantities are typically being synthesized. Scalability of the process is not a consideration at this point. Although this will require the synthesis pathway to be redeveloped before designing a production facility, it is appropriate, since less than one tenth of one percent of the molecules screened will have any commercial potential. Rapid screening is essential to find likely candidate molecules. Once a potentially viable compound is identified, development moves to an exploratory process chemistry group. Concurrently, the molecule will be undergoing further testing (e.g., toxicology, activity, environmental) that could cause the project to be terminated. During the initial process research stage, a group of chemists begins looking at various synthetic pathways that can be used to synthesize the desired molecule. They begin
by brainstorming various possible routes based on their general knowledge of chemical reactivity. The most promising of these routes are then selected for experimental work, where the chemists determine whether the chemistry will actually work and what sorts of yields can be achieved. They are laying the groundwork for what will eventually be used in the production facility. After sufficient exploratory process research has been pursued, a team meets for route selection. All the data for the various routes under consideration are brought together for detailed analysis. One route is selected for more detailed research and will eventually be passed off to engineering. The route selected is the one that appears to have a high certainty of success and a low estimated cost of production. Further experimentation then takes place to attempt to reduce the cost associated with the route, most often by attempting to increase product yields. The responsibility for the new product then passes to engineering, where plant design becomes the focus of attention. Engineers typically become involved during plant design, selecting and sizing equipment, determining energy and material requirements, and optimizing recycle and energy streams using a variety of traditional design techniques By the time plant design begins, the synthetic chemical pathway (i.e., the sequence of reactions and intermediates) has been chosen, developed, and essentially locked-in. As engineers begin plant design and optimization, they are constrained by the decisions that have occurred in previous stages of the development process. The raw materials are, to a large degree, set; most reaction conditions are set; and most separation methodologies are set. Since raw materials account for more than half of the total production costs, a large degree of leverage to affect the total cost of production is already fixed. Thus, because the design problem is constrained by decisions made in earlier stages of the development process, a great deal of flexibility has been lost. Revisiting previous decisions at this point will be extremely costly because significant backtracking in the development cycle would be required. Other Analysis Approaches. Companies often make use of various simple financial heuristics and empirical correlations to provide an estimate of the costs associated with a particular chemical pathway. To make use of these, a feasible sketch of a process flow sheet based on generic operations is typically developed. This sketch identifies major unit operations (i.e., reactors, filters, distillation columns, etc.), but it does not fully specify the equipment or the material streams. This preliminary flow sheet is used to estimate the processing costs associated with a synthesis route. Application of the procedure is often limited to a few individuals who know how to develop such a sketch from developmental chemistry descriptions such that it is in an appropriate form to be used by the financial correlations. Although these company experts can provide reasonable estimates, they are limited in several ways. First, the number of such estimates they can provide is limited by the large amount of time that must be invested in developing each flow sheet before applying the heuristics. Second, since a large portion of the analysis is performed manually, it is subject to errors and inconsistencies. Third, since the procedure is limited to only a few company experts, the procedure is not applied as often as it could be during ongoing development. Thus,
2956
Ind. Eng. Chem. Res., Vol. 39, No. 8, 2000
potentially better synthetic pathways could be missed during route selection because the chemists did not have sufficient guidance while performing their experiments. Fourth, the company could face a severe loss of expertise if the expert were to leave the company. With these limitations, the advantages of a more automated and rigorous method become apparent. The most direct solution to this problem is to apply existing engineering design methodologies and software tools to analyze the developing process chemistry. Unfortunately, insufficient information is available to develop the detailed mathematical models that these techniques require. Typically, the only information available is what the chemist has done in the laboratory to synthesize the molecule of interest. Kinetic studies have not generally been performed; complete physical property data for any new chemicals has not been determined; and knowledge of other reaction products is most likely unavailable (i.e., it is just called “waste”). In this information vacuum, traditional methods of modeling are not viable without artificially redefining the problem to fill in the gaps. Thus, Miller and Davis4,5 developed PDDSS to use existing information to provide a preliminary process analysis of process chemistry. Batch Design Kit6-8 (BDK) and Batch Plus9 are recently available commercial software packages that provide decision support and design tools that are closely related to the functionality of PDDSS but in fact address quite different aspects of the overall design problem.10 The different functionality provided by PDDSS is best characterized by considering the progression of a design from preliminary considerations to detailed design. BDK features the most preliminary design consideration by providing capability for scoping a design at the level of reaction pathways. This feature allows the user to explore multiple feasible reaction pathways to a product set and then narrow the set of pathways for further design consideration. For a particular reaction pathway, BDK then jumps ahead to a detailed design mode in which equipment and process connectivity are specified. At this level, all products of a reaction need to be identified and the designer needs to specify the details of the process flow sheet. Batch Plus also emphasizes this detailed design mode. Thus, although seeking to integrate engineering design with process chemistry development, they focus on the engineer. PDDSS provides design functionality that fits between the scoping capability at the reaction pathway level and the detailed design capability at the process flow sheet specification level. For a given reaction pathway, PDDSS establishes a feasible process and generic equipment configuration based on actual experimental chemistry laboratory procedures and data. With this chemistry information, PDDSS maps from the laboratory procedures into simple but feasible process configurations allowing for scoping consideration of toxicity and environmental issues, solvent selection, and cost. Thus, chemists can use this tool to gain engineering insight and feedback. In addition to a distinct design niche, PDDSS also can be distinguished from BDK and Batch Plus with respect to architecture. With a unique emphasis on mapping from chemistry procedures to process, PDDSS is structured as a database, allowing search between the function and the process device. The ability to map between the function and the process device allows the
system to investigate multiple devices or process options to accomplish various functions. Integrating Process Chemistry with Engineering Design Our approach is to identify the type of information that is immediately available and to make use of that information for process analysis. Engineering analysis in the process domain usually centers on a process flow sheet, because it is a convenient vehicle for identifying major unit operations, process streams, and the interrelations between them. The form of such process flow sheets can vary greatly. Douglas11 describes several types of process flow sheets ranging in detail from an input/output structure (level 0) to abstract reactor and separation systems (level 2) to the traditional process flow sheet with fully specified equipment and process streams (level 4). His approach recognizes the value of performing limited analysis early in the consideration of a new process before expending the resources required to gather all the information needed for developing the more detailed (level 4) analysis. What this method assumes, however, is that the information required for the more advanced levels of analysis is available from some source. When the information required for design beyond level 0 is not immediately available for creating more complete flow sheets (i.e., having equipment identified even if not fully specified) is the situation we are addressing. The PDDSS approach makes use of a generalized flow sheet representation that we refer to as a process topology in recognition that it does not contain the detail typically found in a process flow sheet. A process topology includes information on the material streams (i.e., approximate stream constituents and mass) and the equipment (i.e., what it is intended to do and what components are involved in the operation). It forms the foundation for an engineering analysis that provides direction to the process chemist by assisting with decisions about solvents, environmental conditions, and raw materials. Detailed process design considerations are left for later design phases. Because the information which will be used to develop our process topology is necessarily and primarily coming from synthetic organic chemists, the first step of our approach is to represent that chemistry information in a standardized manner. Our computer-based PDDSS then uses that information to develop the process topology. From there, the system provides a scoping estimate of capital, operating, and raw material costs in addition to analyses by various process critics. The critics provide economic, environmental, and safety evaluations of the process and chemicals involved. For example, the system can (1) flag solvents and reactants that are difficult to use on an industrial scale, (2) identify separation methods that may result in large environmental impacts because of large volumes of wastewater or air emissions, and (3) identify unit operations with especially high cost (economic or environmental) associated with using them on an industrial scale. Using the information provided by PDDSS, the chemists have greater information available to determine the most important issues to work on during the ongoing development of the process chemistry. The purpose to which the chemistry information is put is a key difference between PDDSS and BDK.
Ind. Eng. Chem. Res., Vol. 39, No. 8, 2000 2957
Figure 2. Reaction pathway developed by Schumacher et al.16 (1990).
Figure 3. Reaction pathway developed by Wu et al.17 (1997).
PDDSS Implementation PDDSS has been implemented in C++ using Borland’s Object Windows Library (OWL) and Paradox database tables manipulated using standard SQL. The entire PDDSS runs under the Microsoft Windows environment, which enabled industrial process chemists to evaluate it at various stages during implementation. Consistent with our long-term vision to see the system used by chemists at their desks, it is compatible with standard desktop configurations and utilities. The object-oriented nature of C++ facilitated the implementation by allowing encapsulation of the interface code separate from the core knowledge representation and inference mechanism code. Within the core knowledge representation code, class structures were used to represent the required knowledge of process device operation, chemical reaction recipes, reaction pathways, and chemical properties, such as physical properties, quantities, costs, and so forth. These classes encapsulate and isolate specific processing tasks required for each knowledge type. This early selection of implementation environment allowed prototype development to proceed incrementally and made it possible to add functionality without requiring major revisions of existing code. The structure of PDDSS will ultimately require interaction with several databases. The key database contains knowledge of the structure, function, and behavior of engineering process devices in a manner consistent with that described by Miller, Davis, and coworkers.5,12-14 This particular database provides the fundamental knowledge structures that are used to map between chemistry information and generic process operations to form a process topology that can be critiqued. Other databases house information about the chemicals (e.g., physical properties, price, hazards, and regulatory issues). The chemical database houses information on pure components, mixtures, and various grades of a chemical. PDDSS recognizes new chemicals or mixtures that are not in the database and prompts the user to enter the compound’s specifications. Although implemented in Paradox tables, PDDSS accesses these databases using standard SQL. Thus, the information could just as easily be stored in any database system, which would allow the system to take advantage
of commercial or in-house chemical information databases. PDDSS interacts with the user through projects. Each project consists of a single synthesis pathway that is input to the system. A project may be a pathway to a chemical intermediate or to a final product. Projects can be established to compare synthesis pathways or can be assembled to form overall product pathways (e.g., for convergent syntheses). The latter also makes it possible to evaluate alternative synthesis segments within an existing process. To assist in delving into the details of PDDSS, we will consider a case study comparing two recent reaction pathways for the synthesis of florfenicol (1). Although many different synthetic systems have been examined using PDDSS, florfenicol provides a nonproprietary vehicle for discussing the operation of the prototype system. Florfenicol is a commercially important, broad spectrum veterinary antibiotic developed and manufactured by Schering-Plough under the brand name Nuflor. A considerable number of syntheses have been reported in the literature since the compound was discovered by Nagabhushan.15 Figure 2 shows the synthesis pathway developed by Schumacher et al.16 in 1990, which will be the primary case study for describing PDDSS. Figure 3 shows an improved synthesis pathway developed by Wu et al.17 in 1997, which will be used as an additional case study with which to compare the results of the first. In these figures, the single-digit bold numbers are used to refer to the molecules in this document. Within the following reaction descriptions, they are referred to as Case Study X, where X is the number from the appropriate figure. The names of the reactions are used in this document to track the reactions when they are entered into PDDSS. Representation of Chemistry Knowledge Looking at virtually any issue of the Journal of Organic Chemistry will reveal a general format for reaction information. This format is representative of the information gathered when the primary interest is synthesizing and isolating a new compound. This information is provided in a recipe format: ingredients,
2958
Ind. Eng. Chem. Res., Vol. 39, No. 8, 2000
times, temperatures, procedural directions (e.g., additions, mixing, stirring, separations, etc.), and yield. These chemical recipes must form the basis for any analysis performed at this stage of the development process. Developing a representation structure for these recipes allows them to be used as a means of inputting chemistry laboratory information into PDDSS using semantics and lab operations familiar to the chemist and normally reported. Unlike the fully specified representation formalisms Language for Chemical Reasoning18 (LCR) and Reaction Description Language19-21 (RDL), which attempt to capture detailed molecular and reactivity information, our approach is concerned only with capturing the recipe knowledge. Thus, while LCR and RDL would require identification of reactive sites on the molecule, our representation is sufficiently abstract that a reaction is merely described as a sequence of chemicals being added, various processing steps, and the collection of a product at the other end. The tradeoffs between the two types of representations should be readily apparent. The simple, highly abstract representation is fully reflective of the information that is normally reported in the course of synthetic organic research. It is limited, but it is readily available. LCR and RDL, on the other hand, require significantly more information to define a chemical reaction. In some cases, additional experimentation may be needed to obtain all the information required. From an analysis perspective, LCR and RDL allow for much more detailed reaction analysis, since they have more information front loaded into the system. Thus, RDL can generate the entire theoretical mechanism or reaction network for any given feed;19 however, many of the reactions in its network may never actually occur. Although our representation of the chemical reaction provides little knowledge with which to reason mechanistically (i.e., which atoms of the molecules are reactive sites interacting with other molecules) about the chemical process, it is based directly on actual experimental results obtained by the chemist. Our representation is designed to use actual experimental information to construct a feasible process topology that can be used for subsequent analysis. Thus, to the extent that this laboratory reaction information can be translated directly into an appropriate process topology, it represents a conceivable process that has been verified (if at a much smaller volume) experimentally and provides a solid basis for analysis. LCR and RDL, on the other hand, are intended for detailed, theoretical, molecular-level reasoning and are not appropriate for this application. Since our representation comes out of the standard recipe descriptions found in the experimental sections of synthetic chemistry journals, it accurately captures this level of information. Although the syntax is intended to be as close to the natural terminology of the chemist as possible, certain syntactical conventions are enforced in order to facilitate the implementation of a parser to read and understand the information. Keywords are used to describe major procedural knowledge from the recipe. These are broken down into masstransfer keywords (where primary material is physically entering or leaving the system), separation keywords (which describe various methods that could be employed to separate components), and descriptive keywords (which indicate time, temperatures, pressures, etc.). Our representation has many parallels with the operations-based language of BDK6-8 although they
were developed independently and from different perspectives. Our representation arose from the manner in which syntheses are reported in the chemistry literature.5 The BDK representation language was developed using “terms that sound familiar to both chemists and engineers”.6 Thus, it is not surprising that the representations are quite similar. BDK does, however, more closely correspond to terminology used in batch-processing operations than in the chemistry laboratory by including, for example, operations for synchronization of parallel and sequential operations. Mass-transfer keywords provide a mechanism to track which chemicals are being added to and collected from the system. The keywords add and charge reactor with both indicate that one or more chemicals are to be added to the system. The keyword collect indicates that a product is to be removed. The syntax for these keywords is
add chemical-name (amount-in-grams g) charge reactor with chemical-name (amount-in-grams g) collect chemical-name (amount-in-grams g) transfer to reactor chemical-name (amount-in-grams g) Items in bold appear as shown above. Items shown in italics are to be replaced by the appropriate chemical name and amount. Transfer is a useful shortcut for collect followed by add or charge. Following the add keyword, an asterisk (*) is appended to the name of a chemical which is the product of a previous reaction step. This alerts the system to link the overall reaction sequence through this chemical. Separation keywords allow for the major types of separations that are performed in the laboratory to be specified. These keywords include
centrifuge crystallize distill extract into chemical-name (amount-in-grams g) filter recrystallize from chemical-name (amount-in-grams g) remove chemical-name (amount-in-grams g) via function Extract displays elements of both a separation keyword and a mass-transfer keyword, since implicit in its description is the addition of a new chemical into which the desired chemical is extracted. Remove is used when a component other than a desired product is specifically removed from the process system. (As such, it is also a mass-transfer keyword, explicitly indicating the removal of material from the system.) Although the amount of material removed or the amount used for extraction is not usually reported in the literature, it is often recorded in the chemist’s notebook or can be estimated by the chemist. It must be entered as part of the system input. Filter and extract have one additional keyword that is generally associated with each of them:
Ind. Eng. Chem. Res., Vol. 39, No. 8, 2000 2959
wash with chemical-name (amount-in-grams g) Unlike extract, which implies the addition of a new chemical to the overall mass balance, wash with does not affect the overall mass balance for our kind of preliminary analysis, since the majority of the material used to wash a filtrate or an extract is then separated and either recycled or disposed of. Again, although the amount of material used in the wash is not usually reported in the literature, that information is typically recorded in the chemist’s notebook and can be entered without difficulty. Other keywords that can be used to describe additional details of the chemical process include:
addition temp temp-in-°C addition time time-in-min adjust pH to pH with chemical-name cool cool to temp-in-°C dry final temp temp-in-°C heat heat to temp-in-°C note: any text describing something that does not correspond to a keyword, such as color reaction time time-in-min reference reference to laboratory stir for time-in-min yield ##% based on chemical-name A useful feature of this syntax is that new keywords can be added to support new processes by simply updating the appropriate database tables. For example, pressure specifications could be included by adding syntax and database entries analogous to that used for temperature changes. In the current prototype, pressure changes are neglected. With these keywords a chemical reaction can be represented in sufficient detail to allow a feasible process topology to be generated and analyzed. Florfenicol Case Study. Figure 4 shows the completed reaction entry with graphic for reaction 1990a. The original reaction text in journal format based on Schumacher et al.16 is shown in the inset for comparison. Table 1 shows a comparison of the original text in the journal describing the reaction with the chemical syntax used to input information into PDDSS for all the reactions in Schumacher et al.’s synthesis pathways. In general, PDDSS syntax very closely follows the language in the original description of the reaction. However, additional information is required in those steps that involve additional materials. For example, the user needs to include an estimate of the amount of material used in a wash step. Again, this information is typically available in the laboratory notebook, even though it is not usually reported in a formal paper. Process Topology Development Since the purpose of representing the process chemistry is to develop a process topology that can be critiqued, it is essential that it provide a ready link to
Figure 4. Example of the Reaction Information dialogue with a fully entered sample reaction description. The inset is the reaction description in a format similar to that presented in J. Org. Chem.
engineering process information. This link is achieved by querying an engineering knowledge database that is structured to map between function and device. We have previously described the structure and knowledge content of such a database.5,12-14 The database is organized using a formalism called Functional Representation (FR).5,12-14 (Extensive discussion of the functional representation formalism and the engineering knowledge database is outside the scope of this paper but can be found most completely in refs 5 and 12.) FR distinguishes four main data types: DEVICE (structure), MODE (categories of operation), FUNCTION (expected behavior), and CAUSAL PROCESS DESCRIPTION (CPD) (behavior). As shown in Figure 5, these data structures are linked to provide a linked breakdown of device behavior. The DEVICE data structure includes knowledge of the topological aspects, which includes components, ports, and so forth. It describes what an entity is. Associated with each DEVICE is a set of MODES used to describe an overall state of the device such as valve-normal-operation or pump-seal-fail. The MODE data structure then associates functions expected when in a particular mode of operation. In general, MODE describes when a function can occur. A FUNCTION describes what the device does. A CAUSAL PROCESS DESCRIPTION (CPD) describes how that function is achieved. The CPD represents the internal behavior of a device or component through a set of state descriptions. Annotated-state-transitions (ASTs) provide additional detail on how a final state (state B) is attained from an initial state (state A). ASTs are links that assign responsibility for a transition to a physical law, to a function, or to a more detailed CPD. In general, CPDs can be viewed as digraph representations of process variable transitions. Thus, FR provides data types that answer the questions what is it (DEVICE), what does it do (FUNCTION), when does it do it (MODE), and how does it do it (CPD). FR is a general-purpose structure that can be used for multiple applications such as diagnosis, hazard analysis, and design.5,12-14 For the purposes of PDDSS, we are interested only in the intended or normal functions of the device. The mode descriptions assist in identifying normal operations.
2960
Ind. Eng. Chem. Res., Vol. 39, No. 8, 2000
Table 1. Reaction Text Based on That Presented by Schumacher et al.16 in J. Org. Chem. and PDDSS Reaction Syntax for the 1990 Reaction Pathway original reaction text (based on Schumacher et al.,16 1990)
PDDSS reaction syntax
Case Study 3 (3). A mixture of glycerol (10 mL), K2CO3 (0.43 g, 3.1 mmol), and Case Study 2 (5.0 g, 20.4 mmol) was heated to 115 °C. Benzonitrile (3.5 mL, 3.5 g, 33.9 mmol) was added, and the mixture was stirred for 18 h at 115 °C. The reaction was cooled and diluted with water. The product was collected by filtration, washed with cold water and methylene chloride, and dried under vacuum to give 6.4 g (HPLC purity 97%; yield 95%) of Case Study 3 as an off-white solid; mp 206-209 °C.
name: reaction 1990a charge reactor with glycerol (10 mL, 10 g) add K2CO3 (0.43 g) add Case Study 2 (5.0 g) heat to 115 °C add benzonitrile (3.5 g) stir for 840 min cool add water (5 g) filter wash with water (2 g) wash with MeCl2 (2 g) dry collect Case Study 3 (6.4 g) yield 95% based on Case Study 2
Case Study 5 (5). To a suspension of Case Study 3 (30.0 g, 0.091 mol) in 300 mL of CH2Cl2 was added 21.4 mL (26.3 g, 0.118 mol) of Ishikawa reagent at room temperature under nitrogen. The reaction mixture was enclosed in a pressure reactor and heated at 100 °C for 2 h. After cooling to near 0 °C, the vessel was opened and HPLC analysis of the solution gave 28.7 g (0.086 mol, yield 95%) of Case Study 4. After being washed with water, this solution was used directly in the next step.
name: reaction 1990b charge reactor with MeCl2 (397.5 g) add *Case Study 3 (30 g) add Ishikawa reagent (26.3 g) note: under nitrogen, then pressurize heat to 100 °C reaction time 120 min cool to 0 °C note: open pressure vessel collect Case Study 4-i note: -i stands for intermediate that is not purified
The CH2Cl2 solution of Case Study 4 was added over 0.5 h to 300 mL of 6 N HCl heated at near reflux, allowing the CH2Cl2 to distill from the reaction vessel. After removal of the CH2Cl2, the reaction mixture was heated at 100-105 °C for 12 h. The solution was cooled and extracted twice with dichloroethane. The aqueous layer was adjusted to pH 12, and the product was extracted in CH2Cl2. The CH2Cl2 solution was dried over Mg2SO4, filtered, and concentrated to a residual solid of 24.0 g (HPLC 85%, 0.0826 mol).
name: reaction 1990c charge reactor with 6 N HCL (360 g) heat to reflux add *Case Study 4-i addition time 30 min remove MeCl2 (395 g) by distillation heat to 100 °C reaction time 720 min cool wash with EtCl2 (75 g) 2× adjust pH to 12 with 6 N NaOH solution extract into MeCl2 (75 g) filter remove MeCl2 (75 g) by distillation collect Case Study 5 (24.0 g)
Florfenicol (1). A solution of 11.1 g of Case Study 5, 6.12 mL (4.44 g, 43.9 mmol) of triethylamine, and 22.7 mL (31.4 g, 0.220 mol) of methyl dichloroacetate in 110 mL of dry methanol was stirred at room temperature for 18 h. The reaction mixture was concentrated to a low volume and precipitated by the addition of toluene and H2O. The product was collected by filtration, washed with H2O, and dried under vacuum to afford 16.8 g (HPLC purity 90%, 42.2 mmol, 96% yield) of crude 1. The precipitate was recrystallized from 2-propanol/H2O and dried under vacuum to yield 13.4 g (HPLC purity 98%, 36.7 mmol, 84% yield) of florfenicol (1) as a white solid: mp 152-154 °C.
name: reaction 1990d charge reactor with dry methanol (110 g) add *Case Study 5 (11.1 g) add triethylamine (4.44 g) add methyl dichloroacetate (31.4 g) reaction time 1080 min remove solvent (105 g) by distillation add toluene (10 g) add water (10 g) dry recrystallize from 2-propanol/water (12 g) filter dry collect florfenicol (16.8 g)
The DEVICE-MODE-FUNCTION-BEHAVIOR links within the database make it possible to associate equipment with functions (via modes) and functions with behavior. Because the functions, behaviors, modes of operation, and device descriptions are linked to one another (as opposed to being directly embedded in a single, complex data structure), the database supports access by function or behavior just as readily as access by device. When selecting a function, links can be followed to determine devices and behaviors associated with that function. Thus, the flexible linkages allow multiple points of access. This is important in the context of the present application, because we wish to determine equipment capable of achieving the functions and/or behaviors that were specified in the process chemistry.
Development of a process topology begins by determining a set of functions and behaviors required for the process. Function provides the most direct and natural link to process devices. This is because a function such as distillation or extraction is fundamentally equivalent in both the chemistry and the engineering domains. Behavior provides a useful link that allows consideration of alternative functions; however, for at least the initial evaluation, the topology should be based on the actual laboratory chemistry. The database is queried to determine one or more devices (i.e., equipment or unit operations) capable of achieving the required functions as shown in Figure 6. Table 2 shows the functions related to the keywords in the reaction syntax. Thus, a distillation column would
Ind. Eng. Chem. Res., Vol. 39, No. 8, 2000 2961
Figure 5. Schematic showing the linkages between the major components of FR.
Figure 6. Linking of function to potential devices retrieved from the engineering knowledge database. Table 2. Linkages between Chemical Syntax Keywords and Functions in the Database chemical keyword
function
add charge reactor collect cool dry filter heat to remove transfer to reactor
material loading material loading material collection heat transferscooling drying filter heat transfersheating material removal material transfer consisting of material collection and material loading filtrate washing agitation distillation extraction crystallization centrifugation
wash with stir for distill extract crystallize centrifuge
be selected on the basis of its ability to achieve certain types of separation functions. Depending on the number of equipment items returned, further refinement is possible; however, when developing an initial feasible process topology (based on chemistry laboratory information), it is useful to select the most generic representation of a device capable of achieving the required function. Instead of trying to determine the most appropriate detailed reactor type (for which the data required to select may not be available), a generic reactor can be
selected to encapsulate the common features of any chemical reactor. Coupled closely with this selection criterion is also a desire to select devices that can be used to satisfy multiple functions. In this way, we can minimize the equipment required for the process. Figure 7 abstractly shows multiple device options for a series of state transitions from A to E. In the figure, the state transition from A to B (top left) could be achieved by either of two functions (shown by following the links downward). Continuing downward, each function could be achieved by one of three devices (linked through different modes of operation). The state transition from B to C to D can be achieved using one function to go from B to C and one to go from C to D. Alternatively, the functions associated with those state transitions could be described as part of a single state transition going directly from B to D. Depending on which scenario is preferred, either one or two devices would be selected to go from B to D. Upon selecting a piece of equipment for the initial function, before selecting a piece of equipment for the next function, the database is queried to determine whether the existing equipment can achieve this new function in addition to the function which caused it to be initially selected. If the device already selected can achieve the new function, then that function is added to the list of functions specified for that piece of equipment to carry out. This process continues for each function in the ordered list that was developed from the laboratory specification. Thus, for a generic sequence of functions as given in Table 3, a generic reactor will handle the material loading functions and the material mixing function, as shown schematically in Figure 8. A potential difficulty arises when functions capable of being achieved by a common device are mutually exclusive. These cases can be handled by specifying different modes of operation for each of the functions. This takes into account the fact that identical equipment can be used for different purposes. Thus, one function could be achieved when the device is in normal-mode-1 and the other function could be achieved when the device is in normal-mode-2. The use of modes of operation helps to handle the complexity that can arise when
2962
Ind. Eng. Chem. Res., Vol. 39, No. 8, 2000
Figure 7. Schematic showing how a group of state transitions can be achieved by any one of a number of functions which in turn can be achieved by any one of a number of devices.
Figure 9. Potential functions associated with a selected device through its normal modes of operation.
Figure 8. How multiple functions are associated with a single reactor. Table 3. Sample Reaction Description with Corresponding Ordered Function List reaction description
ordered function list
charge reactor with A add B add C stir for X min extract into D crystallize filter wash with E wash with F dry collect G
material loading material loading material loading material mixing extraction crystallization filter filtrate washing filtrate washing drying material collection
multiple functions are associated with a single piece of equipment. By modularizing functions around modes of operation, only those functions associated with a particular mode need to be considered in an analysis. It is also possible that the modes associated with two incom-
patible functions would themselves preclude a device being used in both modes. In that case, two separate pieces of equipment are required. At this point, an ordered list of devices (i.e., equipment or more generic unit operations) and connections between them has been established. In the course of developing this process topology, certain functions were associated with each device. While this provides a level of understanding of what each device is intended to achieve, it does not provide a description of the complete behavior of the device because each device typically has multiple associated functions that are necessary for its proper operation. Although these functions are ancillary to the function that caused the device to be initially selected, they often have a major effect on the overall process. Therefore, it is necessary to once again query the database to pull out the entire description of the device to obtain the knowledge structure shown schematically in Figure 9. This description will include the functions that caused the device to be selected in the first place (as well as the modes of operation through which those functions are linked to a particular device). These are called intended functions. In addition, other functions that are an integral part of the normal
Ind. Eng. Chem. Res., Vol. 39, No. 8, 2000 2963
operation of a device can be identified and linked with the device. These functions are called side effects. For example, a distillation column will have functions associated with the heating (reboiler) and cooling (condenser) that affect the operability of the device. In addition, most separation functions will implicitly cause waste to be generated. While never an intended function, these kinds of additional behaviors are required for proper operation. It is important to remember that, in developing a process topology based on laboratory chemistry, the devices are not fully detailed equipment specifications. Instead, they are instances of general equipment classes (i.e., reactor, distillation column, filtration unit, etc.). As such, they have abstract functions associated with them (i.e., a make-reaction function is used instead of a more specific make-sulfur-reaction function). Thus, for the purposes of PDDSS, we do not require the greater specificity that can be associated with these descriptions as described in the examples of Miller et al.12 In addition, since we are interested in analyzing the design as it is intended to operate, we do not make use of abnormal or fault modes when pulling the device descriptions from the database. Instead, only those modes that are considered normal are retrieved when building the device representations that will be used for process analysis. Just as the database allows for independent access of information for each data type, the topology representation follows a similar structure. In addition to the structural, functional, and behavioral information retrieved from the database, information about chemicals is linked to those functions that refer to them. Thus, for a material loading function, a link will be maintained to information about the chemical species that is being added to the system. This information can be developed from the reaction description by querying a database of chemical information to get physical property, regulatory, and pricing information. The resulting structure allows for independent access of device, function, and chemical information. No matter which piece of information is the point of access, the linkages between information types allow easy migration between knowledge types. Thus, the equipment associated with a particular function or chemical species can be readily identified. In addition, through the function data type, the representation maintains a link back to the original chemical reaction description. Thus, an analysis of the process topology can point back to the elements in the original chemistry that gave rise to various aspects of the analysis. Florfenicol Case Study. To begin the analysis, the system reads the reaction descriptions that were previously entered. It identifies the functions that are involved with each reaction step and builds an ordered list of functions that will be used to query the FR database for suitable devices. Upon identifying functions that involve adding or removing chemicals, PDDSS queries a chemical information database that has been constructed for this prototype. If PDDSS cannot find a particular chemical, it first asks the user to provide the CAS number or another unique identifier that is used internally to index the information. When the CAS number is entered, the system first queries the chemical database to see if it already has information on that chemical listed under a name not yet known to the database. For example, isopropyl alcohol, 2-propanol,
Table 4. Original Text from the Reaction with Resulting Function and Link to Chemical Information if Applicable original reaction text charge reactor with glycerol (10 g) add K2CO3 (0.43 g) add Case Study 2 (5.0 g) heat to 115 °C add benzonitrile (3.5 g) stir for 840 min cool add water (5 g) filter wash with water (2 g) wash with MeCl2 (2 g) dry collect Case Study 3 (6.4 g)
ordered function list
chemical
material loading
glycerol
material loading material loading heat transfersheating material loading agitation heat transferscooling material loading filter filtrate washing filtrate washing drying material collection
K2CO3 Case Study 2 N/A benzonitrile N/A N/A water N/A water MeCl2 N/A Case Study 3
and IPA all refer to the same chemical species and may be used in a reaction description. Rather than requiring chemical data to be entered repeatedly for each synonym, the database that houses the actual chemical data is indexed by CAS number. CAS numbers are in turn indexed through a table that allows multiple synonyms to be used to access a single chemical record. If the system does not yet have any information about the chemical, the system prompts the user for basic physical property and price information. The chemical information database can be readily updated to reflect changes in prices or to provide additional data not available when initially entered. When PDDSS has finished parsing the reaction information, it has developed an ordered list of the functions involved with each chemical transformation. Associated with mass-transfer functions are the chemicals entering or leaving the system. The ordered list developed when parsing Schumacher’s florfenicol synthesis case study results in the function list shown in Table 4. This table shows the reaction syntax (left column) which gives rise to functions (center column). The right column shows the chemicals associated with mass-transfer functions. Using this ordered list of functions, PDDSS queries the engineering database to find high-level descriptions of the devices that can be used to achieve the functions that are required on the basis of the chemistry as entered into the system. A reactor is initially selected for its ability to handle the material loading function derived from the charge reactor with keywords. The database is then queried to determine whether this device can handle the next function. The reactor is also capable of handling the additional material loading functions as well as the agitation and heat transfer functions. When determining whether the reactor is capable of achieving the filter function, the query returns false. An additional query is then made to determine a new device that can perform the filter function. Table 5 shows the devices (left column) which are selected to perform the required functions (right column). Upon determining a set of devices, a new window appears with graphics representing the devices and text strings indicating material coming into the system (listed above the graphics) and material leaving the system (listed below the graphics), as shown in Figure 10. The first three devices (reactor, filtration system, and dryer) correspond to the functions derived from the experimental description of the first reaction as shown
2964
Ind. Eng. Chem. Res., Vol. 39, No. 8, 2000
Table 5. Devices for Reaction 1990a Shown with Their Corresponding Functions and Chemicals device reactor
filtration system dryer
ordered function list material loading material loading material loading heat transfersheating material loading agitation heat transferscooling material loading filter filtrate washing filtrate washing drying material collection
Figure 11. Parallel knowledge structures showing the linkage between transformation and equipment through function. The highlighted functions heat transfer-cooling are the same function.
Figure 10. Partial display of the devices selected for the case study.
in the above tables. The second reactor corresponds to the second reaction. (A connection is shown here to indicate that the product of the dryer is the feed to the reactor.) The materials leaving the filtration system (i.e., the aqueous waste and MeCl2) were not explicitly identified in the reaction description or in the ordered function list. Instead, they result from additional functions that the filtration system performs as a part of its normal operation. As stated earlier, such functions are called side-effects and are contrasted with intended functions, which drive the decision for choosing a particular device. In this case, the filtration system has a side-effect function that causes waste to be removed from the system. Figure 11 shows how the ordered list of functions developed from reaction 1990a corresponds to process equipment. This link between the chemistry representation and the engineering representation is crucial for reporting the results of the analysis back to the chemist. The left side of the figure shows a tree diagram that displays the linkages between knowledge types under the chemical transformations. In this diagram, two reactions are seen, reaction 1990a and reaction 1990b. Under each of these reaction data types are the functions associated with that reaction. The reaction named reaction 1990a has 12 functions associated with it: five material loading functions, two filtrate washing functions, and one each of heat transferscooling, heat transfersheating, filter, drying, and material collection. Attached to each material loading and filtrate washing function is a link to the chemical involved in the addition or wash function. The right side
of the figure shows a tree diagram that displays the linkages between knowledge types by equipment. The first piece of equipment, reactor vessel, performs the first seven functions found in the transformation representation. Filtration system performs the next three, and dryer, the final two. The additional equipment items listed further down the tree correspond with other chemical transformations. The highlighted function heat transferscooling directly shows the linkage between the reaction-based representation (left side) and the equipment-based representation (right side). This linkage between domains (i.e., chemistry and engineering) allows various types of analysis to proceed in the most natural domain and then be reflected in the other. Thus, an engineering-based analysis can be performed using the structures in the engineering domain (right window). The results can then be linked (through the corresponding functions) back to the reactions that contribute to certain parts of the analysis. Thus, the chemist can directly see how the chemistry influences the equipment. Once a process topology is specified, the resulting mass requirements for each chemical are determined. The numbers associated with each material description are updated to reflect the production basis being used for the remainder of the analysis. Waste streams are also identified and their masses calculated. This allows material and waste costs to be estimated. In addition, since we have determined the volume of material passing through each device, we can estimate the processing costs associated with the chemistry. This cost is currently estimated using a proprietary empirical correlation, but any appropriate model could be used. Process Topology Critique To critique a process topology, we employ independent critics. Critics are much like outside experts. They are external to the data structures developed for the process topology; however, they have complete access to the information contained therein. While it is possible that a given critic can be dependent on the information from another critic, those applied to date have been independent of each other. Figure 12 schematically shows external critics interacting with the process topology
Ind. Eng. Chem. Res., Vol. 39, No. 8, 2000 2965
Figure 12. Schematic of the relationship between the process topology and various critics. Note that the critics are all external to the process topology itself.
knowledge representation, which consists of interconnected devices as well as their associated functions. Each critic is designed to evaluate a particular aspect of the design. These can range from material issues (i.e., regulatory and hazard information associated with the chemicals in the process) to equipment issues (i.e., whether a particular piece of equipment is expected to operate properly or whether special materials of construction must be employed) and to cost issues (i.e., processing cost, waste treatment/disposal cost, raw material cost, and overall cost). The critics share a similar approach. Analysis begins with a critic accessing information about the first device in the topology. It searches for relevant information in the device structure itself and in the functions that are linked to that device through its normal modes of operation. The critic then proceeds to the next device in the topology. Because the critics have access to all the information contained within the data types that make up the process topology, they can pull together information from equipment classes, chemical classes, and function classes to analyze a variety of potential issues. Once a critic has enough information, it attaches the results of its critique to the data structure most directly aligned with its purpose. For example, if the Regulatory critic identifies a chemical not registered under the Toxic Substances Control Act (TSCA), it will attach a comment to the function through which that chemical is added. This comment indicates that significant regulatory hurdles must be overcome in order to be able to use that chemical. The results from this critic can be viewed directly by determining the comments attached to a particular function or it can be viewed by looking at critic results aggregated by device. Additionally, and most importantly, critic results can be viewed as direct consequences of specific actions in the laboratory procedure. Because of the way the topology was generated, all data types have links back to the original reaction syntax used to input the experimental laboratory process. This is best illustrated in Figure 11, which shows functions derived from the laboratory reaction description on the left and their corresponding equipmentbased manifestations on the right. The results of equipment-based and function-based critics are similarly associated with the other data types to facilitate access to the information from multiple points of view. While this is straightforward for critics that rely only on information contained directly within their own data
type, such as a Regulatory critic, other critics require information from the other data types. For example, an equipment-based critic, although it is concerned with the operation of a particular device, cannot effectively critique a device unless it has access to information about its functions and the chemicals involved. Thus, a distillation critic requires certain physical property information in order to determine the effectiveness of a particular distillation. Similarly, a waste critic requires information from all the functions of all the devices in order to determine where waste is generated. It then requires chemical information to determine the amount and type of waste. This knowledge structure allows critics to be independently developed on the basis of the type of critique they perform. Since access is equally easy for any of the available data types, the same overall structure can be employed for all the critics. Thus, an equipment-based critic will iterate through the equipment structures and use the links to functional and chemical information to perform its critique. The results of the criticism are stored in all the data structures associated with the equipment. Several different critics are available, including, but not limited to, Unit Operation, TSCA, Reactive Chemical Hazards, other Regulations, and Cost critics. To provide a more tangible example of the way in which the FR database structure allows for the development and application of critique agents, we will describe in detail the manner in which the distillation critic works (a unit operation critic). The purpose of the distillation critic is to determine whether distillation may be sufficiently difficult to warrant evaluation of other separation methods. This is important, since the process topology was originally based on laboratory chemistry. It is quite possible that a separation method works adequately in the laboratory but may be difficult from a scaled-up process perspective. The rules employed in the critic are based on those put forth by Barnicki and Fair.22 According to them, the appropriateness of distillation is governed by relative volatility. If the relative volatility between the desired chemical and the others in the mixture is >1.50, then simple distillation is the preferred separation method. For a relative volatility between 1.10 and 1.50, simple distillation could still be a competitive separation method, but it should be compared with other techniques such as extraction, adsorption, and crystallization. If the relative volatility is