Java Bytecode to Native Code Translation: The Caffeine Prototype and Preliminary Results
Cheng-Hsueh A. Hsieh John C. Gyllenhaal Wen-mei W. Hwu
Center for Reliable and High-Performance Computing
Abstract
exists behind efforts to create such a software distributionlanguage. The progress, however, has been very slow due to
The Java bytecode language is emerging as a softwaredistribution standard. With major vendors committed to
On the legal side, many software vendors have been
porting the Java run-time environment to their platforms,
skeptical about the ability of the proposed software distribution
programs in Java bytecode are expected to run without
languages to protect their intellectual property. In practice,
modification on multiple platforms. These first generation run-
such concern may have to be addressed empirically after a
time environments rely on an interpreter to bridge the gap
standard emerges. Although the protection of intellectual
between the bytecode instructions and the native hardware.
property in software distribution languages is an intriguing
This interpreter approach is sufficient for specialized
issue, it is not the topic addressed by this paper. For the
applications such as Internet browsers where application
purpose of our work, we expect Java to be accepted by a
performance is often limited by network delays rather than
sufficient number of software vendors in the near future to
processor speed. It is, however, not sufficient for executinggeneral applications distributed in Java bytecode. This paper
On the technical side, the performance of programs
distributed in a universal software distribution language has
presents our initial prototyping experience with Caffeine, an
been a major concern. The problem lies in the mismatch
optimizing translator from Java bytecode to native machine
between the virtual machine assumed by the software
code. We discuss the major technical issues involved in stack
distribution language and the native machine architecture. The
to register mapping, run-time memory structure mapping, and
task of bridging the gap is made more difficult by the lack of
exception handlers. Encouraging initial results based on our
source code information in the distributed code in order to
protect intellectual property. As a result, software interpretershave been the main execution vehicles in the proposed
standards. The disadvantage of software interpreters is poor
1. Introduction
performance. This disadvantage has been partiallycompensated for by the fast advance of microprocessor speed.
The software community has long desired a universal
For applications such as Internet browser applets where overall
software distribution language. If such a language is widely
performance is often more limited by network delays than
supported across systems, software vendors can compile and
processor speed, sacrificing processor performance in favor of
validate their software products once in this distribution
reducing software cost has become acceptable. This is,
language, rather than repeating the process for multiple
however, not true for general applications.
platforms. Software complexity is rapidly increasing and
This paper presents our initial prototyping experience with
validation has become the deciding factor in software cost and
Caffeine, an optimizing Java bytecode to native machine code
time to market. Therefore, substantial economic motivation
translator. Although our techniques are presented in thecontext of handling Java, they are applicable to other software
distribution languages such as Visual Basic P-code. We are by
no means arguing that Java is the ultimate software distribution
Copyright 1996 IEEE. Published in the Proceedings of the 29th Annual
language. Rather, we intend to develop a strong portfolio of
International Symposium on Microarchitecture, December 2-4, 1996, Paris, France. Personal use of this material is permitted. However, permission to reprint/republish
techniques from our Java implementation efforts that will
this material for resale or redistribution purposes or for creating new collective works
contribute to the creation and acceptance of whatever language
for resale or redistribution of servers or lists, or to reuse any copyrighted component of
becomes the final standard. The objective of this work is to run
this work in other works, must be obtained from the IEEE. Contact: Manager,
the translated code at nearly the full performance of native code
Copyrights and Permissions /IEEE Service Center / 445 Hoes Lane / P. O. Box 1331 /Piscataway, NJ 08855-1331, USA. Telephone: + Intl. 908-562-3966
directly generated from a source representation such as the C
comparison between these approaches and native code
execution are presented in Section 7.
Due to space limitations, we will limit our discussion to
Interpreters are the most widely understood approach to
three critical issues involved in the translation process. The
execute Java bytecode programs. A software interpreter
first issue is the mapping of the stack computation model of the
emulates the Java bytecode Virtual Machine by fetching,
bytecode Virtual Machine to the register computation model of
decoding, and executing bytecode instructions. In the process,
modern processors. A performance enhancing algorithm that
it faithfully maintains the contents of the computation stack,
takes advantage of the register computation model is presented.
local memory state, and structure memory. The Java interpreter
This algorithm requires analysis to identify the precise stack
from SUN Microsystems is available to the public [2].
pointer contents at every point of the program. In addition,
Just-in-time compilers do on-the-fly code generation and
most compilation infrastructures require that each virtual
cache the native code sequences to speed up the processing of
register contains just one type of data, and that virtual registers
the original bytecode sequences in the future. The current
do not overlap. We present a live-range-based register-
generations of just-in-time compilers do not save the native
renaming algorithm that can resolve such inconsistencies in
code sequences in external files for future invocations of the
non-pathological cases. The second issue is mapping the
same program. Rather, they keep the native code sequence to
bytecode memory organization to the native architecture. A
speed up the handling of the corresponding bytecode sequence
more efficient memory organization than the one used by the
during the same invocation of the program. Thus, they take
Java interpreter is introduced. The third issue is how to
advantage of iterative execution patterns such as loops and
translate the exception handling semantics of Java. We
recursion. At the time of this work, Borland [12] and Symantec
describe the preliminary method used by Caffeine and some of
[13] had both announced just-in-time compiler products, and
the Symantec JIT compiler is used in this paper. Due to the
A prototype of Caffeine has been developed based on the
code generation overhead that occurs during program execution,
IMPACT compilation infrastructure [1]. The prototype is
just-in-time compilers are still intrinsically slower than
sufficiently stable to handle Java bytecode programs of
substantial size. This paper presents some initial experiments
Optimizing native code translators use compiler analysis to
comparing the real machine execution time of Java bytecode
translate bytecode programs into native code programs off-line.
programs using the SUN Java interpreter 1.0.2, the Symantec
This is the least understood approach among the three
Java Just-in-time (JIT) compiler 1.0, and the IMPACT Java to
alternatives. Without extensive analysis and transformation
X86 native code translator 1.0 running under Windows 95.
capabilities, the native code generated may not be much better
Also included in the comparison is the execution time of
than that cached in the just-in-time compilers. Therefore,
equivalent C programs directly compiled by the Microsoft
optimizing native code translators must perform extensive
Visual C/C++ compiler 4.0 into X86 native code. Preliminary
analysis and optimization in order to offer value beyond just-in-
results show that the optimizing translator is currently capable
time compilers. Such analysis and transformations tend to
of achieving, on average, 68% of the speed of the directly
make the translation process more expensive in time and space.
In general, only those applications that will be repeatedly
The remaining sections are organized as follows. Section 2
invoked or those applications whose execution time is much
introduces different approaches to execute Java bytecode
longer than the translation time should be translated. Thus,
programs and an overview of our translation steps. Section 3
optimizing native code translators will not eliminate the need
presents the stack computation model used by Java followed by
for interpreters and just-in-time compilers.
a proposed stack to register mapping. An overview of the stack
Figure 1 shows an overview of the steps in our prototype
analyses required to perform and validate this mapping is
optimizing native code translator. The Java class files [4]
introduced in Section 4. Section 5 discusses the run-time
required to execute the program are identified and decoded into
memory model adopted by the SUN Java interpreter and
sequences of bytecode operations, which are later used for
presents a more efficient organization. Complications due to
construction of an internal representation (IR), called the Java
exception handling are discussed in Section 6. Preliminary
IR, which is organized into functions and basic blocks. The
performance results are presented in Section 7. Section 8
construction of the Java IR is straightforward due to the
provides some concluding remarks and directions of future
absence of indirect jumps, indirect calls, self-modified code,
embedded data, and branch target alignment “filler” code inbytecode. Due to the nature of the Java Virtual Machinespecification [4] and the class file format, data recognition is
2. Background
also straightforward. Thus, the information recovered fromJava bytecode ensures complete control flow graph
We will not cover the Java bytecode Virtual Machine model
in this paper due to space limitations. Interested readers are
The IMPACT low-level intermediate code (Lcode) serves as
referred to the Java web site [2] and a large collection of Java
a machine-independent IR for our prototype translator.
literature [3-11]. We will instead introduce three competing
Translation from the Java IR to an efficient Lcode IR requires
and sometimes complementary approaches to execute Java
extensive analyses, as discussed in Section 4. The stack
bytecode programs: interpreters, just-in-time compilers, and
computation model is mapped to a more efficient register
optimizing native code translators. A preliminary performance
computation model. Bytecode operations which do not havecorresponding Lcode operations are translated into sequences of
InliningData Dependence Anal. Interclass AnalysisClassic Optimization
Figure 1. Java bytecode to native code translation steps. Stack operations Translated code
efficient object code distribution over the Internet. Beside theoperand stack, the Java Virtual Machine also provides a
memory array, called the local variable array, for storage of
No stack analysis is required if the translated native code
maintains a run-time operand stack in memory and manipulates
it in the same way that the interpreter does. This
straightforward approach is able to handle any situation that theinterpreter can handle. The run-time cost of this
straightforward approach, however, can be expensive due to the
example without stack to register mapping.
unnecessary memory traffic caused by inefficient registerutilization. In Figure 2, the stack operations and thecorresponding unoptimized translated intermediate code are
Lcode operations or into function calls to the emulation library.
presented side-by-side for an add operation to illustrate this
After the Lcode IR is constructed, it is optimized by the X86
approach. A load/store architecture is assumed in this example.
compilation path in the IMPACT compiler to generate assembly
Note that the original add operation pops two operands off the
code and then an executable which runs under Windows 95.
stack, adds them, and pushes the result back on.
The Lcode IR construction phase is generic and will be
Optimizations can be performed on the translated code to
retargeted to other code generation paths supported in IMPACT
eliminate some of the loads (pops) and stores (pushes).
However, many will still exist due to the use of stackoperations across basic blocks. Global removal of unnecessary
3. Stack to Virtual Register Mapping
loads and stores requires an analysis equivalent to thatdiscussed in Section 4 and is not the focus of this paper. 3.1 Stack Computation Model 3.2 Register Mapping: Location
Java bytecode Virtual Machine uses a stack computation
Register Mapping and Renaming
model to avoid making assumptions about the architecturalregister file size available to the interpreter [4]. Source
The performance of the translated code can be improved by
operands are fetched from the top of operand stack and the
mapping the run-time stack to the virtual register file. The
result is pushed back on. The instruction size in this model is
approach used by Caffeine is to assign each stack location a
small since the operands are implicitly defined and require no
unique virtual register number. Register allocation is later used
operand fields in the instruction encoding, which facilitates
Translated After copy prop. & operation Dead code removal
Figure 3. Translated intermediate codeexample with stack to register mapping.
Figure 5. Stack balance analysis example.
neighboring register r3. Because Java has no union and all
type conversions are made explicit, accesses to different types
should never alias. Bytecode generated from a valid Javacompiler should always have the type state property [6] thatguarantees neither type conflict nor aliasing problems should
Figure 4. Example of type or size mismatch
occur. An algorithm presented in Section 4.2 is used to
validate this assumption and to disambiguate virtual registers
which hold different types in this mapping scheme.
Second, parallelism may be lost for wide-issue machines
because different variables use the same stack location in the
to allocate the virtual registers to physical registers during the
original Java bytecode and get assigned to the same virtual
code generation phase. After this register mapping, a push to
register in this mapping scheme. This reuse of the virtual
the operand stack is translated to a move to the register
registers introduces artificial output and anti-dependencies.
assigned to the stack location pointed to by the current stack
The same algorithm used to disambiguate virtual registers
pointer, and a pop is translated to a move from the register
which hold different types can be applied to perform global
assigned to the stack location. This algorithm can only be
virtual register renaming to remove the artificial dependencies.
applied when a constant stack offset can be determined forevery push and pop at translation time. Algorithms todetermine when this transformation can be applied are
4. Stack analysis
discussed in Section 4.1. The local variable array can bemapped to virtual registers using the array indices. Figure 3
4.1 Stack balance analysis
shows the translated code using this approach for the same addoperation as Figure 2. The moves of operands to virtual
For our register mapping scheme to function correctly, the
registers r1 and r2 before add will be forward copy propagated
position of the stack pointer must be a known constant for each
if possible. They are then removed as dead code if they are not
operation at translation time. Although bytecode generated
from valid Java compilers should satisfy this property [6], we
There are two issues associated with this approach that need
can not assume all loaded bytecode came from valid sources.
to be resolved. First, variables with different types or different
A basic block may push more items on the stack than it
sizes may be pushed to the same stack location and thus
consumes, and vice versa. The residue of each basic block,
assigned to the same virtual register, causing some virtual
which is defined as the total number of pushes minus the total
registers to hold multiple types of operands or to alias with
number of pops in the block, is computed first. The control
adjacent registers. In Figure 4, a push of 4-byte float onto
flow graph is then traversed depth-first and each node is
stack location 3 is translated to “r3 Å float_value” in this
marked “visited” along the path from the first block. The stack
register mapping scheme. At another point in the program, a 4-
pointer position upon entering a block is equal to the
byte integer could be pushed to the same stack location and be
accumulated residue. If a marked basic block is revisited, the
translated to “r3 Å integer_value”. As a result, register r3
accumulated residue is checked against the stack pointer
holds two different types, which is not allowed in many
position in the revisited block. If they disagree, the stack to
compiler infrastructures. Another conflict arises if an 8-byte
register mapping cannot be applied to this control flow graph.
double is pushed to stack location 2 and 3. The translated
In such cases, Caffeine reverts to the stack model. The
statement “r2 Å double_value” causes register r2 to alias with
accumulated residue is also checked against zero whenever a
External Heap Memory Shared Memory Reference
Figure 6. Example of where stack based approachmust be used.
Figure 8. Run-time memory organization used byJava interpreter.
possible, using the technique presented below. The live rangesof each register can be characterized by their def-use chains.
Since the access of a double also takes the next contiguousmemory word, the normal reaching-definition analysis [16] isslightly modified to take this effect into account. To bespecific, the definition of r2 by an 8-byte double in Figure 4also reaches the contiguous register r3.
For each register rx, the identified def-use chains are
grouped into non-overlapping live ranges. In Figure 7,operations op1, op3, and op5 define the register rx andoperations op2, op4, and op6 use the same register rx. The def-
use chains for register rx are 1Æ 2, 1Æ 4, 3Æ 4, and 5Æ 6 asshown. Each connected graph forms a non-overlapping liverange of register rx. As a result, op1, op2, op3, and op4 form a
leaf block (a block with no successor) is reached to ensure that
live range (LR#1) while op5 and op6 form another (LR#2).
the stack is balanced in each control flow graph. This
Register rx in each live range is renamed to a different register
algorithm runs in linear time in the number of basic blocks and
id. If the type of register rx is not consistent inside a live range
control flow arcs. Figure 5 shows a control flow graph whose
after this renaming, the stack to register mapping cannot be
blocks are numbered in the order that they are visited by this
applied and the translation falls back to the stack computation
algorithm. In this example, we assume blocks 4 and 13 have a
residue of one, and blocks 9, 11, and 12 have a residue ofminus one. All paths from block 1 to block 10 are stackbalanced. The position of the stack pointer for each block is
5. Run-time memory organization
also a known constant. Specifically, the stack pointer at theentrance to blocks 5, 6, 7, 8, 9, 11, and 12 points to location
Java programs are name-binding rather than address-binding
one. For the rest of the blocks, the stack pointer initially points
and thus allow flexibility in the run-time memory organization
implemented by the interpreter. Dynamically allocated objects
An example of when register mapping cannot be currently
in the heap can be roughly categorized into class objects and
applied is shown in Figure 6. Depending on the path traversed,
array objects. Figure 8 illustrates the heap memory
the stack offset for the pop is either zero or one. For this case,
organization used by the SUN Java interpreter. In this
the stack based method needs to be used. However, since we
organization, neither a class object nor an array object points
are using a valid Java compiler, the register-based approach can
directly to its associated data. Rather, there is an 8-byte handle
in between. Accesses to both class instance data and arraybody require two levels of indirection. Accesses to the method4.2 Live range disambiguation and register renaming block for method invocation need three levels of indirection. Since these access events take place frequently during programexecution, such high levels of indirection can cause significant
The mapping of stack locations and local variables to
registers could have type and size conflicts as discussed in
The enhanced memory model proposed in this paper (in
Section 3.2. Variables of different types which reside in the
Figure 9) reduces the amount of indirection by combining the
same virtual register are separated into separate registers, when
External Heap Memory Shared Memory Reference
Figure 9. Run-time memory organizationused by Java bytecode translator.
Figure 10. Example of exception handing.
Otherwise incomplete flow analysis may lead to incorrect
class instance data block and the method table into one objectblock. The reference to object block now requires only one
Third, during optimization and scheduling, an instruction
level of indirection. Since the class run-time type information
inside the try-block cannot be moved outside its try-block
in our implementation is of constant size, the method block can
without enlarging the try-block in general. However, if the try-
be accessed by a constant offset from a pointer to the class
block has to be enlarged, to avoid changing the program
descriptor. The method_ptr in Figure 8 is thus eliminated,
behavior, the added instructions should not cause exceptions
which reduces a method block reference to two indirection
that can be captured by the try-block’s handler.
levels. The enhanced model also consumes less memory.
Fourth, for maximum portability, exception handling
Changes made to the run-time library, which is source licensed
support in the Java interpreter does not rely on the underlying
from SUN, to support this enhanced memory model are
architecture or operating system. Thus, the interpreter
minimal due to the library’s heavy use of preprocessor macros
explicitly checks for null references, array index bounds, divide
by zero, etc. It is expensive and often unnecessary for thetranslated code to do all of these explicit checks. Caffeinecurrently explicitly checks array index bounds in the translated
6. Exception Handler Considerations
code. This checking costs about 10% of the performance acrossour benchmark programs. Optimization opportunities exist to
Exception handlers are sections of code that are reached
conduct analysis to eliminate unnecessary explicit checks.
when a run-time exception occurs. The try-block in Java is
Previous work has shown that program analysis can be done to
designed to enclose statements which may cause run-time
determine if it is possible for a load or store to ever have an
exceptions. Exceptions which occur within a try-block are
address of zero or to ever access outside of its intended array,
captured by an associated catch-block of the same exception
etc., for the purpose of speculative code motion [20].
type. A Java method can have many exception handlers
The benchmarks presented in Section 7 do not cause
cascaded together to guard ordinary code, or to guard other
exceptions and thus do not exercise the exception handler
handlers. In Figure 10, block 14 is an exception handler that
capabilities of Java. Although Caffeine does not currently
guards its try-block consisting of blocks 4 to 8. There are four
support many of these capabilities, we believe that the
issues that must be addressed during translation.
underlying hardware architectures can be used to support the
First, after exception handling, control may be transferred
remaining exception-handling capabilities without affecting
back to the original program (e.g. in Figure 10, block 14 Æ
10). As a result, exception handlers need to be connected tothe control flow graph as shown in Figure 10.
Second, an exception handler might use local variables
7. Benchmarks and Preliminary Results
defined before its associated try-block. In Figure 10, thedefinition of local variable entry LV[1] reaches the use in
A suite of six integer programs was selected to evaluate our
exception handler block 14. A pseudo arc, shown as a dotted
prototype translator. There were currently no standard Java
line, and a null block preceding the try-block are created to
benchmarks generally available at the time of this work. For
allow live variable information to be passed to the handler.
each program, we hand translated the C source code intoequivalent Java source code. By equivalence we mean that the
SUN (interpreter)Symantec (JIT)Caffeine (Stk-Orig.Mem)
Caffeine (Reg-Orig.Mem)Caffeine (Reg-Enh.Mem)
Percentage of C performance 20.0% Benchmarks
Figure 11. Experiment results on different approaches. All numbers are relative speed to the equivalentC code compiled by Microsoft Visual C/C++ compiler with optimization level two.
algorithm, data structures, and operand types used in the Java
over the stack model. The final Caffeine model (Reg.-
code and the C code are the same. Due to the fundamental
Enh.Mem) also uses the proposed memory organization
differences between C and Java with regard to the object-
instead of the interpreter’s memory organization. This results
oriented concept, array accessing, array index bounds checking
is a 7% performance improvement. This final model of our
and library routines, an exact correspondence is not always
prototype Java native code translator Caffeine is capable of
feasible. When this occurred, we modified the C program so
generating code that runs on average at 68% of the speed of the
that it could be translated with close correspondence. The Java
equivalent C code, 4.7 times faster than the Symantec Java JIT
sources thus generated are then compiled into Java bytecode by
compiler, and more than 20 times faster than the Java
interpreter. For these preliminary results, the Caffeine
Figure 11 shows preliminary results that compare the real
translated code is optimized using classic C code optimization
machine execution time of Java bytecode programs using the
techniques without profiling and inlining.
SUN Java interpreter, the Symantec Java Just-in-time compiler(JIT), and different configurations of the IMPACT Java to X86native code translator Caffeine. All of the programs are
8. Conclusion and Future Work
executed on an Intel Pentium processor running Windows 95. Performance is shown in Figure 11 as a percentage of the
In this paper, we presented our initial prototyping
benchmark performance for the equivalent C code compiled by
experience with Caffeine, a Java-bytecode-to-native-machine-
the Microsoft Visual C/C++ compiler with optimization level
code translator, to demonstrate the feasibility of efficient
two. The first Caffeine model (Stk.-Orig.Mem) uses the stack
universal software distribution languages. The preliminary
computation model and the interpreter’s memory model. The
results show that it is capable of achieving 68% of the speed of
performance is, on average, 2.8 times higher than the JIT
the native code directly compiled from the equivalent C code.
compiler. This is because of the optimizations that remove
Besides the fact that it removes the interpretation overhead,
unnecessary pushes and pops, and because no initial code-
much of the performance gain over the SUN Java interpreter
generation is required. The second Caffeine model (Reg.-
comes from the stack to register mapping, which fully utilizes
Orig.Mem) uses the register computation model instead of the
the register computation model of modern processors. The
stack model. This results in 55% performance improvement
requirements and algorithms for the stack to register mapping
were presented and discussed. Although these requirements
[3] James Gosling and Henry McGilton, The Java Language
will hold for all Java bytecode generated by a valid Java
Environment, A White Paper, Sun Microsystems Computer
compiler, the stack computation model is kept as a fall-back
when these requirements are not met. The penalty for using the
[4] The Java Virtual Machine Specification, Release 1.0 Beta
stack model is about a 35% performance degradation. DRAFT, Sun Microsystems Computer Corporation, August
We also presented and compared two different run-time
memory organizations. Preliminary results showed that a 7%
[5] The Java Language Specification, Version 1.0 Beta DRAFT,
performance gain can be achieved by moving the data
Sun Microsystems Computer Corporation, October 30,
associated with dynamically allocated objects closer to their
[6] James Gosling, Java intermediate Bytecodes, ACM
Several aspects of translating Java bytecode to native code
SIGPLAN Workshop on Intermediate Representations,
that were not exercised by these benchmarks are now being
investigated. These aspects include garbage collection, Java’s
[7] Arthur van Hoff, Sami Shaio, and Orca Starbuck, Hooked
extensive exception handling capabilities, threading support,
on Java, Addison-Wesley, December 1995.
and the use of the Java graphic library.
[8] David Flanagan, Java in a Nutshell, O’Reilly & Associates,
In addition, substantial ongoing efforts are focusing on
removing indirection overhead for method invocation. By
[9] Gary Cornell and Cay S. Horstmann, Core Java, The
doing interclass reaching-definition analysis, we should be able
Sunsoft Press Java Series, March 1996.
to trace the class type of a current object from its definition and
[10] Michael C. Daconta, Java for C/C++ Programmers,
convert, if possible, the indirect method invocation to an
Wiley Computer Publishing, March 1996.
absolute method invocation. Inlining is also made possible by
[11] Ken Arnold and James Gosling, The Java Programming
this conversion. Another direction for research is to perform
Language, Addison Wesley, May 1996.
better memory disambiguation by taking advantage of well-
[12] Borland C++ Development Suite, Borland International,
protected class boundaries to eliminate dereferencing overhead.
We also observe that the array index bounds checking as
[13] Café – Visual Java Development and Debugging Tools,
required by Java semantics is a major source of performance
Symantec Corporation, 1996, http://www.symantec.com/
degradation. We feel that with aggressive analysis, most of
[14] Tim Wilkinson, KAFFE – A JIT virtual machine to run
these checks can be removed. We would also like to target
other platforms and make use of more advanced instruction-
http://web.soi.city.ac.uk/homes/tim/kaffe/kaffe.html
level parallelism enhancing techniques such as predication and
[15] Guava – High-performance Environment for Running JavaPrograms,Softway Pty. Ltd., 1996,http://www.softway.com.au/softway/products/guava/Acknowledgments
[16] Alfred V. Aho, Ravi Sethi, and Jeffery D. Ullman,
Compiler – Principles, Techniques, and Tools, AddisonWesley, March, 1988.
The author would like to thank Daniel M. Lavery for
[17] Jeffery Richter, Advanced Windows - Chap.9 Thread
proofreading the various versions of this paper, all the members
Synchronization and Chap.14 Structured Exception
of the IMPACT research group, and the anonymous reviewers
Handling, Microsoft Press, 1995.
whose comments and suggestions helped to improve the quality
[18] Matt Pietrek, Windows 95 System Programming Secrets,
This research has been supported by the National Science
[19] Walter Oney, Extend Your Application with Dynamically
Foundation (NSF) under grant MIP-9308013, Intel Corporation,
Loaded VxDs Under Windows 95, MSJ, May 1995.
Advanced Micro Devices, Hewlett-Packard, SUN
[20] Roger Alexander Bringmann, Enhancing Instruction Level
Microsystems, NCR, and the National Aeronautics and Space
Parallelism Through Compiler-Controlled Speculation,
Administration (NASA) under Contract NASA NAG 1-613 in
Ph.D. thesis, Department of Computer Science,
cooperation with the Illinois Computer Laboratory for
University of Illinois, Urbana-Champaign, 1995.
Aerospace Systems and Software (ICLASS). Reference
[1] P.P. Chang, S. A. Mahlke, W. Y. Chen, N. J. Warter, and
W. W. Hwu, IMPACT: An architectural framework formultiple-instruction-issue processors, Proc. 18th Ann. Int’lSymp. Computer Architecture, (Toronto, Canada), pp. 266-275, Jun 1991.
[2] JavaTM – Programming for the Internet, Sun Microsystems,
eDocAmerica Healthtip - FDA warning regarding OTC weight loss drugs Health Tip: FDA warning regarding OTC weight loss drugs To many in the medical profession, "over-the-counter" (OTC) weight loss products have long been under a veil of suspicion regarding their safety and effectiveness. Ephedra, a stimulant that was a component of a number of OTC products, was banne