The Automatic Discovery of Alarm Rules for the Validation of Microbiological Data E. Lammaa, M. Manservigib, P. Melloc, A. Nanettid, F. Riguzzia, S. Storaria a Dipartimento di Ingegneria, Università di Ferrara, Ferrara, Italy b Dianoema S.p.A., Bologna, Italyc D.E.I.S., Università di Bologna, Bologna, Italyd Clinical, Specialist and Experimental Medicine Department, Microbiology section, Università di Bologna, Bologna, ItalyAbstract
• the kind of material (specimen) to be analysed
In this work, we describe a project, jointly started by
(e.g., blood, urine, saliva, pus, etc.) and its origin
University of Bologna and Dianoema S.p.A. in order to
(the body part where the specimen was collected);
build a system which is able to validate microbiologicaldata. Within the project we have experimented data mining
• the date when the specimen was collected (often
techniques in order to automatically discover association
substituted with the analysis request date);
rules from microbiological data, and obtain from them
• for every different bacterium identified, its species
alarm rules to be used for data validation. To this purpose,we have exploited the WEKA system and applied it to adatabase containing data about bacterial antibiograms.
For each isolated bacterium, the antibiogram represents its
Discovered association rules are then transformed into
resistance to a series of antibiotics. The set of antibiotics
alarm rules, to be used for data validation within an expert
used to test bacterial resistance can be defined by the user,
system named ESMIS. Among automatically produced
and the antibiogram is a vector of couples (antibiotics,
alarm rules, we have identified some already considered in
resistance), where four types of resistance are possibly
ESMIS and suggested by experts according to the NCCLS
recorded: R when resistant, I when intermediate, S when
compendium, and new rules which were not present in thatreport, but were recommended by interviewed
The antibiogram is not uniquely identified given the
bacterium species but it can vary significantly for bacteria
of the same species. This is due to the fact that bacteria of
Data mining, Knowledge Based System, Microbiology,
the same species may have evolved differently and have
developed different resistances to antibiotics. However,very often groups of antibiotics have similar answer when
Microbiological Data validation
tested on a given bacterium species, despite its strains.
With respect to quality of the results produced through
Today, in a modern microbiological laboratory of a hospital
microbiological analysis, an important step of the entire
the process of analysis result production is similar to an
process is validation. Some instruments already execute
assembly line where both efficiency and quality are
intelligent controls on performed antibiotic test results but
fundamental. With respect to efficiency, in Italy, a great
these controls are limited because they haven’t information
number of hospitals manages microbiological analysis
about specimen, patient characteristics and infection
results by means of a software system named Italab C/S,
history. A system, capable of using all available
developed by Dianoema S.p.A., an Italian information
information, may represent a better support for laboratory
technology company operating in the Health Care market.
personnel in the validation task. This system should also
Italab C/S is a Laboratory Information System based on a
control the application of standard antibiotic testing
Client/Server architecture, which manages all the activities
guidelines: these guidelines, used by almost all
of the various analysis laboratories of the hospital. Italab
microbiological laboratories, suggest antibiotic test
C/S stores all the information concerning patients, the
execution methods and result interpretation. Examples of
analysis requests and the analysis results. In particular, for
problems that this system should manage are: automatic
correction of antibiotic results for particular species that
present in vitro susceptibility but in vivo resistance,
information about the patient: sex, age, hospital
controls on the list of tested antibiotics, predictions of test
unit where the patient has been admitted;
results for a group of antibiotics using some representativeantibiotic (e.g., Tetracycline is representative for allTetracyclines).
In the validation task, one would like the system to control
about ESMIS architecture and knowledge base can be
the results reported in antibiograms in order to verify the
presence of inconsistencies and alarming situations (e.g.,
The paper is organised as follows. Section 2 describes the
some results for given antibiotics should be in accordance
discovery of association rules by exploiting the APRIORI
with one another or the result with respect to an antibiotic is
algorithm and the WEKA system. Section 3 shows how
not the expected and usual one, but some unexpected
alarm rules are generated from the discovered association
rules. Section 4 describes the experiments done. Related
To guide this task, NCCLS [1], an international standard
work is surveyed in section 5. We conclude and mention
organization recognised by almost all laboratories as
reference in routinely work, writes an annual compendium,titled
“Performance Standards for Antimicrobial
Discovery of Association Rules
Susceptibility Testing” [2], regarding testing guidelines formicrobiological laboratory. NCCLS guidelines, for eachspecies, are basically composed of a table that specifies the
Association rules describe correlation of events and can be
antibiotics to be tested, a table that specifies how to
regarded as probabilistic rules. "Correlation of events"
interpret the test of antibiotics and a list of exceptions
means that events are frequently observed together. A good
regarding particular antibiotic test results. Nonetheless, the
example from real life is databases of sales transactions,
validation task, when performed manually can be long and
which are very frequently used by the marketing
difficult, and some laboratory management system helping
department of many companies because knowledge about
microbiologists in this task should be very useful.
sets of items frequently bought together is useful to developsuccessful marketing strategies.
During the last few years, many surveillance systems havebeen developed in order to validate and monitor
The problem of discovering association rules can be
microbiological analysis results, and to early identify
infective and epidemiological events.
Let I = {i1, i2, ., im} be a set of literals, called items.
Within a joint project between University of Bologna and
A transactionT is a set of items such that T⊆I. A
Dianoema S.p.A., we have implemented an expert system
database of transactions D is a set of transactions and is
(named ESMIS [3]) for validating microbiological data and
generating alarms for critical situations. ESMIS has beenbuilt by following a knowledge-base approach. One of themain and well-known problems in building expert systems
Table 1: Schema of a database of transactions
is knowledge acquisition. In general, this is a very time
Transaction ID
consuming and hard task. With respect to ESMISknowledge-base building, we were interested in extracting
Let an itemsetX be a set of items such X⊆ I. We say that a
knowledge about anomalous situations of resistance to
transaction T contains an itemset X if X ⊆ T.
antibiotics by isolated bacterium, in order to generatesuitable alarm rules. This kind of knowledge can be
An association rule is an implication of the form X ⇒ Y,
extracted by hand in accordance with NCCLS documents
where X and Y are itemsetsand X ∩Y ≠∅.
and by intensive colloquia with experts on microbiology(and this approach has been followed in building the first
• The rule X ⇒ Y holds with confidence c in database
D, if and only if c% of transactions in D that containX also contain Y.
Another approach could be the use of the existing databasewhere a large number of antibiograms is stored, in order to
• The rule X ⇒ Y has support s in transaction set D, if
automatically extract "rules" representing anomalous
and only if s% of transactions in D contain X ∪ Y.
situations. This latter approach, described in this paper, not
Given a set of transactions D, the task of mining association
antithetic, but complementary to the former one, can be
rules can be reformulated as finding all association rules
very effective in validating ESMIS’s knowledge-base, and
with at least a minimum support (called minsup) and a
also in extending this knowledge base by "discovering" new
minimum confidence (called minconf), where minsup and
rules not yet considered by official documents. Last but not
minconf are user-specified values.
least, these new discovered rules, taking into account thehistory of the specific laboratory, are better tailored to the
The task of discovering association rules can be
considered hospital situation, and this is very important
since some resistances to antibiotics are specific to
1. Find all itemsest that have transaction support above
particular, local hospital environments. In this work, we
minimum support. The support for an itemset is the
report on the application of data mining techniques in
number of transactions that contain the itemset.
ESMIS. In particular, we have experimented these
Itemsets with minimum support are called large
techniques in order to automatically discover association
itemsets, all others are called small itemsets. This
rules to be used for the validation of microbiological data
subtask is addressed by the algorithm APRIORI [4].
and for the generation of alarming situations. Other details
2. Generate all association rules with minimum support
which can easily be achieved given the set Lk-1.
and confidence from the set of all large itemsets. This subtask can be addressed by a straightforward
Learning Association Rules by WEKA
In order to learn association rules for validating
- For each large itemset l, find all non-empty
microbiological data, we have exploited the WEKA system
[5], a collection of machine learning algorithms for solvingreal-world data mining problems. WEKA is written in Java
- For each such subset a of l, output the rule l
and runs on almost any platform. WEKA is open source
(l-a), iff the ratio of support(l) to
software issued under the GNU General Public License.
support(a) is at least minconf.
WEKA contains algorithms for performing classification,numeric prediction, clustering and learning association
The APRIORI Algorithm
The APRIORI algorithm discovers large itemsets by means
As regards association rule learning, WEKA employs a
version of the APRIORI algorithm that is able to learn
• In the first pass, APRIORI counts the support of
association rules from a generic table (like Table 2) with n
individual items and determine which of them are
Table 2: Example of a table for knowledge extraction
Each subsequent pass starts with a seed set
represented by the itemsets found to be large in the
Attribute1 Attrbute2 Attributen
previous pass. From this set it generates the new
potentially large itemsets, called candidate itemsets.
The actual support for these candidate itemsets is
counted during a new pass over the data.
In this case, an association rule is a rule of the form
• At the end of the pass, we determine which of the
candidate itemsets are actually large. These itemsets
A1=vA1, A2=vA2,…,Aj=vAj ⇒ B1=vB1, B2= vB2,…,Bk=vBk
become the seed for the next pass. This process
where A1, A2,…, Aj, B1, B2,…,Bk are attribute names and
continues until no large itemsets are found.
vA1, vA2,…,vAj, vB1, vB2,…,vBk are values such that vAl (vBh)
For the sake of completeness, the algorithm is reported in
belongs to the domain of the attribute Al (Bh).
In practice, each record is considered as a transaction andeach possible equivalence Attribute=Value an item. WEKA's version of the APRIORI algorithm works as if
Table 2 is first transformed into a transaction database withthe schema of Table 3:
Notation:k-itemset: An itemset having k items. L
Table 3: Example of a table from which WEKA extracts
k: Set of large k-itemsets (those with minimum support).
k: Set of candidate k-itemsets (potentially large itemsets). Transaction ID
Ct = subset( Ck,t); //Candidates contained in t
and the standard version of the APRIORI algorithm is then
The apriori-gen function takes Lk-1, the set of all large (k-
The algorithm in WEKA takes into account two numbers:
1)-itemsets, as an argument, and returns a set of candidates
the number of records verifying the rule antecendent (NA),
for being large k-itemsets. It exploits the fact, that
and the number of records verifying both the antecedent
expanding an itemset will reduce its support. A k-itemset
and consequent of the rule (NR). Starting from these two
can be large only if all of its (k-1)-subsets are large. So
values, confidence and support are assigned to the rule as
apriori-gen generates only candidates with this property,
ratio NR/NA, and NR/N (where N is the total number of
record considered) respectively. Rules are generated and
(for the sake of simplicity, we omitted support and
presented by decreasing value for the confidence.
confidence in the reported alarm rules).
Otherwise, when Y is a composed condition, e.g.:
Generation of Alarm Rules
Discovered association rules can be transformed into alarm
rules, to be used for data validation, as follows.
We have first applied filtering to discovered rules, in orderto consider the most general ones among them. A rule, R1,
we just move its negation to the body of the alarm rule. In
is more general than a second rule, R2, if they have the
this case, for the sample rule above n. 537, we obtain the
same consequent, but conditions in R1’s antecedent are a
(proper) subset of those in R2’s antecedent. For instanceamong the four rules below:
537’. Oxacillin=R, not([Amoxicillin+ClavulanicAcid=R, Penicillin=R])]
==>alarm([Amoxicillin+ClavulanicAcid=R,
Experimental Results
We have applied WEKA to an Italab C/S database
containing data about bacterial antibiograms. We have
considered all the bacteria belonging to the species
Staphilococcus Aureus, Escherichia Coli and four species
belonging to Enterobacteriaceae. All the data have been
collected from the Clinical, Specialist and Experimental
medicine Department of the University of Bologna, inBologna, Italy. We report about the experiments in the
rule 1 is the most general, rule 4 is the most specific, and
rule 2 and 3 are intermediate (and not comparable WITH
Staphilococcus Aureus
To the selected most general rules, we have then applied
The considered dataset for Staphilococcus Aureus contains
syntactic transformations in order to produce alarm rules, to
7009 records having as attributes 41 different antibiotics,
be used in ESMIS [3]. Alarm rules have been obtained by
plus the site of the considered sample, patient sex, hospital
considering that an association rule of the kind:
department hosting the patient and information about thetherapy for the patient.
First experiments have been done by running the system
represents a regular (and usually quite frequent) situation,
with decreasing values for minimal support and confidence.
In particular, we first run the system with minimal supportequal to 0.5, 0.4, 0.3 e 0.2. and confidence equal to 0.9.
These experiments have not produced any known rule or
where the consequent is complemented and moved to the
discovered new rules confirmed by experts. Then, we
antecedent, represents an abnormality situation. When X
choose to further diminish the requested minsup, and run
and not Y simultaneously occur, and alarm has to be raised
the system with minimum support equal to 0.1 and minconf
because the usual value for Y should be true instead of
equal to 0.9. With this experiment, among produced alarm
rules, we have identified some rules already suggested bythe NCCLS report, and already considered in the ESMIS
In order to apply this kind of transformation, when Y is a
knowledge base. In particular, we have discovered those
singleton condition, we have considered the result for an
rules which relate to each other the results of two classes of
antibiotic in an antibiogram as two-valued, where R is the
antibiotics, i.e., Oxacillin and Penicillin (when a bacterium
complementary value of S and vice-versa. For instance, the
is resistant to Oxacillin it must also be resistant to any kind
of Penicillin), and the resistance result for Oxacillin andPenicillin with β-lactamase inhibition (when a bacterium is
resistant to Oxacillina it must also be resistant to any
Penicillin with β-lactamase inhibition). For instance, the
following two istances of these general rules were found:
==>alarm([Amoxicillin+ClavulanicAcid=R,Penicillin=R]
This couple of rules relates to each other the results of two
classes of antibiotics, i.e., Cefotaxime and Ceftazidime
(when a bacterium is susceptible to Cefotaxidime it must
also be susceptible to Ceftazidime, and vice-versa).
==>alarm([Amoxicillin+ClavulanicAcid=R,
With lower support, but with confidence equal to 1, we
have also discovered rules already considered in ESMIS inaccordance with the NCCLS compendium, e.g. those
The discovery of this set of rules both confirms part of the
relating, when the bacterium was isolated from the urinary
content of the NCCLS compendium and of rules elicitated
tract, the resistance to Piperacillin with the resistance to
by the experts and already considered in ESMIS.
Furthermore, the experiment has also discovered new rules
Enterobacteriaceae
which were not present in the NCCLS report and in ESMISknowledge base, but have been validated and recommended
We have also done further experiments by considering four
by the interviewed microbiologists, and in particular,
different bacteria belonging to the same family
(Enterobacteriaceae, in particular).The considered datasetcontains 3387 records having as attributes the bacteria
1 0 8 0 ’ . Teicoplanin =S, Vancomycin =R
species, 28 different antibiotics, plus patient sex and
information about the therapy for the patient
1 5 3 9 ’ . Vancomycin =S, Teicoplanin =R==>alarm(Teicoplanin =S)
Also for Enterobacteriaceae, the most significantexperiments were done by deleting from the datasetunuseful anitibiograms, i.e., all those for which the
which relate to each other the results in an antibiogram of
considered bacteria were always susceptible to each
two (last-generation) antibiotics (i.e., Teicoplanin and
antibiotic in the antibiogram. From the remaining data
(2656 records), with support equal to 0.68 (and confidence
Further experiments for the Staphilococcus Aureus have
equal to 1) we have rediscovered the couple of rules
been done by filtering data and removing from the dataset
relating to each other the results of Cefotaxime and
unuseful anitibiograms, i.e., all those for which the
Ceftazidime (previously discovered for Escherichia Coli).
bacterium was always susceptible to each antibiotic in the
With lower support, but with confidence still equal to 1, we
antibiogram (a part from Penicillin, to which the
have also discovered rules already considered in ESMIS in
Staphilococcus Aureus can be sometimes susceptible and
accordance with the NCCLS compendium, e.g. those
sometimes resistant). This filtering has been suggested by
relating the resistance to Cefotaxime with the resistance to
interviewed microbiologists, and has reduced the dataset to
3734 records. With this last experiment (done withdecreasing minimum support, till 0.1, and minimumconfidence equal to 0.9) we have newly discovered the
Related Work
mentioned above rule 537', rule 1080’ and rule 1539’, butwith a higher minimum support, since noisy and unuseful
During the last few years, many surveillance systems have
data have been removed from the database.
been developed in order to monitor microbiologicalanalysis results and to early identify infection and
Escherichia Coli
epidemiological events. Some of them also encompassed
The considered dataset for Escherichia Coli contains 7165
data validation according to NCCLS compendium. We
records having as attributes 25 different antibiotics, plus the
survey the most significant among them.
site of the considered sample, patient sex and information
WHONET 5 [6] is a database software for the management
on the hospital department hosting the patient
of microbiology laboratory test results. The software was
The most significant experiments were done for this
developed for the management of routine laboratory results
bacterium by deleting from the dataset unuseful
but has also been used for research studies. Software
anitibiograms, i.e., all those for which the bacterium was
development has focused on data analysis, particularly of
always susceptible to each antibiotic in the antibiogram.
the results of antimicrobial susceptibility testing.
From the remaining data (3285 records), with a minimum
GermWatcher [7] is an expert system, which applies both
support equal to 0.8 (and confidence equal to 1) a new
local and international culture-based criteria for detecting
couple of rules was discovered, and confirmed by
potential nosocomial infections. Its knowledge base was
obtained by the analysis of some documents, written by
CDC’s NNIS [8] (Center for Disease Control, National
Nosocomial Infection Surveillance), providing explicit
treatment, with the purpose of enhancing medical quality
culture-based and clinical-based definition for the most
Finally, as concerns the application of data mining
TheraTrac 2 [9] is a system for microbiological data
techniques to microbiological data, two previous works
validation and real-time alarming. It directly interacts with
have considered the analysis of microbiological data ([3]
Vitek, an expert system for test results validation, that is
integrated in particular analytical instruments.
In [14] the system PTAH is presented that was developed
All the systems mentioned above use international standard
for the analysis of antibiogram data in order to help medical
guidelines in order to define controls to be executed on
doctors in the prescription of antibiotics for the cure of
nosocomial infections. PTAH performs four types ofanalysis:
Our data mining approach is deeply integrated with theexpert system ESMIS [3], under development within a joint
project between the University of Bologna and Dianoema
S.p.A. ESMIS is able to validate microbiological data,
according to the NCCLS document. In particular, given a
newly isolated bacterium, ESMIS performs five main tasks:(i) Validates the culture results; (ii) Identifies the most
• effectiveness of antibiotics over time
suitable antibiotics list; (iii) Issues alarms regarding the
In [9] the demographic clustering algorithm that is enclosed
newly isolated bacterium; (iv) Issues alarms regarding
in Intelligent Miner [15,16] is applied in order find
patient clinical situation; and (v) Identifies epidemic events
interesting cluster of antibiograms.
inside the hospital. Furthermore, ESMIS it is also able toconsider alarm rules discovered through the application of
We differ from these works because we consider the
data mining techniques when confirmed by the
problem of discovering potential correlations among the
microbiologist experts. In this respect, ESMIS is able both
tests of different antibiotics, to be used later on for result
to consider standard validation rules as they are stated in
the NCCLS documents, but it is also able to extend itselfand embrace new rules once they have been discovered
Conclusions
starting from data that are peculiar of a given hospital (orregion).
In this paper we have described the application of data
In the past, the University of Bologna and Dianoema S.p.A.
mining techniques in order to automatically discover
have designed and implemented an expert system for the
association rules from microbiological data, and obtain
validation of clinical analysis [10] named DNSEV (Expert
from them alarm rules for data validation. This has been
System for clinical result Validation). DNSEV has been
done within a project, supported by MURST, jointly started
developed in order to improve the quality of the validation
by the University of Bologna and Dianoema S.p.A. Among
process performed by a specific Laboratory Information
automatically discovered alarm rules, we have identified
System, which is an Italab C/S database. Quality
some already considered in the knowledge base of the
improvement of the validation process has led to a decrease
expert system ESMIS – to be used for monitoring
in the time required by medical doctors in the validation
microbiological data - and suggested by experts according
task of clinical analysis data, permitting them to direct their
to the NCCLS compendium. Furthermore, we have also
energies toward other important tasks. In DNSEV the
discovered new rules which were not present in that report,
medical laboratory expertise on the validation process is
but were recommended by interviewed microbiologists.
translated into rules that perform all the necessary checkson analysis results. The reasoning made by the new system
We are currently extending ESMIS knowledge base by
is documented in order to explain it to the medical team.
considering other bacterium species, by interviewing
The type of reasoning and the rules used are clearly shown
experts and by applying, in parallel, the WEKA system to a
and easy to change by a laboratory expert manager.
database containing data about various bacteria.
Previous work on the detection of data inconsistencies at
Acknowledgements
the level of every patient record has been done byconsidering the application of inductive learning on adatabase of atherosclerotic coronary heart disease patients
This work has been partially supported by Dianoema S.p.A.
[11]. In particular, confirmation rules for the detection of
under MURST (Ministero dell’università e della ricerca
outliers are discovered in that work by exploiting inductive
scientifica e tecnologica) Project n. 23204/DSPAR/99. The
methods. The authors also consider the application of
authors would like to thank Giovanni Pizzi of Dianoema
S.p.A. Authors are in debt with Massimo Perelli for hishelp in doing the experiments.
In [12], data mining techniques are applied to patient datafrom several hospitals and along three years in order todiscover associations, e.g., within diagnoses and medical
References Expert System Approach for Clinical Analysis ResultValidation, Proceedings of ICAI2000, Las Vegas,Nevada,CSREA Press, USA, 2000.
[1] NCCLS, National Committee for Clinical Laboratory
[11] D.Gamberger,N. Lavrac, G. Krstacic, T. Smuc
Inconsistency tests for patient records in a coronary
[2] Mary Jane Ferraro et. al., Performance Standards for
heart disease database, Proceedings of IDAMAP2000. Antimicrobial Susceptibility Testing; EleventhInformational Supplement, NCCLS document M100-
[12] W.Stuhlinger, O.Hogl, H.Stoyan, M.Muller,
Intelligent data mining for medical qualitymanagement, Proceedings of IDAMAP2000.
[3] E. Lamma, P. Mello, A. Nanetti, G. Poli, F. Riguzzi,
S.Storari, An Expert System for Microbiological Data
[13] E.Lamma, M.Manservigi, P.Mello, F.Riguzzi,
Validation and Surveillance, to appear in Proceedings
R.Serra, S.Storari, A System for Monitoring
of ISMDA 2001, Lecture Notes in Computer Science,
Nosocomial Infections, Proceedings of
[4] Agrawal Rakesh, Srikant Ramakrishnan Fast
[14] M. Bohanec, M. Rems, S. Slavec, B. Urh, PTAH: AAlgorithms for Mining Association Rules, Proceedings
system for supporting nosocomial infection therapy,
of the 20th International Conference on Very Large
in N. Lavrac, E. Keravnou, B. Zupan (eds) "Intelligent
Data Analysis in Medicine and Pharmacology",Kluwer Academic Publishers, 1997.
[5] I.H. Witten, E. Frank, Data Mining: PracticalMachine Learning Tools and Techniques with JavaImplementations, Morgan Kaufmann, October 1999.
http://www.software.ibm.com/ data/iminer/fordata, 9July 2001.
communicable disease surveillance and response,
[16] Cabena, Hadjinian, Stadler, Verhees, Zanasi,
WHONET 5 - Microbiology Laboratory DatabaseDiscovering Data Mining – from concept toimplementation, Prentice Hall – IBM.
[7] M.G.Kahn, S.A.Steib, V.J.Fraser, W.C.Dunghan, AnAddress for correspondence Expert System for Culture-Based Infection controlSurveillance, Washington University, 1992.
[8] Center for Disease Control National Nosocomial
Infection Surveillance, CDC NNIS, www.cdc.org, 9
[9] Theratrac, Biomerieux, see at web site:
Tel. ++39 051 2093818 Fax. ++39 051 2093073
http://www.theratrac.com, 9 July 2001.
[10] M.Boari, E.Lamma, P.Mello, S.Storari, S.Monesi, An
MEDICATIONS, VITAMINS AND SUPPLEMENTS TO AVOID Your safety in surgery requires that you disclose al medications, vitamins and supplements that you regularly take. In the ( ) days prior to surgery, you wil be required to stop taking certain medications, vitamins and supplements, both those you regularly take, and those that may be taken incidental y for pain or other symptoms. ) M.D.
Questionário para realização de exames de tomografia computadorizada ou de raios x especializado Prezado(a) cliente:As informações deste questionário auxiliarão nacondução e análise de seu exame. Estas informaçõesaumentam a precisão de seu diagnóstico e___________________________________________________________________permanecerão em sigilo. Poderá ser necessário o u