
Diagram Corpora

As part of our experiments, we are collecting sets of diagrams (diagram corpora) on which to test our marking algorithms. Each corpora contains diagrams generated by students in answer to some form of assessment question (either in an exam or a course assignment). The diagrams have been marked and moderated by expert human markers and a grade awarded. We treat these grades as a ‘gold standard’ against which we judge our marking algorithms.

Typically, we randomly divide each corpus into two sets: one called the development set, the other is called the test set. The development set is used to develop and debug the algorithms; we permit ourselves to look at the development diagrams to see where any discrepancies arise. The test set is treated as a black box and is used as a mechanism for measuring the effectiveness of an algorithm. That is, we run the algorithm against the test set and derive measures of performance from the results.

We intend to make all our corpora available to other researchers via downloads from this site.

Each corpus is accompanied by the question which gave rise to the diagrams and one or more specimen solutions (there can be many solutions to a problem; some are partial solutions for which credit should be given). There is also a mark scheme that was used by the human markers and on which the automatic marker’s marking scheme is based. The following list describes the essential features of each corpus.

Corpus E04

Type of diagram: Entity-relationship (without subtype relationships)
Source: Examination
No of diagrams in corpus: 593
Size of development/test sets: 199/394


Corpus E06

Type of diagram: Entity-relationship (includes subtype relationships)
Source: Examination
No of diagrams in corpus: 169
Size of development/test sets: 72/97


Corpus A05

Type of diagram: Entity-relationship (includes subtype relationships)
Source: Assignment
No of diagrams in corpus: 584
Size of development/test sets: not yet defined



[Diagrams] [Researchers] [ERD Exerciser] [Publications] [Corpora] [Auto Marker] [M359] [M363] [Other Domains] [eAssessment] [Accessibility]