Pilots
Pilot evaluations with basic skills tutors and students are listed below.
Planned
- Pilot08, December 2005, Futher investigation of user preferences
and lexical choice.
Completed
- Final Evaluation, Sept. - Nov. 2005, final evaluation experiment using
SkillSum vs. 05.
192 new-entrant student participants (Sept) and 40 new entrants (Nov) from
Peterborough Regional College.
Pre-test questionnaire, literacy and numeracy tests, post-test questionnaires,
self-assessments of skills, reading rates, reading errors, comprehension, and preferences.
SkillSum reports improved students knowledge of their own skills.
This experiment was unable to reproduce the preference
results of pilot07, nor the readability results of pilot05 (probably due to
substantial content improvements in both control and readability models).
- Pilot07, June 2005, 18 participants at
Peterborough Regional College, trial of
SkillSum vs. 05
and the experimental design of our final evaluation experiment. We found
statistically significant preference for SkillSum reports over baseline reports.
- Pilot06, March 2005, 14 participants at
South Lanarkshire
College, think-alouds and interviews on use of technical terms
(e.g. "grammar").
These indicated that illustrative examples are more generally useful
than paraphrases or the terms themselves.
- Pilot05, October 2004, 60 participants at
University of
Derby College Buxton,
SkillSum vs. 04
readability experiment showing a statistically significant increase in
reading rates using SkillSum readability model over the control model
(see our ENLG'05 publication)
and motivation interviews (see Nava Tintarev's Masters Thesis).
- Pilot04, September 2004, 10 participants at
Total People,
SkillSum vs. 03
small readability experiment that confirmed results of earlier GIRL
evaluations and motivation
interviews that revealed large
variations between individuals.
- Pilot03, June 2004, 8 participants at
South Lanarkshire
College,
SkillSum vs. 02
interactive report with buttons to
study browsing behaviour and interviews to determine
usefulness/accuracy of generated reports. Found that people generally
click on a vertical list of buttons in order from top to bottom and
the activities that the NLG system selected to illustrate an individual's skills
were often inaccurate.
- Pilot02, May 2004, 5 participants at
Karten CTEC Centre,
SkillSum vs. 01
trial with shorter "screener" tests vs. longer Target Skills tests,
generation of shorter reports and interviews
to elicit opinions and suggestions for improvement. Generally, the
shorter tests worked well, but this pilot highlighted in particular
problems with what to do when students cannot answer any questions in
a test.
- Pilot01, April 2004, 8 participants at
Total People,
SkillSum vs. 00
trial with short student reports vs. long tutor reports and interviews
to elicit opinions and suggestions for improvement. Students generally
preferred the shorter reports and didn't want to read longer ones.
Tutors were enthusiastic about longer reports for tutors.
|