This short guide is intended to assist Informatics researchers to write scientific papers, whether these are for conferences, journals, dissertations or some other purpose. Although it is not itself a scientific paper, it is based on an hypothesis:
The key to successful paper writing is an explicit statement of both a scientific hypothesis and the evidence to support (or refute) it.We will show how this key idea underpins the overall structure of the paper and determines the story it tells.
Informatics is an engineering science. Like other branches of both engineering and science it contributes to the advancement of knowledge by formulating hypotheses and evaluating them. It is not enough merely to describe some new technique or system; some claim about it must be first stated and then evaluated. This claim has the status of a scientific hypothesis; the evaluation provides the evidence that will support or refute it.
Of course, the whole story may be spread across several papers and several authors. For instance, the initial paper about a new idea may not contain all the evidence needed to support a hypothesis; further evidence may be provided in later papers.
In experimental research, hypotheses typically take one of the following two forms:
A paper may contain one or more hypotheses. However, it is a mistake to try to cover too many hypotheses in a single paper: it leads to confusion.
Explicit hypotheses are rarely stated in Informatics papers. This is a Bad Thing. It makes it hard for the reader to understand and assess the contribution of the paper. If the reader misidentifies the hypotheses then s/he is bound to find the evidence for it unconvincing. If the reader is also a referee or examiner s/he may reject the paper. Worse still, it may indicate that the author is unclear about the contribution of the paper. In the absence of a clear hypothesis it is impossible to know what evidence would support or refute it.
The symptoms of this malaise are commonplace: papers whose contribution is vague, ambiguous or absent; papers with a confused mixture of multiple, implicit hypotheses; evaluations that are inconclusive or non-existent; referee reports that appear harsh or inconsistent. One of the main purposes of this guide is to help to reverse this unhappy situation.
Theoretical papers are usually welcome exceptions: they usually contain both hypotheses and convincing evidence to support them. The hypotheses are the statements of theorems and the supporting evidence is their proofs. This may account for the relatively healthy state of theoretical research in Informatics compared with experimental research.
There is a default structure for writing an experimental Informatics paper, whether this be for a conference or journal or as a dissertation. When reporting experimental work, you should use this structure unless you have a good reason not to. Theoretical papers have a different default structure, which I hope to include at a later date.
The main parts of an experimental Informatics paper should be as follows. Each part could be a section of a paper or chapter of a dissertation. To reduce clutter I will refer to sections and papers below. Some parts may need to be spread over several sections/chapters, e.g. if there is a lot of material to be covered or it naturally falls into disjoint subparts. Some parts, especially adjacent parts, may be merged into a single section/chapter, e.g., where there is not much to be said or two or more topics are interlinked. Parts marked with * are optional, but you should think hard before deciding to omit them; if there is something that should to be said you should say it.
A thorough evaluation usually requires large-scale experimentation, with system X being applied to many examples of task Y. To aid the reader's understanding, the result of these experiments are best presented graphically. To verify the hypothesis, the results must usually be statistically processed (Cohen's book "Empirical methods for artificial intelligence", MIT Press, 1995, is a good guide to statistical methods for Informatics researchers. Toby Walsh has also collected some useful resources on empirical methods in Informatics.). It can aid the reader's understanding of the processing of system X to give one or two worked examples before the results are presented in detail. Further details of the results can be presented in an appendix.
I make no apology for the length of this discussion of evaluation. Evaluation is the most important part of the paper as it provides the evidence for the hypothesis. It is also one of the most neglected parts: even being absent in many papers. If this guide succeeds in raising the profile of evaluation than half my battle will be won.
This guide has both a descriptive and a normative role: descriptive of best practice in the presentation of Informatics research and normative in highlighting the importance of explicit hypotheses in such presentations. In particular, I have tried to show how the conventional structure of papers describing experimental research should be used to emphasise these hypotheses and their supporting (or refuting) evidence.
I believe that the neglect of explicit hypotheses has caused methodological problems in Informatics. At worst, it leads to work that fails to advance the state of the field. Systems are built with no clear idea of the contribution they will make to the advancement of knowledge. Such systems are described without convincing evaluation, since it is unclear what the purpose of evaluation would be. Readers of the research may not explicitly notice the absence of hypotheses, but will feel some vague unease about the contribution of the research, sometimes summarised in the comment "so what?". It is tragic to witness such talent, energy and opportunity being wasted in this way. I hope this guide can make some small contribution to preventing such waste in the future.
This guide is under development. I would be grateful for any comments on ways to correct, improve or extend it. More stuff on methodology can be found in my AI Research Methodologies course.