Thursday, 26 February 2015

Just why is citation important anyway?


The four capital mistakes of open source
The four capital mistakes of open source by opensource.com, on Flickr



I recently had it hammered home to me about just how important citations are in scientific research. This came about as the result of me reviewing a document* .

Me being me, the first thing I did was turn to the back to look at the bibliography**. It was a mess, but I can understand how citation strings get all mucked up. I remember when I was writing my PhD, I had to copy and paste, or even retype, all my citations into the files that were my thesis chapters (files - multiple, because Word couldn't cope with having all the chapters in the one file). Nowadays I have discovered the wonder that is Mendeley, and citations are so much easier to deal with - they even do data citations!

Then I read the document, and one point I said to myself, "Self, this equation looks a bit funny to me. Oh look, here's the citation for the paper it comes from - let's look at the original source to make sure that there's no copying errors in the equation." So verily, I looked up the cited paper, and yay! It was open and accessible. But could I find the quoted equation in the cited paper? Er, no.

There was another moment, where one of my publications was cited as the source for a particular figure. I looked at the figure, and at my name in the caption next to it, and went and checked the cited document. Again, this figure was not contained in the cited publication.

These were the only examples of mis-citation that I caught, but I did find myself scrawling [citation needed] repeatedly in various places throughout the whole work. And every time I did so, my confidence in the research being presented waned a little bit more.

(Unfortunately, it goes without saying that none of the data presented in this work was cited properly either...)

Yes, all researchers stand on the shoulders of giants, and use work that has been published before to support their arguments. But it's important to not rely on unsupported statements of fact being "stuff everyone knows". Yes, the report might be written for a specialist audience who do indeed know all that, and know the citations you'd use to support the statement, but they're not your only audience. And providing citations demonstrates that you've done your due diligence, and can back up your assertions properly.

At the end of the day, when I read a paper or report, I can't check everything that the author(s) have done, so I have to take a certain amount on trust. This trust can be damaged seriously by some silly little things, like too many typos or unreadable graphs (curves all printed in similar shades of grey), and by some serious things, like mis-citations, or no citations at all.

So, citations. They're not just for helping reproducibility, or assigning credit - they also act as a marker that the author(s) knows their background and pays attention to those tricky details that can easily catch you out in science. Honestly - citations are the easy part, but if you don't have the energy to care about them (even though they're annoying) then how can your reader be sure you've applied the same care to the "more important" bits of your research?

____________
* I'm not going to give any names or details about the document, because that's not fair, and not the point of this post.

** Yes, I am a pedant!

My biography


Dr Sarah Callaghan is a senior scientific researcher and programme manager for the Centre for Environmental Data Analysis(CEDA), at STFC Rutherford Appleton, UK. CEDA also incorporates  the British Atmospheric Data Centre, the NERC Earth Observation Data Centre,  the IPCC Data Distribution Centre, and the UK Solar System Data Centre. CEDA also collaborates with the STFC Scientific Computing Department, to host the JASMIN super-data-computer.

She is Editor-in-Chief for the Data Science Journal and is a member of the CODATA Executive Committee. She is Communications Manager for the NERC Data Operations Group - working with members of the other NERC data centres, and is a member of the Belmont Forum e-Infrastructures and Data Management Collaborative Research Action.

She has experience of both creating and managing large datasets, and so understands well the frustrations that scientists can experience as a result of dealing with data!

Her publication list can be found here.

(last updated 29th Nov 2017)