Friday, November 26, 2010

Open Science: from good intentions to hesitant reality

At the start of the Artificial Culture project we made a commitment to an Open Science approach. Actually translating those good intentions into reality has proven much more difficult than I had expected. But now we've made a start, and interestingly the open science part of this research project is turning into a project within a project.

So what's the story? Well, firstly we didn't really know what we meant by open science. We were, at the start, motivated by two factors. One, a strong sense that open science is a Good Thing. And, second, a rather more pragmatic idea that the project might be helped through having a pool of citizen scientists who would help us with interpretation of the results. We knew that we would generate a lot of data and also believed we would benefit from fresh eyes looking over that data, uncoloured - as we are - by the weight of hypotheses and high expectations. We thought we could achieve this simply by putting the whole project, live - as it happens - on the web.

Sounds simple: put the whole project on the web. And now that I put it like this, hopelessly naive. Especially given that we had not budgeted for the work this entails. So, this became a DIY activity fitted into spare moments using free Web tools, in particular Google Sites.

We started experimental work, in earnest, in March 2010 - about two and a half years into the project (building the robots and experimental infrastructure took about two years). Then, by July 2010 I started to give some thought to uploading the experimental data to the project web. But it took me until late October to actually make it happen. Why? Well it took a surprising amount of effort to figure out the best way of structuring and organising the experiments, and the data sets from those experiments, together with the structure of the web pages on which to present that data. But then even when I'd decided on these things I found myself curiously reluctant to actually upload the data sets. I'm still not sure why that was. It's not as if I was uploading anything important, like Wikileaks posts. Perhaps it's because I'm worried that someone will look at the data and declare that it's all trivial, or obvious. Now this may sound ridiculous but posting the data felt a bit like baring the soul. But maybe not so ridiculous given the emotional and intellectual investment I have in this project.

But, having crossed that hurdle, we've made a start. There are more data sets to be loaded (the easy part), and a good deal more narrative to be added (which takes a deal of effort). The narrative is of course critical because without it the data sets are just meaningless numbers. To be useful at all we need to explain (starting at the lowest level of detail):
  1. what each of the data fields in each of the data files in each data set means;
  2. the purpose of each experimental run: number of robots, initial conditions, algorithms, etc;
  3. the overall context for the experiments, including the methodology and the hypotheses we are trying to test.
I said at the start of this blog post that the open science has become a project within a project and happily this aspect is now receiving the attention it deserves: yesterday project co-investigator Frances Griffiths spent the day in the lab here in Bristol, supported by Ann Grand (whose doctoral project is on the subject of Open Science and Public Engagement).

Will anyone be interested in looking inside our data, and - better still - will we realise our citizen science aspirations? Who knows. Would I be disappointed if no-one ever looks at the data? No, actually not. The openness of open science is its own virtue. And we will publish our findings confident that if anyone wants to look at the data behind the claims or conclusions in our papers they can.

Postscript: See also Frances Griffiths' blog post Open Science and the Artificial Culture Project


  1. Hi Alan,

    Long time no see! Ann told me that your site is coming along.

    It occurs to me to wonder whether the difficulty in making the project open might have anything to do with applying old information paradigms to the problem?

    You and I are extremely ancient (i.e. over 25) and so we're still used to Web 1.0, in which any kind of information needs to be composed, formatted and presented like a book or magazine. But in Web 2.0 the medium is the message, in the sense that a lot of our thoughts are actually made flesh using Web technologies right from the start, or communicated using social media as a primary method. The new Web is a tool for thinking with as much as a presentation medium, so ideally the stuff is already online and doesn't need publishing.

    Facebook and Twitter, for instance, are conceptually very different from the older ideas of how we transmit our thoughts. A Twitter feed from a project can afford to have a very low level of content, because it doesn't make demands on either the reader or author. Just knowing that someone's finished writing a paper and is about to make coffee if anyone wants one might not sound like much but it actually adds a lot of information for those of us on the outside who are interested in the project. And if team members expect to communicate with each other and keep tabs on the collective through tweets too, then the rest of us can get a feel for what's happening at no cost to the tweeters.

    Meanwhile, if instead of developing talk slides on PowerPoint, you each do it on Prezi, then by the time you finish composing the slides, they're already published. All you have to do is tweet and blog a link, which ought to become as second-nature as clicking SAVE. The same is true if raw data are recorded in an online spreadsheet that can simply have its permissions changed when the author is happy with it.

    I'm not sure that polished, high-effort web pages are necessarily the right paradigm. It may be more a question of embracing WITHIN THE PROJECT TEAM ITSELF the new direct-to-the-web style of thinking that is emerging from social media and cloud-based tools. The more these are integrated into the day-to-day communications and records of the team, the easier, richer and more natural it will become as open science.

    So the trick might be to use a Wiki instead of a lab notebook, Google Calendar and Google Apps instead of Office, encourage people to tweet about their day, etc. The hard part is getting over the psychological fear you mentioned of letting people see unfinished, unpolished work. But that's something we're all getting used to these days. I don't think people measure organizations by polish any more but by friendliness and openness.

    Of course I SAY this, but I'm a one-man team, so I don't know how easy it would be in practice.

  2. Hi Steve

    Great to hear from you, and thankyou for your thoughtful comments.

    You are of course absolutely right. By adopting a Web 1.0 approach to open science we're making life hard for ourselves. As you say the tools are all there for 'open notebook science' - if only we can be confident and relaxed enough to use them. I especially appreciate your point about how this kind of uncut 'as it happens' open science is much more interesting than the edited version on the project web pages.

    The one aspect that I will slightly disagree with you over, is in relation to making experimental data available on line. Here I think it's essential to have both structure and narrative explaining what the data is. But once that is in place we then ideally need the much more free-flowing Web 2.0 style discussion around those experiments and what they mean.

  3. Yes, that's a good point about raw results. I wouldn't understand them without some supporting info.

    This whole business of having one's daily life exposed on the Web, warts and all, is quite fascinating, I think. I'm pretty shy, not to mention English, so it took a little getting used to, but it's kind of like being on a nudist beach - once you're all in it together it's quite liberating! Hopefully your peers will avoid carping from the sidelines just because they're too scared to get their own kits off!

    Best of luck.

  4. I just added your feed to my favorites. I really enjoy reading your posts.

  5. Thank you - very much appreciated:)

  6. Hi Alan,
    I guess they are two main meaning to "open science".
    One is like in the open source field in computer science. Meaning anyone has access to the code and can add, improve, change, etc. That is how I understand your idea that somebody will have a look at your data.
    The other meaning I see is open science in the sense as being open to the public on how science is actually done. Which is very important in a democracy. What has to be done is showing how your consortium is working, what are the questions, the pitfalls, the disagreements, the results, etc.
    But indeed in the two cases, it opens vast questions about how to do that properly !
    Science works as a black box for the moment, we keep all secret until we publish some polished results. Opening the box may be important. Who dares to?

  7. I agree with the comments of José, and am certainly in favour of things such as publishing raw data from experiments. In the past such supporting information was often missing, which meant that you just had to take the researcher's word for it that a given result had been achieved, or that the results published were representative of the data obtained.

    Recent events, such as the climategate scandal, have shown that people need to be able to have confidence not just in the end products of science (publications/journals) but also in the process of how those results were obtained. A more open style might also help to encourage debate around methodologies or assumptions.

    Looking to web 3.0, it might be that a metadata format can be devised which describes experimental data in a more formalised way, making it amenable to indexing by search engines and some kinds of automated analysis.

  8. On Steve's comment about the liberating effects of social networking, this is probably true, but only up to a point. There is probably an optimal level of social interaction beyond which the effects of behavioural sink come into play. Each individual's optimum social interaction level will differ, depending upon the degree of introversion or extroversion.

  9. Sounds good, I like to read your blog, just added to my favorites ;).

  10. Thanks José and Bob for your great comments. I like your idea Bob of a standardised metadata format for formatting and publishing science data. At present each chunk of data needs an accompanying narrative to explain what each data field means, etc. That's not only a significant amount of work to write, but its very hard (for me the experimentalist) to anticipate what someone will need know in order to later download the data, understand it, analyse it, and etc.