PAMELA (Payload for Antimatter Matter Exploration and Light-nuclei Astrophysics) is a Russian-Italian satellite measuring the composition of cosmic rays. One of the motivations for the measurements is the indirect detection of dark matter — the very-weakly-interacting particles that make up about 25% of the matter in the Universe (with, as I’m sure you all know by now) normal matter about 5% and the so-called Dark Energy the remaining 70%. By observing the decay products of the dark matter — with more decay occurring in the densest locations — we can probe the properties of the dark particles. So far, these decays haven’t yet been unequivocally observed. Recently, however, members of the PAMELA collaboration have been out giving talks, carefully labelled “preliminary”, showing the kind of excess cosmic ray flux that dark matter might be expected to produce.
But preliminary data is just that, and there’s a (usually) unwritten rule that the audience certainly shouldn’t rely on the numerical details in talks like these. Cirelli & Strumia have written a paper based on those numbers, “Minimal Dark Matter predictions and the PAMELA positron excess” (arXiv:0808.3867), arguing that the data fits their pet dark-matter model, so-called minimal dark matter (MDM). MDM adds just a single type of particle to those we know about, compared to the generally-favored supersymmetric (SUSY) dark matter model which doubles the number of particle types in the Universe (but has other motivations as well). What do the authors base their results on? As they say in a footnote, “the preliminary data points for positron and antiproton fluxes plotted in our figures have been extracted from a photo of the slides taken during the talk, and can thereby slightly differ from the data that the PAMELA collaboration will officially publish” (originally pointed out to me in the physics arXiv blog).
This makes me very uncomfortable. It would be one thing to write a paper saying that recent presentations from the PAMELA team have hinted at an excess — that’s public knowledge. But a photograph of the slides sounds more like amateur spycraft than legitimate scientific data-sharing.
Indeed, it’s to avoid such inadvertent data-sharing (which has happened in the CMB community in the past) that the Planck Satellite team has come up with its rather draconian communication policy (which is itself located in a password-protected site): essentially, the first rule of Planck is you do not talk about Planck. The second rule of Planck is you do not talk about Planck. And you don’t leave paper in the printer, or plots on your screen. Not always easy in our hot-house academic environments.
Update: Bergstrom, Bringmann, & Edsjo, “New Positron Spectral Features from Supersymmetric Dark Matter – a Way to Explain the PAMELA Data?” (arXiv: 0808.3725) also refers to the unpublished data, but presents a blue swathe in a plot rather than individual points. This seems a slightly more legitimate way to discuss unpublished data. Or am I just quibbling?
Update 2: One of the authors of the MDM paper comments below. He makes one very important point, which I didn’t know about: “Before doing anything with those points we asked the spokeperson of the collaboration at the Conference, who agreed and said that there was no problem”. Essentially, I think that absolves them of any “wrongdoing” — if the owners of the data don’t have a problem with it, then we shouldn’t, either (although absent that I think the situation would still be dicey, despite the arguments below and elsewhere). And so now we should get onto the really interesting question: is this evidence for dark matter, and, if so, for this particular model. (An opportunity for Bayesian model comparison!?)
7 responses to “Stealing data?”
It’s a weird situation, since they have shown the plot in the public in multiple locations, so at this point one can’t really claim that it’s private data to be kept within the collaboration. It seems strange to publicize a plot in that way but then not release it in any form that can be cited. Of course people will start trying to work with it as soon it’s public.
Wait a minute: nobody is ‘stealing’ anything here, don’t you agree? If you call this ‘stealing’, then any citation in any paper is ‘stealing’. Any time someone quotes a formula from another author, one is ‘stealing’.
We are fully referencing the source of the data. We even stress clearly that data are preliminary and they are not the more precise ones that PAMELA will publish on paper soon.
So I think the title of your post is a bit too strong.
Having clarified this, one can ask oneself whether there is anything wrong in what we did. I understand that someone may disagree or feel uncomfortable, so I am open to criticism. However, I’d like to make a few points:
– Were the data presented in public? Yes, and it was not even the first time. So, while I do understand that one can feel a difference between the locutions: (a) presented in public, (b) published, (c) printed on paper, (d) released on a pdf slide… etc etc, I personally find it difficult to cut any line of ‘legitimacy’ here.
– Before doing anything with those points we asked the spokeperson of the collaboration at the Conference, who agreed and said that there was no problem. More precisely, let me stress: he did not give anything directly to us!; he just said that if we wanted to use our photo we were free to do it. That’s pretty obvious and reasonable, and this is why we specify in our paper that those data point are not the official ones.
– The paper we posted on the arXiv is just the proceeding for the conference. The predictions of the model were already published in a previous ordinary paper (published on Nuclear Physics B last Spring) and what we did is simply to make the check by superimposing to the old figure the preliminary points. Proceedings are there to tell what happens during conferences, this is what they are for.
I say all the above with no intent at all of provocation, it’s just to clarify the issue a bit. I am happy to know comments and change or shape my own opinion on this, if I am wrong.
But after all, I don’t think it is what we should focus on. The science results are quite interesting.
Well, it may not be stealing, but it sure as heck must be bad computational science. How can one possibly conclude that preliminary data points, photographed from a conference presentation,fit one model better than any other? Such conclusions must be supported by very careful analysis, taking into account the precise model’s fits together with the number of free parameters required to fit the data.
Indeed there are some report of this kind of steal of data by a Mr. H Bethe around 1940, using some experimental data reported in a conference, but not actually published, to advance on some ideas about field theory. I can not tell if Mr. Bethe or his friends had memorized the numbers or if he was allowed to take they down in paper.
But, I’d suggest to conference organizers that from now, besides the usual complimentary ink and paper, they should include a disposable photo camera in the conference bag.
Tony Readhead gave a talk at the Jan 2002 AAS meeting, and showed a slide of the CBI measurements of the CMB angular power spectrum. Max Tegmark took a picture of it and inserted the jpeg into the powerpoint for his talk later in the session. When nothing had been posted to the arxiv by late February, I asked Max for a copy of the jpeg and measured the data points off of it. You can do a very good job since you have the plot frame with tick marks, although it is easier to get data from postscript files. I used these points in a few talks I gave, but not any published papers. Finally in May 2002 the CBI papers hit the arxiv.
I have also extracted the temperature map data from BOOMERanG from the jpeg figures on nature.com from their April 2000 paper. A comparison of this map to the map data I extracted from the DASI postscript figure shows a good correlation in the overlap region. These data have yet to be released, but you can get my extractions at http://www.astro.ucla.edu/~wright/BOOMdat.html
WMAP tries to contain preliminary data like Planck, and submits papers to the arxiv and journals, releases datasets, and issues press information simultaneously. No talks are given until the data release.
Of course Planck hasn’t even released their scan strategy, which is a bit absurd.
That model require a boost factor anyway..
Well, it may not be stealing, but it sure as heck must be bad computational science. How can one possibly conclude that preliminary data points, photographed from a conference presentation,fit one model better than any other? Such conclusions must be supported by very careful analysis, taking into account the precise model’s fits together with the number of free parameters required to fit the data.